From 05eef51453108eaa234682524b9a54ac9f8d2283 Mon Sep 17 00:00:00 2001 From: Diego Ripley Date: Fri, 13 Feb 2026 11:24:35 -0500 Subject: [PATCH] improvements to file naming convention --- .../getting_started/file_naming_convention.md | 24 ++++++++++++------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/content/docs/getting_started/file_naming_convention.md b/content/docs/getting_started/file_naming_convention.md index 7b632d9..16ae67a 100644 --- a/content/docs/getting_started/file_naming_convention.md +++ b/content/docs/getting_started/file_naming_convention.md @@ -8,6 +8,16 @@ sidebar: ## Data for Canada: File Naming Convention (DFC-FNC) +## Background + +{{< callout type="important" >}} + You will need to understand these concepts to fully grasp the file naming convention. +{{< /callout >}} + +See Statistics Canada's [geographic hiearchy](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/fig/index-eng.cfm?ID=F1_1) and use the [Census of Population 2021 Dictionary](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/index-eng.cfm) to understand their conceptual model of representing Canada. + +![geographic hierarchy](geographic-hiearchy.svg). + ### 1. The Current Schema All published datasets must adhere to the following structure to ensure files are machine-parsable, sortable by region, and identifiable by human readers. **This file naming convention will be modified as we solidify our processes**. @@ -29,7 +39,7 @@ We are open to feedback on the current file naming convention. | :--- | :--- | :--- | :--- | | **1. ISO Region** | The ISO 3166-2 code for the jurisdiction. | Lowercase. Hyphenated. | `ca-ab`, `ca` | | **_** | *Separator* | Underscore | | -| **2. Data Source and DGUID** | **Data Source**. | Use `[data-source-name]` for the data source. | `city-of-edmonton-2023A00054811061` OR `statcan` | +| **2. Data Source and DGUID** | **Data Source and DGUID**. | Use `[data-source-name]` for the data source and the DGUID for the geographic area it covers. | `city-of-edmonton-2023A00054811061`, `statcan-2021A000011124` | | **_** | *Separator* | Underscore | | | **3. Theme** | The primary category or title of the dataset. | Lowercase. **snake_case** allowed for longer titles. | `orthoimagery` | | **_** | *Separator* | Underscore | | @@ -43,12 +53,10 @@ We are open to feedback on the current file naming convention. ### A. Source / Location ID (Flexible) -This segment defines the "Who" or "Where" of the dataset. +This segment defines the "Who" of the dataset. -* **For Geographic Datasets:** Use the **Data Source Name** + **Hyphen** + **DGUID**. +* Use the **Data Source Name** + **Hyphen** + **DGUID**. * *Example:* `city-of-edmonton-2023A00054811061` -* **For Organization Datasets:** Use the **Organization Acronym** when the data is national or not tied to a single DGUID. - * *Example:* `statcan`, `cmhc`, `nrcan` ### B. The DGUID (Capitalization Exception) @@ -66,8 +74,8 @@ Dates follow strictly **ISO 8601**, but the precision can vary based on the natu **Examples of Date Precision:** -* **Month Precision:** `ca_statcan_national_address_register_2024-12_v0.0.1.parquet` -* **Day Precision:** `ca_statcan_open_database_of_buildings_2025-04-15_v0.0.1.parquet` +* **Month Precision:** `ca_statcan-TOFILL_2024-12_v0.0.1.parquet` +* **Day Precision:** `ca_statcan-TOFILL_2025-04-15_v0.0.1.parquet` ### D. Variant @@ -99,7 +107,7 @@ We use **SemVer** (`vMAJOR.MINOR.PATCH`) to track changes to datasets. ### **Scenario 2: National Organization Data (Source Based)** * **Context:** The Open Database of Buildings released by Statistics Canada on April 15, 2025. -* **File Name:** `ca_statcan_census_pop_dissemination_areas_tabular_2021_v0.0.1.parquet` +* **File Name:** `ca_statcan_TOFILL_2021_v0.0.1.parquet` * **Reference:** [Preview and Download Census Data](https://www.dataforcanada.org/docs/processes/statistical_products/statistics_canada/census_data/#how-to-use-the-map-preview) ## 4. Helper Tools