diff --git a/content/docs/getting_started/_index.md b/content/docs/getting_started/_index.md index c5fbfe9..ca972a9 100644 --- a/content/docs/getting_started/_index.md +++ b/content/docs/getting_started/_index.md @@ -9,6 +9,7 @@ sidebar: Look through our [GitHub Organization](https://github.com/orgs/dataforcanada/repositories) to see ongoing work being done in this project. {{< cards >}} + {{< card link="/docs/getting_started/file_naming_convention/" title="File Naming Convention" icon="cog" >}} {{< card link="https://data-01.labs.dataforcanada.org/processed/" title="Data in The Lab" icon="cloud-download" >}} {{< card link="https://data-01.dataforcanada.org/processed/" title="Production Data" icon="check-circle" >}} {{< card link="/docs/processes/" title="Processes" icon="cog" >}} diff --git a/content/docs/getting_started/file_naming_convention.md b/content/docs/getting_started/file_naming_convention.md new file mode 100644 index 0000000..73ce5ca --- /dev/null +++ b/content/docs/getting_started/file_naming_convention.md @@ -0,0 +1,104 @@ +--- +title: File Naming Convention +weight: 2 +next: /docs/processes +sidebar: + open: true +--- + +## Data for Canada: File Naming Convention (DFC-FNC) + +### 1. The Current Schema + +All published datasets must adhere to the following structure to ensure files are machine-parsable, sortable by region, and identifiable by human readers. **This file naming convention will be modified as we solidify our processes**. + +{{< callout >}} +We are open to feedback on the current file naming convention. +{{< /callout >}} + +#### **Syntax** + +`[iso-region]_[source-identifier]_[theme]_[iso-date]_[variant].[extension]` + +#### **Component Breakdown** + +| Segment | Definition | Format / Rules | Example | +| --- | --- | --- | --- | +| **1. ISO Region** | The ISO 3166-2 code for the jurisdiction. | Lowercase. Hyphenated. | `ca-ab`, `ca` | +| **_** | *Separator* | Underscore | | +| **2. Source / Location ID** | **Organization** OR **Location**. | Use `[colloquial-name]-[dguid]` for locations, OR `[organization-name]` for national bodies. | `edmonton-2023A00054811061` OR `statcan` | +| **_** | *Separator* | Underscore | | +| **3. Theme** | The primary category or title of the dataset. | Lowercase. **snake_case** allowed for longer titles. | `orthoimagery`, `open_database_of_buildings` | +| **_** | *Separator* | Underscore | | +| **4. ISO Date** | The vintage of the data source. | **ISO 8601**. Flexible precision. | `2023`, `2023-06`, `2023-06-01` | +| **_** | *Separator* | Underscore | | +| **5. Variant** | Resolution or specific subset info. | **No Projections.** Alphanumeric. Units included. | `075mm`, `30cm` | + +## 2. Component Detail + +### A. Source / Location ID (Flexible) + +This segment defines the "Who" or "Where" of the dataset. + +* **For Geographic Datasets:** Use the **Colloquial Name** + **Hyphen** + **DGUID**. +* *Example:* `edmonton-2023A00054811061` + + +* **For Organization Datasets:** Use the **Organization Acronym** when the data is national or not tied to a single DGUID. +* *Example:* `statcan`, `cmhc`, `nrcan` + + + +### B. The DGUID (Capitalization Exception) + +If using a DGUID (Dissemination Geography Unique Identifier), you must adhere to Statistics Canada standards. + +* **Link:** [Statistics Canada: DGUID Definition](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/definition-eng.cfm?ID=geo055) +* **Rule:** While the rest of the filename is lowercase, you **must capitalize the structural type letter** (e.g., 'A' for Administrative areas, 'S' for Statistical areas) within the DGUID. +* **Example:** `2021A0005...` (Correct) vs `2021a0005...` (Incorrect). + +### C. ISO Date Flexibility + +Dates follow strictly **ISO 8601**, but the precision can vary based on the nature of the data (Year, Month, or Day). + +* **Learn More:** [Wikipedia: ISO 8601 Date and Time Format](https://en.wikipedia.org/wiki/ISO_8601) + +**Examples of Date Precision:** + +* **Month Precision:** `ca_statcan_national_address_register_2024-12.parquet` +* **Day Precision:** `ca_statcan_open_database_of_buildings_2025-04-15.parquet` + +### D. Variant + +This field is strictly for **resolution** (e.g., `075mm`, `1m`) or content subsets. + +* **Rule:** **Do not include projection information** (e.g., `EPSG:3857`, `NAD83`) in the filename. +* **Reasoning:** Projection details are handled exclusively in the file format metadata or the accompanying **[FAIR Data Catalog](https://stac-utils.github.io/stac-geoparquet/latest/spec/stac-geoparquet-spec/)** item. + +--- + +## 3. Example Scenarios + +### **Scenario 1: High-Res Orthoimagery (Location Based)** + +* **Context:** 7.5cm pixel resolution imagery of Edmonton, Alberta from 2023. +* **File Name:** `ca-ab_edmonton-2023A00054811061_orthoimagery_2023_075mm.pmtiles` +* **Reference:** [Preview and Download Orthoimagery](https://www.dataforcanada.org/docs/processes/orthoimagery/#download-and-preview) + +### **Scenario 2: National Organization Data (Source Based)** + +* **Context:** The Open Database of Buildings released by Statistics Canada on April 15, 2025. +* **File Name:** `ca_statcan_open_database_of_buildings_2025-04-15.parquet` +* **Reference:** [Preview and Download Census Data](https://www.dataforcanada.org/docs/processes/statistical_products/statistics_canada/census_data/#how-to-use-the-map-preview) + +--- + +## 4. Helper Tools + +### **Statistics Canada Geography Search** + +To accurately populate the **Location ID** segment of the schema, use this tool to find 2021 Census geographies and their corresponding DGUIDs. + +* **Tool URL:** [https://statcan-geography.labs.dataforcanada.org/](https://statcan-geography.labs.dataforcanada.org/) +* **Source Code:** [GitHub Repository](https://github.com/dataforcanada/statcan-geography.labs.dataforcanada.org) +* **Usage:** Enter a city or region name to retrieve the correct colloquial name and DGUID pairing (e.g., searching "Ottawa" returns `2021A00053506008`).