3.3 KiB
title, toc
| title | toc |
|---|---|
| Welcome to Data for Canada | false |
Mission
Data for Canada exists to bridge the gap between open data availability and data usability. We curate, clean, and re-engineer high-value Canadian datasets into high-performance, analysis-ready formats for researchers, developers, and systems.
The Problem
Canada creates incredible amounts of open data, from foundational road networks to federal census statistics. However, these datasets are often locked in legacy formats, fragmented portals, or structures that require significant engineering effort to normalize before they can be used. For a researcher or a system developer, the "time-to-insight" is often bottlenecked by data preparation.
The Solution
We act as the transformation layer. We aggregate datasets with permissive licenses and process them into "digestible" standards optimized for modern downstream applications.
- For Researchers: Skip the cleaning phase. Access normalized, documented data ready for analysis.
- For Systems: Standardized data structures designed to feed directly into pipelines, data warehouses, and downstream services.
High-Level Overview
flowchart TD
%% Define Styles
classDef source fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef process fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
classDef storage fill:#e0f2f1,stroke:#00695c,stroke-width:2px
classDef consumer fill:#f3e5f5,stroke:#8e24aa,stroke-width:2px
subgraph "Data Sources"
StatProducts[("Statistical Products")]:::source
Orthoimagery[("Orthoimagery")]:::source
end
subgraph "Processing Pipeline"
Raw[Raw Data Ingestion<br/>CSVs, Shapefiles, ECW]:::process
Transform[Transformation Engine]:::process
Opt[Optimization]:::process
end
subgraph "Dissemination Formats"
Parquet[("Parquet Files")]:::storage
PMTiles[("PMTiles")]:::storage
FlatGeoBuf[("FlatGeoBuf")]:::storage
end
subgraph "Distribution Infrastructure"
ObjectStorage[Object Storage]:::storage
DecentralizedDistribution[Decentralized Distribution]:::storage
Serverless[Serverless Worker<br/>API & Serving]:::storage
end
subgraph "Consumption / End Users"
DataSci[DuckDB, Python, QGIS, Jupyter]:::consumer
WebApps[Web Applications]:::consumer
DataSci[Python, R, Julia]:::consumer
Systems[Systems]:::consumer
end
%% Relationships
StatProducts --> Raw
Raw --> Transform
Transform --> Opt
Opt --> Parquet
Opt --> PMTiles
Opt --> FlatGeoBuf
Parquet --> ObjectStorage
PMTiles --> ObjectStorage
FlatGeoBuf --> ObjectStorage
ObjectStorage --> DecentralizedDistribution
ObjectStorage --> Serverless
DecentralizedDistribution --> Systems
Serverless --> WebApps
Serverless --> DataSci
%% Concept Annotations
Transform -.->|"Join Spatial & Tabular"| Parquet
PMTiles -.->|"Stream"| WebApps
FlatGeoBuf -.->|"Stream"|DataSci
FlatGeoBuf -.->|"Stream"|WebApps
click StatProducts "https://www.dataforcanada.org/docs/processes/statistical_products/" _blank
click Orthoimagery "https://www.dataforcanada.org/docs/processes/statistical_products/" _blank
click DecentralizedDistribution "https://www.dataforcanada.org/docs/dissemination/" _blank