7.7 KiB
title, toc
| title | toc |
|---|---|
| Welcome to Data for Canada | false |
Mission
Data for Canada exists to bridge the gap between open data availability and data usability. We curate, clean, and re-engineer high-value Canadian datasets into high-performance, analysis-ready formats for researchers, developers, and systems.
The Problem
Canada creates incredible amounts of open data, from foundational road networks to federal census statistics and orthoimagery. However, these datasets are often locked in legacy formats, fragmented portals, or structures that require significant engineering effort to normalize. For a researcher or system developer, the "time-to-insight" is often bottlenecked by data preparation.
The Solution
We act as the transformation layer. We aggregate datasets with permissive licenses and process them into "digestible" standards optimized for modern downstream applications.
- For Data Engineers, Researchers/Scientists, and Developers: Skip the cleaning phase. Access normalized, documented data ready for analysis.
- For Systems: Standardized data structures designed to feed directly into pipelines, data warehouses, and downstream services.
Our Stewardship: Data for Canada takes ownership of the datasets we create, from start to finish. We ensure that data structures remain consistent, allowing for reliable analysis across time and space.
What Guides Us
We prioritize our work in a utilitarian manner, aiming to provide the greatest amount of good to the greatest amount of individuals, though we remain open to making exceptions where necessary.
Our approach is informed by key federal digital standards:
- Guidance on assessing readiness to manage data according to Findable, Accessible, Interoperable, Reusable (FAIR) principles
- GC White Paper: Data Sovereignty and Public Cloud
High-Level Overview
flowchart TD
subgraph ds [Data Sources]
Statistical@{ shape: lean-l}
Foundation@{ shape: lean-l}
Orthoimagery@{ shape: lean-l}
FieldImagery@{ shape: lean-l, label: "Field Imagery"}
EnvironmentClimate@{ shape: lean-l, label: "Environmental & Climate"}
Elevation@{ shape: lean-l}
WebCorpus@{ shape: lean-l, label: "Web Corpus"}
end
subgraph pp [Processing Pipeline]
Raw@{ shape: rect, label: "Raw Data Ingestion"}
Transform@{ shape: rect, label: "Transform and Optimize"}
end
subgraph df [Dissemination Formats]
Parquet@{ shape: lean-l}
FlatGeoBuf@{ shape: lean-l}
MVT@{ shape: lean-l}
MLT@{ shape: lean-l}
PMTiles@{ shape: lean-l}
COG@{ shape: lean-l}
Zarr@{ shape: lean-l}
WebP@{ shape: lean-l}
JPEGXL@{ shape: lean-l, label: "JPEG XL"}
AV1@{ shape: lean-l, label: "AV1"}
end
subgraph di [Distribution Infrastructure]
ObjectStorage@{ shape: bow-rect, label: "Object Storage"}
Metadata@{ shape: rect}
HTTP@{ shape: rect, label: "Static Files"}
DecentralizedDistribution@{ shape: rect, label: "Decentralized Distribution"}
end
subgraph ei [Experimental Infrastructure]
GeoSpatialServices@{ shape: rect, label: "Geospatial Services"}
%%Martin@{ shape: rect}
%%GeoServer@{ shape: rect}
%%ZOOProject@{ shape: rect, label: "ZOO-Project"}
%%BBOXServer@{ shape: rect, label: "BBOX Server"}
%%Panoramax@{ shape: rect}
%%Pelias@{ shape: rect}
end
subgraph "Consumption"
DataSci@{ shape: rect, label: "Researchers & Developers"}
Systems@{ shape: rect, label: "Systems"}
end
%% Relationships
Statistical a1@--> Raw
a1@{animate: true, animation: slow}
Foundation a2@--> Raw
a2@{animate: true, animation: slow}
Orthoimagery a3@--> Raw
a3@{animate: true, animation: slow}
FieldImagery a4@--> Raw
a4@{animate:true, animation: fast}
EnvironmentClimate a5@--> Raw
a5@{animate: true, animation: fast}
Elevation a6@--> Raw
a6@{animate: true, animation: slow}
WebCorpus a7@--> Raw
a7@{animate: true, animation: fast}
Raw a8@--> Transform
a8@{animate: true, animation: slow}
Transform a9@--> df
a9@{animate: true, animation: slow}
Parquet a10@--> FlatGeoBuf
a10@{animate: true, animation: slow}
FlatGeoBuf a11@--> MVT
a11@{animate: true, animation: slow}
FlatGeoBuf a91@--> MLT
a91@{animate: true, animation: slow}
MVT a90@ --> PMTiles
a90@{animate: true, animation: slow}
MLT a92@ --> PMTiles
a92@{animate: true, animation: slow}
Zarr a12@ --> WebP
a12@{animate: true, animation: slow}
df a13@ --> di
a13@{animate: true, animation: slow}
COG a14@--> WebP
a14@{animate: true, animation: slow}
WebP a93@--> PMTiles
a93@{animate: true, animation: slow}
ObjectStorage a15@--> Metadata
a15@{animate: true, animation: slow}
Metadata a16@--> HTTP
a16@{animate: true, animation: slow}
HTTP a17@--> ei
a17@{animate: true, animation: slow}
HTTP a18@--> DecentralizedDistribution
a18@{animate: true, animation: slow}
HTTP a19@--> DataSci
a19@{animate: true, animation: slow}
DecentralizedDistribution a20@--> Systems
a20@{animate: true, animation: fast}
DecentralizedDistribution a21@--> DataSci
a21@{animate: true, animation: fast}
Systems a22@ --> DataSci
a22@{animate: true, animation: fast}
ei a23@ --> DataSci
a23@{animate: true, animation: slow}
%% URLs
click Foundation "https://github.com/dataforcanada/process-foundation-labs/" _blank
click Statistical "https://github.com/dataforcanada/process-statistical-labs/" _blank
click Orthoimagery "https://github.com/dataforcanada/process-orthoimagery-labs/" _blank
click FieldImagery "https://github.com/dataforcanada/process-field-imagery-labs/" _blank
click EnvironmentClimate "https://github.com/dataforcanada/process-environmental-climate-labs/" _blank
click Elevation "https://www.dataforcanada.org/docs/dissemination/" _blank
click WebCorpus "https://github.com/dataforcanada/process-web-corpus-labs/" _blank
click Parquet "https://github.com/apache/parquet-format/" _blank
click FlatGeoBuf "https://flatgeobuf.org/" _blank
click MVT "https://github.com/mapbox/vector-tile-spec/" _blank
click MLT "https://github.com/maplibre/maplibre-tile-spec/" _blank
click COG "https://cogeo.org/" _blank
click Zarr "https://github.com/zarr-developers/geozarr-spec/" _blank
click WebP "https://developers.google.com/speed/webp/" _blank
click PMTiles "https://github.com/protomaps/PMTiles/blob/main/spec/v3/spec.md" _blank
click JPEGXL "https://jpeg.org/jpegxl/" _blank
click AV1 "https://aomedia.org/specifications/av1/" _blank
click DecentralizedDistribution "https://www.dataforcanada.org/docs/dissemination/" _blank
click Metadata "https://stac-utils.github.io/stac-geoparquet/latest/spec/stac-geoparquet-spec/" _blank
click GeoSpatialServices "https://github.com/dataforcanada/geo-services-labs/" _blank
click Martin "https://martin.maplibre.org/" _blank
click GeoServer "https://geoserver.org/" _blank
click ZOOProject "https://zoo-project.org/" _blank
click BBOXServer "https://www.bbox.earth/" _blank
click Panoramax "https://gitlab.com/panoramax" _blank
click Pelias "https://pelias.io" _blank