--- theme: seriph layout: cover # some information about your slides (markdown enabled) title: Data for Canada / the Universe Background and Strategy author: Diego Ripley # apply UnoCSS classes to the current slide class: text-center # https://sli.dev/features/drawing drawings: persist: false # slide transition: https://sli.dev/guide/animations.html#slide-transitions transition: slide-up # enable Comark Syntax: https://comark.dev/syntax/markdown comark: true # duration of the presentation duration: 35min hideInToc: true favicon: 'https://www.dataforcanada.org/favicon.svg' routerMode: hash --- # Data for Canada / the Universe ## Background and Strategy Presented By: Diego Ripley Date: April 10, 2026 --- layout: cover hideInToc: true --- "Space is big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space." Douglas Adams, Hitchicker's Guide to the Galaxy, #1 --- layout: cover transition: slide-left hideInToc: true ---
Github Issue
--- layout: two-cols-header zoom: 1.2 transition: slide-left hideInToc: true --- # Notes ::left:: - Keep questions after presentation --- hideInToc: true transition: slide-left zoom: 0.8 --- # Table of Contents --- layout: center hideInToc: true --- # Guide By * [Cloud-Native Geospatial Storage Cheatsheet](https://bdon.github.io/cng-storage-guide/) * [Guidance on Assessing Readiness to Manage Data According to the Findable, Accessible, Interoperable and Reusable (FAIR) Principles](https://www.canada.ca/en/government/system/digital-government/digital-government-innovations/information-management/guidance-assessing-readiness-manage-data-according-findable-accessible-interoperable-reusable-principles.html) * [Cloud-Optimized Geospatial Formats Guide](https://guide.cloudnativegeo.org/) * [Link rot in LIS literature: a 20-year study of web citation decay, recovery and preservation challenges](https://doi.org/10.1108/AJIM-05-2025-0286) * [Science Needs a Social Network for Sharing Big Data](https://hackmd.io/wKKm4cIDR6a9kYwZ3srVFg?view) * [Sustainability of Digital Formats: Planning for Library of Congress Collections](https://www.loc.gov/preservation/digital/formats/index.html) * [GC White Paper: Data Sovereignty and Public Cloud](https://www.canada.ca/en/government/system/digital-government/digital-government-innovations/cloud-services/digital-sovereignty/gc-white-paper-data-sovereignty-public-cloud.html) --- layout: center transition: slide-left hideInToc: true --- # In Plain Words - Make sure that data lasts as long as humanly possible presenting all perspectives by creating efficient data and processes for long-term archival. - I want what I am building to be in libraries. - Create processes, tools, infrastructure, datasets to empower everyday citizens, to make their lifes just a little easier, to filter all of the noise. - Yes, **ethics** is at the core of everything. --- layout: center --- - And for any citizen to contribute to its resilience. This is why my project makes use of P2P technologies (ex. [BitTorrent](https://tixati.com/specs/bittorrent), [IPFS](https://ipfs.tech/), [libp2p](https://libp2p.io/)) - Hence the [data dissemination strategy](https://www.dataforcanada.org/docs/d4c-infra-distribution/). --- layout: center --- ```mermaid {scale: 0.5} flowchart TD classDef linkNode stroke:#0000EE,color:#0000EE,stroke-width:2px; subgraph mirrors [Mirrors & Preservation] SourceCoop[Source Cooperative] Tigris[Tigris] Community[Community] Cloudflare Zenodo[Zenodo] InternetArchive[Internet Archive] Metadata[FAIR Data Catalogue] end Sources[Open Data Sources] Processes[Data Packages] Artifacts[Systems-Ready Data] P2P["P2P Technology"] subgraph Consumers [Consumption] Users[Data People & Developers] Systems[Systems] end %% Flow with Animations Sources a1@<--> Processes a1@{animate: true, animation: slow} Processes a2@<--> Artifacts a2@{animate: true, animation: slow} Artifacts a3@<--> Metadata a3@{animate: true, animation: fast} Metadata a20@<--> SourceCoop a20@{animate: true, animation: slow} Metadata a21@<--> Tigris a21@{animate: true, animation: fast} Metadata a22@<--> Community a22@{animate: true, animation: fast} Metadata a23@<--> Zenodo a23@{animate: true, animation: slow} Metadata a24@<--> Cloudflare a24@{animate: true, animation: fast} Metadata a25@<--> InternetArchive a25@{animate: true, animation: slow} %% Mirror Connections mirrors a12@<--> Consumers a12@{animate: true, animation: slow} %% Hint, the FAIR Data Catalogue can also be decentralized 🤯 %%Metadata a30@<--> P2P %%a30@{animate: true, animation: fast} mirrors a9@<--> P2P a9@{animate: true, animation: fast} %% P2P Connections P2P a10@<--> Consumers a10@{animate: true, animation: fast} style Sources fill:#FFB74D,stroke:#EF6C00,color:#000000 style Artifacts fill:#B71C1C,stroke:#7F0000,color:#FFFFFF %% Opera concertmaster style Metadata fill:#B71C1C,stroke:#7F0000,color:#FFFFFF class Metadata Metadata style Processes fill:#B71C1C,stroke:#7F0000,color:#FFFFFF class Processes Processes style SourceCoop fill:#B71C1C,stroke:#7F0000,color:#FFFFFF style Tigris fill:#B71C1C,stroke:#7F0000,color:#FFFFFF style Cloudflare fill:#FFB74D,stroke:#EF6C00,color:#000000 style Zenodo fill:#FFB74D,stroke:#EF6C00,color:#000000 style Community fill:#B71C1C,stroke:#7F0000,color:#FFFFFF style P2P fill:#B71C1C,stroke:#7F0000,color:#FFFFFF style InternetArchive fill:#66BB6A,stroke:#2E7D32,color:#000000 style Users fill:#B71C1C,stroke:#7F0000,color:#FFFFFF style Systems fill:#B71C1C,stroke:#7F0000,color:#FFFFFF %% Click Actions click P2P "https://libp2p.io/" _blank click Tigris "https://d4c-pkgs.t3.storage.dev/" _blank click Sources "https://www.dataforcanada.org/#high-level-overview" _blank click Processes "https://www.dataforcanada.org/docs/d4c-pkgs/" _blank click Metadata "https://stac-utils.github.io/stac-geoparquet/latest/spec/stac-geoparquet-spec/" _blank click Zenodo "https://zenodo.org/communities/dataforcanada/" _blank click SourceCoop "https://source.coop/dataforcanada/" _blank click InternetArchive "https://archive.org/details/@diegoripley/uploads/" _blank %% APPLY STYLES TO LINKED NODES class Sources linkNode ``` --- layout: center level: 1 --- # Solutions --- layout: center level: 2 --- # Assess Mapping **data portals** and all **data assets** in Canada. --- layout: iframe-unscaled url: https://directory.opendatasociety.ca/directory level: 2 --- --- layout: center level: 2 --- ## Rank - Rank datasets according to impact on Canadians (ex. COVID-19 death cases by [dissemination block](https://www150.statcan.gc.ca/n1/pub/92-195-x/2021001/geo/db-id/db-id-eng.htm)). --- layout: iframe-unscaled url: https://dataindex.us/collections/ level: 2 --- --- layout: iframe-unscaled url: ./98-301-x2021001-eng.pdf level: 2 --- --- layout: center level: 2 --- # Archive - Download datasets and make into efficient long-term storage file formats. - Make them available to the community via something like [Backblaze B2 Overdrive](https://www.backblaze.com/cloud-storage/b2-overdrive), which has a throughput speed ranging from 100Gbps up to 1Tbps (minimum 1PB commitment). - $15 USD / TB - $15K USD per month, $180K per year - Have unique identifiers to the datasets. --- layout: center level: 2 --- - Download them via something like [geoparquet-io](https://geoparquet.io/) that enables downloading from Esri data portals and WFS servers. It supports both vector data and raster data. --- layout: full level: 2 zoom: 0.9 --- # File Formats ```mermaid {scale: 0.25} flowchart TD classDef high fill:#B71C1C,stroke:#7F0000,color:#FFFFFF classDef med fill:#FBC02D,stroke:#F9A825,color:#000000 classDef low fill:#66BB6A,stroke:#2E7D32,color:#000000 classDef medOrange fill:#FFCC80,stroke:#FB8C00,color:#000000 classDef darkOrange fill:#EF6C00,stroke:#E65100,color:#000000 classDef highLight fill:#FFCDD2,stroke:#E57373,color:#000000 classDef white fill:#fff,stroke:#388E3C,color:#000000 subgraph sot [Long-Term Storage] Parquet["Parquet"]:::highLight Lance:::high FlatCityBuf:::high Zarr:::highLight GeoTIFF:::medOrange JPEGXL["JPEG XL"]:::highLight AV1:::highLight FAIRCat["FAIR Data Catalogue"]:::high end FlatGeoBuf:::med subgraph vt [Vector Tiles] VectorTiles["Mapbox Vector Tiles"]:::low NextGenVT["Next-Gen Vector Tiles"]:::high GLB["glTF GLB"]:::high end subgraph visuals [Imagery] AVIF:::high WebP:::medOrange JPG:::low PNG:::low end subgraph pkg [Portable Databases] PMTiles:::medOrange SQLite:::darkOrange end subgraph ent [Enterprise] FileGDB["File Geodatabase"]:::white end sot <--> FlatGeoBuf FlatGeoBuf --> vt sot <--> visuals vt <--> pkg visuals <--> pkg sot <--> ent visuals --> ent style sot fill:#EF9A9A,stroke:#C62828,color:#000000 style vt fill:#FBC02D,stroke:#F9A825,color:#000000 style visuals fill:#FBC02D,stroke:#F9A825,color:#000000 style pkg fill:#FFB74D,stroke:#EF6C00,color:#000000 style ent fill:#66BB6A,stroke:#2E7D32,color:#000000 click FlatCityBuf "https://github.com/cityjson/flatcitybuf" _blank click Parquet "https://github.com/apache/parquet-format/" _blank click FlatGeoBuf "https://flatgeobuf.org/" _blank click SQLite "https://www.geopackage.org/" _blank click FileGDB "https://gdal.org/en/stable/drivers/vector/openfilegdb.html" _blank click VectorTiles "https://github.com/mapbox/vector-tile-spec/" _blank click NextGenVT "https://github.com/maplibre/maplibre-tile-spec/" _blank click GLB "https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#glb-file-format-specification" _blank click Lance "https://docs.lancedb.com/lance" _blank click GeoTIFF "https://cogeo.org/" _blank click Zarr "https://github.com/zarr-developers/geozarr-spec/" _blank click WebP "https://developers.google.com/speed/webp/" _blank click PMTiles "https://github.com/protomaps/PMTiles/blob/main/spec/v3/spec.md" _blank click JPEGXL "https://jpeg.org/jpegxl/" _blank click AV1 "https://aomedia.org/specifications/av1/" _blank click FAIRCat "https://stac-utils.github.io/stac-geoparquet/latest/spec/stac-geoparquet-spec/" _blank ``` --- layout: center level: 2 --- # Standard Interfaces --- layout: center level: 2 hideInToc: true --- ## S3 --- layout: center level: 2 hideInToc: true --- ## P2P [BitTorrent](https://tixati.com/specs/bittorrent), [IPFS](https://ipfs.tech/), [libp2p](https://libp2p.io/)) SETI@home --- layout: center level: 2 hideInToc: true --- ## Other SSH, etc. --- layout: center level: 2 hideInToc: true --- ## Discreet Global Grid Systems (DGGS) - [Standard](https://docs.ogc.org/DRAFTS/21-038r1.html) - [Pilot](https://aidggs-pilot.hartis.org/) *hint*, it pairs well with MCP --- layout: iframe-unscaled level: 2 hideInToc: true url: https://aidggs-pilot.hartis.org/ --- --- layout: center level: 2 --- # Unique Identifiers - ARKs are open, mainstream, non-paywalled, decentralized persistent identifiers that you can start creating in under 48 hours. They identify anything digital, physical, or abstract. - Archival Resource Key (ARK) - [Spec](https://arks-org.github.io/arkspec/draft-kunze-ark.html), [Overview](https://arks.org/about/ark-overview/) --- layout: center level: 2 --- # Ledger - Unique identifier - Added/Updated/Deleted - File hash - Location - Reputation - across time by stakeholders --- layout: center level: 2 --- # Data Packages ```mermaid {scale: 0.6} flowchart TD subgraph ds [Data Sources] Statistical@{ shape: lean-l} Foundation@{ shape: lean-l} EnvClimate@{ shape: lean-l, label: "Environment, Climate, & Health"} Orthoimagery@{ shape: lean-l} FieldImagery@{ shape: lean-l, label: "Field Imagery"} Elevation@{ shape: lean-l} WebCorpus@{ shape: lean-l, label: "Web Corpus"} end DataPkgs@{ shape: rect, label: "Data Packages"} Statistical e1@<--> DataPkgs e1@{animate: true, animation: slow} Foundation e2@<--> DataPkgs e2@{animate: true, animation: slow} EnvClimate e4@<--> DataPkgs e4@{animate: true, animation: fast} Orthoimagery e3@<--> DataPkgs e3@{animate: true, animation: slow} FieldImagery e7@<--> DataPkgs e7@{animate: true, animation: fast} Elevation e5@<--> DataPkgs e5@{animate: true, animation: slow} WebCorpus e6@<--> DataPkgs e6@{animate: true, animation: fast} style EnvClimate fill:#B71C1C,stroke:#7F0000,color:#FFFFFF style Orthoimagery fill:#FBC02D,stroke:#F9A825,color:#000000 style FieldImagery fill:#FBC02D,stroke:#F9A825,color:#000000 style WebCorpus fill:#66BB6A,stroke:#2E7D32,color:#000000 style Elevation fill:#66BB6A,stroke:#2E7D32,color:#000000 style Statistical fill:#B71C1C,stroke:#7F0000,color:#FFFFFF style Foundation fill:#B71C1C,stroke:#7F0000,color:#FFFFFF style DataPkgs fill:#B71C1C,stroke:#7F0000,color:#FFFFFF classDef linkNode stroke:#333333,color:#333333,stroke-width:1.5px class FieldImagery linkNode click DataPkgs "https://github.com/dataforcanada/d4c-pkgs" _blank click Foundation "https://github.com/dataforcanada/d4c-datapkg-foundation" _blank click Statistical "https://github.com/dataforcanada/d4c-datapkg-statistical" _blank click Orthoimagery "https://github.com/dataforcanada/d4c-datapkg-orthoimagery" _blank click FieldImagery "https://github.com/dataforcanada/d4c-datapkg-field-imagery" _blank click EnvClimate "https://github.com/dataforcanada/d4c-datapkg-environment-climate-health" _blank click Elevation "https://github.com/dataforcanada/d4c-datapkg-elevation" _blank click WebCorpus "https://github.com/dataforcanada/d4c-datapkg-web-corpus" _blank ``` --- layout: center hideInToc: true --- Can download all datasets at https://source.coop/dataforcanada --- layout: center level: 3 --- # Statistical - This is how our governments see the world and how what they are supposed to use when making decisions. - We need to request that statistical data be tied to individual authors, so that we can start to trust institutions. If someone's credibility in the community becomes a factor, I believe that individuals will fight to keep their credibility with the community. - Open processes. --- layout: center level: 3 --- --- layout: center level: 4 hideInToc: true --- # Statistical Tables - I did this in 2025 for 7918 Statistics Canada data tables. - Started with 3314.57 GB of CSVs and turned them into 25.73 GB. --- layout: iframe-unscaled hideInToc: true level: 4 url: https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/ --- --- layout: center hideInToc: true level: 4 --- https://source.coop/dataforcanada/d4c-datapkg-statistical/processed/tables --- layout: center level: 4 hideInToc: true --- # Census Data --- layout: iframe-unscaled hideInToc: true level: 4 url: https://docs.google.com/spreadsheets/d/14FmFGaqU7EDZ19zRZXBNX4La4VeIDXa7kbgP_g7ai9s/edit?usp=sharing --- --- layout: iframe-unscaled hideInToc: true level: 4 url: https://static-01.dataforcanada.org/processed/ca_statcan_2021A000011124_d4c-datapkg-statistical_census_pop_dissemination_areas_digital_2021_v0.1.0-beta/#12.2/45.4294/-75.74374/0/60 --- --- layout: iframe-unscaled hideInToc: true level: 4 url: https://static-01.dataforcanada.org/processed/ca_statcan_2021A000011124_d4c-datapkg-statistical_census_pop_federal_electoral_districts_2013_representation_order_digital_2021_v0.1.0-beta/#4.93/56.91/-111.54 --- --- layout: center --- [2021 Census Data](https://www.dataforcanada.org/docs/d4c-pkgs/d4c-datapkg-statistical/statistics_canada/census_data/) --- layout: center level: 3 --- # Foundation - Minimum information that a civilization needs to start from scratch. - Buildings, roads, address points. - Placenames - [GNBC](https://geonames.nrcan.gc.ca/search-place-names/search) --- layout: iframe-unscaled hideInToc: true level: 4 url: https://pmtiles.io/#url=https%3A%2F%2Fdata.source.coop%2Fdataforcanada%2Fd4c-datapkg-foundation%2Fprocessed%2Fca_statcan_2021A000011124_d4c-datapkg-foundation_open_database_of_buildings_2025-04-15_v0.1.0-beta.pmtiles&map=15.17/45.402295/-75.691511 --- --- layout: iframe-unscaled hideInToc: true level: 4 url: https://pmtiles.io/#url=https%3A%2F%2Fdata.source.coop%2Fdataforcanada%2Fd4c-datapkg-foundation%2Fprocessed%2Fca_statcan_2021A000011124_d4c-datapkg-foundation_road_network_2021_v0.1.0-beta.pmtiles&map=15.17/45.402295/-75.691511 --- --- layout: iframe-unscaled hideInToc: true level: 4 url: https://pmtiles.io/#url=https%3A%2F%2Fdata.source.coop%2Fdataforcanada%2Fd4c-datapkg-foundation%2Fprocessed%2Fca_statcan_2021A000011124_d4c-datapkg-foundation_national_address_register_2025-07_v0.1.0-beta.pmtiles&map=15.17/45.402295/-75.691511 --- --- layout: center level: 3 --- # Environment, Climate and Health --- layout: center hideInToc: true level: 4 --- [Source](https://journals.lww.com/epidem/fulltext/2011/01001/assessing_the_value_of_including_global_position.252.aspx), [Internet Archive Snapshot](https://web.archive.org/web/20240915054043/https:/journals.lww.com/epidem/fulltext/2011/01001/assessing_the_value_of_including_global_position.252.aspx) --- layout: center hideInToc: true level: 4 --- --- layout: center hideInToc: true level: 4 --- --- layout: center hideInToc: true level: 4 --- --- layout: center hideInToc: true level: 4 --- --- layout: center hideInToc: true level: 4 --- And now citizens can all create their own air quality stations. See [opensensor.space](https://opensensor.space/) for more information. --- layout: center level: 3 --- # Orthoimagery --- layout: center level: 3 --- https://github.com/dataforcanada/d4c-datapkg-orthoimagery/issues --- layout: iframe-unscaled url: https://pmtiles.io/#url=https%3A%2F%2Fdata.source.coop%2Fdataforcanada%2Fd4c-datapkg-orthoimagery%2Fprocessed%2Fca-on_province_of_ontario-2024A000235_drape_eastern_ontario_orthoimagery_2024_16cm_v0.1.0-beta.pmtiles&map=8.02/45.196/-76.357 --- --- layout: center level: 3 hideInToc: true --- - Currently working on downloading 100TB of [QC, CAN](https://github.com/dataforcanada/d4c-datapkg-orthoimagery/issues/14) orthoimagery. --- layout: center level: 3 --- # Web Corpus [Source](https://archive.org/details/911/day/20010911) --- layout: center level: 3 --- # Field Imagery - Latitude, Longitude, heading - Can be audio, video, etc. - Any device (ex. drone, webcam, ) --- layout: two-cols-header level: 3 hideInToc: true --- ::left:: # Toronto Englinton LRT ::right::

Your browser does not support videos. You may download it here.

--- layout: center ---

Your browser does not support videos. You may download it here.

--- layout: center --- [Download](https://source.coop/dataforcanada/d4c-datapkg-field-imagery/archive/ca-on_dataforcanada-2026A00053520005_d4c-datapkg-field-imagery_test_field_imagery_01_2026-02-15-1542-1742) --- layout: center ---

Your browser does not support videos. You may download it here.

--- layout: center --- [Download](https://source.coop/dataforcanada/d4c-datapkg-field-imagery/archive/ca-on_dataforcanada-2026A00053520005_d4c-datapkg-field-imagery_test_field_imagery_02_2026-03-07-1435-1627) --- layout: center level: 2 --- # Communicate to Your Audience and Create Trust - [The crisis we face is not technical; it is cultural](https://www.cloreleadership.org/provocation_paper/the-crisis-we-face-is-not-technical-it-is-cultural-legitimacy-place-and-the-systems-that-shape-change/) - [Culture and 21st Century Challenges – Reframing Culture as the Foundation of Place and Identity](https://www.cloreleadership.org/research/working-paper-culture-and-21st-century-challenges-reframing-culture-as-the-foundation-of-place-and-identity/) - [Connecting Complex Ecosystems: The Craft of Making "Collaboration" Tangible](https://caribou.global/publications/connecting-complex-ecosystems-the-craft-of-making-collaboration-tangible/) - [Human Interoperability](https://open.substack.com/pub/ramage123/p/human-interoperability?utm_campaign=post-expanded-share&utm_medium=web) --- layout: center hideInToc: true --- Collaboration - In essence: speak people's language. A scientist might be interested in the facts, other stakeholders into other things. --- layout: center --- # Matrix Bridges --- layout: center hideInToc: true --- # Questions? [Main Website](https://www.dataforcanada.org) · [GitHub](https://github.com/dataforcanada)