30 KiB
theme, layout, title, author, class, drawings, transition, comark, hideInToc, favicon, routerMode
| theme | layout | title | author | class | drawings | transition | comark | hideInToc | favicon | routerMode | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| seriph | cover | Data for Canada / the Universe Background and Strategy | Diego Ripley | text-center |
|
slide-up | true | true | https://www.dataforcanada.org/favicon.svg | hash |
Data for Canada / the Universe
Background and Strategy
Presented By: Diego Ripley
Date: April 10, 2026
layout: cover hideInToc: true
"Space is big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space."
Douglas Adams, Hitchicker's Guide to the Galaxy, #1
<style> .slidev-layout.cover { background-image: url('/dont-panic-background-01.webp') !important; background-size: cover !important; background-position: center !important; } p { text-shadow: 1px 1px 4px rgba(0,0,0,0.9); } </style>hideInToc: true level: 4
flowchart TD
Client(["🌐 S3 Client / User"])
Gateway["<b>s3.dataforcanada.org</b>\nS3-Compatible Gateway"]
Client -->|"S3 API Request"| Gateway
Gateway -->|"sourcecooperative bucket"| AWS
Gateway -->|"backblaze-ca-east-006 bucket"| BB
Gateway -->|"cloudflare-apac bucket"| CFAPAC
Gateway -->|"cloudflare-enam bucket"| CFENAM
Gateway -->|"tigris bucket"| TIGRIS
subgraph AWS ["☁️ Amazon Web Services"]
AWSNode["📍 Oregon, United States"]
end
subgraph BB ["🔵 Backblaze B2"]
BBNode["📍 Toronto, ON, Canada"]
end
subgraph CFAPAC ["🟠 Cloudflare R2"]
CFAPACNode["📍 Asia Pacific Region"]
end
subgraph CFENAM ["🟠 Cloudflare R2"]
CFENAMNode["📍 Eastern North America"]
end
subgraph TIGRIS ["⚡ Tigris Data"]
TIGRISNode["11 Regions Worldwide 🌍\nAuto-routes to nearest location\nfor lowest latency"]
end
style Gateway fill:#1a5f7a,color:#fff,stroke:#0d3d52
style Client fill:#2d6a4f,color:#fff,stroke:#1b4332
style AWSNode fill:#ff9900,color:#000,stroke:#cc7a00
style BBNode fill:#e03c31,color:#fff,stroke:#b02d24
style CFAPACNode fill:#f6821f,color:#fff,stroke:#c4681a
style CFENAMNode fill:#f6821f,color:#fff,stroke:#c4681a
style TIGRISNode fill:#6c3483,color:#fff,stroke:#512e6b
- This slide can go into size of dataset in specific bucket. For example, sourcecooperative has X TBs worth of data.
layout: iframe-unscaled hideInToc: true level: 4 url: https://objex.labs.dataforcanada.org/
layout: cover transition: slide-left hideInToc: true
<style> .slidev-layout.cover { background-image: url('/dont-panic-background-02.webp') !important; background-size: cover !important; background-position: center !important; } a { text-shadow: 1px 1px 4px rgba(0,0,0,0.9); } </style>layout: two-cols-header zoom: 1.2 transition: slide-left hideInToc: true
Notes
::left::
- Keep questions after presentation
hideInToc: true transition: slide-left zoom: 0.8
Table of Contents
layout: center hideInToc: true
Guide By
- Cloud-Native Geospatial Storage Cheatsheet
- Guidance on Assessing Readiness to Manage Data According to the Findable, Accessible, Interoperable and Reusable (FAIR) Principles
- Cloud-Optimized Geospatial Formats Guide
- Link rot in LIS literature: a 20-year study of web citation decay, recovery and preservation challenges
- Science Needs a Social Network for Sharing Big Data
- Sustainability of Digital Formats: Planning for Library of Congress Collections
- GC White Paper: Data Sovereignty and Public Cloud
layout: center transition: slide-left hideInToc: true
In Plain Words
- Make sure that data lasts as long as humanly possible presenting all perspectives by creating efficient data and processes for long-term archival.
- I want what I am building to be in libraries.
- Create processes, tools, infrastructure, datasets to empower everyday citizens, to make their lifes just a little easier, to filter all of the noise.
- Yes, ethics is at the core of everything.
layout: center
- And for any citizen to contribute to its resilience. This is why my project makes use of P2P technologies (ex. BitTorrent, IPFS, libp2p)
- Hence the data dissemination strategy.
layout: center
flowchart TD
classDef linkNode stroke:#0000EE,color:#0000EE,stroke-width:2px;
subgraph mirrors [Mirrors & Preservation]
SourceCoop[Source Cooperative]
Tigris[Tigris]
Community[Community]
Cloudflare
Zenodo[Zenodo]
InternetArchive[Internet Archive]
Metadata[FAIR Data Catalogue]
end
Sources[Open Data Sources]
Processes[Data Packages]
Artifacts[Systems-Ready Data]
P2P["P2P Technology"]
subgraph Consumers [Consumption]
Users[Data People & Developers]
Systems[Systems]
end
%% Flow with Animations
Sources a1@<--> Processes
a1@{animate: true, animation: slow}
Processes a2@<--> Artifacts
a2@{animate: true, animation: slow}
Artifacts a3@<--> Metadata
a3@{animate: true, animation: fast}
Metadata a20@<--> SourceCoop
a20@{animate: true, animation: slow}
Metadata a21@<--> Tigris
a21@{animate: true, animation: fast}
Metadata a22@<--> Community
a22@{animate: true, animation: fast}
Metadata a23@<--> Zenodo
a23@{animate: true, animation: slow}
Metadata a24@<--> Cloudflare
a24@{animate: true, animation: fast}
Metadata a25@<--> InternetArchive
a25@{animate: true, animation: slow}
%% Mirror Connections
mirrors a12@<--> Consumers
a12@{animate: true, animation: slow}
%% Hint, the FAIR Data Catalogue can also be decentralized 🤯
%%Metadata a30@<--> P2P
%%a30@{animate: true, animation: fast}
mirrors a9@<--> P2P
a9@{animate: true, animation: fast}
%% P2P Connections
P2P a10@<--> Consumers
a10@{animate: true, animation: fast}
style Sources fill:#FFB74D,stroke:#EF6C00,color:#000000
style Artifacts fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
%% Opera concertmaster
style Metadata fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
class Metadata Metadata
style Processes fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
class Processes Processes
style SourceCoop fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
style Tigris fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
style Cloudflare fill:#FFB74D,stroke:#EF6C00,color:#000000
style Zenodo fill:#FFB74D,stroke:#EF6C00,color:#000000
style Community fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
style P2P fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
style InternetArchive fill:#66BB6A,stroke:#2E7D32,color:#000000
style Users fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
style Systems fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
%% Click Actions
click P2P "https://libp2p.io/" _blank
click Tigris "https://d4c-pkgs.t3.storage.dev/" _blank
click Sources "https://www.dataforcanada.org/#high-level-overview" _blank
click Processes "https://www.dataforcanada.org/docs/d4c-pkgs/" _blank
click Metadata "https://stac-utils.github.io/stac-geoparquet/latest/spec/stac-geoparquet-spec/" _blank
click Zenodo "https://zenodo.org/communities/dataforcanada/" _blank
click SourceCoop "https://source.coop/dataforcanada/" _blank
click InternetArchive "https://archive.org/details/@diegoripley/uploads/" _blank
%% APPLY STYLES TO LINKED NODES
class Sources linkNode
layout: center level: 1
Solutions
layout: center level: 2
Assess
Mapping data portals and all data assets in Canada.
layout: center level: 2
Rank
- Rank datasets according to impact on Canadians (ex. COVID-19 death cases by dissemination block).
layout: iframe-unscaled url: https://dataindex.us/collections/ level: 2
layout: iframe-unscaled url: https://s3.dataforcanada.org/tigris/d4u-datapkg-web-corpus/archive/1777398477.100686/www12.statcan.gc.ca/census-recensement/2021/ref/dict/98-301-x2021001-eng.pdf level: 2
layout: center level: 2
Archive
- Download datasets and make into efficient long-term storage file formats.
- Make them available to the community via something like Backblaze B2 Overdrive, which has a throughput speed ranging from 100Gbps up to 1Tbps (minimum 1PB commitment).
- $15 USD / TB
- $15K USD per month, $180K per year
- Have unique identifiers to the datasets.
layout: center level: 2
- Download them via something like geoparquet-io that enables downloading from Esri data portals and WFS servers. It supports both vector data and raster data.
layout: full level: 2 zoom: 0.9
File Formats
flowchart TD
classDef high fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
classDef med fill:#FBC02D,stroke:#F9A825,color:#000000
classDef low fill:#66BB6A,stroke:#2E7D32,color:#000000
classDef medOrange fill:#FFCC80,stroke:#FB8C00,color:#000000
classDef darkOrange fill:#EF6C00,stroke:#E65100,color:#000000
classDef highLight fill:#FFCDD2,stroke:#E57373,color:#000000
classDef white fill:#fff,stroke:#388E3C,color:#000000
subgraph sot [Long-Term Storage]
Parquet["Parquet"]:::highLight
Lance:::high
FlatCityBuf:::high
Zarr:::highLight
GeoTIFF:::medOrange
JPEGXL["JPEG XL"]:::highLight
AV1:::highLight
FAIRCat["FAIR Data Catalogue"]:::high
end
FlatGeoBuf:::med
subgraph vt [Vector Tiles]
VectorTiles["Mapbox Vector Tiles"]:::low
NextGenVT["Next-Gen Vector Tiles"]:::high
GLB["glTF GLB"]:::high
end
subgraph visuals [Imagery]
AVIF:::high
WebP:::medOrange
JPG:::low
PNG:::low
end
subgraph pkg [Portable Databases]
PMTiles:::medOrange
SQLite:::darkOrange
end
subgraph ent [Enterprise]
FileGDB["File Geodatabase"]:::white
end
sot <--> FlatGeoBuf
FlatGeoBuf --> vt
sot <--> visuals
vt <--> pkg
visuals <--> pkg
sot <--> ent
visuals --> ent
style sot fill:#EF9A9A,stroke:#C62828,color:#000000
style vt fill:#FBC02D,stroke:#F9A825,color:#000000
style visuals fill:#FBC02D,stroke:#F9A825,color:#000000
style pkg fill:#FFB74D,stroke:#EF6C00,color:#000000
style ent fill:#66BB6A,stroke:#2E7D32,color:#000000
click FlatCityBuf "https://github.com/cityjson/flatcitybuf" _blank
click Parquet "https://github.com/apache/parquet-format/" _blank
click FlatGeoBuf "https://flatgeobuf.org/" _blank
click SQLite "https://www.geopackage.org/" _blank
click FileGDB "https://gdal.org/en/stable/drivers/vector/openfilegdb.html" _blank
click VectorTiles "https://github.com/mapbox/vector-tile-spec/" _blank
click NextGenVT "https://github.com/maplibre/maplibre-tile-spec/" _blank
click GLB "https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#glb-file-format-specification" _blank
click Lance "https://docs.lancedb.com/lance" _blank
click GeoTIFF "https://cogeo.org/" _blank
click Zarr "https://github.com/zarr-developers/geozarr-spec/" _blank
click WebP "https://developers.google.com/speed/webp/" _blank
click PMTiles "https://github.com/protomaps/PMTiles/blob/main/spec/v3/spec.md" _blank
click JPEGXL "https://jpeg.org/jpegxl/" _blank
click AV1 "https://aomedia.org/specifications/av1/" _blank
click FAIRCat "https://stac-utils.github.io/stac-geoparquet/latest/spec/stac-geoparquet-spec/" _blank
layout: center level: 2
Standard Interfaces
layout: center level: 2 hideInToc: true
S3
layout: center level: 2 hideInToc: true
P2P
BitTorrent, IPFS, libp2p)
SETI@homelayout: center level: 2 hideInToc: true
Other
SSH, etc.
layout: center level: 2 hideInToc: true
Discreet Global Grid Systems (DGGS)
layout: iframe-unscaled level: 2 hideInToc: true url: https://aidggs-pilot.hartis.org/
layout: center level: 2
Unique Identifiers
- ARKs are open, mainstream, non-paywalled, decentralized persistent identifiers that you can start creating in under 48 hours. They identify anything digital, physical, or abstract.
- Archival Resource Key (ARK) - Spec, Overview
layout: center level: 2
Ledger
- Unique identifier
- Added/Updated/Deleted
- File hash
- Location
- Reputation - across time by stakeholders
layout: center level: 2
Data Packages
flowchart TD
subgraph ds [Data Sources]
Statistical@{ shape: lean-l}
Foundation@{ shape: lean-l}
EnvClimate@{ shape: lean-l, label: "Environment, Climate, & Health"}
Orthoimagery@{ shape: lean-l}
FieldImagery@{ shape: lean-l, label: "Field Imagery"}
Elevation@{ shape: lean-l}
WebCorpus@{ shape: lean-l, label: "Web Corpus"}
end
DataPkgs@{ shape: rect, label: "Data Packages"}
Statistical e1@<--> DataPkgs
e1@{animate: true, animation: slow}
Foundation e2@<--> DataPkgs
e2@{animate: true, animation: slow}
EnvClimate e4@<--> DataPkgs
e4@{animate: true, animation: fast}
Orthoimagery e3@<--> DataPkgs
e3@{animate: true, animation: slow}
FieldImagery e7@<--> DataPkgs
e7@{animate: true, animation: fast}
Elevation e5@<--> DataPkgs
e5@{animate: true, animation: slow}
WebCorpus e6@<--> DataPkgs
e6@{animate: true, animation: fast}
style EnvClimate fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
style Orthoimagery fill:#FBC02D,stroke:#F9A825,color:#000000
style FieldImagery fill:#FBC02D,stroke:#F9A825,color:#000000
style WebCorpus fill:#66BB6A,stroke:#2E7D32,color:#000000
style Elevation fill:#66BB6A,stroke:#2E7D32,color:#000000
style Statistical fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
style Foundation fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
style DataPkgs fill:#B71C1C,stroke:#7F0000,color:#FFFFFF
classDef linkNode stroke:#333333,color:#333333,stroke-width:1.5px
class FieldImagery linkNode
click DataPkgs "https://github.com/dataforcanada/d4c-pkgs" _blank
click Foundation "https://github.com/dataforcanada/d4c-datapkg-foundation" _blank
click Statistical "https://github.com/dataforcanada/d4c-datapkg-statistical" _blank
click Orthoimagery "https://github.com/dataforcanada/d4c-datapkg-orthoimagery" _blank
click FieldImagery "https://github.com/dataforcanada/d4c-datapkg-field-imagery" _blank
click EnvClimate "https://github.com/dataforcanada/d4c-datapkg-environment-climate-health" _blank
click Elevation "https://github.com/dataforcanada/d4c-datapkg-elevation" _blank
click WebCorpus "https://github.com/dataforcanada/d4c-datapkg-web-corpus" _blank
layout: center hideInToc: true
Can download all datasets at https://source.coop/dataforcanada
layout: center level: 3
Statistical
- This is how our governments see the world and how what they are supposed to use when making decisions.
- We need to request that statistical data be tied to individual authors, so that we can start to trust institutions. If someone's credibility in the community becomes a factor, I believe that individuals will fight to keep their credibility with the community.
- Open processes.
layout: center level: 3
layout: center level: 4 hideInToc: true
Statistical Tables
- I did this in 2025 for 7918 Statistics Canada data tables.
- Started with 3314.57 GB of CSVs and turned them into 25.73 GB.
layout: iframe-unscaled hideInToc: true level: 4 url: https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/
layout: center hideInToc: true level: 4
https://source.coop/dataforcanada/d4c-datapkg-statistical/processed/tables
layout: center level: 4 hideInToc: true
Census Data
layout: iframe-unscaled hideInToc: true level: 4 url: https://docs.google.com/spreadsheets/d/14FmFGaqU7EDZ19zRZXBNX4La4VeIDXa7kbgP_g7ai9s/edit?usp=sharing
layout: iframe-unscaled hideInToc: true level: 4 url: https://static-01.dataforcanada.org/processed/ca_statcan_2021A000011124_d4c-datapkg-statistical_census_pop_dissemination_areas_digital_2021_v0.1.0-beta/#12.2/45.4294/-75.74374/0/60
layout: iframe-unscaled hideInToc: true level: 4 url: https://static-01.dataforcanada.org/processed/ca_statcan_2021A000011124_d4c-datapkg-statistical_census_pop_federal_electoral_districts_2013_representation_order_digital_2021_v0.1.0-beta/#4.93/56.91/-111.54
layout: center
layout: center level: 3
Foundation
- Minimum information that a civilization needs to start from scratch.
- Buildings, roads, address points.
- Placenames
layout: iframe-unscaled hideInToc: true level: 4 url: https://pmtiles.io/#url=https%3A%2F%2Fdata.source.coop%2Fdataforcanada%2Fd4c-datapkg-foundation%2Fprocessed%2Fca_statcan_2021A000011124_d4c-datapkg-foundation_open_database_of_buildings_2025-04-15_v0.1.0-beta.pmtiles&map=15.17/45.402295/-75.691511
layout: iframe-unscaled hideInToc: true level: 4 url: https://pmtiles.io/#url=https%3A%2F%2Fdata.source.coop%2Fdataforcanada%2Fd4c-datapkg-foundation%2Fprocessed%2Fca_statcan_2021A000011124_d4c-datapkg-foundation_road_network_2021_v0.1.0-beta.pmtiles&map=15.17/45.402295/-75.691511
layout: iframe-unscaled hideInToc: true level: 4 url: https://pmtiles.io/#url=https%3A%2F%2Fdata.source.coop%2Fdataforcanada%2Fd4c-datapkg-foundation%2Fprocessed%2Fca_statcan_2021A000011124_d4c-datapkg-foundation_national_address_register_2025-07_v0.1.0-beta.pmtiles&map=15.17/45.402295/-75.691511
layout: center level: 3
Environment, Climate and Health
layout: center hideInToc: true level: 4
Source, Internet Archive Snapshot
layout: center hideInToc: true level: 4
layout: center hideInToc: true level: 4
layout: center hideInToc: true level: 4
layout: center hideInToc: true level: 4
layout: center hideInToc: true level: 4
And now citizens can all create their own air quality stations.
See opensensor.space for more information.
layout: center level: 3
Orthoimagery
layout: center level: 3
https://github.com/dataforcanada/d4c-datapkg-orthoimagery/issues
layout: iframe-unscaled url: https://pmtiles.io/#url=https%3A%2F%2Fdata.source.coop%2Fdataforcanada%2Fd4c-datapkg-orthoimagery%2Fprocessed%2Fca-on_province_of_ontario-2024A000235_drape_eastern_ontario_orthoimagery_2024_16cm_v0.1.0-beta.pmtiles&map=8.02/45.196/-76.357
layout: center level: 3 hideInToc: true
- Currently working on downloading 100TB of QC, CAN orthoimagery.
layout: center level: 3
Web Corpus
layout: center level: 3
Field Imagery
- Latitude, Longitude, heading
- Can be audio, video, etc.
- Any device (ex. drone, webcam, )
layout: two-cols-header level: 3 hideInToc: true
::left::
Toronto Englinton LRT
::right::
Your browser does not support videos. You may download it here.
layout: center
Your browser does not support videos. You may download it here.
layout: center
layout: center
Your browser does not support videos. You may download it here.
layout: center
layout: center level: 2
Communicate to Your Audience and Create Trust
- The crisis we face is not technical; it is cultural
- Culture and 21st Century Challenges – Reframing Culture as the Foundation of Place and Identity
- Connecting Complex Ecosystems: The Craft of Making "Collaboration" Tangible
- Human Interoperability
layout: center hideInToc: true
Collaboration
- In essence: speak people's language. A scientist might be interested in the facts, other stakeholders into other things.
layout: center
Matrix Bridges