mirror of
https://github.com/dataforcanada/d4c-service-main-site.git
synced 2026-06-13 14:00:51 +02:00
2.0 KiB
2.0 KiB
title, toc, weight
| title | toc | weight |
|---|---|---|
| Data Dissemination Strategy | true | 3 |
Our dissemination strategy prioritizes interoperability, long-term preservation, and decentralized resilience.
Once data products reach a production-ready state, the workflow is as follows:
- Cloud-Native First: Priority is given to highly performant, system-to-system file formats (e.g., [Geo]Parquet) to enable efficient bulk analysis. Support for specific desktop GIS clients (e.g., Esri products, QGIS) will follow.
- Persistent Identification & Cataloging: Every dataset version will be assigned a DOI for citation and immutability.
- The endpoint
https://data-01.dataforcanada.org/processed/will strictly serve the latest version of a dataset. - Global metadata will be aggregated into a single, queryable STAC GeoParquet file. This catalog will track all versions and DOIs, providing direct download links to Zenodo which serves as the long-term data repository.
- The endpoint
- Decentralized Distribution: We will pilot BitTorrent to maximize infrastructure resilience. By leveraging HTTP Web Seeding (BEP 19), torrents will be seeded simultaneously by Zenodo, the Data for Canada infrastructure, and community peers, ensuring high availability without a single point of failure.
High-Level Overview
flowchart TD
Sources[Open Data Sources<br/>Statistics Canada and Others]
Notebook[Jupyter Notebooks]
DuckDB[DuckDB]
QGIS[QGIS]
Artifacts[Analysis-Ready Data<br/>Parquet and GeoParquet]
Distribution[Decentralized Distribution]
Portal[Static Data Portal]
Zenodo[Long-Term Archive]
Torrent[Peer Distribution]
Users[Researchers and Developers]
Sources --> Notebook
Notebook --> DuckDB
DuckDB --> QGIS
QGIS --> Artifacts
Artifacts --> Distribution
Distribution --> Portal
Distribution --> Zenodo
Distribution --> Torrent
Portal --> Users
Zenodo --> Users
Torrent --> Users