Simplified high-level overview diagrams. Still needs improvement

This commit is contained in:
Diego Ripley
2026-01-27 12:04:08 +00:00
parent 78cd6c1638
commit d09a39924d
2 changed files with 32 additions and 22 deletions
+31 -21
View File
@@ -8,11 +8,11 @@ Our dissemination strategy prioritizes interoperability, long-term preservation,
Once data products reach a production-ready state, the workflow is as follows:
* **Cloud-Native First:** Priority is given to highly performant, system-to-system file formats (e.g., [Geo]Parquet) to enable efficient bulk analysis. Support for specific desktop GIS clients (e.g., Esri products, QGIS) will follow.
* **Cloud-Native First:** Priority is given to highly performant, system-to-system file formats (e.g., Parquet) to enable efficient bulk analysis.
* **Persistent Identification & Cataloging:** Every dataset version will be assigned a DOI for citation and immutability.
* The endpoint `https://data-01.dataforcanada.org/processed/` will strictly serve the **latest** version of a dataset.
* Global metadata will be aggregated into a single, queryable [STAC GeoParquet](https://stac-utils.github.io/stac-geoparquet/latest/spec/stac-geoparquet-spec/) file. This catalog will track all versions and DOIs, providing direct download links to [Zenodo](https://zenodo.org) which serves as the long-term data repository.
* **Decentralized Distribution:** We will pilot BitTorrent to maximize infrastructure resilience. By leveraging HTTP Web Seeding (BEP 19), torrents will be seeded simultaneously by Zenodo, the Data for Canada infrastructure, and community peers, ensuring high availability without a single point of failure.
* **Decentralized Distribution:** We will pilot BitTorrent to maximize infrastructure resilience. By leveraging [HTTP Web Seeding (BEP 19)](https://www.bittorrent.org/beps/bep_0019.html), torrents will be seeded simultaneously by Zenodo, the Data for Canada infrastructure, and community peers, ensuring high availability without a single point of failure.
## High-Level Overview
@@ -20,28 +20,38 @@ Once data products reach a production-ready state, the workflow is as follows:
flowchart TD
Sources[Open Data Sources]
Notebook[Jupyter Notebooks]
Artifacts[Analysis-Ready Data]
Portal[Static Data Portal]
Processes[Transformation Processes]
Artifacts[Systems-Ready Data]
Portal[Object Storage]
Metadata[Metadata]
Distribution[Decentralized Distribution]
Zenodo[Zenodo]
Torrent[BitTorrent]
Users[Researchers & Developers]
Systems[Systems]
Zenodo[Long-Term Archive]
Torrent[Peer Distribution]
Sources a1@--> Processes
a1@{animate: true, animation: slow}
Processes a2@--> Artifacts
a2@{animate: true, animation: slow}
Artifacts a3@--> Portal
a3@{animate: true, animation: slow}
Portal a4@--> Metadata
a4@{animate: true, animation: fast}
Metadata a5@--> Distribution
a5@{animate: true, animation: fast}
Users[Researchers, Developers, Systems]
Distribution a6@--> Zenodo
a6@{animate: true, animation: slow}
Distribution a7@--> Torrent
a7@{animate: true, animation: slow}
Sources --> Notebook
Notebook --> Artifacts
Artifacts --> Portal
Portal --> Distribution
Zenodo a9@--> Users
a9@{animate: true, animation: slow}
Torrent a10@--> Users
a10@{animate: true, animation: fast}
Torrent a11@--> Systems
a10@{animate: true, animation: fast}
Distribution --> Zenodo
Distribution --> Torrent
Portal --> Users
Zenodo --> Users
Torrent --> Users
click Zenodo "https://zenodo.org/communities/dataforcanada" _blank
```