From 8013bf424d49ba78fa3175050035a434950fe273 Mon Sep 17 00:00:00 2001 From: Diego Ripley Date: Thu, 19 Mar 2026 07:36:38 -0400 Subject: [PATCH] Update Vancouver 2022 orthoimagery README. It is just downloading the index file and creating a txt file dataset that lists the files to download, which is run by the HTTP ingestor --- .../README.md | 35 +------------------ 1 file changed, 1 insertion(+), 34 deletions(-) diff --git a/scripts/ca-bc_vancouver-2022A00055915022_d4c-datapkg-orthoimagery_2022_075mm/README.md b/scripts/ca-bc_vancouver-2022A00055915022_d4c-datapkg-orthoimagery_2022_075mm/README.md index 54c0729..d57e6c3 100644 --- a/scripts/ca-bc_vancouver-2022A00055915022_d4c-datapkg-orthoimagery_2022_075mm/README.md +++ b/scripts/ca-bc_vancouver-2022A00055915022_d4c-datapkg-orthoimagery_2022_075mm/README.md @@ -1,34 +1 @@ -# Vancouver 2022 Orthoimagery — Download Script - -This directory contains the automation script for acquiring the **City of Vancouver 2022 Orthophoto Imagery** dataset (7.5 cm resolution) from [Vancouver Open Data](https://opendata.vancouver.ca/explore/dataset/orthophoto-imagery-2022/). - -## What the Script Does - -`download.sh` performs four sequential steps: - -1. **Download Index** — Uses `aria2c` to fetch the dataset catalogue as a Parquet file from the Vancouver Open Data API. -2. **Extract URLs** — Queries the Parquet file with `duckdb` to extract all MrSID image URLs into a plain-text file suitable for batch downloading. -3. **Create Output Directory** — Ensures the data input directory exists at `../../data/input/ca-bc_vancouver-2022A00055915022_d4c-datapkg-orthoimagery_2022_075mm/` (relative to this script). -4. **Download Images** — Uses `aria2c` to download all images in parallel (12 concurrent connections, 4 connections per server) into the data input directory. - -## Dependencies - -The following command-line tools must be installed and available on your `PATH`: - -| Tool | Purpose | Install | -|---|---|---| -| [aria2c](https://aria2.github.io/) | High-speed parallel downloads | `sudo apt install aria2` | -| [duckdb](https://duckdb.org/) | Query Parquet files from the CLI | [Install guide](https://duckdb.org/docs/installation/) | - -## Usage - -```bash -cd scripts/ca-bc_vancouver-2022A00055915022_d4c-datapkg-orthoimagery_2022_075mm -bash download.sh -``` - -The script will print progress for each step. Once complete, the downloaded MrSID image files will be located in: - -``` -data/input/ca-bc_vancouver-2022A00055915022_d4c-datapkg-orthoimagery_2022_075mm/ -``` +# Vancouver 2022 Orthoimagery