cf-data-ingestor
A Cloudflare Worker that acts as a secure proxy: it downloads a file from a
URL provided in a JSON payload and streams it directly into an S3 bucket in
us-west-2, keeping memory usage constant regardless of file size. All uploads
use multipart upload with 5 MiB chunks to stay well within the Workers 128 MiB
memory limit.
Architecture
Client POST ──▶ Worker ──stream──▶ S3 PutObject / Multipart
│
├─ Auth check (Bearer token)
├─ Fetch source URL (custom User-Agent)
└─ Sign with AWS Sig V4 (aws4fetch)
Client PUT ──▶ Worker ──stream──▶ S3 PutObject / Multipart
│
├─ Auth check (Bearer token)
└─ Direct binary upload (X-S3-Key header)
All uploads use S3 multipart upload with 5 MiB parts, keeping peak memory bounded to ~5 MiB regardless of file size. This avoids hitting the Cloudflare Workers 128 MiB memory limit that can occur when buffering large single PUT request bodies.
Setup
1. Install dependencies
pnpm install
2. Configure wrangler.toml
Edit the [vars] section:
[vars]
S3_BUCKET = "us-west-2.opendata.source.coop"
S3_REGION = "us-west-2"
S3_ENDPOINT = ""
S3_ENDPOINT should be left empty when targeting AWS S3 (path-style
addressing is used automatically). Set it only for non-AWS S3-compatible
services — https:// is prepended automatically if omitted.
3. Set secrets
Copy the example .env file and fill in your values:
cp .env.example .env
AUTH_TOKEN="your-auth-token"
AWS_ACCESS_KEY_ID="AKIAxxxxxxxxxxxxxxxxxxxx"
AWS_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Wrangler automatically loads the .env file during local development
(pnpm run dev). For deployed Workers, push each secret with:
pnpm wrangler secret put AUTH_TOKEN
pnpm wrangler secret put AWS_ACCESS_KEY_ID
pnpm wrangler secret put AWS_SECRET_ACCESS_KEY
4. Deploy
pnpm run deploy
Usage
Download mode (POST)
Downloads a file from a URL and uploads it to S3.
Method: POST
Content-Type: application/json
Authorization: Bearer <AUTH_TOKEN>
Payload parameters:
| Field | Required | Description |
|---|---|---|
download_url |
Yes | Direct link to the source file |
user_agent |
Yes | User-Agent string for the download request |
key_prefix |
No | Destination path within the S3 bucket |
Example
curl -X POST https://cf-data-ingestor.labs.dataforcanada.org \
-H "Authorization: Bearer <AUTH_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"download_url": "https://diffusion.mern.gouv.qc.ca/diffusion/RGQ/Imagerie/Orthomosaique/Generique/Mosa30rvb0015_30cm_Rvb/Mtm9/Jpeg2000/mos_14_31n02_se_30cm_f09.JP2",
"user_agent": "Data for Canada - d4c-datapkg-orthoimagery",
"key_prefix": "dataforcanada/d4c-datapkg-orthoimagery/archive/ca-qc_government_and_municipalities_of_quebec-2026A000224_d4c-datapkg-orthoimagery_orthorectified_imagery_from_quebec"
}'
Successful response
{
"ok": true,
"bucket": "us-west-2.opendata.source.coop",
"key": "dataforcanada/d4c-datapkg-orthoimagery/archive/ca-qc_government_and_municipalities_of_quebec-2026A000224_d4c-datapkg-orthoimagery_orthorectified_imagery_from_quebecdataforcanada/.../mos_14_31n02_se_30cm_f09.JP2",
"content_type": "application/x-msdownload",
"size_bytes": 773722941,
"etag": "abc123def456",
"multipart_part_size": 5242880,
"multipart_number_parts": 148,
"started_at": "2026-03-12T21:00:00.000Z",
"finished_at": "2026-03-12T21:01:30.000Z"
}
multipart_part_sizeandmultipart_number_partsare always present since all uploads use multipart.
Direct upload mode (PUT)
Uploads a binary file body directly to S3. Useful for uploading local files (e.g. Parquet artifacts) without needing a public download URL.
Method: PUT
Authorization: Bearer <AUTH_TOKEN>
Required headers:
| Header | Description |
|---|---|
X-S3-Key |
Full S3 object key (e.g. dataforcanada/my-dataset/data.parquet) |
Optional headers:
| Header | Description |
|---|---|
Content-Type |
MIME type (default: application/octet-stream) |
Content-Length |
File size in bytes |
Body: Raw binary file content.
Example
curl -X PUT https://cf-data-ingestor.labs.dataforcanada.org \
-H "Authorization: Bearer <AUTH_TOKEN>" \
-H "X-S3-Key: dataforcanada/my-dataset/downloads.parquet" \
-H "Content-Type: application/octet-stream" \
-H "Content-Length: $(stat -c%s downloads.parquet)" \
--data-binary @downloads.parquet
Successful response
{
"ok": true,
"bucket": "us-west-2.opendata.source.coop",
"key": "dataforcanada/my-dataset/downloads.parquet",
"content_type": "application/octet-stream",
"size_bytes": 45231,
"etag": "def456abc789",
"started_at": "2026-03-12T21:00:00.000Z",
"finished_at": "2026-03-12T21:00:01.000Z"
}
Response fields
| Field | Type | Always present | Description |
|---|---|---|---|
ok |
boolean | Yes | true on success |
bucket |
string | Yes | S3 bucket name |
key |
string | Yes | S3 object key |
content_type |
string | Yes | MIME type of the uploaded file |
size_bytes |
number | When Content-Length known | File size in bytes |
etag |
string | When available | S3 ETag (quotes stripped) |
multipart_part_size |
number | Yes | Part size in bytes (5 MiB) |
multipart_number_parts |
number | Yes | Number of parts uploaded |
started_at |
string | Yes | ISO-8601 UTC timestamp when processing started |
finished_at |
string | Yes | ISO-8601 UTC timestamp when processing finished |
Error responses
| Status | Meaning |
|---|---|
| 401 | Missing or invalid Bearer token |
| 405 | Non-POST/PUT method |
| 415 | Content-Type is not application/json (POST only) |
| 400 | Malformed JSON, missing fields, or missing X-S3-Key header |
| 502 | Source download or S3 upload failed |
S3 Object Key
POST mode
Only the filename is extracted from the download_url and placed under the key_prefix. The source URL's directory hierarchy is not preserved.
download_url: https://diffusion.mern.gouv.qc.ca/diffusion/RGQ/Imagerie/Orthomosaique/Generique/Mosa30rvb0015_30cm_Rvb/Mtm9/Jpeg2000/mos_14_31n02_se_30cm_f09.JP2
key_prefix: "dataforcanada/d4c-datapkg-orthoimagery/archive/ca-qc_government_and_municipalities_of_quebec-2026A000224_d4c-datapkg-orthoimagery_orthorectified_imagery_from_quebec"
→ key: dataforcanada/d4c-datapkg-orthoimagery/archive/ca-qc_government_and_municipalities_of_quebec-2026A000224_d4c-datapkg-orthoimagery_orthorectified_imagery_from_quebec/mos_14_31n02_se_30cm_f09.JP2
If key_prefix is omitted or empty, the file uploads to the bucket root.
PUT mode
The full S3 key is specified directly via the X-S3-Key header.
Local Development
pnpm run dev
Then POST or PUT to http://localhost:8787. Wrangler reads secrets from the .env file you created in step 3. You can also create environment-specific overrides (e.g. .env.staging) — see the Cloudflare docs for the full .env precedence rules.