Initial commit

This commit is contained in:
Diego Ripley
2025-06-02 18:13:00 -04:00
commit c73a343599
16 changed files with 46480 additions and 0 deletions
+2
View File
@@ -0,0 +1,2 @@
node_modules/
.wrangler/
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) Diego Ripley. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+56
View File
@@ -0,0 +1,56 @@
## Table of Contents
- [Table of Contents](#table-of-contents)
- [About](#about)
- [Steps to Deploy](#steps-to-deploy)
- [License](#license)
## About
**statcan-geographies-search** is a Cloudflare worker that allows searching of Statistics Canada's geographies and makes use of SQLite's full-text search [(FTS5)](https://www.sqlite.org/fts5.html) extension.
You can preview this Cloudflare worker by going to https://geographies.sisyphus.ca/ . After searching, you can click on the DGUID, and it will open up a map page, zooming to the geography.
Currently the search is just for the following geographic levels:
- **Country**: This includes the geographic boundary for Canada for the 2021 Census.
- **Geographical Regions of Canada (GRCs)**: This includes the [GRCs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo027a) for the 2021 Census.
- **Provinces or Territories (PRs)**: This includes the [PRs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo038) for the 2021 Census.
- **Economic Regions (ERs)**: This includes the [ERs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo022) for the 2021 Census.
- **Census Agricultural Regions (CARs)**: This includes the [CARs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo006) for the 2021 Census.
- **Census Divisions (CDs)**: This includes the [CDs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo008) for the 2021 Census.
- **Census Consolidated Subdivisions (CCSs)**: This includes the [CCSs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo007) for the 2021 Census.
- **Census Metropolitan Areas or Census Agglomerations (CMACAs)**: This includes the [CMACAs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo009) for the 2021 Census.
- **Census Subdivisions (CSDs)**: This includes the [CSDs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo012) for the 2021 Census.
- **Federal Electoral Districts (FEDs)**: This includes the [FEDs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo025) for the 2021 Census, which are based on the 2013 Repreesentation Order.
- **Designated Places (DPLs)**: This includes the [DPLs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo018) for the 2021 Census.
- **Population Centres (POPCTRs)**: This includes the [POPCTRs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo049a) for the 2021 Census.
- **Place Names (PNs)**: This includes the [PNs](https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/Definition-eng.cfm?ID=geo033) for the 2021 Census.
## Steps to Deploy
```shell
# 1. Clone the repository
git clone https://github.com/diegoripley/statcan-geographies-search.git
# 2. Navigate to the project directory
cd statcan-geographies-search
# 3. Install the JavaScript dependencies
npx npm install
# 3. Make sure you are logged in with wrangler
npx wrangler login
# 3. Create a Cloudflare D1 database, add the database_id to wrangler.toml
npx wrangler d1 create geographies_search
# 4. Create an R2 bucket, this will host the GeoJSON geometries for Statistics Canada's geographies. Change the bucket_name value in wrangler.tom if it's different than above
npx wrangler r2 bucket create geographies-search
# 6. Import the geographies SQL into D1. If you want to generate it yourself head into the notebooks folder for instructions. You will need to run the notebook to generate the GeoJSON files
npx wrangler d1 execute geographies_search --remote --file=db/geographies.sql
# 7. If you want to generate the dist/output.css
npx @tailwindcss/cli -i ./src/input.css -o ./dist/output.css --minify
# 8. Deploy the Cloudflare worker
npx wrangler deploy
```
## License
This repo is distributed under an MIT license.
[Back to top](#top)
+42587
View File
File diff suppressed because it is too large Load Diff
+109
View File
@@ -0,0 +1,109 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Statistics Canada Geographies Search</title>
<link href="./output.css" rel="stylesheet">
</head>
<body class="bg-gray-100 min-h-screen flex items-center justify-center p-6">
<div class="w-full max-w-3xl">
<h1 class="text-3xl font-bold mb-6 text-center text-gray-800">
Statistics Canada Geographies Search
</h1>
<input
id="searchInput"
type="text"
placeholder="Type at least 3 characters..."
class="w-full px-4 py-2 border border-gray-300 rounded-md shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
/>
<div class="mt-6 overflow-x-auto rounded-md shadow">
<table class="min-w-full bg-white">
<thead>
<tr class="bg-gray-200 text-sm font-semibold text-gray-700 text-left">
<th class="px-4 py-2">DGUID</th>
<th class="px-4 py-2">Name</th>
<th class="px-4 py-2">Geographic Level</th>
</tr>
</thead>
<tbody id="resultsBody" class="text-sm text-gray-800 divide-y divide-gray-200">
<!-- Results will be inserted here -->
</tbody>
</table>
</div>
</div>
<script>
const input = document.getElementById('searchInput');
const resultsBody = document.getElementById('resultsBody');
let debounceTimer;
const geographicLevelLabels = {
13: "Country",
12: "Region",
11: "Province and Territory",
10: "Economic Region",
9: "Census Agricultural Region",
8: "Census Division",
7: "Census Consolidated Subdivision",
6: "Census Metropolitan Area",
5: "Census Subdivision",
4: "Federal Electoral District",
3: "Designated Place",
2: "Population Centre",
1: "Place Name"
};
input.addEventListener('input', () => {
const query = input.value.trim();
clearTimeout(debounceTimer);
if (query.length >= 3) {
debounceTimer = setTimeout(() => {
//fetch(`https://geographies.sisyphus.ca/api/?search=${encodeURIComponent(query)}`)
fetch(`/api/?search=${encodeURIComponent(query)}`)
.then(response => {
if (!response.ok) throw new Error('Network response was not ok');
return response.json();
})
.then(data => {
resultsBody.innerHTML = '';
if (data.length === 0) {
resultsBody.innerHTML = `
<tr>
<td colspan="4" class="px-4 py-2 text-red-500">No results</td>
</tr>
`;
}
data.forEach(item => {
const dguid = item[0];
const searchName = item[1];
const geographicLevel = item[2]
const geographicLevelText = geographicLevelLabels[geographicLevel];
const row = document.createElement('tr');
row.innerHTML = `
<td class="px-4 py-2"><a href="map.html?dguid=${dguid}" target="_blank">${dguid}</a></td>
<td class="px-4 py-2">${searchName}</td>
<td class="px-4 py-2">${geographicLevelText}</td>
`;
resultsBody.appendChild(row);
});
})
.catch(error => {
console.error('Error fetching data:', error);
resultsBody.innerHTML = `
<tr>
<td colspan="4" class="px-4 py-2 text-red-500">Error fetching data</td>
</tr>
`;
});
}, 300);
} else {
resultsBody.innerHTML = '';
}
});
</script>
</body>
</html>
+166
View File
@@ -0,0 +1,166 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>Statistics Canada Geography dguid Map</title>
<link href="./maplibre-gl.css" rel="stylesheet" />
<style>
body { margin: 0; padding: 0; }
html, body, #map { height: 100%; }
</style>
</head>
<body>
<div id="map"></div>
<script src="./maplibre-gl.js"></script>
<script>
// Initialize the map
const map = new maplibregl.Map({
container: 'map',
style: {
version: 8,
sources: {
'openstreetmap': {
type: 'raster',
tiles: [
'https://tile.openstreetmap.org/{z}/{x}/{y}.png'
],
tileSize: 256,
attribution: '&copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors'
}
},
layers: [
{
id: 'osm',
type: 'raster',
source: 'openstreetmap'
}
]
},
center: [-79.3832, 43.6532], // Default center (Toronto)
zoom: 0,
maxZoom: 18,
dragRotate: false, // disable drag rotation
keyboard: false, // disable keyboard interactions
pitchWithRotate: false // disable pitch with rotate
});
// Function to calculate the bounding box of GeoJSON data
function getGeoJSONBounds(geojson) {
const bounds = new maplibregl.LngLatBounds();
function extendBounds(coords) {
if (typeof coords[0] === 'number' && typeof coords[1] === 'number') {
bounds.extend(coords);
} else {
coords.forEach(extendBounds);
}
}
geojson.features.forEach(feature => {
const coords = feature.geometry.coordinates;
extendBounds(coords);
});
return bounds;
}
// Function to load GeoJSON from URL parameter
function loadGeoJSONFromURL() {
const urlParams = new URLSearchParams(window.location.search);
const dguid = urlParams.get('dguid');
//const geojsonUrl = `https://geographies.sisyphus.ca/dguid/?dguid=${dguid}`;
const geojsonUrl = `/dguid/?dguid=${dguid}`;
if (dguid) {
fetch(geojsonUrl)
.then(response => response.json())
.then(data => {
let geojsonData;
geojsonData = {
type: 'FeatureCollection',
features: [{
type: 'Feature',
geometry: data,
properties: {}
}]
};
map.addSource('geojson-source', {
type: 'geojson',
data: geojsonData
});
// Determine layer type based on geometry
const firstFeature = geojsonData.features[0];
let layerType = 'circle';
if (firstFeature.geometry.type === 'Polygon' || firstFeature.geometry.type === 'MultiPolygon') {
layerType = 'fill';
} else if (firstFeature.geometry.type === 'LineString' || firstFeature.geometry.type === 'MultiLineString') {
layerType = 'line';
}
const layer = {
id: 'geojson-layer',
type: layerType,
source: 'geojson-source',
paint: {}
};
if (layerType === 'circle') {
layer.paint = {
'circle-radius': 15,
'circle-color': '#ff0000'
};
} else if (layerType === 'fill') {
layer.paint = {
'fill-color': '#ff0000',
'fill-opacity': 0.5
};
} else if (layerType === 'line') {
layer.paint = {
'line-color': '#ff0000',
'line-width': 2
};
}
map.addLayer(layer);
// Zoom for point geometries
if (layerType == 'circle') {
const latitude = data['coordinates'][1];
const longitude = data['coordinates'][0];
map.flyTo({
center: [longitude, latitude],
zoom: 15,
animate: false
});
return
};
// Zoom for other types of geometries
const bounds = getGeoJSONBounds(geojsonData);
if (bounds.isEmpty()) {
console.warn('No valid coordinates found in GeoJSON.');
} else {
map.fitBounds(bounds, {
padding: 20,
animate: false
});
}
})
.catch(error => console.error('Error loading GeoJSON:', error));
} else {
console.warn('No GeoJSON URL parameter provided.');
}
}
// Disable only the rotation aspect
map.touchZoomRotate.disableRotation();
// Load GeoJSON when the map is ready
map.on('load', loadGeoJSONFromURL);
map.on('style.load', () => {
map.setProjection({ type: 'globe' });
});
</script>
</body>
</html>
+1
View File
File diff suppressed because one or more lines are too long
+59
View File
File diff suppressed because one or more lines are too long
+2
View File
File diff suppressed because one or more lines are too long
+47
View File
@@ -0,0 +1,47 @@
# Instructions
- Run the Jupyter Notebook (`jupyter execute generate_sql.ipynb`). This will generate `geography.db` and the GeoJSON file for every Statistics Canada geography (`geography` folder).
- Export the geographies table using sqlite3
```
sqlite3 notebooks/geography.db
.output geographies.sql
.exit
```
- Edit the `geographies.sql` file
- Remove PRAGMA foreign_keys=OFF; that is at the beginning of the file
- Remove TRANSACTION; that is at the beginning of the file
- Remove COMMIT; at the end of the file
- Remove the CREATE TABLE geographies statement
- Add the following to the top of the file:
```
DROP TABLE IF EXISTS geographies;
CREATE TABLE IF NOT EXISTS geographies (
id INTEGER PRIMARY KEY,
dguid TEXT,
search_name TEXT,
geographic_level INTEGER
);
DROP TABLE IF EXISTS geographies_fts;
CREATE VIRTUAL TABLE IF NOT EXISTS geographies_fts USING fts5(
id UNINDEXED,
search_name,
content='geographies',
content_rowid='id',
tokenize = "unicode61 tokenchars '-/.,''&():+'"
);
```
- Add the following to the bottom of the file:
```
INSERT INTO geographies_fts(geographies_fts) VALUES ('rebuild');
PRAGMA optimize;
```
- Upload the GeoJSON files to your R2 bucket
```
cd geojson
rclone copy . --transfers 50 --progress cloudflare:/geographies-search
```
- Insert the SQL into the D1 database
```
npx wrangler d1 execute geographies_search --remote --file=geographies.sql
```
+838
View File
@@ -0,0 +1,838 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "05ac8556",
"metadata": {},
"source": [
"# TODO\n",
"- Fix encoding issues with place names table (see below for troublesome records)\n",
"- Add remaining geographic hierarchy (Health Regions, CT, DA, DB, ADA, HCCSS)\n",
"- Read geographic hierachy from Parquet files and do the SQL work using DuckDB\n",
"- Add field so user can search by province (if possible). It won't be possible to add the field to the country and region tables\n",
"- Add field so user can search by census year\n",
"- Standardize search values. Look into porting CASK into JavaScript as the user input will need to be standardized as well"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "68f3cacd",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import sqlite3\n",
"\n",
"from dotenv import load_dotenv\n",
"import duckdb\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "88719ee9",
"metadata": {},
"source": [
"# Create the geographies table"
]
},
{
"cell_type": "markdown",
"id": "400083f5",
"metadata": {},
"source": [
"## Create tables in SQLite\n",
"These are the instructions for exporting the database tables and importing into Cloudflare D1. At the moment they are manually done, but I should automate it.\n",
"\n",
"1. Export the geographies table using `sqlite3`\n",
"```\n",
"sqlite3 geography.db\n",
".output ./geographies.sql\n",
".dump geographies\n",
"```\n",
"2. Remove the `PRAGMA foreign_keys=off`, `BEGIN TRANSACTION` and `COMMIT` parts\n",
"3. Remove the `CREATE TABLE geographies` statement\n",
"4. Add the following to the top, before the insert statements\n",
"```\n",
"DROP TABLE IF EXISTS geographies;\n",
"CREATE TABLE IF NOT EXISTS geographies (\n",
" id INTEGER PRIMARY KEY,\n",
" dguid TEXT,\n",
" search_name TEXT,\n",
" geographic_level INTEGER\n",
");\n",
"\n",
"DROP TABLE IF EXISTS geographies_fts;\n",
"CREATE VIRTUAL TABLE IF NOT EXISTS geographies_fts USING fts5(\n",
" id UNINDEXED,\n",
" search_name,\n",
" content='geographies',\n",
" content_rowid='id',\n",
" tokenize = \"unicode61 tokenchars '-/.,''&():+'\"\n",
");\n",
"\n",
"```\n",
"5. Add `INSERT INTO geographies_fts(geographies_fts) VALUES ('rebuild');` at the end of the SQL file\n",
"6. Add `PRAGMA optimize;` at the end of the SQL file. This is recommended https://developers.cloudflare.com/d1/best-practices/use-indexes/\n",
"7. Log into Cloudflare by doing npx wrangler login\n",
"8. Import as follows\n",
"```\n",
"npx wrangler d1 execute geographies_search --remote --file=./geographies.sql\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "17f8ffd8",
"metadata": {},
"outputs": [],
"source": [
"con = sqlite3.connect(\"geography.db\")\n",
"cur = con.cursor()\n",
"\n",
"cur.executescript(\"\"\"\n",
"DROP TABLE IF EXISTS geographies;\n",
"CREATE TABLE IF NOT EXISTS geographies (\n",
" id INTEGER PRIMARY KEY,\n",
" dguid TEXT,\n",
" search_name TEXT,\n",
" geographic_level INTEGER\n",
");\n",
"\"\"\")\n",
"\n",
"# Allow searches to use -/.,'&():+\n",
"cur.executescript(\"\"\"\n",
"DROP TABLE IF EXISTS geographies_fts;\n",
"CREATE VIRTUAL TABLE IF NOT EXISTS geographies_fts USING fts5(\n",
" id UNINDEXED,\n",
" search_name,\n",
" content='geographies',\n",
" content_rowid='id',\n",
" tokenize = \"unicode61 tokenchars '-/.,''&():+'\"\n",
");\n",
"\"\"\")\n",
"\n",
"con.commit()"
]
},
{
"cell_type": "markdown",
"id": "f8010194",
"metadata": {},
"source": [
"## SQL to create search table\n",
"For tables where there is an English and French field, it creates two records. Can probably add a field to the search table that tells the user whether the field is English, French, or Both.\n",
"\n",
"Statistics Canada searches English field when the page is in English, and it searches the French field when the page is in French. Here are the two examples:\n",
"- **English:** https://www150.statcan.gc.ca/n1/en/geo?geotext=Quebec%20%5BProvince%5D&geocode=A000224\n",
"- **French:** https://www150.statcan.gc.ca/n1/fr/geo?geotext=Qu%C3%A9bec%20%5BProvince%5D&geocode=A000224"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c04b4979",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7a9c8c29df524416a8280e3f80b2a6cb",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"duck_con = duckdb.connect()\n",
"duck_con.install_extension(\"spatial\")\n",
"duck_con.load_extension(\"spatial\")\n",
"\n",
"duck_con.sql(\"\"\"\n",
"DROP TABLE IF EXISTS geography;\n",
"CREATE TABLE geography AS\n",
"WITH country AS (\n",
"\tSELECT country_dguid AS dguid, country_en_name AS search_name, 13 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/country_2021.parquet'\n",
"), regions AS (\n",
"\tSELECT DISTINCT grc_dguid AS dguid, grc_en_name AS search_name, 12 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/grc_2021.parquet'\n",
"\tUNION\n",
"\tSELECT DISTINCT grc_dguid AS dguid, grc_fr_name AS search_name, 12 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/grc_2021.parquet'\n",
"), pr AS (\n",
"\tSELECT DISTINCT pr_dguid AS dguid, pr_en_name AS search_name, 11 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'hhttps://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/pr_2021.parquet'\n",
"\tUNION\n",
"\tSELECT DISTINCT pr_dguid AS dguid, pr_fr_name AS search_name, 11 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'hhttps://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/pr_2021.parquet'\n",
"), er AS (\n",
"\tSELECT DISTINCT er_dguid AS dguid, er_name AS search_name, 10 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/er_2021.parquet'\n",
"), car AS (\n",
"\tSELECT DISTINCT car_dguid AS dguid, car_en_name AS search_name, 9 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/car_2021.parquet'\n",
"\tUNION\n",
"\tSELECT DISTINCT car_dguid AS dguid, car_fr_name AS search_name, 9 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/car_2021.parquet'\n",
"), cd AS (\n",
"\tSELECT cd_dguid AS dguid, cd_name AS search_name, 8 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/cd_2021.parquet'\n",
"), ccs AS (\n",
"\tSELECT ccs_dguid AS dguid, ccs_name AS search_name, 7 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/ccs_2021.parquet'\n",
"), cma AS (\n",
"\tSELECT \n",
"\tCASE \n",
"\t\tWHEN cma_p_dguid IS NOT NULL THEN cma_p_dguid\n",
"\t\tELSE cma_dguid \n",
"\tEND AS dguid, cma_name AS search_name, 6 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/cma_2021.parquet'\n",
"), csd AS (\n",
"\tSELECT csd_dguid AS dguid, csd_name AS search_name, 5 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/csd_2021.parquet'\n",
"), fed AS (\n",
"\tSELECT DISTINCT fed_dguid AS dguid, fed_en_name AS search_name, 4 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/fed_2021_2013.parquet'\n",
"\tUNION\n",
"\tSELECT DISTINCT fed_dguid AS dguid, fed_fr_name AS search_name, 4 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/fed_2021_2013.parquet'\n",
"), dpl AS (\n",
"\tSELECT dpl_dguid AS dguid, dpl_name AS search_name, 3 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/dpl_2021.parquet'\n",
"), pc AS (\n",
"\tSELECT \n",
"\tCASE \n",
"\t\tWHEN pop_ctr_p_dguid IS NOT NULL THEN pop_ctr_p_dguid\n",
"\t\tELSE pop_ctr_dguid\n",
"\tEND AS dguid, pop_ctr_name AS search_name, 2 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'hhttps://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/pop_ctr_2021.parquet'\n",
"), pn AS (\n",
"\tSELECT pn_dguid AS dguid, pn_name AS search_name, 1 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/placenames/2021/pn_2021.parquet'\n",
"), concatenation AS (\n",
"\tSELECT * FROM country\n",
"\tUNION\n",
"\tSELECT * FROM regions\n",
"\tUNION\n",
"\tSELECT * FROM pr\n",
"\tUNION\n",
"\tSELECT * FROM er\n",
"\tUNION\n",
"\tSELECT * FROM car\n",
"\tUNION\n",
"\tSELECT * FROM cd\n",
"\tUNION\n",
"\tSELECT * FROM ccs\n",
"\tUNION\n",
"\tSELECT * FROM cma\n",
"\tUNION\n",
"\tSELECT * FROM csd\n",
"\tUNION\n",
"\tSELECT * FROM fed\n",
"\tUNION\n",
"\tSELECT * FROM dpl\n",
"\tUNION\n",
"\tSELECT * FROM pc\n",
" UNION\n",
"\tSELECT * FROM pn\n",
")\n",
"SELECT * FROM concatenation\n",
"ORDER BY search_name, geographic_level DESC;\n",
"\"\"\")\n",
"duck_con.commit()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "4c873914",
"metadata": {},
"outputs": [],
"source": [
"geography = duck_con.sql(\"SELECT * FROM geography;\").df()"
]
},
{
"cell_type": "markdown",
"id": "d1180ec8",
"metadata": {},
"source": [
"# TODO\n",
"## Fix encoding issues with place names"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "0d807307",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dguid</th>\n",
" <th>search_name</th>\n",
" <th>geographic_level</th>\n",
" <th>geom</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5653</th>\n",
" <td>2021S0515005422</td>\n",
" <td>Cascapédia–Saint-Jules</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-65.9166667,48....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8243</th>\n",
" <td>2021S0515007864</td>\n",
" <td>Côte-des-Neiges–Notre-Dame-de-Grâce</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-73.6263889,45....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18297</th>\n",
" <td>2021S0515017557</td>\n",
" <td>L'Île-Bizard–Sainte-Geneviève</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-73.866667,45.4...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20327</th>\n",
" <td>2021S0515019487</td>\n",
" <td>Le Coteau-des-Sœurs</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-70.456886,47.0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20569</th>\n",
" <td>2021S0515019731</td>\n",
" <td>Le Sacré-Cœur</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-69.979863,46.9...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23733</th>\n",
" <td>2021S0515022795</td>\n",
" <td>Mercier–Hochelaga-Maisonneuve</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-73.5388889,45....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25319</th>\n",
" <td>2021S0515024311</td>\n",
" <td>Métabetchouan–Lac-à-la-Croix</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-71.8666667,48....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29619</th>\n",
" <td>2021S0515028429</td>\n",
" <td>Port-Daniel–Gascons</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-64.9666667,48....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31289</th>\n",
" <td>2021S0515030028</td>\n",
" <td>Rivière-des-Prairies–Pointe-aux-Trembles</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-73.516667,45.65]}</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31432</th>\n",
" <td>2021S0515030168</td>\n",
" <td>Rock Forest–Saint-Élie–Deauville</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-72.0416667,45....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31702</th>\n",
" <td>2021S0515030432</td>\n",
" <td>Rosemont–La Petite-Patrie</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-73.5902778,45....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32617</th>\n",
" <td>2021S0515031197</td>\n",
" <td>Saint-Côme–Linière</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-70.5166667,46....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32737</th>\n",
" <td>2021S0515031295</td>\n",
" <td>Saint-Faustin–Lac-Carré</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-74.4833333,46....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33177</th>\n",
" <td>2021S0515031660</td>\n",
" <td>Saint-Lin–Laurentides</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-73.755663,45.8...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34031</th>\n",
" <td>2021S0515032370</td>\n",
" <td>Sainte-Foy–Sillery–Cap-Rouge</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-71.308333,46.7...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40241</th>\n",
" <td>2021S0515038300</td>\n",
" <td>Vieux-Québec–Basse-Ville</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-71.2069444,46....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40329</th>\n",
" <td>2021S0515038389</td>\n",
" <td>Villeray–Saint-Michel–Parc-Extension</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-73.6222222,45....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42483</th>\n",
" <td>2021S0515040448</td>\n",
" <td>Yuneŝit'in</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-123.1363889,51...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42561</th>\n",
" <td>2021S0515040522</td>\n",
" <td>ʔEsdilagh</td>\n",
" <td>1</td>\n",
" <td>{\"type\":\"Point\",\"coordinates\":[-122.4972222,52...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dguid search_name \\\n",
"5653 2021S0515005422 Cascapédia–Saint-Jules \n",
"8243 2021S0515007864 Côte-des-Neiges–Notre-Dame-de-Grâce \n",
"18297 2021S0515017557 L'Île-Bizard–Sainte-Geneviève \n",
"20327 2021S0515019487 Le Coteau-des-Sœurs \n",
"20569 2021S0515019731 Le Sacré-Cœur \n",
"23733 2021S0515022795 Mercier–Hochelaga-Maisonneuve \n",
"25319 2021S0515024311 Métabetchouan–Lac-à-la-Croix \n",
"29619 2021S0515028429 Port-Daniel–Gascons \n",
"31289 2021S0515030028 Rivière-des-Prairies–Pointe-aux-Trembles \n",
"31432 2021S0515030168 Rock Forest–Saint-Élie–Deauville \n",
"31702 2021S0515030432 Rosemont–La Petite-Patrie \n",
"32617 2021S0515031197 Saint-Côme–Linière \n",
"32737 2021S0515031295 Saint-Faustin–Lac-Carré \n",
"33177 2021S0515031660 Saint-Lin–Laurentides \n",
"34031 2021S0515032370 Sainte-Foy–Sillery–Cap-Rouge \n",
"40241 2021S0515038300 Vieux-Québec–Basse-Ville \n",
"40329 2021S0515038389 Villeray–Saint-Michel–Parc-Extension \n",
"42483 2021S0515040448 Yuneŝit'in \n",
"42561 2021S0515040522 ʔEsdilagh \n",
"\n",
" geographic_level geom \n",
"5653 1 {\"type\":\"Point\",\"coordinates\":[-65.9166667,48.... \n",
"8243 1 {\"type\":\"Point\",\"coordinates\":[-73.6263889,45.... \n",
"18297 1 {\"type\":\"Point\",\"coordinates\":[-73.866667,45.4... \n",
"20327 1 {\"type\":\"Point\",\"coordinates\":[-70.456886,47.0... \n",
"20569 1 {\"type\":\"Point\",\"coordinates\":[-69.979863,46.9... \n",
"23733 1 {\"type\":\"Point\",\"coordinates\":[-73.5388889,45.... \n",
"25319 1 {\"type\":\"Point\",\"coordinates\":[-71.8666667,48.... \n",
"29619 1 {\"type\":\"Point\",\"coordinates\":[-64.9666667,48.... \n",
"31289 1 {\"type\":\"Point\",\"coordinates\":[-73.516667,45.65]} \n",
"31432 1 {\"type\":\"Point\",\"coordinates\":[-72.0416667,45.... \n",
"31702 1 {\"type\":\"Point\",\"coordinates\":[-73.5902778,45.... \n",
"32617 1 {\"type\":\"Point\",\"coordinates\":[-70.5166667,46.... \n",
"32737 1 {\"type\":\"Point\",\"coordinates\":[-74.4833333,46.... \n",
"33177 1 {\"type\":\"Point\",\"coordinates\":[-73.755663,45.8... \n",
"34031 1 {\"type\":\"Point\",\"coordinates\":[-71.308333,46.7... \n",
"40241 1 {\"type\":\"Point\",\"coordinates\":[-71.2069444,46.... \n",
"40329 1 {\"type\":\"Point\",\"coordinates\":[-73.6222222,45.... \n",
"42483 1 {\"type\":\"Point\",\"coordinates\":[-123.1363889,51... \n",
"42561 1 {\"type\":\"Point\",\"coordinates\":[-122.4972222,52... "
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dguids_to_fix = ['2021S0515005422',\n",
" '2021S0515007864',\n",
" '2021S0515017557',\n",
" '2021S0515019487',\n",
" '2021S0515019731',\n",
" '2021S0515022795',\n",
" '2021S0515024311',\n",
" '2021S0515028429',\n",
" '2021S0515030028',\n",
" '2021S0515030168',\n",
" '2021S0515030432',\n",
" '2021S0515031197',\n",
" '2021S0515031295',\n",
" '2021S0515031660',\n",
" '2021S0515032370',\n",
" '2021S0515038300',\n",
" '2021S0515038389',\n",
" '2021S0515040448',\n",
" '2021S0515040522']\n",
"place_names_to_fix = geography[geography['dguid'].isin(dguids_to_fix)]\n",
"place_names_to_fix.head(19)"
]
},
{
"cell_type": "markdown",
"id": "5bec091c",
"metadata": {},
"source": [
"## Generate GeoJSON file for every dguid\n",
"Copy into Cloudflare R2 by running \n",
"```\n",
"cd geographies\n",
"rclone copy . --transfers 50 --progress cloudflare:/geographies-search\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "12ddba86",
"metadata": {},
"outputs": [],
"source": [
"if not os.path.exists(\"geojson\"):\n",
" print(\"Creating DGUID geojson folder\")\n",
" os.mkdir(\"geojson\")\n",
"\n",
"for record in geography.to_records():\n",
" dguid = record[1]\n",
" geom = record[-1]\n",
" path = f\"geojson/{dguid}.geojson\"\n",
" if os.path.exists(path):\n",
" continue\n",
" with open(path, 'w') as geography_fp:\n",
" geography_fp.write(geom)"
]
},
{
"cell_type": "markdown",
"id": "39c1ff9f",
"metadata": {},
"source": [
"## Insert data into SQLite database"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "6e39bbc4",
"metadata": {},
"outputs": [],
"source": [
"# Subset of fields to import into SQLite database, add id field as well\n",
"geography_subset = geography[['dguid', 'search_name', 'geographic_level']]\n",
"geography_subset.insert(0, 'id', geography_subset.index)\n",
"\n",
"cur.executemany(\"INSERT INTO geographies VALUES(?, ?, ?, ?)\", geography_subset.values.tolist())\n",
"cur.execute(\"INSERT INTO geographies_fts(geographies_fts) VALUES ('rebuild')\")\n",
"con.commit()"
]
},
{
"cell_type": "markdown",
"id": "0675ca6d",
"metadata": {},
"source": [
"### Test out a search query"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "c49c2f06",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dguid</th>\n",
" <th>search_name</th>\n",
" <th>geographic_level</th>\n",
" <th>rank</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2021S05003510</td>\n",
" <td>Ottawa</td>\n",
" <td>10</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2021A00033506</td>\n",
" <td>Ottawa</td>\n",
" <td>8</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2021S05023506008</td>\n",
" <td>Ottawa</td>\n",
" <td>7</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2021A00053506008</td>\n",
" <td>Ottawa</td>\n",
" <td>5</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2013A000435078</td>\n",
" <td>Ottawa--Vanier</td>\n",
" <td>4</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2013A000435075</td>\n",
" <td>Ottawa-Centre</td>\n",
" <td>4</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2013A000435079</td>\n",
" <td>Ottawa-Ouest--Nepean</td>\n",
" <td>4</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2013A000435077</td>\n",
" <td>Ottawa-Sud</td>\n",
" <td>4</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>2021S0515026282</td>\n",
" <td>Ottawa</td>\n",
" <td>1</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2021S0515026283</td>\n",
" <td>Ottawa</td>\n",
" <td>1</td>\n",
" <td>-9.011603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2013A000435075</td>\n",
" <td>Ottawa Centre</td>\n",
" <td>4</td>\n",
" <td>-6.940322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>2013A000435077</td>\n",
" <td>Ottawa South</td>\n",
" <td>4</td>\n",
" <td>-6.940322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>2013A000435079</td>\n",
" <td>Ottawa West--Nepean</td>\n",
" <td>4</td>\n",
" <td>-6.940322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>2021S0515026271</td>\n",
" <td>Ottawa Brook</td>\n",
" <td>1</td>\n",
" <td>-6.940322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>2021S0515026273</td>\n",
" <td>Ottawa East</td>\n",
" <td>1</td>\n",
" <td>-6.940322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>2021S0515026275</td>\n",
" <td>Ottawa South</td>\n",
" <td>1</td>\n",
" <td>-6.940322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>2021S0515026277</td>\n",
" <td>Ottawa West</td>\n",
" <td>1</td>\n",
" <td>-6.940322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>2021S0511240616</td>\n",
" <td>Ottawa - Gatineau</td>\n",
" <td>2</td>\n",
" <td>-5.643245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>2021S0511350616</td>\n",
" <td>Ottawa - Gatineau</td>\n",
" <td>2</td>\n",
" <td>-5.643245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>2021S050535505</td>\n",
" <td>Ottawa - Gatineau (Ontario part / partie de l'...</td>\n",
" <td>6</td>\n",
" <td>-2.660226</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>2021S050524505</td>\n",
" <td>Ottawa - Gatineau (partie du Québec / Quebec p...</td>\n",
" <td>6</td>\n",
" <td>-2.660226</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dguid search_name \\\n",
"0 2021S05003510 Ottawa \n",
"1 2021A00033506 Ottawa \n",
"2 2021S05023506008 Ottawa \n",
"3 2021A00053506008 Ottawa \n",
"4 2013A000435078 Ottawa--Vanier \n",
"5 2013A000435075 Ottawa-Centre \n",
"6 2013A000435079 Ottawa-Ouest--Nepean \n",
"7 2013A000435077 Ottawa-Sud \n",
"8 2021S0515026282 Ottawa \n",
"9 2021S0515026283 Ottawa \n",
"10 2013A000435075 Ottawa Centre \n",
"11 2013A000435077 Ottawa South \n",
"12 2013A000435079 Ottawa West--Nepean \n",
"13 2021S0515026271 Ottawa Brook \n",
"14 2021S0515026273 Ottawa East \n",
"15 2021S0515026275 Ottawa South \n",
"16 2021S0515026277 Ottawa West \n",
"17 2021S0511240616 Ottawa - Gatineau \n",
"18 2021S0511350616 Ottawa - Gatineau \n",
"19 2021S050535505 Ottawa - Gatineau (Ontario part / partie de l'... \n",
"20 2021S050524505 Ottawa - Gatineau (partie du Québec / Quebec p... \n",
"\n",
" geographic_level rank \n",
"0 10 -9.011603 \n",
"1 8 -9.011603 \n",
"2 7 -9.011603 \n",
"3 5 -9.011603 \n",
"4 4 -9.011603 \n",
"5 4 -9.011603 \n",
"6 4 -9.011603 \n",
"7 4 -9.011603 \n",
"8 1 -9.011603 \n",
"9 1 -9.011603 \n",
"10 4 -6.940322 \n",
"11 4 -6.940322 \n",
"12 4 -6.940322 \n",
"13 1 -6.940322 \n",
"14 1 -6.940322 \n",
"15 1 -6.940322 \n",
"16 1 -6.940322 \n",
"17 2 -5.643245 \n",
"18 2 -5.643245 \n",
"19 6 -2.660226 \n",
"20 6 -2.660226 "
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_sql_query(\"\"\"\n",
"SELECT geographies.dguid, fts.search_name, geographies.geographic_level, rank\n",
"FROM geographies_fts AS fts,\n",
" geographies\n",
"WHERE fts.search_name MATCH '\"Ottawa\"*'\n",
"AND fts.id = geographies.id\n",
"ORDER BY fts.rank, geographies.geographic_level DESC\n",
"\"\"\", con)\n",
"df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
+2496
View File
File diff suppressed because it is too large Load Diff
+17
View File
@@ -0,0 +1,17 @@
{
"name": "statcan-geographies-search",
"version": "0.0.1",
"author": "Diego Ripley <diego@diegoripley.ca>",
"scripts": {
"deploy": "wrangler deploy",
"dev": "wrangler dev",
"start": "wrangler dev"
},
"dependencies": {
"@tailwindcss/cli": "^4.1.4",
"tailwindcss": "^4.1.4"
},
"devDependencies": {
"wrangler": "^4.13.2"
}
}
+58
View File
@@ -0,0 +1,58 @@
export default {
async fetch(request, env, ctx) {
const url = new URL(request.url);
const params = url.searchParams;
// TODO: Add caching
// DGUID API, returns GeoJSON from R2 bucket. Called in map.html
if (url.pathname.startsWith("/dguid/")) {
const dguid = params.get("dguid");
const object = await env.BUCKET.get(`${dguid}.geojson`);
return new Response(object.body);
}
// Autocomplete API, called in index.html
if (url.pathname.startsWith("/api/")) {
const searchParam = params.get("search");
if (!searchParam || searchParam.length < 3) {
return new Response("Search parameter must be at least 3 characters.", { status: 400 });
}
const cache = caches.default;
const cacheKey = new Request(url.toString(), request);
let response = await cache.match(cacheKey);
if (response) {
return response;
}
const searchTerm = `"${searchParam}"*`;
// Need to use a D1 session to use read replica
const session = env.DB.withSession()
const stmt = await session
.prepare(`
SELECT geographies.dguid, fts.search_name, geographies.geographic_level
FROM geographies_fts AS fts,
geographies
WHERE fts.search_name MATCH ?
AND fts.id = geographies.id
ORDER BY fts.rank, geographies.geographic_level DESC
`).bind(searchTerm);
const returnValue = await stmt.raw();
response = new Response(JSON.stringify(returnValue), {
headers: {
"Content-Type": "application/json",
"Cache-Control": "max-age=604800"
}
});
ctx.waitUntil(cache.put(cacheKey, response.clone()));
return response;
}
// Serve static assets in /dist folder
if (url.pathname = "/") {
return env.ASSETS.fetch(request);
}
}
};
+1
View File
@@ -0,0 +1 @@
@import "tailwindcss";
+20
View File
@@ -0,0 +1,20 @@
name = "statcan-geographies-search"
main = "src/index.js"
compatibility_date = "2025-04-02"
workers_dev = true
routes = [
{ pattern = "statcan-geography.dataforcanada.org", custom_domain = true }
]
[assets]
directory = "./dist"
binding = "ASSETS"
[[d1_databases]]
binding = "DB"
database_name = "geographies_search"
database_id = "bc12b7cd-9f12-45c7-bb89-4ab22b3b2c8b"
[[r2_buckets]]
binding = 'BUCKET'
bucket_name = 'geographies-search'