mirror of
https://github.com/dataforcanada/d4c-service-statcan-geography.git
synced 2026-06-13 14:31:01 +02:00
839 lines
32 KiB
Plaintext
839 lines
32 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "05ac8556",
|
|
"metadata": {},
|
|
"source": [
|
|
"# TODO\n",
|
|
"- Fix encoding issues with place names table (see below for troublesome records)\n",
|
|
"- Add remaining geographic hierarchy (Health Regions, CT, DA, DB, ADA, HCCSS)\n",
|
|
"- Read geographic hierachy from Parquet files and do the SQL work using DuckDB\n",
|
|
"- Add field so user can search by province (if possible). It won't be possible to add the field to the country and region tables\n",
|
|
"- Add field so user can search by census year\n",
|
|
"- Standardize search values. Look into porting CASK into JavaScript as the user input will need to be standardized as well"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 33,
|
|
"id": "68f3cacd",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"import sqlite3\n",
|
|
"\n",
|
|
"from dotenv import load_dotenv\n",
|
|
"import duckdb\n",
|
|
"import pandas as pd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "88719ee9",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Create the geographies table"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "400083f5",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create tables in SQLite\n",
|
|
"These are the instructions for exporting the database tables and importing into Cloudflare D1. At the moment they are manually done, but I should automate it.\n",
|
|
"\n",
|
|
"1. Export the geographies table using `sqlite3`\n",
|
|
"```\n",
|
|
"sqlite3 geography.db\n",
|
|
".output ./geographies.sql\n",
|
|
".dump geographies\n",
|
|
"```\n",
|
|
"2. Remove the `PRAGMA foreign_keys=off`, `BEGIN TRANSACTION` and `COMMIT` parts\n",
|
|
"3. Remove the `CREATE TABLE geographies` statement\n",
|
|
"4. Add the following to the top, before the insert statements\n",
|
|
"```\n",
|
|
"DROP TABLE IF EXISTS geographies;\n",
|
|
"CREATE TABLE IF NOT EXISTS geographies (\n",
|
|
" id INTEGER PRIMARY KEY,\n",
|
|
" dguid TEXT,\n",
|
|
" search_name TEXT,\n",
|
|
" geographic_level INTEGER\n",
|
|
");\n",
|
|
"\n",
|
|
"DROP TABLE IF EXISTS geographies_fts;\n",
|
|
"CREATE VIRTUAL TABLE IF NOT EXISTS geographies_fts USING fts5(\n",
|
|
" id UNINDEXED,\n",
|
|
" search_name,\n",
|
|
" content='geographies',\n",
|
|
" content_rowid='id',\n",
|
|
" tokenize = \"unicode61 tokenchars '-/.,''&():+'\"\n",
|
|
");\n",
|
|
"\n",
|
|
"```\n",
|
|
"5. Add `INSERT INTO geographies_fts(geographies_fts) VALUES ('rebuild');` at the end of the SQL file\n",
|
|
"6. Add `PRAGMA optimize;` at the end of the SQL file. This is recommended https://developers.cloudflare.com/d1/best-practices/use-indexes/\n",
|
|
"7. Log into Cloudflare by doing npx wrangler login\n",
|
|
"8. Import as follows\n",
|
|
"```\n",
|
|
"npx wrangler d1 execute geographies_search --remote --file=./geographies.sql\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 49,
|
|
"id": "17f8ffd8",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"con = sqlite3.connect(\"geography.db\")\n",
|
|
"cur = con.cursor()\n",
|
|
"\n",
|
|
"cur.executescript(\"\"\"\n",
|
|
"DROP TABLE IF EXISTS geographies;\n",
|
|
"CREATE TABLE IF NOT EXISTS geographies (\n",
|
|
" id INTEGER PRIMARY KEY,\n",
|
|
" dguid TEXT,\n",
|
|
" search_name TEXT,\n",
|
|
" geographic_level INTEGER\n",
|
|
");\n",
|
|
"\"\"\")\n",
|
|
"\n",
|
|
"# Allow searches to use -/.,'&():+\n",
|
|
"cur.executescript(\"\"\"\n",
|
|
"DROP TABLE IF EXISTS geographies_fts;\n",
|
|
"CREATE VIRTUAL TABLE IF NOT EXISTS geographies_fts USING fts5(\n",
|
|
" id UNINDEXED,\n",
|
|
" search_name,\n",
|
|
" content='geographies',\n",
|
|
" content_rowid='id',\n",
|
|
" tokenize = \"unicode61 tokenchars '-/.,''&():+'\"\n",
|
|
");\n",
|
|
"\"\"\")\n",
|
|
"\n",
|
|
"con.commit()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f8010194",
|
|
"metadata": {},
|
|
"source": [
|
|
"## SQL to create search table\n",
|
|
"For tables where there is an English and French field, it creates two records. Can probably add a field to the search table that tells the user whether the field is English, French, or Both.\n",
|
|
"\n",
|
|
"Statistics Canada searches English field when the page is in English, and it searches the French field when the page is in French. Here are the two examples:\n",
|
|
"- **English:** https://www150.statcan.gc.ca/n1/en/geo?geotext=Quebec%20%5BProvince%5D&geocode=A000224\n",
|
|
"- **French:** https://www150.statcan.gc.ca/n1/fr/geo?geotext=Qu%C3%A9bec%20%5BProvince%5D&geocode=A000224"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c04b4979",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"model_id": "7a9c8c29df524416a8280e3f80b2a6cb",
|
|
"version_major": 2,
|
|
"version_minor": 0
|
|
},
|
|
"text/plain": [
|
|
"FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"duck_con = duckdb.connect()\n",
|
|
"duck_con.install_extension(\"spatial\")\n",
|
|
"duck_con.load_extension(\"spatial\")\n",
|
|
"\n",
|
|
"duck_con.sql(\"\"\"\n",
|
|
"DROP TABLE IF EXISTS geography;\n",
|
|
"CREATE TABLE geography AS\n",
|
|
"WITH country AS (\n",
|
|
"\tSELECT country_dguid AS dguid, country_en_name AS search_name, 13 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/country_2021.parquet'\n",
|
|
"), regions AS (\n",
|
|
"\tSELECT DISTINCT grc_dguid AS dguid, grc_en_name AS search_name, 12 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/grc_2021.parquet'\n",
|
|
"\tUNION\n",
|
|
"\tSELECT DISTINCT grc_dguid AS dguid, grc_fr_name AS search_name, 12 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/grc_2021.parquet'\n",
|
|
"), pr AS (\n",
|
|
"\tSELECT DISTINCT pr_dguid AS dguid, pr_en_name AS search_name, 11 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'hhttps://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/pr_2021.parquet'\n",
|
|
"\tUNION\n",
|
|
"\tSELECT DISTINCT pr_dguid AS dguid, pr_fr_name AS search_name, 11 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'hhttps://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/pr_2021.parquet'\n",
|
|
"), er AS (\n",
|
|
"\tSELECT DISTINCT er_dguid AS dguid, er_name AS search_name, 10 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/er_2021.parquet'\n",
|
|
"), car AS (\n",
|
|
"\tSELECT DISTINCT car_dguid AS dguid, car_en_name AS search_name, 9 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/car_2021.parquet'\n",
|
|
"\tUNION\n",
|
|
"\tSELECT DISTINCT car_dguid AS dguid, car_fr_name AS search_name, 9 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/car_2021.parquet'\n",
|
|
"), cd AS (\n",
|
|
"\tSELECT cd_dguid AS dguid, cd_name AS search_name, 8 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/cd_2021.parquet'\n",
|
|
"), ccs AS (\n",
|
|
"\tSELECT ccs_dguid AS dguid, ccs_name AS search_name, 7 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/ccs_2021.parquet'\n",
|
|
"), cma AS (\n",
|
|
"\tSELECT \n",
|
|
"\tCASE \n",
|
|
"\t\tWHEN cma_p_dguid IS NOT NULL THEN cma_p_dguid\n",
|
|
"\t\tELSE cma_dguid \n",
|
|
"\tEND AS dguid, cma_name AS search_name, 6 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/cma_2021.parquet'\n",
|
|
"), csd AS (\n",
|
|
"\tSELECT csd_dguid AS dguid, csd_name AS search_name, 5 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/csd_2021.parquet'\n",
|
|
"), fed AS (\n",
|
|
"\tSELECT DISTINCT fed_dguid AS dguid, fed_en_name AS search_name, 4 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/fed_2021_2013.parquet'\n",
|
|
"\tUNION\n",
|
|
"\tSELECT DISTINCT fed_dguid AS dguid, fed_fr_name AS search_name, 4 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/fed_2021_2013.parquet'\n",
|
|
"), dpl AS (\n",
|
|
"\tSELECT dpl_dguid AS dguid, dpl_name AS search_name, 3 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/dpl_2021.parquet'\n",
|
|
"), pc AS (\n",
|
|
"\tSELECT \n",
|
|
"\tCASE \n",
|
|
"\t\tWHEN pop_ctr_p_dguid IS NOT NULL THEN pop_ctr_p_dguid\n",
|
|
"\t\tELSE pop_ctr_dguid\n",
|
|
"\tEND AS dguid, pop_ctr_name AS search_name, 2 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'hhttps://data.dataforcanada.org/processed/statistics_canada/boundaries/2021/digital_boundary_files/pop_ctr_2021.parquet'\n",
|
|
"), pn AS (\n",
|
|
"\tSELECT pn_dguid AS dguid, pn_name AS search_name, 1 AS geographic_level, ST_AsGeoJSON(geom) AS geom FROM 'https://data.dataforcanada.org/processed/statistics_canada/placenames/2021/pn_2021.parquet'\n",
|
|
"), concatenation AS (\n",
|
|
"\tSELECT * FROM country\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM regions\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM pr\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM er\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM car\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM cd\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM ccs\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM cma\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM csd\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM fed\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM dpl\n",
|
|
"\tUNION\n",
|
|
"\tSELECT * FROM pc\n",
|
|
" UNION\n",
|
|
"\tSELECT * FROM pn\n",
|
|
")\n",
|
|
"SELECT * FROM concatenation\n",
|
|
"ORDER BY search_name, geographic_level DESC;\n",
|
|
"\"\"\")\n",
|
|
"duck_con.commit()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 37,
|
|
"id": "4c873914",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"geography = duck_con.sql(\"SELECT * FROM geography;\").df()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d1180ec8",
|
|
"metadata": {},
|
|
"source": [
|
|
"# TODO\n",
|
|
"## Fix encoding issues with place names"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 39,
|
|
"id": "0d807307",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>dguid</th>\n",
|
|
" <th>search_name</th>\n",
|
|
" <th>geographic_level</th>\n",
|
|
" <th>geom</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>5653</th>\n",
|
|
" <td>2021S0515005422</td>\n",
|
|
" <td>CascapédiaSaint-Jules</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-65.9166667,48....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8243</th>\n",
|
|
" <td>2021S0515007864</td>\n",
|
|
" <td>Côte-des-NeigesNotre-Dame-de-Grâce</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-73.6263889,45....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>18297</th>\n",
|
|
" <td>2021S0515017557</td>\n",
|
|
" <td>L'Île-BizardSainte-Geneviève</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-73.866667,45.4...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>20327</th>\n",
|
|
" <td>2021S0515019487</td>\n",
|
|
" <td>Le Coteau-des-Surs</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-70.456886,47.0...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>20569</th>\n",
|
|
" <td>2021S0515019731</td>\n",
|
|
" <td>Le Sacré-Cur</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-69.979863,46.9...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>23733</th>\n",
|
|
" <td>2021S0515022795</td>\n",
|
|
" <td>MercierHochelaga-Maisonneuve</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-73.5388889,45....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>25319</th>\n",
|
|
" <td>2021S0515024311</td>\n",
|
|
" <td>MétabetchouanLac-à-la-Croix</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-71.8666667,48....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>29619</th>\n",
|
|
" <td>2021S0515028429</td>\n",
|
|
" <td>Port-DanielGascons</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-64.9666667,48....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>31289</th>\n",
|
|
" <td>2021S0515030028</td>\n",
|
|
" <td>Rivière-des-PrairiesPointe-aux-Trembles</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-73.516667,45.65]}</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>31432</th>\n",
|
|
" <td>2021S0515030168</td>\n",
|
|
" <td>Rock ForestSaint-ÉlieDeauville</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-72.0416667,45....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>31702</th>\n",
|
|
" <td>2021S0515030432</td>\n",
|
|
" <td>RosemontLa Petite-Patrie</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-73.5902778,45....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>32617</th>\n",
|
|
" <td>2021S0515031197</td>\n",
|
|
" <td>Saint-CômeLinière</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-70.5166667,46....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>32737</th>\n",
|
|
" <td>2021S0515031295</td>\n",
|
|
" <td>Saint-FaustinLac-Carré</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-74.4833333,46....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>33177</th>\n",
|
|
" <td>2021S0515031660</td>\n",
|
|
" <td>Saint-LinLaurentides</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-73.755663,45.8...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>34031</th>\n",
|
|
" <td>2021S0515032370</td>\n",
|
|
" <td>Sainte-FoySilleryCap-Rouge</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-71.308333,46.7...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>40241</th>\n",
|
|
" <td>2021S0515038300</td>\n",
|
|
" <td>Vieux-QuébecBasse-Ville</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-71.2069444,46....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>40329</th>\n",
|
|
" <td>2021S0515038389</td>\n",
|
|
" <td>VilleraySaint-MichelParc-Extension</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-73.6222222,45....</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>42483</th>\n",
|
|
" <td>2021S0515040448</td>\n",
|
|
" <td>YunesÌit'in</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-123.1363889,51...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>42561</th>\n",
|
|
" <td>2021S0515040522</td>\n",
|
|
" <td>ÊEsdilagh</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>{\"type\":\"Point\",\"coordinates\":[-122.4972222,52...</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" dguid search_name \\\n",
|
|
"5653 2021S0515005422 CascapédiaSaint-Jules \n",
|
|
"8243 2021S0515007864 Côte-des-NeigesNotre-Dame-de-Grâce \n",
|
|
"18297 2021S0515017557 L'Île-BizardSainte-Geneviève \n",
|
|
"20327 2021S0515019487 Le Coteau-des-Surs \n",
|
|
"20569 2021S0515019731 Le Sacré-Cur \n",
|
|
"23733 2021S0515022795 MercierHochelaga-Maisonneuve \n",
|
|
"25319 2021S0515024311 MétabetchouanLac-à-la-Croix \n",
|
|
"29619 2021S0515028429 Port-DanielGascons \n",
|
|
"31289 2021S0515030028 Rivière-des-PrairiesPointe-aux-Trembles \n",
|
|
"31432 2021S0515030168 Rock ForestSaint-ÉlieDeauville \n",
|
|
"31702 2021S0515030432 RosemontLa Petite-Patrie \n",
|
|
"32617 2021S0515031197 Saint-CômeLinière \n",
|
|
"32737 2021S0515031295 Saint-FaustinLac-Carré \n",
|
|
"33177 2021S0515031660 Saint-LinLaurentides \n",
|
|
"34031 2021S0515032370 Sainte-FoySilleryCap-Rouge \n",
|
|
"40241 2021S0515038300 Vieux-QuébecBasse-Ville \n",
|
|
"40329 2021S0515038389 VilleraySaint-MichelParc-Extension \n",
|
|
"42483 2021S0515040448 YunesÌit'in \n",
|
|
"42561 2021S0515040522 ÊEsdilagh \n",
|
|
"\n",
|
|
" geographic_level geom \n",
|
|
"5653 1 {\"type\":\"Point\",\"coordinates\":[-65.9166667,48.... \n",
|
|
"8243 1 {\"type\":\"Point\",\"coordinates\":[-73.6263889,45.... \n",
|
|
"18297 1 {\"type\":\"Point\",\"coordinates\":[-73.866667,45.4... \n",
|
|
"20327 1 {\"type\":\"Point\",\"coordinates\":[-70.456886,47.0... \n",
|
|
"20569 1 {\"type\":\"Point\",\"coordinates\":[-69.979863,46.9... \n",
|
|
"23733 1 {\"type\":\"Point\",\"coordinates\":[-73.5388889,45.... \n",
|
|
"25319 1 {\"type\":\"Point\",\"coordinates\":[-71.8666667,48.... \n",
|
|
"29619 1 {\"type\":\"Point\",\"coordinates\":[-64.9666667,48.... \n",
|
|
"31289 1 {\"type\":\"Point\",\"coordinates\":[-73.516667,45.65]} \n",
|
|
"31432 1 {\"type\":\"Point\",\"coordinates\":[-72.0416667,45.... \n",
|
|
"31702 1 {\"type\":\"Point\",\"coordinates\":[-73.5902778,45.... \n",
|
|
"32617 1 {\"type\":\"Point\",\"coordinates\":[-70.5166667,46.... \n",
|
|
"32737 1 {\"type\":\"Point\",\"coordinates\":[-74.4833333,46.... \n",
|
|
"33177 1 {\"type\":\"Point\",\"coordinates\":[-73.755663,45.8... \n",
|
|
"34031 1 {\"type\":\"Point\",\"coordinates\":[-71.308333,46.7... \n",
|
|
"40241 1 {\"type\":\"Point\",\"coordinates\":[-71.2069444,46.... \n",
|
|
"40329 1 {\"type\":\"Point\",\"coordinates\":[-73.6222222,45.... \n",
|
|
"42483 1 {\"type\":\"Point\",\"coordinates\":[-123.1363889,51... \n",
|
|
"42561 1 {\"type\":\"Point\",\"coordinates\":[-122.4972222,52... "
|
|
]
|
|
},
|
|
"execution_count": 39,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"dguids_to_fix = ['2021S0515005422',\n",
|
|
" '2021S0515007864',\n",
|
|
" '2021S0515017557',\n",
|
|
" '2021S0515019487',\n",
|
|
" '2021S0515019731',\n",
|
|
" '2021S0515022795',\n",
|
|
" '2021S0515024311',\n",
|
|
" '2021S0515028429',\n",
|
|
" '2021S0515030028',\n",
|
|
" '2021S0515030168',\n",
|
|
" '2021S0515030432',\n",
|
|
" '2021S0515031197',\n",
|
|
" '2021S0515031295',\n",
|
|
" '2021S0515031660',\n",
|
|
" '2021S0515032370',\n",
|
|
" '2021S0515038300',\n",
|
|
" '2021S0515038389',\n",
|
|
" '2021S0515040448',\n",
|
|
" '2021S0515040522']\n",
|
|
"place_names_to_fix = geography[geography['dguid'].isin(dguids_to_fix)]\n",
|
|
"place_names_to_fix.head(19)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5bec091c",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Generate GeoJSON file for every dguid\n",
|
|
"Copy into Cloudflare R2 by running \n",
|
|
"```\n",
|
|
"cd geographies\n",
|
|
"rclone copy . --transfers 50 --progress cloudflare:/geographies-search\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "12ddba86",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"if not os.path.exists(\"geojson\"):\n",
|
|
" print(\"Creating DGUID geojson folder\")\n",
|
|
" os.mkdir(\"geojson\")\n",
|
|
"\n",
|
|
"for record in geography.to_records():\n",
|
|
" dguid = record[1]\n",
|
|
" geom = record[-1]\n",
|
|
" path = f\"geojson/{dguid}.geojson\"\n",
|
|
" if os.path.exists(path):\n",
|
|
" continue\n",
|
|
" with open(path, 'w') as geography_fp:\n",
|
|
" geography_fp.write(geom)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "39c1ff9f",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Insert data into SQLite database"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 50,
|
|
"id": "6e39bbc4",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Subset of fields to import into SQLite database, add id field as well\n",
|
|
"geography_subset = geography[['dguid', 'search_name', 'geographic_level']]\n",
|
|
"geography_subset.insert(0, 'id', geography_subset.index)\n",
|
|
"\n",
|
|
"cur.executemany(\"INSERT INTO geographies VALUES(?, ?, ?, ?)\", geography_subset.values.tolist())\n",
|
|
"cur.execute(\"INSERT INTO geographies_fts(geographies_fts) VALUES ('rebuild')\")\n",
|
|
"con.commit()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0675ca6d",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Test out a search query"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 51,
|
|
"id": "c49c2f06",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>dguid</th>\n",
|
|
" <th>search_name</th>\n",
|
|
" <th>geographic_level</th>\n",
|
|
" <th>rank</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>2021S05003510</td>\n",
|
|
" <td>Ottawa</td>\n",
|
|
" <td>10</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>2021A00033506</td>\n",
|
|
" <td>Ottawa</td>\n",
|
|
" <td>8</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>2021S05023506008</td>\n",
|
|
" <td>Ottawa</td>\n",
|
|
" <td>7</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>2021A00053506008</td>\n",
|
|
" <td>Ottawa</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>2013A000435078</td>\n",
|
|
" <td>Ottawa--Vanier</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>2013A000435075</td>\n",
|
|
" <td>Ottawa-Centre</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>2013A000435079</td>\n",
|
|
" <td>Ottawa-Ouest--Nepean</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>2013A000435077</td>\n",
|
|
" <td>Ottawa-Sud</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>2021S0515026282</td>\n",
|
|
" <td>Ottawa</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td>2021S0515026283</td>\n",
|
|
" <td>Ottawa</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>-9.011603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>10</th>\n",
|
|
" <td>2013A000435075</td>\n",
|
|
" <td>Ottawa Centre</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>-6.940322</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>11</th>\n",
|
|
" <td>2013A000435077</td>\n",
|
|
" <td>Ottawa South</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>-6.940322</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>12</th>\n",
|
|
" <td>2013A000435079</td>\n",
|
|
" <td>Ottawa West--Nepean</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>-6.940322</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>13</th>\n",
|
|
" <td>2021S0515026271</td>\n",
|
|
" <td>Ottawa Brook</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>-6.940322</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>14</th>\n",
|
|
" <td>2021S0515026273</td>\n",
|
|
" <td>Ottawa East</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>-6.940322</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>15</th>\n",
|
|
" <td>2021S0515026275</td>\n",
|
|
" <td>Ottawa South</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>-6.940322</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>16</th>\n",
|
|
" <td>2021S0515026277</td>\n",
|
|
" <td>Ottawa West</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>-6.940322</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>17</th>\n",
|
|
" <td>2021S0511240616</td>\n",
|
|
" <td>Ottawa - Gatineau</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>-5.643245</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>18</th>\n",
|
|
" <td>2021S0511350616</td>\n",
|
|
" <td>Ottawa - Gatineau</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>-5.643245</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>19</th>\n",
|
|
" <td>2021S050535505</td>\n",
|
|
" <td>Ottawa - Gatineau (Ontario part / partie de l'...</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>-2.660226</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>20</th>\n",
|
|
" <td>2021S050524505</td>\n",
|
|
" <td>Ottawa - Gatineau (partie du Québec / Quebec p...</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>-2.660226</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" dguid search_name \\\n",
|
|
"0 2021S05003510 Ottawa \n",
|
|
"1 2021A00033506 Ottawa \n",
|
|
"2 2021S05023506008 Ottawa \n",
|
|
"3 2021A00053506008 Ottawa \n",
|
|
"4 2013A000435078 Ottawa--Vanier \n",
|
|
"5 2013A000435075 Ottawa-Centre \n",
|
|
"6 2013A000435079 Ottawa-Ouest--Nepean \n",
|
|
"7 2013A000435077 Ottawa-Sud \n",
|
|
"8 2021S0515026282 Ottawa \n",
|
|
"9 2021S0515026283 Ottawa \n",
|
|
"10 2013A000435075 Ottawa Centre \n",
|
|
"11 2013A000435077 Ottawa South \n",
|
|
"12 2013A000435079 Ottawa West--Nepean \n",
|
|
"13 2021S0515026271 Ottawa Brook \n",
|
|
"14 2021S0515026273 Ottawa East \n",
|
|
"15 2021S0515026275 Ottawa South \n",
|
|
"16 2021S0515026277 Ottawa West \n",
|
|
"17 2021S0511240616 Ottawa - Gatineau \n",
|
|
"18 2021S0511350616 Ottawa - Gatineau \n",
|
|
"19 2021S050535505 Ottawa - Gatineau (Ontario part / partie de l'... \n",
|
|
"20 2021S050524505 Ottawa - Gatineau (partie du Québec / Quebec p... \n",
|
|
"\n",
|
|
" geographic_level rank \n",
|
|
"0 10 -9.011603 \n",
|
|
"1 8 -9.011603 \n",
|
|
"2 7 -9.011603 \n",
|
|
"3 5 -9.011603 \n",
|
|
"4 4 -9.011603 \n",
|
|
"5 4 -9.011603 \n",
|
|
"6 4 -9.011603 \n",
|
|
"7 4 -9.011603 \n",
|
|
"8 1 -9.011603 \n",
|
|
"9 1 -9.011603 \n",
|
|
"10 4 -6.940322 \n",
|
|
"11 4 -6.940322 \n",
|
|
"12 4 -6.940322 \n",
|
|
"13 1 -6.940322 \n",
|
|
"14 1 -6.940322 \n",
|
|
"15 1 -6.940322 \n",
|
|
"16 1 -6.940322 \n",
|
|
"17 2 -5.643245 \n",
|
|
"18 2 -5.643245 \n",
|
|
"19 6 -2.660226 \n",
|
|
"20 6 -2.660226 "
|
|
]
|
|
},
|
|
"execution_count": 51,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df = pd.read_sql_query(\"\"\"\n",
|
|
"SELECT geographies.dguid, fts.search_name, geographies.geographic_level, rank\n",
|
|
"FROM geographies_fts AS fts,\n",
|
|
" geographies\n",
|
|
"WHERE fts.search_name MATCH '\"Ottawa\"*'\n",
|
|
"AND fts.id = geographies.id\n",
|
|
"ORDER BY fts.rank, geographies.geographic_level DESC\n",
|
|
"\"\"\", con)\n",
|
|
"df"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.9"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|