Commit Graph

15 Commits

Author SHA1 Message Date
Diego Ripley a55e1d325d Made changes 2025-06-26 13:35:37 +00:00
Diego Ripley 4ed5fb4bbb Add DuckDB example for duplicate column name 2025-06-25 15:38:12 +00:00
Diego Ripley b71a7b326e DuckDB issue with duplicate column names (ex. 'Value' and 'VALUE' are treated the same) 2025-06-25 15:30:36 +00:00
Diego Ripley e929850d4a Finish comment on issue with Value and VALUE columns being treated the same by DuckDB 2025-06-21 18:03:29 +00:00
Diego Ripley 8875722d10 Made changes to processing of data tables 2025-06-21 18:01:16 +00:00
Diego Ripley 7c8211cb5f Found some issues with the output parquet files 2025-06-21 05:26:50 +00:00
Diego Ripley 887291d2f7 Read all DGUIDs from subset parquet output (100,000 records each) 2025-06-21 00:54:26 -04:00
Diego Ripley 72ca6c87e1 Made changes 2025-06-20 17:32:01 -04:00
Diego Ripley 5a95616b3c Calculate CSV file size by viewing inside of zip file 2025-06-20 16:01:20 -04:00
Diego Ripley e836363cd1 Had to optimize the code. Leaving it outside of function for now in case I need to continue working on it 2025-06-20 16:00:51 -04:00
Diego Ripley f6d88c5fd0 Continue work on processing data tables 2025-06-19 15:58:30 -04:00
Diego Ripley ab8f40c708 Keeping track of processed files in case processing crashes and I have to restart again 2025-06-19 11:46:31 -04:00
Diego Ripley faa63451ab Experiment with Jupyter notebook on downloading and processing statcan cubes 2025-06-18 21:26:51 +00:00
Diego Ripley c0899080f4 Remove scratch files after processing. Was running out of space 2025-06-18 09:26:18 -04:00
Diego Ripley ea603f2914 Convert statcan CSV into parquet 2025-06-17 20:46:24 +00:00