Conversation
There was a problem hiding this comment.
Pull request overview
This PR upgrades the project’s debug mode to support running a single module for a single year by re-using a pre-existing complete run_id, while avoiding writes to the production database and saving outputs locally instead.
Changes:
- Add a
debugflag to runtime configuration/parsing and propagate it through module entrypoints frommain.py. - When
debugis enabled, skip database inserts/updates in modules and write inputs/outputs to a localdebug_output/folder. - Simplify debug configuration to
{run_id, year, module}and validate that the referenced run is complete.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
python/utils.py |
Exposes DEBUG from parser, adds debug_output/ folder creation, expands runtime logging. |
python/parsers.py |
Reworks debug config to single {run_id, year, module}, validates run completeness, adds module list constant. |
main.py |
Threads debug=utils.DEBUG into all module orchestrators. |
python/startup.py |
Adds debug parameter and skips DB insertion in debug mode. |
python/staging.py |
Adds debug parameter and skips metadata update in debug mode. |
python/hs_hh.py |
Adds debug parameter; writes CSVs instead of DB inserts in debug mode. |
python/pop_type.py |
Adds debug parameter; writes CSVs instead of DB inserts in debug mode. |
python/ase.py |
Adds debug parameter; writes CSVs instead of DB inserts in debug mode (including BULK INSERT bypass). |
python/hh_characteristics.py |
Adds debug parameter; writes CSVs instead of DB inserts in debug mode. |
python/employment.py |
Adds debug parameter; writes CSVs instead of DB inserts in debug mode. |
config.toml |
Updates debug configuration shape to {run_id, year, module} with new comments. |
README.md |
Minor TOML example quoting change for sql.staging. |
.gitignore |
Adds debug_output/ and replaces with a more complete Python .gitignore template. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
This PR upgrades the project’s debug mode so a single module can be run against a previously completed run_id, while avoiding writes to the production database by exporting outputs locally.
Changes:
- Reworks debug configuration parsing/validation to target exactly one module + one year for an existing complete
run_id. - Threads a
debugflag through module entrypoints to skip DB inserts/updates and write outputs todebug_output/instead. - Updates documentation/config examples and expands
.gitignoreto ignoredebug_output/(and adopts a standard Python ignore template).
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
python/utils.py |
Exposes DEBUG, expands startup logging, creates debug_output/ folder in debug mode |
python/parsers.py |
Implements new debug config schema/validation and module selection logic |
python/startup.py |
Adds debug flag and skips startup DB insertion in debug mode |
python/staging.py |
Adds debug flag and skips metadata UPDATE in debug mode |
python/hs_hh.py |
Adds debug flag; exports inputs/outputs to CSV instead of DB in debug mode |
python/pop_type.py |
Adds debug flag; exports inputs/outputs to CSV instead of DB in debug mode |
python/ase.py |
Adds debug flag; exports controls/results to CSV instead of DB/BULK INSERT in debug mode |
python/hh_characteristics.py |
Adds debug flag; exports inputs/outputs to CSV instead of DB in debug mode |
python/employment.py |
Adds debug flag; exports inputs/outputs to CSV instead of DB in debug mode |
main.py |
Passes utils.DEBUG into module entrypoints |
config.toml |
Updates debug configuration format (run_id, year, module) |
README.md |
Updates user-facing configuration documentation for new debug mode |
.gitignore |
Ignores debug_output/ and replaces with a fuller Python .gitignore template |
Comments suppressed due to low confidence (5)
python/hs_hh.py:41
- The function signature now includes
debug, but the docstringArgs:section only documentsyear. Please document whatdebugchanges (e.g., skip DB writes and export todebug_output/).
def run_hs_hh(year: int, debug: bool) -> None:
"""Orchestrator function to calculate and insert housing stock and households.
Inserts housing stock by MGRA from SANDAG's LUDU database for a given year
into the production database. Then calculates households by MGRA using
the housing stock by MGRA, applying both Census tract and jurisdiction-level
occupancy controls, and then runs an integerization and reallocation
procedure to produce total households by MGRA. Results are inserted into
the production database.
Functionality is segmented into functions for code encapsulation:
_get_hs_hh_inputs - Get housing stock and occupancy controls
_validate_hs_hh_inputs - Validate the households input data from the above
function
_create_hs_hh - Calculate households by MGRA applying occupancy
controls, integerization, and reallocation
_validate_hs_hh_outputs - Validate the households output data from the above
function
_insert_hs_hh - Insert occupancy controls and households by MGRA to
production database
A single utility function is also defined:
_calculate_hh_adjustment - Calculate adjustments to make to households
Args:
year (int): estimates year
"""
python/employment.py:32
- The function signature now includes
debug, but the docstringArgs:section only documentsyear. Please document thedebugparameter so its effect on DB writes / local exports is clear.
def run_employment(year: int, debug: bool):
"""Control function to create jobs data by naics_code (NAICS) at the MGRA level.
Get the LEHD LODES data, aggregate to the MGRA level using the block to MGRA
crosswalk, then apply control totals from QCEW using integerization.
Functionality is split apart for code encapsulation (function inputs not included):
_get_jobs_inputs - Get all input data related to jobs, including LODES data,
block to MGRA crosswalk, and control totals from QCEW. Then process the
LODES data to the MGRA level by naics_code.
_validate_jobs_inputs - Validate the input tables from the above function
_create_jobs_output - Apply control totals to employment data using
utils.integerize_1d() and create output table
_validate_jobs_outputs - Validate the output table from the above function
_insert_jobs - Store input and output data related to jobs to the database.
Args:
year: estimates year
"""
python/pop_type.py:46
- The function signature now includes
debug, but the docstringArgs:section only documentsyear. Please document whatdebugcontrols (e.g., skip DB writes and export todebug_output/).
def run_pop(year: int, debug: bool):
"""Control function to create population by type (GQ and HHP) data
Get MGRA group quarters input data, create the output data, then load both into the
production database. Also get MGRA household population input data, create the
output data, then load both into the production database. See the wiki linked at the
top of this file for additional details.
Functionality is split apart for code encapsulation (function inputs not included):
_get_gq_inputs - Get city level group quarter controls (DOF E-5) and GQ point
data pre-aggregated into MGRAs
_validate_gq_inputs - Validate the data from the above function
_create_gq_outputs - Control MGRA level GQ data to the city level group
quarter controls
_validate_gq_outputs - Validate the data from the above function
_insert_gq - Store both the city level control data and controlled
MGRA level GQ data into the production database
_get_hhp_inputs - Get city level household population controls (DOF E-5),
MGRA level households, and tract level household size
_validate_hhp_inputs - Validate the data from the above function
_create_hhp_outputs - Compute MGRA household population, then control to
city level household population
_validate_hhp_outputs - Validate the data from the above function
_insert_hhp - Store certain household population input/output data to
the production database
A single utility function is also defined:
_calculate_hhp_adjustment - Calculate adjustments to make to household
population
Args:
year (int): estimates year
"""
python/hh_characteristics.py:46
- The function signature now includes
debug, but the docstringArgs:section only documentsyear. Please add documentation for thedebugflag so callers know how outputs are handled in debug mode.
def run_hh_characteristics(year: int, debug: bool) -> None:
"""Orchestrator function to calculate and insert household characteristics.
The exact household characteristics created are:
1. Households split by household income category
2. Households split by number of people in each household
Both characteristics are generated by applying ACS data to MGRA level households,
which are created by the HS/HH module.
Functionality is segmented into functions for code encapsulation. The following are
used for households split by income category:
_get_hh_income_inputs - Get MGRA households and ACS tract distributions for
income
_validate_hh_income_inputs - Validate the hh income input data
_create_hh_income - Calculate the hh income, control to MGRA households
_validate_hh_income_outputs - Validate the hh income output data
_insert_hh_income - Insert hh income and tract level income distributions to
database
The following functions are used for households split by size
_get_hh_size_inputs - Get MGRA households, MGRA household population, and ACS
tract distributions for size
_validate_hh_size_inputs - Validate the hh size input data
_create_hh_size - Calculate the hh size, control to MGRA households and MGRA
household population
_validate_hh_size_outputs - Validate the hh size output data
_insert_hh_size - Insert hh size and tract level size distributions to database
Args:
year (int): estimates year
"""
python/ase.py:64
- The function signature now includes
debug, but the docstringArgs:section only documentsyear. Please documentdebug(e.g., controls whether outputs are written to DB vs exported locally).
def run_ase(year: int, debug: bool) -> None:
"""Orchestrator function for age/sex/ethnicity population by type.
Creates regional age/sex/ethnicity controls by population type. Then
calculates MGRA level age/sex/ethnicity population by type using these
regional controls, synthesized census tract level seed data, and MGRA
level population by type generated by the Population by Type module.
Results are inserted into the production database along with the regional
controls.
Functionality is segmented into functions for code encapsulation:
_get_controls_inputs - Get regional age/sex/ethnicity controls from
CA DOF for total population, regional age/sex/ethnicity group
quarters by type distributions from the 5-year ACS PUMS, and
regional population by type generated by the Population by Type
module
_validate_controls_inputs - Validate the controls input data from the
above function
_create_controls - Calculate regional age/sex/ethnicity controls by
population type
_validate_controls_outputs - Validate the controls output data from
the above function
_insert_controls - Insert regional age/sex/ethnicity controls by
population type to production database
_get_seed_inputs - Get 5-year ACS Detailed Tables B010001, B03002, and
B01001(B-I)
_create_seed - Calculate census tract level age/sex/ethnicity seed
data for total population
_get_ase_inputs - Get MGRA population by type generated by the
Population by Type module, special MGRAs with age/sex restrictions
by population type, regional age/sex/ethnicity controls by
population type, and census tract level age/sex/ethnicity seed
data for total population
_validate_ase_inputs - Validate the age/sex/ethnicity input data from
the above function
_create_ase - Calculate MGRA level age/sex/ethnicity population by
population type
_validate_ase_outputs - Validate the age/sex/ethnicity output data from
the above function
_insert_ase - Insert MGRA level age/sex/ethnicity population by
population type to production database
Args:
year (int): estimates year
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Describe this pull request. What changes are being made?
Upgrades to
debugmode to make it not completely uselessWhat issues does this pull request address?
Additional context
N/A