On a clean-main PE-US rebuild smoke run, after working around missing PE-US-data prerequisite files (#147, #148), the pipeline got through CPS/PUF loading and failed while loading the ACS donor provider.
Clean-main worktree used for this run:
/Users/administrator/Documents/PolicyEngine/worktrees/microplex-us/fix-pe-rebuild-smoke-issues
Command shape:
python -m microplex_us.pipelines.pe_us_data_rebuild_checkpoint \
--output-root artifacts/local_us_microplex_smoke \
--version-id local-smoke-v1 \
--baseline-dataset /Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5 \
--targets-db /Users/administrator/Documents/PolicyEngine/calibration-diagnostics/.artifacts/policy_data.db \
--policyengine-us-data-repo /Users/administrator/Documents/PolicyEngine/policyengine-us-data \
--calibration-backend microcalibrate \
--donor-imputer-backend zi_qrf \
--policyengine-materialize-batch-size 100000 \
--cps-sample-n 1000 --puf-sample-n 1000 --donor-sample-n 1000 \
--n-synthetic 1000 \
--defer-policyengine-harness \
--defer-policyengine-native-score \
--defer-native-audit \
--defer-imputation-ablation
Progress before failure:
Loading processed CPS ASEC 2023 from ~/.cache/microplex/cps_asec_2023_processed_v20260601_ecps_spm_takeup_inputs.parquet
Loading PUF from ~/.cache/microplex/puf_2015.csv...
Raw records: 207,692
Loading demographics from ~/.cache/microplex/demographics_2015.csv...
After demographics merge: 207,692
Expanded 1,000 tax units to 1,921 persons
Loading processed CPS ASEC 2023 from ~/.cache/microplex/cps_asec_2023_processed_v20260601_ecps_spm_takeup_inputs.parquet
Failure from the ACS donor loader subprocess:
Traceback (most recent call last):
File "<string>", line 6, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
The parent traceback shows the subprocess command uses:
/Users/administrator/Documents/PolicyEngine/policyengine-us-data/.venv/bin/python
Relevant traceback:
microplex_us/data_sources/donor_surveys.py:880 load_frame
microplex_us/data_sources/donor_surveys.py:598 _default_acs_tables_loader
microplex_us/data_sources/donor_surveys.py:572 _run_policyengine_dataset_loader_from_spec
microplex_us/data_sources/donor_surveys.py:539 _run_policyengine_dataset_loader
subprocess.CalledProcessError
For a fresh smoke command, --policyengine-us-data-repo is not enough if the sibling checkout has a stale or incomplete .venv. The CLI does expose --policyengine-us-data-python, so the workaround is to rerun with that pointing at the active smoke environment's Python. A better contract may be to default to sys.executable unless the caller explicitly supplies a PE-US-data Python, or to validate the selected subprocess Python for numpy, pandas, and policyengine_us_data before starting the expensive source load.
On a clean-main PE-US rebuild smoke run, after working around missing PE-US-data prerequisite files (#147, #148), the pipeline got through CPS/PUF loading and failed while loading the ACS donor provider.
Clean-main worktree used for this run:
Command shape:
Progress before failure:
Failure from the ACS donor loader subprocess:
The parent traceback shows the subprocess command uses:
Relevant traceback:
For a fresh smoke command,
--policyengine-us-data-repois not enough if the sibling checkout has a stale or incomplete.venv. The CLI does expose--policyengine-us-data-python, so the workaround is to rerun with that pointing at the active smoke environment's Python. A better contract may be to default tosys.executableunless the caller explicitly supplies a PE-US-data Python, or to validate the selected subprocess Python fornumpy,pandas, andpolicyengine_us_databefore starting the expensive source load.