Skip to content

[codex] Cache PE SOI targets for rebuild PUF uprating#155

Draft
anth-volk wants to merge 5 commits into
mainfrom
codex/fix-147-soi-targets-fallback
Draft

[codex] Cache PE SOI targets for rebuild PUF uprating#155
anth-volk wants to merge 5 commits into
mainfrom
codex/fix-147-soi-targets-fallback

Conversation

@anth-volk
Copy link
Copy Markdown
Contributor

@anth-volk anth-volk commented Jun 1, 2026

Fixes #147

Summary

  • Keeps --soi-path as the explicit override for PE-SOI PUF uprating.
  • Adds a cache-backed default SOI resolver: when no explicit SOI path is supplied, microplex-us downloads the pinned PE-style long SOI target table from the PolicyEngine/policyengine-us-data repo into a versioned Microplex cache.
  • Stops requiring a local policyengine_us_data/storage/soi.csv file for SOI resolution while preserving repo-local PE-US-data handling for other PUF inputs, including raw PUF/demographics and uprating_factors.csv.
  • Adds regression coverage for cached SOI resolution, missing-cache download, schema validation, CLI/provider wiring, and PE-SOI provider behavior without repo-local soi.csv.

Root Cause

The PE-SOI PUF uprating path previously expected policyengine_us_data/storage/soi.csv when only --policyengine-us-data-repo was provided. Fresh PE-US-data checkouts do not reliably materialize that file, while the tracked historical PE-style target table lives at policyengine_us_data/storage/calibration_targets/soi_targets.csv in the PE-US-data repo.

SOI Source

For this PR, the canonical default remains the PE-US-data SOI setup: microplex-us uses the PE-style long soi_targets.csv from a pinned PolicyEngine/policyengine-us-data commit and caches it as soi_targets_pe_us_data_<commit>.csv. This is a dependency on the PE-US-data repo artifact, not an import from the PE-US-data Python package.

I checked the HF policyengine-us-data model, but it does not currently publish the historical long SOI target table needed by this uprating path. It publishes current raw/target DB artifacts that do not cover the required 2015+ historical surface.

Longer term, this should move to Arch: Arch should own source-backed IRS SOI facts/provenance across the needed years, while microplex-us owns the adapter/export into the PE-style target surface used by PUF uprating.

Validation

  • uv run --no-sync pytest -q tests/test_puf_source_provider.py -k "soi or download_pe_soi" -> 8 passed, 29 deselected
  • uv run --no-sync pytest -q tests/test_puf_source_provider.py tests/pipelines/test_pe_us_data_rebuild.py tests/pipelines/test_pe_us_data_rebuild_checkpoint.py::test_main_passes_donor_condition_selection_override -> 47 passed
  • uv run --no-sync ruff check src/microplex_us/data_sources/puf.py tests/test_puf_source_provider.py
  • uv run --no-sync python -m py_compile src/microplex_us/data_sources/puf.py tests/test_puf_source_provider.py
  • git diff --check

Notes

  • GitNexus tools were not exposed in this session, so I used a manual rg caller scan for the touched PUF/rebuild symbols before committing.

@anth-volk anth-volk changed the title [codex] Resolve PE SOI targets from fresh data checkout [codex] Add explicit PE SOI path override for rebuild CLI Jun 1, 2026
@anth-volk anth-volk changed the title [codex] Add explicit PE SOI path override for rebuild CLI [codex] Cache PE SOI targets for rebuild PUF uprating Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PUF SOI uprating should not rely on PE-US-data storage/soi.csv

1 participant