Skip to content

Fix Enhanced CPS PUF clone calibration guards#1150

Merged
MaxGhenis merged 5 commits into
mainfrom
fix-ecps-household-weight-total-guard
May 29, 2026
Merged

Fix Enhanced CPS PUF clone calibration guards#1150
MaxGhenis merged 5 commits into
mainfrom
fix-ecps-household-weight-total-guard

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented May 28, 2026

Summary

  • Adds high-weight Enhanced CPS calibration targets for the source household total and a 50/50 CPS vs PUF-clone household split.
  • Seeds CPS and PUF-clone household priors deterministically so each half starts with half the source household count before calibration.
  • Excludes Forbes-marked PUF rows from QRF clone training; when metadata is missing, excludes all-record top-tail donors at $10M+ in AGI or selected financial components.
  • Treats all-default Forbes metadata columns as missing metadata, so fallback PUFs do not silently use the Forbes-specific $250M filter.
  • Keeps poverty as a post-build QA guard only; this PR does not add any poverty calibration target.
  • Refreshes policyengine-us to 1.715.2 from upstream/main to resolve the merge conflict and satisfy freshness checks.

Local validation

  • Full local 2024 Enhanced CPS rebuild completed under policyengine-us==1.715.2 at commit 192edf1e.
  • Final guard output: 134.6M weighted households vs 135.3M source households, 49.9% PUF-clone household share, 19.3% person poverty QA, 31,670 nonzero weights.
  • MicroSeries headline sums: LTCG $719.5B, household market income $16.50T, income tax $2.56T, SPM taxes $4.13T.
  • Capital gains residuals: CBO loss-limited net capital gains $950.7B vs $1.291T (-26.4%); SOI long-term capital gains $719.5B vs $1.274T (-43.5%).
  • Full target residual sweep: mean squared relative error 0.465, median 0.00278, 71.2% of 3,706 targets within 10%.

Checks

  • PYTHONUNBUFFERED=1 uv run python -m policyengine_us_data.datasets.cps.enhanced_cps
  • uv run ruff format --check .
  • uv run ruff check policyengine_us_data/calibration/puf_impute.py policyengine_us_data/datasets/cps/enhanced_cps.py policyengine_us_data/utils/loss.py policyengine_us_data/utils/__init__.py tests/unit/calibration/test_calibration_puf_impute.py tests/unit/calibration/test_loss_targets.py tests/unit/datasets/test_enhanced_cps_seeding.py
  • uv run pytest tests/unit/calibration/test_calibration_puf_impute.py tests/unit/calibration/test_loss_targets.py tests/unit/datasets/test_enhanced_cps_seeding.py -q (83 passed)
  • git diff --check
  • Full local target residual sweep script over build_loss_matrix(LocalEnhancedCPS, 2024)

@MaxGhenis MaxGhenis force-pushed the fix-ecps-household-weight-total-guard branch from 1967ba0 to a1232ba Compare May 28, 2026 01:09
@MaxGhenis MaxGhenis changed the title Fail ECPS calibration on household count drift Fix Enhanced CPS PUF clone calibration guards May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant