Skip to content

Enforce PUF-clone weight-share floor in the dataset upload validator#1159

Draft
MaxGhenis wants to merge 1 commit into
mainfrom
fix-clone-upload-floor
Draft

Enforce PUF-clone weight-share floor in the dataset upload validator#1159
MaxGhenis wants to merge 1 commit into
mainfrom
fix-clone-upload-floor

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented May 30, 2026

Fixes #1158

Summary

Enforce the PUF-clone weight-share floor (and the taxes-exceed-market-income cap) in the dataset upload validator, so a degraded Enhanced CPS artifact cannot be published.

The generation-time guard in enhanced_cps.py already rejects builds where PUF-clone households fall below the 5% weight-share floor (or where clone tax pathology exceeds 25%). But the upload validator only checked each clone metric was finite and in [0,100] — it did not enforce the floor/cap. That gap is why the currently-published Enhanced CPS (clone share ≈ 0%, built before the clone fix) was able to ship.

Changes

  • policyengine_us_data/storage/upload_completed_datasets.py: _clone_diagnostics_errors now imports MIN_PUF_CLONE_HOUSEHOLD_WEIGHT_SHARE_PCT and MAX_PUF_CLONE_TAXES_EXCEED_MARKET_INCOME_SHARE_PCT from enhanced_cps.py (single source of truth) and appends an error when clone_household_weight_share_pct < 5% or clone_taxes_exceed_market_income_share_pct > 25%. Both checks are presence- and finiteness-guarded, so older sidecars without these fields still validate (back-compat).
  • tests/unit/test_upload_clone_diagnostics_floor.py: 2% share → rejected (floor); 66% tax share → rejected (cap); healthy 10%/5% → passes; missing fields → passes.

Tightens the upload gate only; healthy artifacts and older sidecars are unaffected.

🤖 Generated with Claude Code

The Enhanced CPS build has two guards against degraded PUF-clone behavior,
but they were not equivalent:

- The generation-time guard
  (``enhanced_cps.validate_clone_diagnostics``) raises when the clone
  household weight share falls below
  ``MIN_PUF_CLONE_HOUSEHOLD_WEIGHT_SHARE_PCT`` (5%) or the clone-tax-vs-
  market-income share exceeds
  ``MAX_PUF_CLONE_TAXES_EXCEED_MARKET_INCOME_SHARE_PCT`` (25%).
- The upload-time guard
  (``upload_completed_datasets._clone_diagnostics_errors``) only checked
  that each metric was finite and within [0, 100].

Because the upload-time guard was weaker, a degraded artifact could still
publish even though generation would have rejected it.

Import the two thresholds from ``enhanced_cps`` (rather than re-hardcoding
5.0 / 25.0) and enforce the same floor/bound in the upload validator,
after the existing finite/range checks. The share fields are absent on
some periods/older sidecars, so enforcement is guarded by presence and
finiteness checks to preserve back-compatibility.

Fixes #1158

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enforce PUF-clone weight-share floor in the dataset upload validator

1 participant