Skip to content

Align benchmark prompts with PolicyEngine target-year input values #19

@MaxGhenis

Description

@MaxGhenis

PolicyBench error audit found prompt/reference mismatches that can make model errors look larger or qualitatively different than they are.\n\nIssues to fix before the next paid run:\n\n- UK prompts currently show raw transfer-dataset input amounts, while the 2026-27 PolicyEngine reference simulation uprates inputs before calculating outputs. In the 100-household run, all 69 UK scenarios with positive employment income differed between prompt and PE calculation, with median PE/prompt ratio about 1.034. Capital gains, savings interest, dividends, and private pensions were also uprated before reference calculation.\n- UK pension contribution wording is ambiguous. Models interpreted employee pension contributions as reducing taxable pay, while the PE reference did not always reduce taxable income that way. Prompts should either expose target-year tax variables in the same semantics PE uses or specify the pension contribution treatment precisely.\n- ACA PTC prompts are not clean no-web household calculations when local SLCSP/benchmark premium and Marketplace enrollment assumptions are implicit. Either supply the benchmark/selected plan facts, label PTC as a no-web local-market stress test, or exclude/stratify those cases.\n- SSDI/disability plus Medicare inference is a prompt-contract risk: some models inferred Medicare from SSDI even though unlisted statuses are false and PE had Medicare eligibility false. Clarify or avoid cases where waiting-period/enrollment facts are not supplied.\n\nSuggested acceptance criteria:\n\n- Scenario prompt values are generated from the same target-year PolicyEngine values used to produce reference outputs, or the prompt explicitly says the displayed values are base-year values and gives the uprating rule. Prefer target-year values.\n- Add a test that UK prompt-visible income/gain/pension inputs reconcile to PE target-year input variables for sampled scenarios.\n- Add prompt-contract tests or snapshot examples for PTC and SSDI/Medicare edge cases.\n\nContext: this came from the post-run error audit for results/local/full_run_20260513_policyengine_4_4_4_nested_outputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions