Skip to content

Impute leaf input variables instead of formula aggregates in QRF #596

@MaxGhenis

Description

@MaxGhenis

Summary

The QRF currently imputes formula-level aggregates (e.g. taxable_pension_income) and then renames them to leaf inputs (e.g. taxable_private_pension_income) before storing. This loses information — all taxable pension is attributed to private pensions, all interest deductions to mortgage, etc.

Current workaround (PR #594)

_rename_imputed_to_inputs maps:

  • taxable_pension_incometaxable_private_pension_income (loses public pension split)
  • tax_exempt_pension_incometax_exempt_private_pension_income (same)
  • interest_deductiondeductible_mortgage_interest (loses non-mortgage interest)
  • self_employed_pension_contribution_ald_person (entity mapping only)
  • self_employed_health_insurance_ald_person (entity mapping only)

Proper fix

Train the QRF on leaf input variables from the PUF rather than formula aggregates. This would:

  1. Preserve the public/private pension split
  2. Preserve mortgage vs non-mortgage interest
  3. Eliminate the need for post-hoc renaming
  4. Give more accurate distributions for each sub-component

Variables to split

  • taxable_pension_incometaxable_public_pension_income + taxable_private_pension_income
  • tax_exempt_pension_incometax_exempt_public_pension_income + tax_exempt_private_pension_income
  • interest_deductiondeductible_mortgage_interest + non_mortgage_interest (or deeper: investment_interest_expense)

Requires checking which sub-components are available in the PUF training data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions