Skip to content

Impute stochastic qualified-dividend shares for unsplit dividend totals #203

@MaxGhenis

Description

@MaxGhenis

Follow-up to #202 and part of the broader target-surface cleanup in #200.

#202 fixes the immediate inversion by applying a documented constant fallback: when a record has only a dividend total and no observed qualified/non-qualified components, split it 78% qualified / 22% non-qualified based on the 2015 PUF aggregate E00650/E00600 share. That is a good first-order patch, but it gives every unsplit CPS dividend row the same qualified share.

We should replace that constant fallback with a stochastic or modeled qualified_dividend_share imputation learned from PUF rows with observed dividend composition.

Suggested shape:

  • Train/impute qualified_dividend_share = qualified_dividend_income / ordinary_dividend_income from PUF donor rows where ordinary dividends are positive and the qualified/non-qualified split is observed.
  • Apply the imputed share only to rows with an unsplit positive dividend total and no observed components, e.g. CPS DIV_VAL-only rows.
  • Preserve each row's total dividend exactly: qualified + non_qualified == ordinary_dividend_income == dividend_income up to numerical tolerance.
  • Keep observed PUF component rows unchanged.
  • Make the stochastic draw reproducible via the pipeline seed/checkpoint metadata.
  • Prefer conditioning on relevant predictors if available, such as dividend amount, income/AGI proxies, age, filing/tax-unit features, and asset/investment indicators.

Validation target:

  • Rebuild or run a focused diagnostic showing the qualified/non-qualified split moves toward the SOI/eCPS evidence without breaking export support parity.
  • Report national weighted totals and filer counts for qualified_dividend_income, non_qualified_dividend_income, and total dividends before/after.
  • Confirm this does not reintroduce the old all-non-qualified CPS-spine failure.

This should be treated as a quality improvement after #202, not a reason to block the constant-share bug fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions