Skip to content

extra_variables silently ignored in Simulation.run() (US microsim path) #303

@MaxGhenis

Description

@MaxGhenis

Summary

Simulation.extra_variables is defined as a dict[str, list[str]] field (see src/policyengine/core/simulation.py:84), but the US model's PolicyEngineUSLatest.run() path never consults it. Only self.entity_variables is iterated when filling output_dataset.data, so variables a caller adds via extra_variables silently don't appear on the output dataframes.

Repro

import numpy as np, pandas as pd
import policyengine.core.simulation as _s
Simulation = _s.Simulation
from policyengine.tax_benefit_models.us import ensure_datasets, us_latest

YEAR = 2026
datasets = ensure_datasets(
    datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
    years=[YEAR],
    data_folder="./data",
)

sim = Simulation(
    dataset=datasets[f"enhanced_cps_2024_{YEAR}"],
    tax_benefit_model_version=us_latest,
    extra_variables={"household": ["net_worth"]},
)
sim.run()

assert "net_worth" in sim.output_dataset.data.household.columns  # fails

net_worth is a valid household-level variable on policyengine_us (imputed from SCF) and is available via the underlying Microsimulation.calc, so there's no upstream reason it shouldn't appear when requested.

Where it's missing

src/policyengine/tax_benefit_models/us/model.py around the output-building loop (~line 232):

for entity, variables in self.entity_variables.items():
    for var in variables:
        if var not in id_columns and var not in weight_columns:
            data[entity][var] = microsim.calculate(
                var, period=simulation.dataset.year, map_to=entity
            ).values

This only considers self.entity_variablessimulation.extra_variables is never merged in.

The household-calc path (src/policyengine/tax_benefit_models/us/household.py and the dispatch_extra_variables helper) does handle extra variables. The microsim path should do the same: merge simulation.extra_variables[entity] into the per-entity variable list before the loop.

Suggested fix

Roughly:

extra = simulation.extra_variables or {}
for entity, variables in self.entity_variables.items():
    all_vars = list(variables) + list(extra.get(entity, []))
    for var in all_vars:
        if var not in id_columns and var not in weight_columns and var not in data[entity].columns:
            data[entity][var] = microsim.calculate(
                var, period=simulation.dataset.year, map_to=entity
            ).values

Plus validation that every extra_variables entity key is a known group entity and every variable name resolves on the tax-benefit system.

Context

Hit this while running a wealth-distribution analysis against the Enhanced CPS. Worked around it by dropping to policyengine_us.Microsimulation directly, but the ergonomics of extra_variables are nice and the field already exists — it should just work.

Same bug almost certainly exists on the UK side; haven't verified.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions