Summary
Simulation.extra_variables is defined as a dict[str, list[str]] field (see src/policyengine/core/simulation.py:84), but the US model's PolicyEngineUSLatest.run() path never consults it. Only self.entity_variables is iterated when filling output_dataset.data, so variables a caller adds via extra_variables silently don't appear on the output dataframes.
Repro
import numpy as np, pandas as pd
import policyengine.core.simulation as _s
Simulation = _s.Simulation
from policyengine.tax_benefit_models.us import ensure_datasets, us_latest
YEAR = 2026
datasets = ensure_datasets(
datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
years=[YEAR],
data_folder="./data",
)
sim = Simulation(
dataset=datasets[f"enhanced_cps_2024_{YEAR}"],
tax_benefit_model_version=us_latest,
extra_variables={"household": ["net_worth"]},
)
sim.run()
assert "net_worth" in sim.output_dataset.data.household.columns # fails
net_worth is a valid household-level variable on policyengine_us (imputed from SCF) and is available via the underlying Microsimulation.calc, so there's no upstream reason it shouldn't appear when requested.
Where it's missing
src/policyengine/tax_benefit_models/us/model.py around the output-building loop (~line 232):
for entity, variables in self.entity_variables.items():
for var in variables:
if var not in id_columns and var not in weight_columns:
data[entity][var] = microsim.calculate(
var, period=simulation.dataset.year, map_to=entity
).values
This only considers self.entity_variables — simulation.extra_variables is never merged in.
The household-calc path (src/policyengine/tax_benefit_models/us/household.py and the dispatch_extra_variables helper) does handle extra variables. The microsim path should do the same: merge simulation.extra_variables[entity] into the per-entity variable list before the loop.
Suggested fix
Roughly:
extra = simulation.extra_variables or {}
for entity, variables in self.entity_variables.items():
all_vars = list(variables) + list(extra.get(entity, []))
for var in all_vars:
if var not in id_columns and var not in weight_columns and var not in data[entity].columns:
data[entity][var] = microsim.calculate(
var, period=simulation.dataset.year, map_to=entity
).values
Plus validation that every extra_variables entity key is a known group entity and every variable name resolves on the tax-benefit system.
Context
Hit this while running a wealth-distribution analysis against the Enhanced CPS. Worked around it by dropping to policyengine_us.Microsimulation directly, but the ergonomics of extra_variables are nice and the field already exists — it should just work.
Same bug almost certainly exists on the UK side; haven't verified.
Summary
Simulation.extra_variablesis defined as adict[str, list[str]]field (seesrc/policyengine/core/simulation.py:84), but the US model'sPolicyEngineUSLatest.run()path never consults it. Onlyself.entity_variablesis iterated when fillingoutput_dataset.data, so variables a caller adds viaextra_variablessilently don't appear on the output dataframes.Repro
net_worthis a valid household-level variable onpolicyengine_us(imputed from SCF) and is available via the underlyingMicrosimulation.calc, so there's no upstream reason it shouldn't appear when requested.Where it's missing
src/policyengine/tax_benefit_models/us/model.pyaround the output-building loop (~line 232):This only considers
self.entity_variables—simulation.extra_variablesis never merged in.The household-calc path (
src/policyengine/tax_benefit_models/us/household.pyand thedispatch_extra_variableshelper) does handle extra variables. The microsim path should do the same: mergesimulation.extra_variables[entity]into the per-entity variable list before the loop.Suggested fix
Roughly:
Plus validation that every
extra_variablesentity key is a known group entity and every variable name resolves on the tax-benefit system.Context
Hit this while running a wealth-distribution analysis against the Enhanced CPS. Worked around it by dropping to
policyengine_us.Microsimulationdirectly, but the ergonomics ofextra_variablesare nice and the field already exists — it should just work.Same bug almost certainly exists on the UK side; haven't verified.