| title | Microsimulation |
|---|
For population-level estimates — budget cost, winners and losers, poverty impact — run a microsimulation over calibrated microdata.
import policyengine as pe
from policyengine.core import Simulation
from policyengine.outputs import Aggregate, AggregateType
datasets = pe.us.ensure_datasets(
datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
years=[2026],
)
dataset = datasets["enhanced_cps_2024_2026"]
baseline = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
baseline.ensure()
total_snap = Aggregate(
simulation=baseline,
variable="snap",
aggregate_type=AggregateType.SUM,
)
total_snap.run()
total_snap.resultSimulation.ensure() loads a cached result if one exists, or runs and caches on miss. Call Simulation.run() explicitly if you want to bypass the cache.
Microdata is stored as HDF5 on Hugging Face. ensure_datasets downloads, caches, and uprates:
datasets = pe.us.ensure_datasets(
datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
years=[2024, 2026],
data_folder="./data", # local cache directory
)
# Keys are "<dataset_stem>_<year>":
dataset = datasets["enhanced_cps_2024_2026"]The default US dataset is Enhanced CPS 2024 — CPS ASEC fused with IRS SOI tax-return records and calibrated to IRS, CMS, SNAP, and other administrative totals. The UK default is Enhanced FRS 2023/24 — the Family Resources Survey fused with tax-return microdata and calibrated to HMRC and DWP totals.
List datasets already known to the country:
pe.us.load_datasets() # or pe.uk.load_datasets()UK population data uses licensed Family Resources Survey inputs. The default
UK release bundle points to the private
policyengine/policyengine-uk-data-private Hugging Face model repository. Set
HUGGING_FACE_TOKEN to a token from a Hugging Face account with access:
export HUGGING_FACE_TOKEN=hf_...For policyengine.py analyses, use the logical dataset name from the release
bundle. ensure_datasets resolves it to the pinned private Hugging Face file,
downloads it, caches it locally, and creates year-specific uprated datasets:
import policyengine as pe
from policyengine.core import Simulation
datasets = pe.uk.ensure_datasets(
datasets=["enhanced_frs_2023_24"],
years=[2026],
data_folder="./data",
)
dataset = datasets["enhanced_frs_2023_24_2026"]
simulation = Simulation(
dataset=dataset,
tax_benefit_model_version=pe.uk.model,
)
simulation.run()To download the raw h5 artifact directly from Hugging Face, use
huggingface_hub and specify repo_type="model":
import os
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id="policyengine/policyengine-uk-data-private",
filename="enhanced_frs_2023_24.h5",
repo_type="model",
token=os.environ["HUGGING_FACE_TOKEN"],
)
print(path)The repository URL is
https://huggingface.co/policyengine/policyengine-uk-data-private. A 404 from
the website or RepositoryNotFoundError from the Hub API usually means the
browser or token is not authenticated as an account with access, or that the
Hub call omitted repo_type="model".
A Simulation needs a dataset, a tax-benefit model version, and optionally a policy (reform):
baseline = Simulation(
dataset=dataset,
tax_benefit_model_version=pe.us.model,
)
reformed = Simulation(
dataset=dataset,
tax_benefit_model_version=pe.us.model,
policy={"gov.irs.credits.ctc.amount.base[0].amount": 3_000},
)policy= accepts the same flat {"param.path": value} dict shape as pe.us.calculate_household(reform=...), or a Policy object with explicit ParameterValue entries. Scale parameters use bracket indexing — see Reforms.
Every output has the same lifecycle: instantiate with the simulation(s) and configuration, call .run(), read the typed result fields.
from policyengine.outputs import (
Aggregate, AggregateType,
ChangeAggregate, ChangeAggregateType,
)
snap_cost = Aggregate(
simulation=baseline,
variable="snap",
aggregate_type=AggregateType.SUM,
)
snap_cost.run()
budget = ChangeAggregate(
baseline_simulation=baseline,
reform_simulation=reformed,
variable="household_net_income",
aggregate_type=ChangeAggregateType.SUM,
)
budget.run()See Outputs for the full catalog.
A full Enhanced CPS microsimulation uses roughly 4 GB of memory and takes 15–30 seconds on a laptop. For parameter sweeps, reuse the baseline:
baseline = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
for amount in [0, 1_000, 2_000, 3_000]:
reformed = Simulation(
dataset=dataset,
tax_benefit_model_version=pe.us.model,
policy={"gov.irs.credits.ctc.amount.base[0].amount": amount},
)
# each iteration runs only the reformDownsampled datasets are available for testing:
datasets = pe.us.ensure_datasets(
datasets=["hf://policyengine/policyengine-us-data/cps_small_2024.h5"],
years=[2026],
)These run in seconds and are fine for integration tests. Don't use them for production analysis — the weights are not calibration-tuned.
managed_microsimulation constructs a country-package Microsimulation pinned to the policyengine.py release bundle (so the dataset selection is certified, not ad-hoc):
from policyengine.tax_benefit_models.us import managed_microsimulation
sim = managed_microsimulation()
# `sim` is a policyengine_us.Microsimulation — use its API directlyPass allow_unmanaged=True with a custom dataset= to opt out of the release bundle.
Every policyengine release pins specific country-model and country-data versions so results are reproducible. pe.us.model and pe.uk.model expose the pinned TaxBenefitModelVersion.
If the installed country-package version doesn't match the pinned manifest, managed_microsimulation warns. For strict reproducibility, pin country packages to the versions the policyengine release was built against — see Release bundles.
- Outputs — catalog of typed output classes
- Impact analysis — full baseline-vs-reform in one call
- Regions — sub-national analysis