Cache QRF models to avoid retraining when only weights change

## Summary

The extended CPS build retrains all QRF models from scratch every time, even when only calibration weights change. Since the QRF imputation depends on source CPS + PUF data (not weights), the fitted models could be cached and reused.

## Current cost

- 85+ variable sequential QRF on ~20K PUF records: ~30-60 min
- Additional QRF calls for weeks_unemployed, retirement contributions, SS sub-components
- This runs on every `make data` or CI build

## Proposed approach

- Serialize fitted QRF models (e.g. pickle/joblib) keyed by a hash of the training data
- On rebuild, check if source data hash matches cached model — if so, skip training and just predict
- microimpute could potentially support this natively (save/load fitted models)
- Could also cache the full `extended_cps_2024.h5` and only rebuild when CPS/PUF inputs change

## Context

Related to the sequential QRF migration in #594 — now that all 85 variables are in a single fit() call, caching the one fitted model would skip the entire training phase.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache QRF models to avoid retraining when only weights change #595

Summary

Current cost

Proposed approach

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cache QRF models to avoid retraining when only weights change #595

Description

Summary

Current cost

Proposed approach

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions