Autonomous demand forecasting with a seasonal regime-change agent.
Watch the model fail in December (missing the holiday spike), the agent detect the recurring pattern via Arize telemetry, decline a one-time decoy feature, and switch its forecast strategy — live.
Time cursor advances
→ model predicts (pack_default)
→ predictions + feature performance logged to Arize
→ agent reads Arize history (observe)
→ agent evaluates recurrence (decide)
├─ banner_red (fluke) → decline, log reasoning
└─ days_until_christmas (regime) → switch to pack_holiday
→ model predicts better (verify)
→ UI shows failure → recovery
| File | Purpose |
|---|---|
prepare_data.py |
Download + preprocess Rossmann data, inject banner_red decoy |
model.py |
LightGBM with two switchable feature packs (no retraining) |
arize_logging.py |
Log predictions to Arize; read per-feature history |
agent.py |
Observe→decide→act→verify loop; Gemini or rule-based reasoning |
server.py |
Flask API serving the UI |
ui/ |
React two-panel demo UI |
Download from Kaggle: rossmann-store-sales and place train.csv + store.csv in the data/ folder.
Or, if you have the Kaggle CLI configured:
./setup.sh --downloadcp .env.example .env
# edit .env — add Arize keys and/or Google Cloud project for GeminiThe system works fully offline without Arize or Gemini credentials — it uses local feature-performance storage and rule-based reasoning.
chmod +x setup.sh
./setup.shThis installs Python deps, prepares the dataset, trains the model, seeds the Arize performance log, and runs the initial agent simulation.
Terminal 1 — API:
python3 server.pyTerminal 2 — UI:
cd ui
npm install
npm run dev- Stable period — Cursor at the start of the year. Forecast tracks actuals. Agent: quiet.
- Decoy fires — Advance to March (year 1).
banner_redspikes. Agent logs: "fluke — no prior recurrence → declining." - Holiday break — Advance to November/December. Default forecast diverges badly from actuals. Panel A turns red.
- Agent acts — Agent finds
days_until_christmasrecurring across ≥2 prior years → logs "regime — switching to holiday strategy." - Recovery — Forecast snaps back to tracking actuals. Panel A turns green. Verification shows MAPE drop.
| Pack | Features | When active |
|---|---|---|
pack_default |
day_of_week, month, day_of_year, is_promo, lags, banner_red | Default |
pack_holiday |
+ days_until_christmas, holiday proximity | After agent detects December regime |
The agent's action is a config switch — instant, no retraining. The model was trained on all features; inactive ones are zeroed at inference.
-
data/plots/acceptance_check.pngshows repeating December spikes + isolated springbanner_redbump - December MAPE gap:
pack_defaultvspack_holidayis large (>20pp) - Agent outputs
flukeforbanner_redwith evidence citing no prior recurrence - Agent outputs
regimefordays_until_christmaswith evidence citing ≥2 prior years - UI Panel A: failure visually obvious to a non-technical viewer; recovery unmissable
This section exists so judges and reviewers can evaluate the demo honestly.
| Artifact | Detail |
|---|---|
| Dataset | Rossmann Store Sales — Store 1, Open==1 only → 781 trading days 2013-01-02–2015-07-31 |
| Model | LightGBM trained on ALL_FEATURES; single train run, 85/15 time split, early stopping |
| December MAPE gap | 27.5 percentage points (default 42.8% vs. holiday 15.3%) verified by run_visual_acceptance_checks() in model.py |
| Feature masking | Inactive features are zeroed at inference; the model is never retrained on a different set |
| Recurrence evidence | Counts only prior calendar years (< current simulation year) with elevated signal; current year is excluded |
| Leakage guard | TelemetryStore._visible() hard-filters all records to date <= current_date; belt-and-suspenders assertions in agent.py will raise if any future record slips through |
| Artifact | Detail |
|---|---|
| Time cursor | The UI slider does not ingest live sales data; it re-runs the stored agent log |
| Arize telemetry | The feature performance CSV is seeded by walking history month-by-month via seed_simulation_log(); it mimics what Arize would have recorded in a live deployment |
| Holiday signal | strengthen_holiday_signal() applies a 1.65× multiplier to the 21 days before Dec 25 each year in the Rossmann data; this amplifies an already-present real signal |
| Artifact | Detail |
|---|---|
| Recurrence scorer | score_recurrence() in agent.py is a structured evidence checker, not ML; it is predictable and auditable |
banner_red verdict |
Always fluke — March 2013 only, no recurrence across years |
days_until_christmas verdict |
Regime after Dec 2014 — two prior December cycles (2013, 2014) with elevated signal |
| Partner | How it is used |
|---|---|
| Arize Phoenix | Set ARIZE_API_KEY + ARIZE_SPACE_KEY; ArizeTelemetryStore logs predictions and feature signals; the read path falls back to local CSV |
| Gemini via Vertex AI | Set GOOGLE_CLOUD_PROJECT; agent sends structured evidence to Gemini for natural-language explanation; falls back to score_recurrence() if not configured |
- Rossmann data ends July 2015; only 2 complete December cycles are available.
The recurrence threshold (
recurrence_count >= 2) is at the lower bound of statistical credibility. sales_lag_365echoes the holiday uplift from the prior year intopack_defaultpredictions, inflating them in December and exaggerating the apparent failure — this is acknowledged, not hidden.- Feature signal strength uses
abs(Pearson r)of raw feature values vs. target in a sliding window. This is not the same as SHAP or model gain importance; the proxy is labeledfeature_signal_strengthto distinguish it from model-based importance.
MIT