Route reform/distributional questions to Opus by vahid-ahmadi · Pull Request #89 · PolicyEngine/policyengine-uk-chat

vahid-ahmadi · 2026-05-29T09:10:51Z

Summary

Tunes _select_chat_model in backend/routes/chatbot.py to upgrade reform / distributional / reasoning-heavy turns from Haiku to Opus.
Plan mode always keeps the fast model (clarifying-question turns, low cognitive load) — Opus override does NOT apply there.
Length-based Sonnet fallback is preserved for non-reform questions whose input exceeds the fast-model context budget.

Why

Live test: a user asked a distributional/reform question, the stream confirmed routing to claude-haiku-4-5, and Haiku never converged on the reform API shape — it blew the 60-iteration budget guessing. Opus converges in 2–4 iterations on the same prompt. Per-turn Opus is ~5x Haiku, but reforms on Haiku reach the iteration cap and waste ~15x more turns, so net cost should go DOWN.

Heuristic

Case-insensitive substring/regex match on the latest user message only:

Keywords (substring, case-insensitive):

Distributional: decile, quintile, distributional, winners, losers, poverty, inequality, gini
Reform shape: reform, increase the, raise the, cut the, change the, replace, freeze, uprate, bump
Marginal/effective: marginal rate, effective rate, marginal tax, effective tax
Magnitude: percentage point (catches "points" too), 1pp

Regex for magnitudes not captured as substrings:

(?:\bby\s+\d+(?:\.\d+)?\s*%)
| (?:\bfrom\s+\d+(?:\.\d+)?\s*%\s*to\s+\d+(?:\.\d+)?\s*%)
| (?:\b\d+\s*pp\b)

This catches by 5%, from 20% to 25%, 2pp.

Wiring

chat_request.plan_mode and chat_request.charts_mode are passed as kwargs to _select_chat_model (preferred over overriding outside — keeps the routing decision in one place).
charts_mode=True also upgrades to Opus (charts usually imply distributional analysis).
plan_mode=True ALWAYS returns DEFAULT_FAST_MODEL, regardless of any other signal.

Model constant

Added DEFAULT_REASONING_MODEL (defaults to claude-opus-4-5, env-overridable via ANTHROPIC_REASONING_MODEL) — there was no Opus constant in this file yet despite the issue's wording. Naming/version follows the same convention as claude-haiku-4-5 / claude-sonnet-4-6.

Logging

Each Opus upgrade emits:

[MODEL] Routed to Opus (reform signal: 'decile')
[MODEL] Routed to Opus (charts_mode=True)

Cost note

Opus ~5x Haiku per turn. But on reform questions Haiku reaches max_iterations=60; Opus converges in 2–4. Net cost should drop.

Test plan

AST validates on Python 3.13 target (one pre-existing 3.10-only f-string on line 771 is unrelated)
Smoke-tested regex + keyword matching standalone
Added TestSelectChatModel unit tests in backend/tests/test_api.py:
- decile + reform → Opus
- plain question → Haiku
- plan_mode override beats reform signal → Haiku
- charts_mode alone → Opus
Re-run the original failing user question end-to-end and confirm the stream shows "model": "claude-opus-4-5"

Closes #83

🤖 Generated with Claude Code

Haiku tends to burn the iteration budget guessing the reform-API shape on distributional / reform-shape questions; Opus converges in 2–4 iterations on the same prompt, so net cost goes down even though per-turn cost is higher. Add a cheap pure-Python heuristic in `_select_chat_model` that looks at the latest user message for: - distributional vocabulary (decile, quintile, distributional, winners, losers, poverty, inequality, gini) - reform-shape verbs/nouns (reform, increase/raise/cut/change the, replace, freeze, uprate, bump) - marginal/effective rate reasoning - magnitude expressions (1pp, "percentage point", "by N%", "from X% to Y%") via a small compiled regex `charts_mode=True` also upgrades (charts usually imply distributional analysis). `plan_mode=True` ALWAYS keeps the fast model — plan turns are just clarifying questions, low cognitive load. Length-based Sonnet fallback is preserved for non-reform questions whose input exceeds the fast-model context budget. Closes #83 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-29T09:10:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-uk-chat	Ready	Preview, Comment	May 29, 2026 9:11am

github-actions · 2026-05-29T09:11:31Z

Beta preview is ready.

Frontend: open preview
Backend: open backend

vercel Bot deployed to Preview May 29, 2026 09:11 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Route reform/distributional questions to Opus#89

Route reform/distributional questions to Opus#89
vahid-ahmadi wants to merge 1 commit into
mainfrom
feat/opus-on-reform-questions

vahid-ahmadi commented May 29, 2026

Uh oh!

vercel Bot commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vahid-ahmadi commented May 29, 2026

Summary

Why

Heuristic

Wiring

Model constant

Logging

Cost note

Test plan

Uh oh!

vercel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 29, 2026 •

edited

Loading