Skip to content

Route reform/distributional questions to Opus#89

Open
vahid-ahmadi wants to merge 1 commit into
mainfrom
feat/opus-on-reform-questions
Open

Route reform/distributional questions to Opus#89
vahid-ahmadi wants to merge 1 commit into
mainfrom
feat/opus-on-reform-questions

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

Summary

  • Tunes _select_chat_model in backend/routes/chatbot.py to upgrade reform / distributional / reasoning-heavy turns from Haiku to Opus.
  • Plan mode always keeps the fast model (clarifying-question turns, low cognitive load) — Opus override does NOT apply there.
  • Length-based Sonnet fallback is preserved for non-reform questions whose input exceeds the fast-model context budget.

Why

Live test: a user asked a distributional/reform question, the stream confirmed routing to claude-haiku-4-5, and Haiku never converged on the reform API shape — it blew the 60-iteration budget guessing. Opus converges in 2–4 iterations on the same prompt. Per-turn Opus is ~5x Haiku, but reforms on Haiku reach the iteration cap and waste ~15x more turns, so net cost should go DOWN.

Heuristic

Case-insensitive substring/regex match on the latest user message only:

Keywords (substring, case-insensitive):

  • Distributional: decile, quintile, distributional, winners, losers, poverty, inequality, gini
  • Reform shape: reform, increase the, raise the, cut the, change the, replace, freeze, uprate, bump
  • Marginal/effective: marginal rate, effective rate, marginal tax, effective tax
  • Magnitude: percentage point (catches "points" too), 1pp

Regex for magnitudes not captured as substrings:

(?:\bby\s+\d+(?:\.\d+)?\s*%)
| (?:\bfrom\s+\d+(?:\.\d+)?\s*%\s*to\s+\d+(?:\.\d+)?\s*%)
| (?:\b\d+\s*pp\b)

This catches by 5%, from 20% to 25%, 2pp.

Wiring

  • chat_request.plan_mode and chat_request.charts_mode are passed as kwargs to _select_chat_model (preferred over overriding outside — keeps the routing decision in one place).
  • charts_mode=True also upgrades to Opus (charts usually imply distributional analysis).
  • plan_mode=True ALWAYS returns DEFAULT_FAST_MODEL, regardless of any other signal.

Model constant

Added DEFAULT_REASONING_MODEL (defaults to claude-opus-4-5, env-overridable via ANTHROPIC_REASONING_MODEL) — there was no Opus constant in this file yet despite the issue's wording. Naming/version follows the same convention as claude-haiku-4-5 / claude-sonnet-4-6.

Logging

Each Opus upgrade emits:

[MODEL] Routed to Opus (reform signal: 'decile')
[MODEL] Routed to Opus (charts_mode=True)

Cost note

Opus ~5x Haiku per turn. But on reform questions Haiku reaches max_iterations=60; Opus converges in 2–4. Net cost should drop.

Test plan

  • AST validates on Python 3.13 target (one pre-existing 3.10-only f-string on line 771 is unrelated)
  • Smoke-tested regex + keyword matching standalone
  • Added TestSelectChatModel unit tests in backend/tests/test_api.py:
    • decile + reform → Opus
    • plain question → Haiku
    • plan_mode override beats reform signal → Haiku
    • charts_mode alone → Opus
  • Re-run the original failing user question end-to-end and confirm the stream shows "model": "claude-opus-4-5"

Closes #83

🤖 Generated with Claude Code

Haiku tends to burn the iteration budget guessing the reform-API shape on
distributional / reform-shape questions; Opus converges in 2–4 iterations
on the same prompt, so net cost goes down even though per-turn cost is
higher.

Add a cheap pure-Python heuristic in `_select_chat_model` that looks at
the latest user message for:

- distributional vocabulary (decile, quintile, distributional, winners,
  losers, poverty, inequality, gini)
- reform-shape verbs/nouns (reform, increase/raise/cut/change the,
  replace, freeze, uprate, bump)
- marginal/effective rate reasoning
- magnitude expressions (1pp, "percentage point", "by N%",
  "from X% to Y%") via a small compiled regex

`charts_mode=True` also upgrades (charts usually imply distributional
analysis). `plan_mode=True` ALWAYS keeps the fast model — plan turns are
just clarifying questions, low cognitive load.

Length-based Sonnet fallback is preserved for non-reform questions whose
input exceeds the fast-model context budget.

Closes #83

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment May 29, 2026 9:11am

Request Review

@github-actions
Copy link
Copy Markdown

Beta preview is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to Opus on distributional / reform questions in _select_chat_model

1 participant