Symptom (live test 2026-05-28)
The decile-impact reform question routed to Haiku 4.5 (visible in stream: "model": "claude-haiku-4-5"). Haiku is fast per turn but semantically weaker on API derivation — it needed many guesses to converge on the reform shape, and ultimately never converged within the iteration budget. Opus would likely have nailed it in 2–4 iterations.
Diagnosis
_select_chat_model (backend/routes/chatbot.py:153) routes by message length / token count. That heuristic doesn't capture cognitive difficulty — distributional simulation questions are short to ask but reasoning-heavy.
Fix
Add a keyword/intent check to upgrade to Opus when the latest user message contains any of:
decile, distributional, winners, losers
reform, increase the X, change the X by
1pp, 1 percentage point, 1%, points
marginal rate, effective rate (paired with by/from)
poverty, inequality, gini
Tune the list against real usage rather than guessing — start small, watch which queries the agent fumbles, expand.
Constraints
- Don't change pricing visibility (the cost UI already shows per-message £).
- Cache hits still preferred where possible — keep the routing decision cheap (no extra LLM call for routing).
- Should compose with Plan mode and Charts mode (no override conflict).
Open questions
- Should
charts_mode=True also bias toward Opus? Charts often imply distributional analysis. Probably yes.
- Should Plan mode always upgrade? Plan turns generate clarifying questions, low cognitive load — probably stay on Haiku.
Symptom (live test 2026-05-28)
The decile-impact reform question routed to Haiku 4.5 (visible in stream:
"model": "claude-haiku-4-5"). Haiku is fast per turn but semantically weaker on API derivation — it needed many guesses to converge on the reform shape, and ultimately never converged within the iteration budget. Opus would likely have nailed it in 2–4 iterations.Diagnosis
_select_chat_model(backend/routes/chatbot.py:153) routes by message length / token count. That heuristic doesn't capture cognitive difficulty — distributional simulation questions are short to ask but reasoning-heavy.Fix
Add a keyword/intent check to upgrade to Opus when the latest user message contains any of:
decile,distributional,winners,losersreform,increase the X,change the X by1pp,1 percentage point,1%,pointsmarginal rate,effective rate(paired withby/from)poverty,inequality,giniTune the list against real usage rather than guessing — start small, watch which queries the agent fumbles, expand.
Constraints
Open questions
charts_mode=Truealso bias toward Opus? Charts often imply distributional analysis. Probably yes.