Skip to content

Add reform recipes to system-prompt library reference #85

@vahid-ahmadi

Description

@vahid-ahmadi

Symptom (live test 2026-05-28)

Asked the agent to compute the decile impact of "increasing the income tax basic rate by 1 percentage point". The agent ran run_python 15+ times trying to construct the reform, each attempt producing nonsense results (revenue dropping when the rate goes up), never converged, never produced a chart. Several minutes of compute wasted on API guessing.

The agent's own commentary during the run:

"There's clearly a bug in how I'm specifying the tax brackets - they're causing revenue to DROP, not rise."

There's no bug. The agent guessed the wrong shape for Parameters(income_tax={...}) — likely bracket override that wipes the rest of the schedule instead of an additive change.

Diagnosis

The cached library reference passed via _build_system_blocks (backend/routes/chatbot.py:170) apparently doesn't include concrete reform examples. Without them, the model has to re-derive the API on every cold session and often fails.

Fix

Add 3–5 canonical worked examples to the library reference (whatever string is being cached) — exact Python the model can copy:

  • Basic rate +1pp (the failure case from today)
  • Personal allowance increase (e.g. £12,570 → £15,000)
  • NI primary threshold change
  • Child benefit uprating
  • Marriage allowance toggle

Each recipe should show the precise Parameters(income_tax=...) / Parameters(national_insurance=...) etc. shape that produces a partial reform (not a full schedule replacement).

~50 lines of system-prompt content. One-time cost. Cached.

Constraints

  • Don't add new tools.
  • Don't change _build_system_blocks or run_python sandbox.
  • Examples should be copy-pasteable Python.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions