Skip to content

Add canonical reform recipes to cached library reference#88

Open
vahid-ahmadi wants to merge 1 commit into
mainfrom
feat/system-prompt-reform-recipes
Open

Add canonical reform recipes to cached library reference#88
vahid-ahmadi wants to merge 1 commit into
mainfrom
feat/system-prompt-reform-recipes

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

Summary

A recent live test failed dramatically: a user asked for the distributional effect of raising income tax basic rate by 1pp, and the agent ran run_python 15+ times trying to construct the reform — every guess producing wrong results (revenue dropping when the rate went up). Root cause: Parameters(income_tax={"uk_brackets": [...]}) REPLACES the full schedule, so passing only the basic-rate band silently zeroed out higher and additional rates. The run never converged (262s wasted).

This PR adds a "Reform recipes" section to the cached library reference produced by scripts/build_reference.py, so the agent sees five copy-pasteable reform constructions inside the prompt-cached reference document.

Recipes

  1. Basic rate +1pp via the full uk_brackets list (the actual failure case) — VERIFIED against backend/tests/test_agent_tools.py::test_uk_brackets_reform.
  2. Personal allowance £12,570 -> £15,000 — VERIFIED against backend/tests/test_agent_tools.py::test_valid_reform.
  3. NI primary threshold +£1,000 — unverified; uses conventional naming, recipe carries an inline "verify against the JSON schema" note.
  4. Child benefit +10% uprating — unverified; uses conventional amount_for_first_child / amount_for_additional_child, recipe carries an inline verify note.
  5. Marriage allowance toggle — unverified; placed under income_tax, recipe carries an inline verify note.

The Parameters JSON schema (already dumped in this reference) is the authoritative fallback when a field name is rejected — the recipes explicitly point the agent at it.

Where it lands

  • File: backend/scripts/build_reference.py — new REFORM_RECIPES constant inserted into render() between Public API and Parameters JSON schema sections.
  • Output: backend/reference.md (generated at deploy time) — picked up by _build_system_blocks in backend/routes/chatbot.py inside the existing cache_control={"type":"ephemeral"} block. One-time cache invalidation on merge, then cached again on subsequent requests.
  • Section is ~80 lines of system-prompt content.

Constraints respected

  • No changes to _build_system_blocks signature or logic.
  • No new tools, no new dependencies.
  • No frontend touched.
  • Recipes live inside the cached blob (no extra cache breakpoint).

Test plan

  • CI is green (python -c "import ast; ast.parse(open('backend/scripts/build_reference.py').read())" already passes locally).
  • After merge, run scripts/build_reference.py in the deploy image and confirm reference.md contains the "Reform recipes" section.
  • Live-test the original prompt ("distributional effect of raising income tax basic rate by 1pp") and confirm the agent converges in one tool call.
  • Verify recipes 3–5 against the actual pe.Parameters.model_json_schema() and, if any field names are wrong, follow up with a fix-up PR.

Closes #85

🤖 Generated with Claude Code

In a recent live session a user asked for the distributional effect of
raising the income tax basic rate by 1pp. The agent ran `run_python` 15+
times trying to construct the reform, every guess producing wrong results
(revenue dropping when the rate went up) because
`Parameters(income_tax={"uk_brackets": [...]})` REPLACES the full
schedule — passing a single-bracket list zeroed out higher and additional
rates. The run never converged (262s of wasted compute).

Add a "Reform recipes" section to the cached library reference produced
by scripts/build_reference.py. Each recipe is exact, copy-pasteable
Python the agent can adapt verbatim:

  1. Basic rate +1pp via the full `uk_brackets` list (the failure case) — VERIFIED
  2. Personal allowance change — VERIFIED
  3. NI primary threshold change — unverified, marked for follow-up
  4. Child benefit uprating — unverified, marked for follow-up
  5. Marriage allowance toggle — unverified, marked for follow-up

Recipes 1–2 are verified against backend/tests/test_agent_tools.py
(`uk_brackets` and `personal_allowance` are exercised there). Recipes
3–5 use the conventional PolicyEngine UK naming from
`_build_compiled_policy` in backend/agent_tools.py and reference the
Parameters JSON schema dump (which lives in the same reference file) as
the authoritative source if a field name is rejected.

The section is inserted between Public API and Parameters JSON schema in
`render()` so it lands inside the existing cached `REFERENCE_DOC` block
in `_build_system_blocks` — one-time cache invalidation on this merge,
then cached again on every subsequent request.

Closes #85

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment May 29, 2026 9:10am

Request Review

@github-actions
Copy link
Copy Markdown

Beta preview is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add reform recipes to system-prompt library reference

1 participant