WIP: US Python backend (latency spike — do not merge)#54
Conversation
- USPolicyEnginePythonBackend in model_backends.py: mirrors the UK Python backend, swaps to policyengine_us, capabilities() lists US variables and parameter roots, prompt notes the state_code requirement. - Add policyengine_us to backend/requirements.txt. - Frontend label map: UK (Compiled) / UK (Python) / US (Python). Known gaps deferred: - reference.md is still UK-compiled-only — Claude sees UK API docs on US backend. Acceptable for the initial latency smoke test. - System prompt still says "British English" and the title-generation route still calls itself "a UK tax and benefit policy assistant". - Modal region is "eu" — US response latency will reflect transatlantic hop, not a US-optimised deploy.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Beta preview is ready.
|
Smoke test findingsTested on preview: https://policyengine-uk-chat-git-feat-us-backend-policy-engine.vercel.app What works ✅
Verification — tight prompt:
Returned $1,616.00, which matches hand calc exactly (standard deduction $14,600 → taxable $15,400 → 10% × $11,600 + 12% × $3,800). What doesn't work yet
|
|
Closing — this was a latency-test spike to measure how long US sims take in the chat interface. Confirmed end-to-end working ($1,616 result on tight prompt matched hand calc), but the reference.md is UK-only so open-ended US prompts cause tool-call wandering. Findings are preserved in the PR comments. Branch |
Purpose
Spike to measure how long US simulations take in the chat interface. Not for merge — exists to deploy a preview where we can run timed prompts against the US backend.
Branches off PR #51 (
feat/model-backend-selector) so it includes the backend-selector + scenario_context plumbing already.What's in the diff
USPolicyEnginePythonBackendinbackend/model_backends.py, mirrors the UK Python backendpolicyengine_usadded tobackend/requirements.txtKnown gaps deferred for the latency test
reference.mdis UK-compiled-only. Claude sees UK API docs when US backend is selected. Will write some wrong code on the first attempt — that's part of what we want to measure (recovery latency).eu— US response latency will reflect the transatlantic hop, not a US-optimised deploy.Latency numbers to capture
uk_pythonWhat this PR does NOT do