fix: retry transient Bedrock-fallback billing 403s instead of aborting#708
Draft
posthog[bot] wants to merge 1 commit into
Draft
fix: retry transient Bedrock-fallback billing 403s instead of aborting#708posthog[bot] wants to merge 1 commit into
posthog[bot] wants to merge 1 commit into
Conversation
The gateway routes failed Anthropic calls to AWS Bedrock (the wizard forces `x-posthog-use-bedrock-fallback: true`). Bedrock can answer with a 403 `INVALID_PAYMENT_INSTRUMENT` / AWS Marketplace subscription error — a transient, PostHog-side billing condition that clears on its own and whose own message advises retrying after ~2 minutes. Previously `output-signals.ts` only special-cased 401/429, so this 403 fell through to a generic API_ERROR and the linear runner escalated it to a fatal "report this to wizard@posthog.com" abort, breaking the whole integration flow. Classify the billing 403 as a new `PROVISIONING_ERROR`, retry it with backoff in the linear runner, and if retries are exhausted surface a friendly "temporarily unavailable, try again shortly" message instead of the terminal abort. Generated-By: PostHog Code Task-Id: 089e04c7-5d59-41f2-a9ab-8119a97a1c3a
🧙 Wizard CIRun the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands: Test all apps:
Test all apps in a directory:
Test an individual app:
Show more apps
Results will be posted here when complete. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Users running
npx @posthog/wizardcould have their entire integration flow crash with an unrecoverable error when PostHog's LLM gateway falls back to AWS Bedrock and Bedrock returns a billing-related 403.The wizard forces
x-posthog-use-bedrock-fallback: true, so a failed Anthropic call is re-routed to Bedrock. Bedrock can answer with a 403INVALID_PAYMENT_INSTRUMENT/ AWS Marketplace subscription error — a transient, PostHog-side billing condition that clears on its own and whose own message advises retrying after ~2 minutes. Butoutput-signals.tsonly special-cased 401 (auth) and 429 (rate limit), so the 403 fell through to a genericAPI_ERROR, and the linear runner escalated it to a fatalWizardError+ abort with the terminal "Please report this to wizard@posthog.com" message — terminating the run with no retry.Changes
AgentErrorType.PROVISIONING_ERRORfor transient provisioning/billing failures from the Bedrock fallback.AgentOutputSignals.hasProvisioningError()detects the 403 +INVALID_PAYMENT_INSTRUMENT/AWS Marketplacesignature;agent-interface.tsclassifies it ahead of the genericAPI_ERROR.runWithProvisioningRetryhelper retries the agent run with backoff (~2 min, twice, per the upstream guidance), wired into the linear runner.Test plan
Covered by CI. Added unit tests for: the new signal detection (and negatives — a plain 403 / billing keywords on a non-403 must not match), the retry helper (no-retry on success, retry-then-succeed, give-up-after-backoff, no-retry on other error types), and
runAgentclassifying the billing 403 asPROVISIONING_ERROR. Full suite (1043 tests) + lint green.LLM context
Authored by PostHog Code (Claude) from an error-tracking inbox report.
Created with PostHog Code from an inbox report.