Skip to content

fix: retry transient Bedrock-fallback billing 403s instead of aborting#708

Draft
posthog[bot] wants to merge 1 commit into
mainfrom
posthog-code/retry-bedrock-billing-403
Draft

fix: retry transient Bedrock-fallback billing 403s instead of aborting#708
posthog[bot] wants to merge 1 commit into
mainfrom
posthog-code/retry-bedrock-billing-403

Conversation

@posthog

@posthog posthog Bot commented Jun 22, 2026

Copy link
Copy Markdown

Problem

Users running npx @posthog/wizard could have their entire integration flow crash with an unrecoverable error when PostHog's LLM gateway falls back to AWS Bedrock and Bedrock returns a billing-related 403.

The wizard forces x-posthog-use-bedrock-fallback: true, so a failed Anthropic call is re-routed to Bedrock. Bedrock can answer with a 403 INVALID_PAYMENT_INSTRUMENT / AWS Marketplace subscription error — a transient, PostHog-side billing condition that clears on its own and whose own message advises retrying after ~2 minutes. But output-signals.ts only special-cased 401 (auth) and 429 (rate limit), so the 403 fell through to a generic API_ERROR, and the linear runner escalated it to a fatal WizardError + abort with the terminal "Please report this to wizard@posthog.com" message — terminating the run with no retry.

Changes

  • New AgentErrorType.PROVISIONING_ERROR for transient provisioning/billing failures from the Bedrock fallback.
  • AgentOutputSignals.hasProvisioningError() detects the 403 + INVALID_PAYMENT_INSTRUMENT / AWS Marketplace signature; agent-interface.ts classifies it ahead of the generic API_ERROR.
  • New runWithProvisioningRetry helper retries the agent run with backoff (~2 min, twice, per the upstream guidance), wired into the linear runner.
  • If retries are exhausted, the runner surfaces a friendly "model service is temporarily unavailable, try again shortly" message instead of the "report this" abort.

Test plan

Covered by CI. Added unit tests for: the new signal detection (and negatives — a plain 403 / billing keywords on a non-403 must not match), the retry helper (no-retry on success, retry-then-succeed, give-up-after-backoff, no-retry on other error types), and runAgent classifying the billing 403 as PROVISIONING_ERROR. Full suite (1043 tests) + lint green.

LLM context

Authored by PostHog Code (Claude) from an error-tracking inbox report.


Created with PostHog Code from an inbox report.

The gateway routes failed Anthropic calls to AWS Bedrock (the wizard forces
`x-posthog-use-bedrock-fallback: true`). Bedrock can answer with a 403
`INVALID_PAYMENT_INSTRUMENT` / AWS Marketplace subscription error — a transient,
PostHog-side billing condition that clears on its own and whose own message
advises retrying after ~2 minutes. Previously `output-signals.ts` only
special-cased 401/429, so this 403 fell through to a generic API_ERROR and the
linear runner escalated it to a fatal "report this to wizard@posthog.com" abort,
breaking the whole integration flow.

Classify the billing 403 as a new `PROVISIONING_ERROR`, retry it with backoff in
the linear runner, and if retries are exhausted surface a friendly
"temporarily unavailable, try again shortly" message instead of the terminal
abort.

Generated-By: PostHog Code
Task-Id: 089e04c7-5d59-41f2-a9ab-8119a97a1c3a
@github-actions

Copy link
Copy Markdown

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci basic-integration
  • /wizard-ci error-tracking-upload-source-maps
  • /wizard-ci misc
  • /wizard-ci revenue

Test an individual app:

  • /wizard-ci basic-integration/android
  • /wizard-ci basic-integration/angular
  • /wizard-ci basic-integration/astro
Show more apps
  • /wizard-ci basic-integration/django
  • /wizard-ci basic-integration/fastapi
  • /wizard-ci basic-integration/flask
  • /wizard-ci basic-integration/javascript-node
  • /wizard-ci basic-integration/javascript-web
  • /wizard-ci basic-integration/laravel
  • /wizard-ci basic-integration/next-js
  • /wizard-ci basic-integration/nuxt
  • /wizard-ci basic-integration/python
  • /wizard-ci basic-integration/rails
  • /wizard-ci basic-integration/react-native
  • /wizard-ci basic-integration/react-router
  • /wizard-ci basic-integration/sveltekit
  • /wizard-ci basic-integration/swift
  • /wizard-ci basic-integration/tanstack-router
  • /wizard-ci basic-integration/tanstack-start
  • /wizard-ci basic-integration/vue
  • /wizard-ci error-tracking-upload-source-maps/android
  • /wizard-ci error-tracking-upload-source-maps/cicd-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-nested-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-gitlab-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-ssh-vps-node-raw
  • /wizard-ci error-tracking-upload-source-maps/flutter
  • /wizard-ci error-tracking-upload-source-maps/ios
  • /wizard-ci error-tracking-upload-source-maps/next
  • /wizard-ci error-tracking-upload-source-maps/next-no-posthog
  • /wizard-ci error-tracking-upload-source-maps/node-raw
  • /wizard-ci error-tracking-upload-source-maps/node-rollup
  • /wizard-ci error-tracking-upload-source-maps/node-rollup-typescript-plugin
  • /wizard-ci error-tracking-upload-source-maps/node-webpack
  • /wizard-ci error-tracking-upload-source-maps/nuxt-3-6
  • /wizard-ci error-tracking-upload-source-maps/nuxt-4-3
  • /wizard-ci error-tracking-upload-source-maps/react-native
  • /wizard-ci error-tracking-upload-source-maps/react-vite
  • /wizard-ci error-tracking-upload-source-maps/rust
  • /wizard-ci misc/quack-quack
  • /wizard-ci revenue/stripe

Results will be posted here when complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants