Revise LLM gateway design: SDK owns retry#357
Merged
rockfordlhotka merged 1 commit intomainfrom May 8, 2026
Merged
Conversation
…ency The original design called for the gateway to own rate-limit retry and for the SDK's built-in retry to be disabled. Implementation in PR #356 revealed that the OpenAI SDK's ClientRetryPolicy is more capable than what the gateway would reasonably build: - Honors Retry-After on 429 and 503 - Exponential backoff with jitter (the gateway version had no jitter, risking thundering-herd retry on shared rate limits) - Retries on 408, 429, 5xx, and transient network errors (the gateway version handled only 429) - Maintained as providers change behavior Reimplementing this in the gateway traded a more mature retry for a less mature one, with no benefit beyond centralized metrics. PR #356 was closed without merging. Updates the design doc to reflect the decision: - Retry stays in the SDK pipeline, runs inside the gateway slot - Gateway focuses on concurrency capping, cancellation, and queue-depth telemetry — the load-bearing pieces - Pipeline-level retry telemetry recorded as a future enhancement - Implementation phases renumbered: phase 3 = bounded queue, phase 4 = ct audit + mandatory ct on ILlmClient - Non-goals section explicitly records "owning rate-limit retry" as rejected, with reasoning, so this decision survives future drift Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9 tasks
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Updates
design/llm-gateway.mdto reflect the decision made after attempting phase 2 of #352: keep the OpenAI SDK'sClientRetryPolicyenabled and let the gateway focus on concurrency capping. PR #356 (the gateway-retry implementation) was closed without merging.The SDK retry is more mature than what the gateway would reasonably reimplement (
Retry-Afterhonoring + jitter + 429/5xx/transient-network coverage). Reimplementing it traded maturity for a small observability gain that can be addressed later via aPipelinePolicylistener if it becomes important.Changes
Test plan
design/llm-gateway.mdend-to-end and confirm the rationale for keeping SDK retry is captured clearly enough to deter future "let's add gateway retry again" drift🤖 Generated with Claude Code