LLM gateway: rate-limit retry + disable SDK retry (phase 2 of #352)#356
Closed
rockfordlhotka wants to merge 1 commit intomainfrom
Closed
LLM gateway: rate-limit retry + disable SDK retry (phase 2 of #352)#356rockfordlhotka wants to merge 1 commit intomainfrom
rockfordlhotka wants to merge 1 commit intomainfrom
Conversation
Adds retry-on-429 to the gateway with Retry-After honoring and
exponential-backoff fallback. Disables the OpenAI SDK's built-in retry
in every chat-client construction site so gateway and SDK don't
double-retry silently on rate-limit responses.
Phase 2 scope per design/llm-gateway.md:
Gateway side:
- LlmGatewayOptions adds MaxRateLimitRetries (default 5) and
MaxBackoffSeconds (default 16)
- ILlmRateLimitClassifier strategy with DefaultLlmRateLimitClassifier
recognizing ClientResultException status 429 (with Retry-After header
parsing) and HttpRequestException with TooManyRequests status
- LlmGateway.ExecuteWithRetryAsync wraps the operation: on a classifier-
recognized rate limit, honors Retry-After when supplied or falls back
to exponential backoff (1s, 2s, 4s, ...) capped at MaxBackoffSeconds.
Slot is held throughout (releasing wouldn't help since rate limits
are per-tier). Sleep is ct-aware so cancellation aborts the retry.
- New metric: rockbot.llm.gateway.rate_limit_retries (counter), tagged
by tier and retry-after source ("header" vs "backoff")
- Constructor validates MaxRateLimitRetries >= 0 and MaxBackoffSeconds >= 1
SDK side:
- Adds RetryPolicy = new ClientRetryPolicy(maxRetries: 0) to every
OpenAIClient construction in agent Program.cs files (Agent,
ResearchAgent, SampleAgent). The Sample HTTP agent and the embedding
client are intentionally not modified — they don't route through the
gateway.
Tests:
- 8 new gateway retry tests (success-after-retry, exhaustion, Retry-
After honoring, exponential backoff, cancel-during-wait, non-rate-
limit-not-retried) using a stubbed FakeRateLimitClassifier
- 5 new DefaultLlmRateLimitClassifier tests covering positive/negative
HttpRequestException paths and inner-exception walking
- All 21 gateway+classifier unit tests pass; all 637 Host tests pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
|
Closing without merging — see thread on the design tradeoff. Decision: re-enable the OpenAI SDK retry (more mature than the gateway retry we'd build: jitter, 5xx, transient network errors) and let the gateway focus on its load-bearing role of concurrency capping. The design doc and #352 will be updated accordingly. The phase 2 gateway-retry code is discarded with this PR; the SDK retry was never actually disabled in main since #355 only landed phase 1. |
This was referenced May 8, 2026
rockfordlhotka
added a commit
that referenced
this pull request
May 8, 2026
…ency (#357) The original design called for the gateway to own rate-limit retry and for the SDK's built-in retry to be disabled. Implementation in PR #356 revealed that the OpenAI SDK's ClientRetryPolicy is more capable than what the gateway would reasonably build: - Honors Retry-After on 429 and 503 - Exponential backoff with jitter (the gateway version had no jitter, risking thundering-herd retry on shared rate limits) - Retries on 408, 429, 5xx, and transient network errors (the gateway version handled only 429) - Maintained as providers change behavior Reimplementing this in the gateway traded a more mature retry for a less mature one, with no benefit beyond centralized metrics. PR #356 was closed without merging. Updates the design doc to reflect the decision: - Retry stays in the SDK pipeline, runs inside the gateway slot - Gateway focuses on concurrency capping, cancellation, and queue-depth telemetry — the load-bearing pieces - Pipeline-level retry telemetry recorded as a future enhancement - Implementation phases renumbered: phase 3 = bounded queue, phase 4 = ct audit + mandatory ct on ILlmClient - Non-goals section explicitly records "owning rate-limit retry" as rejected, with reasoning, so this decision survives future drift Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second of four phases implementing
design/llm-gateway.md. Adds retry-on-429 to the gateway and disables the OpenAI SDK's built-in retry so the two layers don't fight.Builds on phase 1 (#355).
Summary
Gateway side
LlmGatewayOptions.MaxRateLimitRetries(default 5) andMaxBackoffSeconds(default 16)ILlmRateLimitClassifierstrategy withDefaultLlmRateLimitClassifierrecognisingClientResultExceptionstatus 429 (withRetry-Afterheader parsing — both numeric seconds and HTTP-date forms) andHttpRequestExceptionwithTooManyRequestsstatus. Walks inner-exception chain for wrapped errors.LlmGateway.ExecuteWithRetryAsyncwraps the operation: on a recognised rate-limit, honoursRetry-Afterwhen present, falls back to exponential backoff (1s, 2s, 4s, ...) capped atMaxBackoffSeconds. Slot is held throughout (releasing wouldn't help — rate limits are per-tier). Sleep isct-aware so cancellation aborts the retry wait.rockbot.llm.gateway.rate_limit_retries(counter) tagged by tier andretry_after_source("header"vs"backoff").MaxRateLimitRetries >= 0,MaxBackoffSeconds >= 1.SDK side
RetryPolicy = new ClientRetryPolicy(maxRetries: 0)added to everyOpenAIClientconstruction in agentProgram.csfiles (RockBot.Agent,RockBot.ResearchAgent,RockBot.SampleAgent). The Sample HTTP agent and the embedding client are intentionally untouched — neither routes through the gateway.Tests
LlmGatewayTests.cs): success-after-retry, exhaustion,Retry-Afterhonouring, exponential backoff, cancel-during-wait, non-rate-limit-not-retried, plus options validationDefaultLlmRateLimitClassifierTests.cs): positive/negativeHttpRequestExceptionpaths, inner-exception walking, AggregateException-collection-not-walked (documenting current behaviour)Test plan
dotnet build RockBot.slnx— cleandotnet test --filter "FullyQualifiedName~LlmGatewayTests|FullyQualifiedName~DefaultLlmRateLimitClassifierTests"— 21/21 passOut of scope (later phases)
MaxPendingPerTier) with fail-fastctmandatory onILlmClient🤖 Generated with Claude Code