LLM gateway: rate-limit retry + disable SDK retry (phase 2 of #352) by rockfordlhotka · Pull Request #356 · MarimerLLC/rockbot

rockfordlhotka · 2026-05-07T21:56:14Z

Second of four phases implementing design/llm-gateway.md. Adds retry-on-429 to the gateway and disables the OpenAI SDK's built-in retry so the two layers don't fight.

Builds on phase 1 (#355).

Summary

Gateway side

LlmGatewayOptions.MaxRateLimitRetries (default 5) and MaxBackoffSeconds (default 16)
ILlmRateLimitClassifier strategy with DefaultLlmRateLimitClassifier recognising ClientResultException status 429 (with Retry-After header parsing — both numeric seconds and HTTP-date forms) and HttpRequestException with TooManyRequests status. Walks inner-exception chain for wrapped errors.
LlmGateway.ExecuteWithRetryAsync wraps the operation: on a recognised rate-limit, honours Retry-After when present, falls back to exponential backoff (1s, 2s, 4s, ...) capped at MaxBackoffSeconds. Slot is held throughout (releasing wouldn't help — rate limits are per-tier). Sleep is ct-aware so cancellation aborts the retry wait.
New metric rockbot.llm.gateway.rate_limit_retries (counter) tagged by tier and retry_after_source ("header" vs "backoff").
New constructor validation: MaxRateLimitRetries >= 0, MaxBackoffSeconds >= 1.

SDK side

RetryPolicy = new ClientRetryPolicy(maxRetries: 0) added to every OpenAIClient construction in agent Program.cs files (RockBot.Agent, RockBot.ResearchAgent, RockBot.SampleAgent). The Sample HTTP agent and the embedding client are intentionally untouched — neither routes through the gateway.

Tests

8 new gateway retry tests (LlmGatewayTests.cs): success-after-retry, exhaustion, Retry-After honouring, exponential backoff, cancel-during-wait, non-rate-limit-not-retried, plus options validation
5 new classifier tests (DefaultLlmRateLimitClassifierTests.cs): positive/negative HttpRequestException paths, inner-exception walking, AggregateException-collection-not-walked (documenting current behaviour)

Test plan

dotnet build RockBot.slnx — clean
dotnet test --filter "FullyQualifiedName~LlmGatewayTests|FullyQualifiedName~DefaultLlmRateLimitClassifierTests" — 21/21 pass
Full Host suite — 637/637 pass
Full solution test run — all projects pass

Out of scope (later phases)

Phase 3: bounded queue depth (MaxPendingPerTier) with fail-fast
Phase 4: ct propagation audit + make ct mandatory on ILlmClient

🤖 Generated with Claude Code

Adds retry-on-429 to the gateway with Retry-After honoring and exponential-backoff fallback. Disables the OpenAI SDK's built-in retry in every chat-client construction site so gateway and SDK don't double-retry silently on rate-limit responses. Phase 2 scope per design/llm-gateway.md: Gateway side: - LlmGatewayOptions adds MaxRateLimitRetries (default 5) and MaxBackoffSeconds (default 16) - ILlmRateLimitClassifier strategy with DefaultLlmRateLimitClassifier recognizing ClientResultException status 429 (with Retry-After header parsing) and HttpRequestException with TooManyRequests status - LlmGateway.ExecuteWithRetryAsync wraps the operation: on a classifier- recognized rate limit, honors Retry-After when supplied or falls back to exponential backoff (1s, 2s, 4s, ...) capped at MaxBackoffSeconds. Slot is held throughout (releasing wouldn't help since rate limits are per-tier). Sleep is ct-aware so cancellation aborts the retry. - New metric: rockbot.llm.gateway.rate_limit_retries (counter), tagged by tier and retry-after source ("header" vs "backoff") - Constructor validates MaxRateLimitRetries >= 0 and MaxBackoffSeconds >= 1 SDK side: - Adds RetryPolicy = new ClientRetryPolicy(maxRetries: 0) to every OpenAIClient construction in agent Program.cs files (Agent, ResearchAgent, SampleAgent). The Sample HTTP agent and the embedding client are intentionally not modified — they don't route through the gateway. Tests: - 8 new gateway retry tests (success-after-retry, exhaustion, Retry- After honoring, exponential backoff, cancel-during-wait, non-rate- limit-not-retried) using a stubbed FakeRateLimitClassifier - 5 new DefaultLlmRateLimitClassifier tests covering positive/negative HttpRequestException paths and inner-exception walking - All 21 gateway+classifier unit tests pass; all 637 Host tests pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rockfordlhotka · 2026-05-08T00:53:47Z

Closing without merging — see thread on the design tradeoff. Decision: re-enable the OpenAI SDK retry (more mature than the gateway retry we'd build: jitter, 5xx, transient network errors) and let the gateway focus on its load-bearing role of concurrency capping. The design doc and #352 will be updated accordingly. The phase 2 gateway-retry code is discarded with this PR; the SDK retry was never actually disabled in main since #355 only landed phase 1.

…ency (#357) The original design called for the gateway to own rate-limit retry and for the SDK's built-in retry to be disabled. Implementation in PR #356 revealed that the OpenAI SDK's ClientRetryPolicy is more capable than what the gateway would reasonably build: - Honors Retry-After on 429 and 503 - Exponential backoff with jitter (the gateway version had no jitter, risking thundering-herd retry on shared rate limits) - Retries on 408, 429, 5xx, and transient network errors (the gateway version handled only 429) - Maintained as providers change behavior Reimplementing this in the gateway traded a more mature retry for a less mature one, with no benefit beyond centralized metrics. PR #356 was closed without merging. Updates the design doc to reflect the decision: - Retry stays in the SDK pipeline, runs inside the gateway slot - Gateway focuses on concurrency capping, cancellation, and queue-depth telemetry — the load-bearing pieces - Pipeline-level retry telemetry recorded as a future enhancement - Implementation phases renumbered: phase 3 = bounded queue, phase 4 = ct audit + mandatory ct on ILlmClient - Non-goals section explicitly records "owning rate-limit retry" as rejected, with reasoning, so this decision survives future drift Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rockfordlhotka closed this May 8, 2026

rockfordlhotka deleted the feature/llm-gateway-retry branch May 8, 2026 00:53

This was referenced May 8, 2026

Revise LLM gateway design: SDK owns retry #357

Merged

LLM gateway: per-tier concurrency cap, retry, ct propagation #352

Closed

rockfordlhotka mentioned this pull request May 8, 2026

LLM gateway: bounded queue with fail-fast (phase 3 of #352) #358

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM gateway: rate-limit retry + disable SDK retry (phase 2 of #352)#356

LLM gateway: rate-limit retry + disable SDK retry (phase 2 of #352)#356
rockfordlhotka wants to merge 1 commit intomainfrom
feature/llm-gateway-retry

rockfordlhotka commented May 7, 2026

Uh oh!

rockfordlhotka commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockfordlhotka commented May 7, 2026

Summary

Tests

Test plan

Out of scope (later phases)

Uh oh!

rockfordlhotka commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant