Revise LLM gateway design: SDK owns retry by rockfordlhotka · Pull Request #357 · MarimerLLC/rockbot

rockfordlhotka · 2026-05-08T00:56:07Z

Summary

Updates design/llm-gateway.md to reflect the decision made after attempting phase 2 of #352: keep the OpenAI SDK's ClientRetryPolicy enabled and let the gateway focus on concurrency capping. PR #356 (the gateway-retry implementation) was closed without merging.

The SDK retry is more mature than what the gateway would reasonably reimplement (Retry-After honoring + jitter + 429/5xx/transient-network coverage). Reimplementing it traded maturity for a small observability gain that can be addressed later via a PipelinePolicy listener if it becomes important.

Changes

Goals/Non-goals: removes "honor Retry-After" and "bounded retry" goals; adds an explicit non-goal "owning rate-limit retry" with reasoning so the decision survives future drift
Architecture diagram: updated to show retry happening in the SDK pipeline inside the gateway slot, not in the gateway itself
Retry policy section: rewritten to describe the SDK as the retry owner; notes the slot-held-during-retry behavior and the cross-provider implications
Implementation phases: phase 1 ✅ marked complete; phases renumbered (phase 3 = bounded queue, phase 4 = ct audit + mandatory ct on ILlmClient); the abandoned retry phase explicitly recorded
Open questions: replaces "retry budget per cycle" with "pipeline-level retry telemetry" as the relevant future concern

Test plan

Read updated design/llm-gateway.md end-to-end and confirm the rationale for keeping SDK retry is captured clearly enough to deter future "let's add gateway retry again" drift

🤖 Generated with Claude Code

…ency The original design called for the gateway to own rate-limit retry and for the SDK's built-in retry to be disabled. Implementation in PR #356 revealed that the OpenAI SDK's ClientRetryPolicy is more capable than what the gateway would reasonably build: - Honors Retry-After on 429 and 503 - Exponential backoff with jitter (the gateway version had no jitter, risking thundering-herd retry on shared rate limits) - Retries on 408, 429, 5xx, and transient network errors (the gateway version handled only 429) - Maintained as providers change behavior Reimplementing this in the gateway traded a more mature retry for a less mature one, with no benefit beyond centralized metrics. PR #356 was closed without merging. Updates the design doc to reflect the decision: - Retry stays in the SDK pipeline, runs inside the gateway slot - Gateway focuses on concurrency capping, cancellation, and queue-depth telemetry — the load-bearing pieces - Pipeline-level retry telemetry recorded as a future enhancement - Implementation phases renumbered: phase 3 = bounded queue, phase 4 = ct audit + mandatory ct on ILlmClient - Non-goals section explicitly records "owning rate-limit retry" as rejected, with reasoning, so this decision survives future drift Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rockfordlhotka mentioned this pull request May 8, 2026

LLM gateway: per-tier concurrency cap, retry, ct propagation #352

Closed

9 tasks

rockfordlhotka merged commit 1801cb4 into main May 8, 2026

rockfordlhotka deleted the design/llm-gateway-retry-decision branch May 8, 2026 01:18

rockfordlhotka mentioned this pull request May 8, 2026

LLM gateway: bounded queue with fail-fast (phase 3 of #352) #358

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise LLM gateway design: SDK owns retry#357

Revise LLM gateway design: SDK owns retry#357
rockfordlhotka merged 1 commit intomainfrom
design/llm-gateway-retry-decision

rockfordlhotka commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockfordlhotka commented May 8, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant