LLM gateway: per-tier concurrency cap (phase 1 of #352) by rockfordlhotka · Pull Request #355 · MarimerLLC/rockbot

rockfordlhotka · 2026-05-07T20:47:13Z

First of four phases implementing design/llm-gateway.md. Phase 1 lands the gateway core: a singleton ILlmGateway that all LLM calls flow through, with per-tier SemaphoreSlim concurrency caps. This unblocks parallelism work (notably the observation framework, #353) by ensuring bursty parallel callers can't overwhelm a tier.

Summary

New LlmGateway (singleton) and ILlmGateway interface
New LlmGatewayOptions with per-tier concurrency caps; defaults Low=8, Balanced=4, High=2
LlmClient.CallTierAsync now routes the actual SDK invocation through the gateway, preserving the existing tier-fallback and SDK-quirk-retry behavior outside the gated section
New metric rockbot.llm.gateway.slot_wait.duration (histogram, ms) tagged by tier
Internal GetPendingCount / GetInFlightCount accessors surfaced for diagnostics and tests
8 unit tests (LlmGatewayTests.cs) covering cap enforcement, ct propagation, per-tier independence, exception-releases-slot, and cancel-while-waiting cleanup

User-priority semantics emerge from cancellation: when dream's ct fires, every pending waiter on the gateway throws OperationCanceledException and the user call's WaitAsync completes promptly. No priority lanes needed.

Out of scope (later phases)

Phase 2: retry-on-429 with Retry-After honor + exponential fallback; disable provider SDK retry
Phase 3: bounded queue depth (MaxPendingPerTier) with fail-fast
Phase 4: ct propagation audit + make ct mandatory on ILlmClient

Test plan

dotnet build RockBot.slnx — clean
dotnet test tests/RockBot.Host.Tests/RockBot.Host.Tests.csproj --filter "FullyQualifiedName~LlmGatewayTests" — 8/8 pass
Full Host suite — 624/624 pass (existing tests unaffected by LlmClient ctor change)
Full solution test run — all projects pass

🤖 Generated with Claude Code

Adds the gateway core: a singleton ILlmGateway with per-tier SemaphoreSlim instances that all LLM calls flow through. Bursty parallel callers cannot overwhelm a tier; pending waiters cancel automatically when their ct fires, which is how user work effectively preempts dream-cycle work without an explicit priority queue. Phase 1 scope per design/llm-gateway.md: - LlmGateway singleton with per-tier semaphores (Low/Balanced/High) - LlmGatewayOptions with configurable concurrency caps (defaults: Low=8, Balanced=4, High=2) - LlmClient.CallTierAsync routes through the gateway, preserving existing tier-fallback and SDK-quirk-retry behavior - New metric: rockbot.llm.gateway.slot_wait.duration - Internal counters (Pending, InFlight) surfaced for diagnostics and tests - 8 unit tests covering cap enforcement, ct propagation, cross-tier independence, slot release on exception, and cancellation-while-waiting cleanup Phases 2-4 (retry-on-429, bounded queue, ct audit) land separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rockfordlhotka merged commit 76ae8b9 into main May 7, 2026
2 checks passed

rockfordlhotka deleted the feature/llm-gateway-core branch May 7, 2026 21:46

This was referenced May 7, 2026

LLM gateway: rate-limit retry + disable SDK retry (phase 2 of #352) #356

Closed

LLM gateway: bounded queue with fail-fast (phase 3 of #352) #358

Merged

LLM gateway: ct audit + mandatory ct on ILlmClient (phase 4 of #352) #359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM gateway: per-tier concurrency cap (phase 1 of #352)#355

LLM gateway: per-tier concurrency cap (phase 1 of #352)#355
rockfordlhotka merged 1 commit intomainfrom
feature/llm-gateway-core

rockfordlhotka commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockfordlhotka commented May 7, 2026

Summary

Out of scope (later phases)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant