Skip to content

LLM gateway: per-tier concurrency cap (phase 1 of #352)#355

Merged
rockfordlhotka merged 1 commit intomainfrom
feature/llm-gateway-core
May 7, 2026
Merged

LLM gateway: per-tier concurrency cap (phase 1 of #352)#355
rockfordlhotka merged 1 commit intomainfrom
feature/llm-gateway-core

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

First of four phases implementing design/llm-gateway.md. Phase 1 lands the gateway core: a singleton ILlmGateway that all LLM calls flow through, with per-tier SemaphoreSlim concurrency caps. This unblocks parallelism work (notably the observation framework, #353) by ensuring bursty parallel callers can't overwhelm a tier.

Summary

  • New LlmGateway (singleton) and ILlmGateway interface
  • New LlmGatewayOptions with per-tier concurrency caps; defaults Low=8, Balanced=4, High=2
  • LlmClient.CallTierAsync now routes the actual SDK invocation through the gateway, preserving the existing tier-fallback and SDK-quirk-retry behavior outside the gated section
  • New metric rockbot.llm.gateway.slot_wait.duration (histogram, ms) tagged by tier
  • Internal GetPendingCount / GetInFlightCount accessors surfaced for diagnostics and tests
  • 8 unit tests (LlmGatewayTests.cs) covering cap enforcement, ct propagation, per-tier independence, exception-releases-slot, and cancel-while-waiting cleanup

User-priority semantics emerge from cancellation: when dream's ct fires, every pending waiter on the gateway throws OperationCanceledException and the user call's WaitAsync completes promptly. No priority lanes needed.

Out of scope (later phases)

  • Phase 2: retry-on-429 with Retry-After honor + exponential fallback; disable provider SDK retry
  • Phase 3: bounded queue depth (MaxPendingPerTier) with fail-fast
  • Phase 4: ct propagation audit + make ct mandatory on ILlmClient

Test plan

  • dotnet build RockBot.slnx — clean
  • dotnet test tests/RockBot.Host.Tests/RockBot.Host.Tests.csproj --filter "FullyQualifiedName~LlmGatewayTests" — 8/8 pass
  • Full Host suite — 624/624 pass (existing tests unaffected by LlmClient ctor change)
  • Full solution test run — all projects pass

🤖 Generated with Claude Code

Adds the gateway core: a singleton ILlmGateway with per-tier
SemaphoreSlim instances that all LLM calls flow through. Bursty
parallel callers cannot overwhelm a tier; pending waiters cancel
automatically when their ct fires, which is how user work effectively
preempts dream-cycle work without an explicit priority queue.

Phase 1 scope per design/llm-gateway.md:
- LlmGateway singleton with per-tier semaphores (Low/Balanced/High)
- LlmGatewayOptions with configurable concurrency caps
  (defaults: Low=8, Balanced=4, High=2)
- LlmClient.CallTierAsync routes through the gateway, preserving
  existing tier-fallback and SDK-quirk-retry behavior
- New metric: rockbot.llm.gateway.slot_wait.duration
- Internal counters (Pending, InFlight) surfaced for diagnostics
  and tests
- 8 unit tests covering cap enforcement, ct propagation,
  cross-tier independence, slot release on exception, and
  cancellation-while-waiting cleanup

Phases 2-4 (retry-on-429, bounded queue, ct audit) land separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 76ae8b9 into main May 7, 2026
2 checks passed
@rockfordlhotka rockfordlhotka deleted the feature/llm-gateway-core branch May 7, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant