Skip to content

LLM gateway: bounded queue with fail-fast (phase 3 of #352)#358

Merged
rockfordlhotka merged 1 commit intomainfrom
feature/llm-gateway-bounded-queue
May 8, 2026
Merged

LLM gateway: bounded queue with fail-fast (phase 3 of #352)#358
rockfordlhotka merged 1 commit intomainfrom
feature/llm-gateway-bounded-queue

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

Third of the remaining LLM gateway phases per design/llm-gateway.md. Adds per-tier capacity bounds so the gateway fails fast under saturation instead of accumulating an unbounded queue of waiters.

Builds on phase 1 (#355). Phase 2 was dropped (#357 / #356); SDK retry stays enabled and runs inside the gateway slot.

Summary

  • LlmGatewayOptions adds LowMaxPending (64), BalancedMaxPending (32), HighMaxPending (16). Defaults are intentionally generous — this is a backstop against runaway loops or saturation cascades, not a normal-operations limit.
  • LlmGatewaySaturatedException (public, in RockBot.Host.Abstractions) carries the saturated Tier and the CapacityCap that was exceeded.
  • LlmGateway tracks a per-tier Active counter (Pending + InFlight). On entry it atomically increments, checks against MaxConcurrent + MaxPending, and rejects with the typed exception if the cap is exceeded. Outer try/finally guarantees Active is decremented on every exit path (rejection, cancellation, exception, success).
  • Constructor validation: MaxPending >= 0.
  • Startup log line updated to surface both caps: "Low=8+64 Balanced=4+32 High=2+16 (MaxConcurrent + MaxPending)".
  • New metric rockbot.llm.gateway.saturation_rejections (counter, tagged by tier). A non-zero rate is the operator signal to investigate.

Tests

4 new tests added to LlmGatewayTests:

  • ExecuteAsync_AtSaturation_ThrowsLlmGatewaySaturatedException — at cap, the next call throws with correct Tier + CapacityCap; after drain, calls succeed again
  • ExecuteAsync_RejectionDoesNotConsumeCapacity — 5 rejections in a row leave the in-flight count at 1 (no leaked tickets)
  • ExecuteAsync_SaturationIsPerTier — saturating Low does not affect Balanced or High
  • Constructor_NegativeMaxPending_Throws — options validation

Test plan

  • dotnet build RockBot.slnx — clean
  • dotnet test --filter "FullyQualifiedName~LlmGatewayTests" — 12/12 pass
  • Full Host suite — 628/628 pass
  • Full solution test run — all projects pass

Out of scope (next phase)

  • Phase 4: ct propagation audit + make ct mandatory on ILlmClient. Lands separately.

🤖 Generated with Claude Code

Adds a per-tier capacity cap on the gateway: total in-flight + queued
callers cannot exceed MaxConcurrent + MaxPending. Beyond that, new calls
fail fast with LlmGatewaySaturatedException rather than waiting
indefinitely. Backstop against runaway loops or saturation cascades.

Phase 3 scope per design/llm-gateway.md:

- LlmGatewayOptions adds Low/Balanced/HighMaxPending (defaults 64/32/16
  — generous; this is a backstop, not a normal-operations limit)
- LlmGatewaySaturatedException (in RockBot.Host.Abstractions so callers
  can catch without internal-type dependency) carries the saturated
  Tier and the CapacityCap that was exceeded
- LlmGateway tracks a per-tier Active counter (Pending + InFlight);
  atomically increments on entry, checks vs MaxConcurrent + MaxPending,
  and rejects with the typed exception if the cap is exceeded. Outer
  try/finally ensures Active is always decremented (rejection,
  cancellation, exception, success — all paths)
- Constructor validation: MaxPending >= 0
- Startup log line updated to surface both caps per tier
  ("Low=8+64 Balanced=4+32 High=2+16")
- New metric: rockbot.llm.gateway.saturation_rejections (counter,
  tagged by tier)
- 4 new tests: at-saturation throws typed exception with correct
  metadata; rejections do not consume capacity (5 rejects in a row,
  Active still 1); saturation is per-tier (Low full, Balanced/High
  unaffected); negative MaxPending rejected at construction

All 12 gateway tests pass; all 628 Host tests pass; full solution test
run clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 72016c1 into main May 8, 2026
2 checks passed
@rockfordlhotka rockfordlhotka deleted the feature/llm-gateway-bounded-queue branch May 8, 2026 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant