LLM gateway: bounded queue with fail-fast (phase 3 of #352)#358
Merged
rockfordlhotka merged 1 commit intomainfrom May 8, 2026
Merged
LLM gateway: bounded queue with fail-fast (phase 3 of #352)#358rockfordlhotka merged 1 commit intomainfrom
rockfordlhotka merged 1 commit intomainfrom
Conversation
Adds a per-tier capacity cap on the gateway: total in-flight + queued
callers cannot exceed MaxConcurrent + MaxPending. Beyond that, new calls
fail fast with LlmGatewaySaturatedException rather than waiting
indefinitely. Backstop against runaway loops or saturation cascades.
Phase 3 scope per design/llm-gateway.md:
- LlmGatewayOptions adds Low/Balanced/HighMaxPending (defaults 64/32/16
— generous; this is a backstop, not a normal-operations limit)
- LlmGatewaySaturatedException (in RockBot.Host.Abstractions so callers
can catch without internal-type dependency) carries the saturated
Tier and the CapacityCap that was exceeded
- LlmGateway tracks a per-tier Active counter (Pending + InFlight);
atomically increments on entry, checks vs MaxConcurrent + MaxPending,
and rejects with the typed exception if the cap is exceeded. Outer
try/finally ensures Active is always decremented (rejection,
cancellation, exception, success — all paths)
- Constructor validation: MaxPending >= 0
- Startup log line updated to surface both caps per tier
("Low=8+64 Balanced=4+32 High=2+16")
- New metric: rockbot.llm.gateway.saturation_rejections (counter,
tagged by tier)
- 4 new tests: at-saturation throws typed exception with correct
metadata; rejections do not consume capacity (5 rejects in a row,
Active still 1); saturation is per-tier (Low full, Balanced/High
unaffected); negative MaxPending rejected at construction
All 12 gateway tests pass; all 628 Host tests pass; full solution test
run clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Third of the remaining LLM gateway phases per
design/llm-gateway.md. Adds per-tier capacity bounds so the gateway fails fast under saturation instead of accumulating an unbounded queue of waiters.Builds on phase 1 (#355). Phase 2 was dropped (#357 / #356); SDK retry stays enabled and runs inside the gateway slot.
Summary
LlmGatewayOptionsaddsLowMaxPending(64),BalancedMaxPending(32),HighMaxPending(16). Defaults are intentionally generous — this is a backstop against runaway loops or saturation cascades, not a normal-operations limit.LlmGatewaySaturatedException(public, inRockBot.Host.Abstractions) carries the saturatedTierand theCapacityCapthat was exceeded.LlmGatewaytracks a per-tierActivecounter (Pending + InFlight). On entry it atomically increments, checks againstMaxConcurrent + MaxPending, and rejects with the typed exception if the cap is exceeded. Outertry/finallyguaranteesActiveis decremented on every exit path (rejection, cancellation, exception, success).MaxPending >= 0."Low=8+64 Balanced=4+32 High=2+16 (MaxConcurrent + MaxPending)".rockbot.llm.gateway.saturation_rejections(counter, tagged by tier). A non-zero rate is the operator signal to investigate.Tests
4 new tests added to
LlmGatewayTests:ExecuteAsync_AtSaturation_ThrowsLlmGatewaySaturatedException— at cap, the next call throws with correctTier+CapacityCap; after drain, calls succeed againExecuteAsync_RejectionDoesNotConsumeCapacity— 5 rejections in a row leave the in-flight count at 1 (no leaked tickets)ExecuteAsync_SaturationIsPerTier— saturating Low does not affect Balanced or HighConstructor_NegativeMaxPending_Throws— options validationTest plan
dotnet build RockBot.slnx— cleandotnet test --filter "FullyQualifiedName~LlmGatewayTests"— 12/12 passOut of scope (next phase)
ctmandatory onILlmClient. Lands separately.🤖 Generated with Claude Code