B3: concurrency limit with 429 backpressure#404
Open
MrTravisB wants to merge 1 commit intotravis/pilo-sentry-scrubberfrom
Open
B3: concurrency limit with 429 backpressure#404MrTravisB wants to merge 1 commit intotravis/pilo-sentry-scrubberfrom
MrTravisB wants to merge 1 commit intotravis/pilo-sentry-scrubberfrom
Conversation
lmorchard
reviewed
Apr 28, 2026
Collaborator
lmorchard
left a comment
There was a problem hiding this comment.
We run multiple replicas in production for Pilo server, so I think in-process won't quite work there?
Comment on lines
+13
to
+15
| * In-process only — fine for the current single-instance deployment. If we | ||
| * ever scale to multiple replicas, replace with a Redis-backed token bucket | ||
| * or rely on the load balancer's queue depth. |
Collaborator
There was a problem hiding this comment.
Hmm, we might need this sooner than later for Pilo server production: looks like we currently run at least 3 replicas there and a max of 30. So, this would only cap concurrency per replica?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third PR in Stack B (server observability + capacity protection). Stacked
on #403 (B2).
Caps how many tasks can run concurrently across all SSE and WebSocket
connections. When at the limit, new requests get an immediate
AT_CAPACITYrejection (HTTP 429 withRetry-Afterfor SSE; structurederror event for WS) instead of queuing indefinitely on the server and
tying up file descriptors and memory.
Why
Today, the server accepts unbounded concurrent tasks. If the AI provider
slows down or browser pool is saturated, requests pile up holding SSE
streams open, exhausting FDs and burning memory. A bounded queue with
clean backpressure lets the caller (tabs-api) handle overload gracefully
rather than waiting forever.
Behavior
PILO_MAX_CONCURRENT_TASKS(default10). Read lazily sotests can override at runtime; production sets once at startup.
the
finallyof the SSE stream callback / WS task execution block.HTTP 429withRetry-After: 30header and the structurederror shape (
code: "AT_CAPACITY",reason: "AT_CAPACITY",phase: "setup",taskId).errorevent with the same structured payload. Connectionstays open so the client can retry without reconnecting.
taskRunningcheck firebefore the capacity check, so misconfigured requests don't burn slots.
Changes
packages/server/src/concurrency.ts- new module:tryAcquire,release,getInflight,getMaxConcurrent, plus_resetInflightfortests
packages/server/src/taskRunner.ts- addAT_CAPACITYto theErrorReasonenum andREASON_HINTSmappackages/server/src/routes/pilo.ts- acquire after validation, return429 with
Retry-Afterif at capacity, release in stream finallypackages/server/src/routes/piloWs.ts- acquire after validation,send AT_CAPACITY error event if at capacity, release in finally
test/concurrency.test.ts(11): tryAcquire/release semantics, envvar parsing, default/fallback handling
routes/pilo.test.ts(+2): 429 response shape, slot release aftersuccessful task
routes/piloWs.test.ts(+3): AT_CAPACITY error + no runTask call,slot release after success, slot release after thrown error
Out of scope (explicit)
pilo.inflight_tasksgauge metric - mentioned in the originalplan but deferred to Stack C2's metrics bridge, where we'll already be
setting up the OTel metrics SDK pattern. Adding it here in isolation
would duplicate that work.
/healthexposure of inflight count - dropped per offline scopingdecision (we're not changing the health endpoint in Stack B).
single-instance deployment. Multi-replica would need Redis-backed
tokens or a load-balancer-level queue depth check; track separately.
Test plan
pnpm --filter pilo-server run test(129 passing: 114 prior + 15 new)pnpm --filter pilo-core run test(still green)pnpm run typecheckgreenpnpm run format:checkgreenNotes
travis/pilo-sentry-scrubber(stacks on B2: scrub Sentry events and add per-request scope tags #403)10matches typical browserless instance sizing; overridevia
PILO_MAX_CONCURRENT_TASKSif your deployment has different headroom