Skip to content

B3: concurrency limit with 429 backpressure#404

Open
MrTravisB wants to merge 1 commit intotravis/pilo-sentry-scrubberfrom
travis/pilo-concurrency-limit
Open

B3: concurrency limit with 429 backpressure#404
MrTravisB wants to merge 1 commit intotravis/pilo-sentry-scrubberfrom
travis/pilo-concurrency-limit

Conversation

@MrTravisB
Copy link
Copy Markdown
Collaborator

Summary

Third PR in Stack B (server observability + capacity protection). Stacked
on #403 (B2).

Caps how many tasks can run concurrently across all SSE and WebSocket
connections. When at the limit, new requests get an immediate
AT_CAPACITY rejection (HTTP 429 with Retry-After for SSE; structured
error event for WS) instead of queuing indefinitely on the server and
tying up file descriptors and memory.

Why

Today, the server accepts unbounded concurrent tasks. If the AI provider
slows down or browser pool is saturated, requests pile up holding SSE
streams open, exhausting FDs and burning memory. A bounded queue with
clean backpressure lets the caller (tabs-api) handle overload gracefully
rather than waiting forever.

Behavior

  • New env var: PILO_MAX_CONCURRENT_TASKS (default 10). Read lazily so
    tests can override at runtime; production sets once at startup.
  • Each accepted task increments an in-process counter; release happens in
    the finally of the SSE stream callback / WS task execution block.
  • When at capacity:
    • SSE: HTTP 429 with Retry-After: 30 header and the structured
      error shape (code: "AT_CAPACITY", reason: "AT_CAPACITY",
      phase: "setup", taskId).
    • WS: an error event with the same structured payload. Connection
      stays open so the client can retry without reconnecting.
  • Slot is released on every termination path: success, error, abort.
  • Validation errors and the per-WS-connection taskRunning check fire
    before the capacity check, so misconfigured requests don't burn slots.

Changes

  • packages/server/src/concurrency.ts - new module: tryAcquire,
    release, getInflight, getMaxConcurrent, plus _resetInflight for
    tests
  • packages/server/src/taskRunner.ts - add AT_CAPACITY to the
    ErrorReason enum and REASON_HINTS map
  • packages/server/src/routes/pilo.ts - acquire after validation, return
    429 with Retry-After if at capacity, release in stream finally
  • packages/server/src/routes/piloWs.ts - acquire after validation,
    send AT_CAPACITY error event if at capacity, release in finally
  • 16 new tests:
    • test/concurrency.test.ts (11): tryAcquire/release semantics, env
      var parsing, default/fallback handling
    • routes/pilo.test.ts (+2): 429 response shape, slot release after
      successful task
    • routes/piloWs.test.ts (+3): AT_CAPACITY error + no runTask call,
      slot release after success, slot release after thrown error

Out of scope (explicit)

  • OTel pilo.inflight_tasks gauge metric - mentioned in the original
    plan but deferred to Stack C2's metrics bridge, where we'll already be
    setting up the OTel metrics SDK pattern. Adding it here in isolation
    would duplicate that work.
  • /health exposure of inflight count - dropped per offline scoping
    decision (we're not changing the health endpoint in Stack B).
  • Multi-instance concurrency - the counter is in-process. Fine for
    single-instance deployment. Multi-replica would need Redis-backed
    tokens or a load-balancer-level queue depth check; track separately.

Test plan

  • pnpm --filter pilo-server run test (129 passing: 114 prior + 15 new)
  • pnpm --filter pilo-core run test (still green)
  • pnpm run typecheck green
  • pnpm run format:check green

Notes

  • Base: travis/pilo-sentry-scrubber (stacks on B2: scrub Sentry events and add per-request scope tags #403)
  • Default limit 10 matches typical browserless instance sizing; override
    via PILO_MAX_CONCURRENT_TASKS if your deployment has different headroom
  • Completes Stack B. Stack C (core observability) is next

Copy link
Copy Markdown
Collaborator

@lmorchard lmorchard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run multiple replicas in production for Pilo server, so I think in-process won't quite work there?

Comment on lines +13 to +15
* In-process only — fine for the current single-instance deployment. If we
* ever scale to multiple replicas, replace with a Redis-backed token bucket
* or rely on the load balancer's queue depth.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we might need this sooner than later for Pilo server production: looks like we currently run at least 3 replicas there and a max of 30. So, this would only cap concurrency per replica?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants