feat: add per-program concurrency limits with backpressure#44
feat: add per-program concurrency limits with backpressure#44isaacbmiller wants to merge 1 commit intomainfrom
Conversation
Add asyncio.Semaphore per program (default 20, configurable via server.max_concurrent_per_program in config). Requests queue for up to 30s then get 429 Too Many Requests. Prevents unbounded concurrency from overwhelming upstream LLM providers. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83e9440d46
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if program_name not in _program_semaphores: | ||
| _program_semaphores[program_name] = asyncio.Semaphore(max_concurrent) | ||
| sem = _program_semaphores[program_name] |
There was a problem hiding this comment.
Keep concurrency semaphores per app instance
Using the module-level _program_semaphores cache here makes semaphore state global across all create_app() calls in the same Python process, so apps with the same program_name end up sharing a single limiter and the first app’s max_concurrent_per_program setting is reused by later apps. This can produce unexpected 429s (cross-app backpressure) and ignores updated concurrency config in test/integration setups or embedded multi-app deployments; storing semaphores on app.state (or rebuilding them per app) avoids this bleed-through.
Useful? React with 👍 / 👎.
|
Dropping -- the semaphore overlaps with the bounded executor (#42) for sync modules, doesn't match how LLM providers actually rate-limit (tokens/min, not concurrent connections), and the wait_for(sem.acquire()) pattern has edge cases pre-3.11. DSPy/LiteLLM already handle provider 429s with exponential backoff. Real rate limiting belongs at the infra layer. |
Split from #38 (5/5)
Changes
Add an
asyncio.Semaphoreper program that limits concurrent in-flight requests. When the limit is exceeded, new requests queue for up to 30 seconds, then receive a 429 Too Many Requests response.Configuration
server.max_concurrent_per_programin dspy.config.yaml (default: 20)Why this matters
Without backpressure, a burst of requests can spawn unlimited concurrent LLM calls, overwhelming the upstream provider's rate limits and causing cascading failures. The semaphore makes the concurrency limit explicit and configurable.
Files changed
src/dspy_cli/server/routes.py(20 insertions)Testing
All 123 existing tests pass.