feat: add per-program concurrency limits with backpressure by isaacbmiller · Pull Request #44 · cmpnd-ai/dspy-cli

isaacbmiller · 2026-02-28T19:54:03Z

Split from #38 (5/5)

Changes

Add an asyncio.Semaphore per program that limits concurrent in-flight requests. When the limit is exceeded, new requests queue for up to 30 seconds, then receive a 429 Too Many Requests response.

Configuration

server.max_concurrent_per_program in dspy.config.yaml (default: 20)

Why this matters

Without backpressure, a burst of requests can spawn unlimited concurrent LLM calls, overwhelming the upstream provider's rate limits and causing cascading failures. The semaphore makes the concurrency limit explicit and configurable.

Files changed

src/dspy_cli/server/routes.py (20 insertions)

Testing

All 123 existing tests pass.

Add asyncio.Semaphore per program (default 20, configurable via server.max_concurrent_per_program in config). Requests queue for up to 30s then get 429 Too Many Requests. Prevents unbounded concurrency from overwhelming upstream LLM providers. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83e9440d46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-28T19:56:31Z

+    if program_name not in _program_semaphores:
+        _program_semaphores[program_name] = asyncio.Semaphore(max_concurrent)
+    sem = _program_semaphores[program_name]


Keep concurrency semaphores per app instance

Using the module-level _program_semaphores cache here makes semaphore state global across all create_app() calls in the same Python process, so apps with the same program_name end up sharing a single limiter and the first app’s max_concurrent_per_program setting is reused by later apps. This can produce unexpected 429s (cross-app backpressure) and ignores updated concurrency config in test/integration setups or embedded multi-app deployments; storing semaphores on app.state (or rebuilding them per app) avoids this bleed-through.

Useful? React with 👍 / 👎.

isaacbmiller · 2026-02-28T21:48:49Z

Dropping -- the semaphore overlaps with the bounded executor (#42) for sync modules, doesn't match how LLM providers actually rate-limit (tokens/min, not concurrent connections), and the wait_for(sem.acquire()) pattern has edge cases pre-3.11. DSPy/LiteLLM already handle provider 429s with exponential backoff. Real rate limiting belongs at the infra layer.

chatgpt-codex-connector Bot reviewed Feb 28, 2026

View reviewed changes

isaacbmiller closed this Feb 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add per-program concurrency limits with backpressure#44

feat: add per-program concurrency limits with backpressure#44
isaacbmiller wants to merge 1 commit intomainfrom
isaac/perf-backpressure

isaacbmiller commented Feb 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 28, 2026

Uh oh!

isaacbmiller commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isaacbmiller commented Feb 28, 2026

Changes

Configuration

Why this matters

Files changed

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

isaacbmiller commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant