Skip to content

feat: add per-program concurrency limits with backpressure#44

Closed
isaacbmiller wants to merge 1 commit intomainfrom
isaac/perf-backpressure
Closed

feat: add per-program concurrency limits with backpressure#44
isaacbmiller wants to merge 1 commit intomainfrom
isaac/perf-backpressure

Conversation

@isaacbmiller
Copy link
Copy Markdown
Contributor

Split from #38 (5/5)

Changes

Add an asyncio.Semaphore per program that limits concurrent in-flight requests. When the limit is exceeded, new requests queue for up to 30 seconds, then receive a 429 Too Many Requests response.

Configuration

  • server.max_concurrent_per_program in dspy.config.yaml (default: 20)

Why this matters

Without backpressure, a burst of requests can spawn unlimited concurrent LLM calls, overwhelming the upstream provider's rate limits and causing cascading failures. The semaphore makes the concurrency limit explicit and configurable.

Files changed

  • src/dspy_cli/server/routes.py (20 insertions)

Testing

All 123 existing tests pass.

Add asyncio.Semaphore per program (default 20, configurable via
server.max_concurrent_per_program in config). Requests queue for
up to 30s then get 429 Too Many Requests. Prevents unbounded
concurrency from overwhelming upstream LLM providers.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83e9440d46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +86 to +88
if program_name not in _program_semaphores:
_program_semaphores[program_name] = asyncio.Semaphore(max_concurrent)
sem = _program_semaphores[program_name]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep concurrency semaphores per app instance

Using the module-level _program_semaphores cache here makes semaphore state global across all create_app() calls in the same Python process, so apps with the same program_name end up sharing a single limiter and the first app’s max_concurrent_per_program setting is reused by later apps. This can produce unexpected 429s (cross-app backpressure) and ignores updated concurrency config in test/integration setups or embedded multi-app deployments; storing semaphores on app.state (or rebuilding them per app) avoids this bleed-through.

Useful? React with 👍 / 👎.

@isaacbmiller
Copy link
Copy Markdown
Contributor Author

Dropping -- the semaphore overlaps with the bounded executor (#42) for sync modules, doesn't match how LLM providers actually rate-limit (tokens/min, not concurrent connections), and the wait_for(sem.acquire()) pattern has edge cases pre-3.11. DSPy/LiteLLM already handle provider 429s with exponential backoff. Real rate limiting belongs at the infra layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant