Skip to content

Comments

feat: add rate limit handling with automatic retry for OpenRouter API#14

Open
armand0e wants to merge 3 commits intomainfrom
feat/adaptive-rate-limit-gating
Open

feat: add rate limit handling with automatic retry for OpenRouter API#14
armand0e wants to merge 3 commits intomainfrom
feat/adaptive-rate-limit-gating

Conversation

@armand0e
Copy link
Contributor

Add global rate limit state tracking per model/endpoint and implement retry logic with exponential backoff for 429 responses. Parse x-ratelimit-reset and x-ratelimit-remaining headers to coordinate wait times across concurrent requests. Retry up to 5 times with calculated delays based on reset timestamps or fallback to exponential backoff (1s, 2s, 3s, etc.) in case outputs aren't successfully parsed. Non-invasive and non-breaking.

All tests passed:

✔ parseArgs requires model and prompts (1.0002ms)
✔ parseArgs defaults store-system to true (0.0923ms)
✔ parseArgs defaults concurrent to 1 (0.0497ms)
✔ parseArgs parses --concurrent (0.065ms)
✔ parseArgs parses OpenRouter provider flags (1.4609ms)
✔ parseArgs parses --reasoningEffort (0.079ms)
✔ parseArgs parses --openrouter.isFree (0.0592ms)
✔ parseArgs supports --config YAML (10.1642ms)
✔ parseArgs lets CLI override config (4.4964ms)
✔ buildRequestMessages omits system when empty (0.1571ms)
✔ buildOutputMessages respects storeSystem flag (0.1172ms)
✔ formatAssistantContent wraps reasoning in <think> (0.0488ms)
✔ callOpenRouter sends correct payload and parses reasoning (0.3448ms)
✔ callOpenRouter includes provider prefs when provided (0.1282ms)
✔ callOpenRouter includes reasoning.effort when provided (0.0767ms)
✔ callOpenRouter reasoning.effort works for non-OpenRouter apiBase (0.0646ms)
✔ ensureReadableFile throws if missing or not a file (1.9111ms)
ℹ tests 17
ℹ suites 0
ℹ pass 17
ℹ fail 0
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 75.0296

Add global rate limit state tracking per model/endpoint and implement retry logic with exponential backoff for 429 responses. Parse `x-ratelimit-reset` and `x-ratelimit-remaining` headers to coordinate wait times across concurrent requests. Retry up to 5 times with calculated delays based on reset timestamps or exponential backoff (1s, 2s, 3s, etc.). Track rate limit state in Map keyed by `apiBase|model` to prevent redundant requests when
@armand0e
Copy link
Contributor Author

dont merge yet. I will be bulletproofing this later today to ensure this is the final PR for this functionality

@armand0e armand0e marked this pull request as draft February 13, 2026 21:26
@armand0e armand0e marked this pull request as ready for review February 16, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant