feat(retry): add exponential backoff for transient API failures#91
feat(retry): add exponential backoff for transient API failures#91qorexdev wants to merge 2 commits intolinearis-oss:mainfrom
Conversation
retry on 429 and 5xx, up to 3 attempts with 500ms base delay. applies to GraphQLClient.request() which is the shared entry point for all API calls. Closes linearis-oss#62
There was a problem hiding this comment.
Pull request overview
Adds a shared retry utility with exponential backoff and wires it into the GraphQL request path to improve resilience against transient API failures (rate limiting / server errors), along with unit tests for the retry helper.
Changes:
- Introduces
isRetryable()andwithRetry()insrc/common/retry.ts(exponential backoff, configurable retries/delay). - Wraps
GraphQLClient.request()withwithRetry()to retry failed requests. - Adds unit tests for
isRetryableandwithRetrybehavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/unit/common/retry.test.ts | Adds unit coverage for retryability detection and retry loop behavior. |
| src/common/retry.ts | Implements retry classification and exponential backoff retry wrapper. |
| src/client/graphql-client.ts | Applies retry wrapper to GraphQL requests and keeps existing auth/error mapping behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/client/graphql-client.ts
Outdated
| return withRetry(async () => { | ||
| try { | ||
| const response = await Promise.race([ | ||
| this.rawClient.rawRequest(print(document), variables), | ||
| new Promise<never>((_, reject) => | ||
| setTimeout( | ||
| () => reject(new Error("Request timed out")), | ||
| REQUEST_TIMEOUT_MS, | ||
| ), | ||
| ), | ||
| ), | ||
| ]); | ||
| return response.data as TResult; | ||
| } catch (error: unknown) { | ||
| const gqlError = error as GraphQLErrorResponse; | ||
| const errorMessage = gqlError.response?.errors?.[0]?.message ?? ""; | ||
| ]); | ||
| return response.data as TResult; | ||
| } catch (error: unknown) { | ||
| const gqlError = error as GraphQLErrorResponse; | ||
| const errorMessage = gqlError.response?.errors?.[0]?.message ?? ""; | ||
|
|
||
| if (isAuthError(new Error(errorMessage))) { | ||
| throw new AuthenticationError(errorMessage || undefined); | ||
| } | ||
| if (isAuthError(new Error(errorMessage))) { | ||
| throw new AuthenticationError(errorMessage || undefined); | ||
| } | ||
|
|
||
| if (errorMessage) { | ||
| throw new Error(errorMessage); | ||
| if (errorMessage) { | ||
| throw new Error(errorMessage); | ||
| } | ||
| throw new Error( | ||
| `GraphQL request failed: ${error instanceof Error ? error.message : String(error)}`, | ||
| ); |
There was a problem hiding this comment.
withRetry wraps a function that catches the original SDK error and rethrows a newly created Error (or AuthenticationError). That loses the original error shape (e.g., response.status), so isRetryable() will never see 429/5xx and retries won’t trigger for the main intended cases. Consider moving withRetry to wrap only the rawRequest/timeout call and do the auth/error-message mapping after retries are exhausted, or preserve/rethrow the original error (with status) for retryable cases and only translate it once you’re done retrying.
src/client/graphql-client.ts
Outdated
| return withRetry(async () => { | ||
| try { | ||
| const response = await Promise.race([ | ||
| this.rawClient.rawRequest(print(document), variables), | ||
| new Promise<never>((_, reject) => | ||
| setTimeout( | ||
| () => reject(new Error("Request timed out")), | ||
| REQUEST_TIMEOUT_MS, | ||
| ), | ||
| ), | ||
| ), | ||
| ]); | ||
| return response.data as TResult; | ||
| } catch (error: unknown) { | ||
| const gqlError = error as GraphQLErrorResponse; | ||
| const errorMessage = gqlError.response?.errors?.[0]?.message ?? ""; | ||
| ]); | ||
| return response.data as TResult; | ||
| } catch (error: unknown) { |
There was a problem hiding this comment.
No unit test currently asserts that GraphQLClient.request() actually retries on transient failures (429/5xx) and stops retrying on non-retryable errors. Adding a test that mocks rawRequest to fail with { response: { status: 503 } } once (or up to max) and then succeed would catch the current integration issue and prevent regressions; using fake timers would keep the test fast despite backoff delays.
| describe("withRetry", () => { | ||
| it("returns result on first success", async () => { | ||
| const fn = vi.fn().mockResolvedValueOnce("ok"); | ||
| const result = await withRetry(fn, { baseDelayMs: 1 }); | ||
| expect(result).toBe("ok"); | ||
| expect(fn).toHaveBeenCalledTimes(1); | ||
| }); | ||
|
|
||
| it("retries on 429 and succeeds", async () => { | ||
| const fn = vi | ||
| .fn() | ||
| .mockRejectedValueOnce({ response: { status: 429 } }) | ||
| .mockResolvedValueOnce("ok"); | ||
| const result = await withRetry(fn, { maxRetries: 3, baseDelayMs: 1 }); | ||
| expect(result).toBe("ok"); | ||
| expect(fn).toHaveBeenCalledTimes(2); | ||
| }); | ||
|
|
||
| it("retries on 503 up to maxRetries then throws", async () => { | ||
| const err = { response: { status: 503 } }; | ||
| const fn = vi.fn().mockRejectedValue(err); | ||
| await expect( | ||
| withRetry(fn, { maxRetries: 2, baseDelayMs: 1 }), | ||
| ).rejects.toEqual(err); | ||
| expect(fn).toHaveBeenCalledTimes(3); // 1 initial + 2 retries |
There was a problem hiding this comment.
The tests for withRetry validate retry counts, but they don’t verify the exponential backoff timing (e.g., 500ms → 1s → 2s). Since backoff is a core requirement of this PR, consider adding a test with fake timers that asserts the scheduled delays (or at least the sequence of setTimeout durations) to ensure future changes don’t accidentally make the backoff linear or constant.
iamfj
left a comment
There was a problem hiding this comment.
Hey @qorexdev — nice work on the retry utility itself. isRetryable and withRetry are clean, well-typed, and the test suite covers the important cases. The exponential backoff math is correct and the code is easy to follow. Good stuff.
I found one issue that unfortunately breaks the core feature, though. See the inline comment on graphql-client.ts for the full explanation, but the short version: withRetry currently wraps the entire method body including the error handler, so by the time isRetryable() inspects the error, the HTTP status code has been stripped. 429 and 5xx errors are never actually retried.
The fix is to wrap only the raw HTTP call in withRetry and let the error handler run after retries are exhausted. Once that's in place, I'd also suggest adding a test in graphql-client.test.ts that mocks a 429→success sequence to prove the integration works end-to-end.
Small thing: the issue (#62) also mentions FileService — totally fine to leave that for a follow-up, just noting it's still open.
Looking forward to the next iteration — this is close!
src/client/graphql-client.ts
Outdated
| setTimeout( | ||
| () => reject(new Error("Request timed out")), | ||
| REQUEST_TIMEOUT_MS, | ||
| return withRetry(async () => { |
There was a problem hiding this comment.
This wraps the error handler inside withRetry, which means the original error's response.status gets stripped before isRetryable() ever sees it.
When the API returns a 429, the catch block below transforms it into a plain Error("GraphQL request failed: ...") — no more .response.status for isRetryable() to check. So it returns false and the error is thrown immediately without retrying.
To fix, wrap only the raw call so the error handler runs after retries are exhausted:
try {
const response = await withRetry(() =>
Promise.race([
this.rawClient.rawRequest(print(document), variables),
new Promise<never>((_, reject) =>
setTimeout(
() => reject(new Error("Request timed out")),
REQUEST_TIMEOUT_MS,
),
),
])
);
return response.data as TResult;
} catch (error: unknown) {
// existing error handling here — runs after retries exhausted
...
}This way isRetryable() sees the original SDK error with response.status intact, and the error handler only runs once we've given up retrying.
I'd also recommend adding a test in graphql-client.test.ts that mocks rawRequest to reject with { response: { status: 429 } } on the first call and resolve on the second — that would have caught this.
|
good catch — the try-catch was eating the http status before isRetryable() could see it. moved withRetry to wrap only the rawRequest call, so the original error propagates to isRetryable on retryable failures, and the error handler only runs after retries are exhausted. also added a test that mocks 429→success to prove the integration works end-to-end. |
closes #62
adds
withRetryinsrc/common/retry.tsand applies it toGraphQLClient.request(). retries on 429 and 5xx up to 3 times with 500ms base delay (doubles each attempt). non-retryable errors (auth 401, 404, etc.) throw immediately.12 new tests covering
isRetryableandwithRetrybehavior.