feat(retrier): add context-aware retries via DoWithContext#17
Conversation
Introduce RetryableFuncWithContext and DoWithContext to support cancellation and deadlines using context.Context. Implement supporting helpers for attempt handling and backoff that respect context (e.g., shouldRetry, waitRetry, handleAttemptError, runAttemptWithContext, ensureInitialized). Update README with guidance and examples for long-running operations using DoWithContext. Expand PRD with acceptance criteria and test plan for context-aware retries. Add TestDoWithContext_Cancel to verify cancellation propagates (expects context.Canceled). No breaking changes; existing Do() behavior remains unchanged.
There was a problem hiding this comment.
Pull request overview
This PR introduces context-aware retry functionality by adding a new DoWithContext method that accepts functions observing context cancellation. The implementation refactors the existing Do method to extract reusable helper functions (shouldRetry, waitRetry, handleAttemptError) and adds new infrastructure (runAttemptWithContext, attemptOutcome type) to support concurrent execution with proper cancellation handling.
Changes:
- Added
RetryableFuncWithContexttype andDoWithContextmethod for context-aware retries - Refactored
Domethod to use extracted helper functions - Updated documentation in README and PRD to explain when to use context-aware retries
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| retrier.go | Added DoWithContext method, RetryableFuncWithContext type, and helper methods (shouldRetry, waitRetry, handleAttemptError, runAttemptWithContext, ensureInitialized); refactored Do to use helpers |
| tests/retrier_test.go | Added TestDoWithContext_Cancel to verify context cancellation behavior |
| README.md | Added example and guidance for using DoWithContext with long-running operations |
| PRD.md | Updated goals, acceptance criteria, and test plan to include context-aware retry support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (r *Retrier) runAttemptWithContext( | ||
| ctx context.Context, | ||
| timeoutTimer *time.Timer, | ||
| retryableFunc RetryableFuncWithContext, | ||
| ) (attemptErr, terminalErr error) { | ||
| attemptCtx, attemptCancel := context.WithCancelCause(ctx) | ||
| resultCh := make(chan error, 1) | ||
|
|
||
| go func() { | ||
| resultCh <- retryableFunc(attemptCtx) | ||
| }() | ||
|
|
||
| select { | ||
| case err := <-resultCh: | ||
| attemptCancel(nil) | ||
|
|
||
| return err, nil | ||
| case <-ctx.Done(): | ||
| attemptCancel(context.Cause(ctx)) | ||
|
|
||
| return nil, ewrap.Wrap(ctx.Err(), "context done") | ||
| case <-r.ctx.Done(): | ||
| attemptCancel(ErrOperationStopped) | ||
| //nolint:contextcheck | ||
| return nil, ewrap.Wrap(context.Cause(r.ctx), "retrier context done") | ||
| case <-timeoutTimer.C: | ||
| attemptCancel(ErrTimeoutReached) | ||
|
|
||
| return nil, ErrTimeoutReached | ||
| } | ||
| } |
There was a problem hiding this comment.
The goroutine launched in runAttemptWithContext may leak if the select statement exits through the ctx.Done(), r.ctx.Done(), or timeoutTimer.C cases. After calling attemptCancel, the function returns immediately without waiting for the goroutine to complete. The goroutine will continue executing retryableFunc(attemptCtx) and attempt to send to resultCh, but no receiver will be reading from it. While the channel is buffered with size 1, this prevents immediate blocking, but the goroutine itself is not guaranteed to terminate promptly if retryableFunc is long-running or doesn't observe cancellation. Consider adding a mechanism to ensure the goroutine completes, or document that retryableFunc must observe context cancellation to avoid goroutine leaks.
| func TestDoWithContext_Cancel(t *testing.T) { | ||
| r, err := again.NewRetrier( | ||
| context.Background(), | ||
| again.WithMaxRetries(1), | ||
| again.WithInterval(10*time.Millisecond), | ||
| again.WithTimeout(1*time.Second), | ||
| ) | ||
| if err != nil { | ||
| t.Fatalf("failed to create retrier: %v", err) | ||
| } | ||
|
|
||
| ctx, cancel := context.WithCancel(context.Background()) | ||
|
|
||
| errs := r.DoWithContext(ctx, func(ctx context.Context) error { | ||
| cancel() | ||
| <-ctx.Done() | ||
|
|
||
| return ctx.Err() | ||
| }) | ||
|
|
||
| if errs.Last == nil || !errors.Is(errs.Last, context.Canceled) { | ||
| t.Errorf("expected last error to match %v, got %v", context.Canceled, errs.Last) | ||
| } | ||
| } |
There was a problem hiding this comment.
This test has a race condition. The function calls cancel() and then waits for ctx.Done() before returning ctx.Err(). However, in runAttemptWithContext, there's a race between the function completing (case err := <-resultCh) and the context being detected as done (case <-ctx.Done()). Depending on which case wins, the error handling path differs: if resultCh wins, context.Canceled is treated as attemptErr and goes through shouldRetry logic; if ctx.Done() wins, it's returned as a terminalErr. This makes the test behavior non-deterministic. Consider using a more controlled scenario, such as a longer-running operation or explicit synchronization to ensure the desired execution path is consistently tested.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Introduce RetryableFuncWithContext and DoWithContext to support cancellation and deadlines using context.Context. Implement supporting helpers for attempt handling and backoff that respect context (e.g., shouldRetry, waitRetry, handleAttemptError, runAttemptWithContext, ensureInitialized). Update README with guidance and examples for long-running operations using DoWithContext. Expand PRD with acceptance criteria and test plan for context-aware retries. Add TestDoWithContext_Cancel to verify cancellation propagates (expects context.Canceled).
No breaking changes; existing Do() behavior remains unchanged.