feat(retrier): per-attempt error tracking, early stop, and init guards#16
feat(retrier): per-attempt error tracking, early stop, and init guards#16
Conversation
• Track errors per attempt (populate Errors.Attempts); remove shared r.err usage. • Return early when a non-temporary error is encountered (when a temporary-error list/registry is provided). • Wrap timeouts with ErrTimeoutReached and include attempt context. • Add ensureInitialized/ensureContext; default registry, logger, timer pool; support manual initialization. • Tests: cover early-stop behavior, timeouts, and manual init (e.g., TestDo_NonTemporaryStopsEarly, TestDo_ManualRetrierInitialization). • Docs: add PRD for Retrier Hardening; update README and examples (temporary error registry usage). BREAKING CHANGE: Errors.Retries renamed to Errors.Attempts. Update usages and expectations: • Use Errors.Attempts for attempt traces. • Register temporary errors via retrier.Registry.RegisterTemporaryError. • Expect timeout errors as ErrTimeoutReached and max-retries errors wrapped by ErrMaxRetriesReached.
…temp error policy - MaxRetries now defined as “retries after the first attempt”; total attempts = MaxRetries + 1. - Validation permits MaxRetries >= 0 and updates the error message accordingly. - Default behavior when temporaryErrors is omitted: - If a registry is configured, retry only registry-identified temporary errors. - If the registry is empty, treat all errors as temporary and retry. - Update retrier loop/logic, option docs, tests, README, and PRD to reflect the above. Refs: options.go, retrier.go, tests/retrier_test.go, README.md, PRD.md
There was a problem hiding this comment.
Pull request overview
This pull request hardens the go-again retrier implementation by fixing concurrency issues, improving error handling, and adding support for manually constructed Retrier instances. The PR introduces per-attempt error tracking, early stopping on non-temporary errors, and proper timeout error wrapping.
Changes:
- Removed shared mutable state (
r.errfield) to enable safe concurrent calls toDo - Added initialization guards (
ensureInitialized,ensureContext) to support manual Retrier construction withoutNewRetrier - Implemented early return when non-temporary errors are encountered with a temporary error list provided
- Wrapped timeout errors with
ErrTimeoutReachedforerrors.Iscompatibility
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| retrier.go | Core implementation changes: removed shared error state, added per-call error tracking, implemented initialization guards, improved timeout error wrapping |
| tests/retrier_test.go | Added tests for early-stop behavior, timeout error detection, and manual Retrier initialization |
| README.md | Updated API documentation to reflect Errors.Attempts field rename and corrected registry usage examples |
| PRD.md | Added Product Requirements Document describing the hardening goals and implementation approach |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PRD.md
Outdated
| arly on non-temporary errors when a list is provided. | ||
|
|
||
| arly on non-temporary errors when a list is provided. | ||
|
|
||
| - Update README snippets and API notes to match current code. |
There was a problem hiding this comment.
The "Rollout" section contains duplicated and incomplete text fragments. The section should be properly completed or removed if not needed. The text "arly on non-temporary errors when a list is provided" appears to be a copy-paste error from the "Proposed Changes" section.
| arly on non-temporary errors when a list is provided. | |
| arly on non-temporary errors when a list is provided. | |
| - Update README snippets and API notes to match current code. | |
| - Release changes as a minor version (no breaking API changes). | |
| - Coordinate with dependent services to update the `go-again` dependency and run regression tests. | |
| - Monitor retry behavior, error rates, and timeout frequency after rollout. | |
| - Be prepared to roll back to the previous version if regressions in retry or timeout handling are observed. |
• Track errors per attempt (populate Errors.Attempts); remove shared r.err usage. • Return early when a non-temporary error is encountered (when a temporary-error list/registry is provided). • Wrap timeouts with ErrTimeoutReached and include attempt context. • Add ensureInitialized/ensureContext; default registry, logger, timer pool; support manual initialization. • Tests: cover early-stop behavior, timeouts, and manual init (e.g., TestDo_NonTemporaryStopsEarly, TestDo_ManualRetrierInitialization). • Docs: add PRD for Retrier Hardening; update README and examples (temporary error registry usage).
BREAKING CHANGE: Errors.Retries renamed to Errors.Attempts. Update usages and expectations: • Use Errors.Attempts for attempt traces.
• Register temporary errors via retrier.Registry.RegisterTemporaryError. • Expect timeout errors as ErrTimeoutReached and max-retries errors wrapped by ErrMaxRetriesReached.