-
Notifications
You must be signed in to change notification settings - Fork 71
Exponential backoff for reconciler requests #822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you add unit tests and an integration test to validate that the backoff is working as expected? |
d905b3d to
04db094
Compare
04db094 to
f95c3f6
Compare
| reason: reason, | ||
| } | ||
| // Retry errors caused by Lattice APIs which need high requeueAfter seconds | ||
| func NewRetryError() *RequeueNeededAfter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the non-retriable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are retriable. For example when target group status is create in progress. But the default exponential backoff starts with 5ms which might be too aggressive for lattice apis. So that's why kept it as 10 seconds.
* exponential backoff for reconciler requests * fix typo in comments --------- Co-authored-by: vbedi <vbedi@amazon.com>
…off to prevent delay in reconciliation
What type of PR is this?
Cleanup and Fix
Which issue does this PR fix:
Fix
HandleReconcileErrorfunction to enable exponential backoff.Incase of transient errors from lattice apis keep static 10 seconds delay.
What does this PR do / Why do we need it:
According to the docs
RequeueResultif the returned error is non-nilRetryErrorbut since RetryError is just an alias for err type, all error types satisfy that if condition which leads to 20 second requeue delay for all errors. RefactoredNewRetryErrorfunction to returnRequeueNeededAftererror type instead.RetryErrwhich created err withLattice_Retryerror message withNewRetryErrorfunction call which returns a custom error type to fix type comparisons.If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:
#462
Testing done on this change:
make
presubmitandmake e2e-testrun successfully.Automation added to e2e:
N/A
Will this PR introduce any new dependencies?:
No
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No
Does this PR introduce any user-facing change?:
No
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.