Prevent duplicate lifecycle bucket continuation processing#2700
Prevent duplicate lifecycle bucket continuation processing#2700delthas wants to merge 1 commit intodevelopment/9.2from
Conversation
This commit adds: - Idempotence on the lifecycle bucket processor producer, ensuring at-most-once producing on that topic. Enabling idempotence is safe, as this was enabled by default in Kafka 3.x [1], and a study by Apache showed that there was no significant performance impact to this [2]. - Committing bucket entries that are being processed, before pushing continuation entries. Previously, we could possibly create continuation entries first, then fail (due to e.g. Kafka network issues), and fail/skip committing offsets. This moves committing before pushing conitnuation entries, such that in the worst case, we now stop processing the bucket rather than push a duplicate. [1]: https://cwiki.apache.org/confluence/display/KAFKA/KIP-679%3A+Producer+will+enable+the+strongest+delivery+guarantee+by+default [2]: https://cwiki.apache.org/confluence/display/KAFKA/An+analysis+of+the+impact+of+max.in.flight.requests.per.connection+and+acks+on+Producer+performance Issue: BB-723 Signed-off-by: Thomas Flament <thomas.flament@scality.com>
Hello delthas,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (60.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files
... and 6 files with indirect coverage changes
@@ Coverage Diff @@
## development/9.2 #2700 +/- ##
===================================================
+ Coverage 74.72% 74.74% +0.02%
===================================================
Files 200 200
Lines 13488 13492 +4
===================================================
+ Hits 10079 10085 +6
+ Misses 3399 3397 -2
Partials 10 10
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent duplicate lifecycle bucket continuation processing by enabling idempotence on the Kafka producer and committing bucket entries before creating continuation entries.
Changes:
- Adds idempotence configuration option to BackbeatProducer with conditional requiredAcks validation
- Enables idempotent producer for lifecycle bucket processor to prevent duplicate continuations
- Modifies processBucketEntry to commit entries before processing to avoid reprocessing and duplicate continuations
- Updates all test files to pass the new entry parameter (as null in test contexts)
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/BackbeatProducer.js | Adds idempotence configuration support with conditional requiredAcks enforcement |
| extensions/lifecycle/bucketProcessor/LifecycleBucketProcessor.js | Enables idempotent producer and passes consumer/entry to task scheduler |
| extensions/lifecycle/tasks/LifecycleTask.js | Adds entry parameter and commits before processing to prevent duplicate continuations |
| tests/unit/lifecycle/LifecycleTask.spec.js | Updates test calls to include null entry parameter |
| tests/functional/lifecycle/LifecycleTaskV2.js | Updates test calls to include null entry parameter |
| tests/functional/lifecycle/LifecycleTaskV2-versioned.js | Updates test calls to include null entry parameter |
| tests/functional/lifecycle/LifecycleTask.js | Updates test calls to include null entry parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| // Commit now to avoid re-processing this entry after pushing continuation entries below. | ||
| if (entry) { | ||
| this.consumer.onEntryCommittable(entry); |
There was a problem hiding this comment.
- committing early means we may stop (if interrupted) without having processed the entry
- Backbeat is designed to process multiple entries at the same time (typically 10-20 for lifecycle, but may be 1000s for bucket notification); calling
this.consumer.onEntryCommittable()does not actually commit (since commits must be done in order), it only marks the entry in backbeat. So many reason can still delay commit - I don't remember how it works precisely, but AFAIR there are 2 modes of using backbeatclient: either letting it handle the commits (as it used to do for lifecycle) once processing is complete, or manually call
onEntryCommittable(): so if you start to call it, other changes must be made to have something consistent
(btw, the issue we have is not about some occasional duplicated message -which is fine, processing is idempotent-, but really about massive duplication: where we end up with 10 times or more the exact same message in kafka log)
| is: true, | ||
| then: joi.valid(-1).failover(-1), | ||
| }), | ||
| idempotent: joi.boolean().default(false), |
There was a problem hiding this comment.
what does kafka idempotent producer protect against? the "client code" (e.g. backbeat) producing the same message twice, or just kafka library & network duplicating the message which was sent just once by the app (backbeat)?
|
|
As far as performance impact,
|
|
Closing for now, as the root cause is still unclear. We will investigate further by adding more logs before changing how backbeat works. |
This commit adds:
Issue: BB-723