Durable emitter - error handling#2061
Conversation
|
👋 tarcisiozf, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
|
There was a problem hiding this comment.
Pull request overview
This PR attempts to add partial PublishBatch error handling for the durable emitter by enabling chipingress partial-success responses and recording failed event batches in the durable store.
Changes:
- Adds
MarkFailedBatchto the durable event store interface and metrics wrapper. - Updates typed batch publishing to request partial success and process per-event errors.
- Points the root module at the local
pkg/chipingressmodule.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/beholder/durable_event_store.go |
Adds failure-recording API support and metrics instrumentation. |
pkg/beholder/durable_emitter.go |
Enables typed PublishBatch partial-success handling and records partial failures. |
pkg/beholder/durable_emitter_test.go |
Adds a no-op MarkFailedBatch implementation to the in-memory test store. |
go.mod |
Adds a local replace for the chipingress module. |
go.sum |
Removes chipingress module checksums now hidden by the local replace. |
Comments suppressed due to low confidence (2)
pkg/beholder/durable_emitter.go:804
- After recording per-event failures, this path still returns nil, so
flushBatchtreats the whole PublishBatch as successful and callsMarkDeliveredBatchfor every ID in the batch. Any events present inresponse.Resultswith errors will be marked delivered/deleted instead of remaining pending for retransmit, which defeats the failure tracking added here.
if len(groupByError) > 0 {
if err := d.reportPartialFailures(ctx, groupByError); err != nil {
return fmt.Errorf("publish errors reporting failed: %w", err)
}
}
return nil
pkg/beholder/durable_emitter.go:795
- If the server returns an errored result for an event ID that is not in the batch, continuing here lets
flushBatchmark the entire batch delivered even though the response could not be reconciled to stored rows. Treat an unknown erroredevent_idas a publish failure for the batch so the original rows remain pending instead of risking data loss.
id, exists := mapEventIDtoTableId[result.EventId]
if !exists {
d.log.Warnw("PublishBatch returned invalid event", "event_id", result.EventId)
continue
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Requires
Supports