Skip to content

Flink: Add extensibility support to IcebergSink for downstream composition#15316

Closed
herbherbherb wants to merge 1 commit intoapache:mainfrom
herbherbherb:flink-sink-extensibility
Closed

Flink: Add extensibility support to IcebergSink for downstream composition#15316
herbherbherb wants to merge 1 commit intoapache:mainfrom
herbherbherb:flink-sink-extensibility

Conversation

@herbherbherb
Copy link
Copy Markdown

Summary

  • Add CommittableMetadata framework: a composition-based extension point allowing downstream to
    attach custom metadata to committables flowing through the sink pipeline
  • Widen access modifiers on sink classes to enable downstream connector implementations to compose
    with (reference, wrap, delegate to) Iceberg's existing sink infrastructure

Resolves #15315.

Changes

New files (3, in both v2.0 and v2.1)

  • CommittableMetadata.javaSerializable marker interface for custom metadata
  • CommittableMetadataSerializer.java — serializer interface for metadata
  • CommittableMetadataRegistry.java — global registry (downstream calls register() before
    sink creation)

Modified files (in both v2.0 and v2.1)

  • IcebergCommittable — public class, public accessors, optional @Nullable metadata field,
    backward-compatible constructor chaining
  • IcebergCommittableSerializer — writes boolean metadata flag + delegates serialization
  • IcebergCommitter — public class + constructor
  • IcebergSinkWriter — public class + constructor
  • IcebergWriteAggregator — public class + constructor
  • IcebergSinkBuilder — public interface
  • IcebergSink — protected constructor, protected getters, protected Builder()
  • CachingTableSupplier — public class + constructor
  • RowDataTaskWriterFactory — protected getters for spec/format/outputFileFactory
  • SinkUtil — public checkAndGetEqualityFieldIds()

Backward compatibility

  • All existing constructors still work (new IcebergCommittable constructor chains with
    metadata = null)
  • Serialization is backward-compatible (boolean flag distinguishes metadata presence)
  • No behavioral changes to any existing code path

Test plan

  • Existing TestIcebergCommitter passes
  • Existing TestIcebergSink* tests pass
  • Serialization backward compatibility: old serialized data (no metadata flag) still
    deserializes correctly

@bryanck
Copy link
Copy Markdown
Contributor

bryanck commented Feb 13, 2026

The general approach is first make your change to the current version (2.1), then once that is merged, cherry pick to the other supported versions (separate PRs for each version).

@herbherbherb
Copy link
Copy Markdown
Author

The general approach is first make your change to the current version (2.1), then once that is merged, cherry pick to the other supported versions (separate PRs for each version).

ack, thank you

…ition (apache#15315)

Add CommittableMetadata framework (marker interface, serializer, registry)
allowing downstream to attach custom metadata to committables. Widen access
modifiers on sink pipeline classes to enable downstream connector
implementations to compose with Iceberg's existing sink infrastructure.

All changes are additive with no behavioral changes to existing code paths.
@herbherbherb herbherbherb force-pushed the flink-sink-extensibility branch from 3e34caa to 1c53254 Compare February 13, 2026 23:10
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 16, 2026
@herbherbherb
Copy link
Copy Markdown
Author

This PR still needs a review from the community please

@pvary
Copy link
Copy Markdown
Contributor

pvary commented Mar 16, 2026

There are currently multiple changes for the IcebergSink. I don't think we would like to expose the internals of the Sink as it would bind us to these contracts which we would like to change in the near future.

If you think this still would be a valuable addition, please prepare a proposal and start a dev list discussion around this.

Thanks,
Peter

@github-actions github-actions bot removed the stale label Mar 17, 2026
@herbherbherb
Copy link
Copy Markdown
Author

There are currently multiple changes for the IcebergSink. I don't think we would like to expose the internals of the Sink as it would bind us to these contracts which we would like to change in the near future.

If you think this still would be a valuable addition, please prepare a proposal and start a dev list discussion around this.

Thanks, Peter

Thank you Peter for the feedback. I'll close this PR.

I've broken the approach into three smaller, focused PRs that add plugin hooks rather than exposing sink internals or changing access modifiers. Each adds a single @FunctionalInterface callback set via the builder and no visibility changes to existing classes:

  1. OutputFileFactoryProvider — custom OutputFileFactory creation (Flink Sink V2: Add OutputFileFactoryProvider plugin interface #15763 /
    Flink Sink V2: Add OutputFileFactoryProvider plugin interface #15764)
  2. PostCommitHook — callback after successful Iceberg commit (Flink Sink V2: Add PostCommitHook plugin interface #15768 /
    Flink Sink V2: Add PostCommitHook plugin interface #15769)
  3. CommitGate — deferred commits with ListState buffering (Flink Sink V2: Add CommitGate plugin interface for deferred commits #15770 /
    Flink Sink V2: Add CommitGate plugin interface for deferred commits #15771)

All three are backward compatible with no behavioral change when the hooks are not set. Happy to start a dev list discussion if you'd prefer that before reviewing the individual PRs, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flink: Add extensibility support to IcebergSink for downstream connector implementations

3 participants