Skip to content

Flink: Add extensibility support to IcebergSink for downstream connector implementations #15315

@herbherbherb

Description

@herbherbherb

Feature Request / Improvement

Query Engine

Flink

Feature Request / Improvement

The Flink Sink implementation (IcebergSink and related classes) currently uses package-private visibility for most internal classes, constructors, and accessors. This makes it impractical for downstream connector implementations to compose custom sink pipelines that build on Iceberg's existing infrastructure.

Use Case / Motivation

Downstream Flink integrations often need to:

  1. Add custom metadata to committables — e.g. watermark information, lineage data, or
    application-specific tracking that flows from writers through to committers. Currently there is no
    extension point on IcebergCommittable for this.

  2. Compose custom sink topologies — e.g. wrapping IcebergSinkWriter or IcebergCommitter
    with custom metrics, logging, or error handling. Even composition (delegation/wrapping) requires
    being able to reference these types, which is impossible when they are package-private.

  3. Extend IcebergSink to override createWriter() or createCommitter() with custom
    implementations while reusing the base builder logic and data distribution.

Without these extension points, downstream projects must copy large portions of the sink code,
creating maintenance burden and version skew.

Proposed Changes

Part 1: CommittableMetadata framework (new composition-based extension point)

  • CommittableMetadata — marker interface for custom metadata on committables
  • CommittableMetadataSerializer — serializer interface for metadata
  • CommittableMetadataRegistry — global registry allowing downstream to register a serializer
  • IcebergCommittable — adds optional @Nullable metadata field (backward-compatible: existing
    constructors chain with metadata = null)
  • IcebergCommittableSerializer — writes boolean flag + delegates to registered serializer

This is a pure composition pattern — downstream registers a serializer, attaches metadata via the
existing constructors, and the pipeline carries it through transparently.

Part 2: Access modifier changes (enabling downstream composition)
Widen visibility of key sink pipeline classes (IcebergCommittable, IcebergCommitter,
IcebergSinkWriter, IcebergWriteAggregator, etc.) from package-private to public, and add
protected accessors on IcebergSink so that downstream implementations can reference, wrap, or
extend these types.

All changes are additive. No existing behavior changes. No new dependencies.

#15316

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions