Skip to content

[FLINK-39588][flink-table][connector/datagen] Migrate the DataGen table connector off deprecated source APIs#28083

Open
Dennis-Mircea wants to merge 1 commit intoapache:masterfrom
Dennis-Mircea:FLINK-39588
Open

[FLINK-39588][flink-table][connector/datagen] Migrate the DataGen table connector off deprecated source APIs#28083
Dennis-Mircea wants to merge 1 commit intoapache:masterfrom
Dennis-Mircea:FLINK-39588

Conversation

@Dennis-Mircea
Copy link
Copy Markdown

What is the purpose of the change

This pull request migrates the built-in 'connector' = 'datagen' table source (DataGenTableSource in flink-table-api-java-bridge) off the long-deprecated SourceFunctionProvider and the legacy SourceFunction-based DataGeneratorSource / RandomGenerator / SequenceGenerator hierarchy, onto the FLIP-27 source stack (Source API + SourceProvider).

These were the last remaining usages of those deprecated APIs from a built-in table connector and contributed a sizeable chunk of the frozen entries in flink-architecture-tests-production. User-visible behavior of the connector is preserved: same DDL options, same value distributions, same parallelism, same rows-per-second rate limiting, and same termination semantics for bounded sequence fields.

Brief change log

  • Replaced SourceFunctionProvider with SourceProvider.of(Source, parallelism) in DataGenTableSource.
  • Switched the underlying source to FLIP-27 org.apache.flink.connector.datagen.source.DataGeneratorSource, driven by a composite GeneratorFunction<Long, RowData> and RateLimiterStrategy.perSecond(rowsPerSecond).
  • Introduced new RandomGeneratorFunction<T> / SequenceGeneratorFunction<T> types implementing GeneratorFunction<Long, T>, replacing the legacy RandomGenerator / SequenceGenerator hierarchies. Migrated RowDataGenerator, DataGeneratorMapper, DecimalDataRandomGenerator, DataGeneratorContainer, and the random/sequence visitors accordingly.
  • Sequence checkpoint state ownership moved from per-generator ListState<Long> deques to the NumberSequenceSource driving the FLIP-27 source. The legacy "halt when any sequence range is exhausted" semantic is preserved via a source-level effective count (min(sequenceFieldsTotalCount) when number-of-rows is unset).
  • Added flink-connector-datagen as a dependency of flink-table-api-java-bridge.
  • Removed the now-stale entries from flink-architecture-tests-production/archunit-violations/.

Verifying this change

This change added tests and can be verified as follows:

  • DataGenTableSourceFactoryTest rewritten to drive the generator function directly via DataGenTableSource#buildRowGenerator() + computeEffectiveCount() (no MiniCluster needed). All 14 existing factory + per-field behavior assertions are retained.
  • New test testSequenceProducesEachValueExactlyOnce asserts every value in the requested [start, end] range is produced exactly once.
  • DecimalDataRandomGeneratorTest updated for the new map(Long) entry point.
  • Architecture tests in flink-architecture-tests-production pass with the legacy-API violation entries removed from the frozen list.
  • Existing ITCases that exercise 'connector' = 'datagen' (e.g. DataGeneratorConnectorITCase, SqlGatewayServiceITCase, AbstractMaterializedTableStatementITCase, OpenAIChatModelTest, ProcessTableFunctionTest) continue to pass unchanged.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes (flink-table-api-java-bridge now depends on flink-connector-datagen)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [GitHub Copilot]

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented May 1, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants