Skip to content

feat: add dataset selector operator#4364

Open
aglinxinyuan wants to merge 9 commits intomainfrom
xinyuan-dataset-selector
Open

feat: add dataset selector operator#4364
aglinxinyuan wants to merge 9 commits intomainfrom
xinyuan-dataset-selector

Conversation

@aglinxinyuan
Copy link
Copy Markdown
Contributor

@aglinxinyuan aglinxinyuan commented Apr 11, 2026

What changes were proposed in this PR?

This PR adds a new Dataset Selector operator that allows users to select a dataset from the property panel and output one tuple per filepath in that version. The emitted values follow Texera’s existing dataset file path format, so they can be consumed directly by downstream operators. On the frontend, this PR adds a dedicated dataset-version selector field in the property panel and wires datasetVersionPath to that custom UI.

image

Any related issues, documentation, discussions?

Closes #4363.

How was this PR tested?

Tested manually, and a test case was added.
The test covers the dataset selector descriptor metadata and output schema.

Was this PR authored or co-authored using generative AI tooling?

No.

Copilot AI review requested due to automatic review settings April 11, 2026 00:04
@github-actions github-actions bot added frontend Changes related to the frontend GUI common labels Apr 11, 2026
@chenlica chenlica requested review from kunwp1 and removed request for Copilot April 11, 2026 00:06
@aglinxinyuan aglinxinyuan self-assigned this Apr 11, 2026
@aglinxinyuan aglinxinyuan removed the request for review from Xiao-zhen-Liu April 11, 2026 00:09
Copilot AI review requested due to automatic review settings April 11, 2026 00:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Dataset Selector source operator that lets users pick a dataset version in the property panel and emits one tuple per file path in that version, using Texera’s dataset path format.

Changes:

  • Backend: introduce DatasetSelectorSourceOpDesc + DatasetSelectorSourceOpExec and register the operator type.
  • Frontend: add a custom Formly field (datasetversionselector) to select dataset + version and bind it to datasetVersionPath.
  • Tests/assets: add a basic descriptor/schema unit test and an operator icon.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
frontend/src/assets/operator_images/DatasetSelector.png Adds an icon for the new operator.
frontend/src/app/workspace/component/property-editor/operator-property-edit-frame/operator-property-edit-frame.component.ts Maps datasetVersionPath to a custom Formly control type.
frontend/src/app/workspace/component/dataset-version-selector/dataset-version-selector.component.ts Implements dataset+version selector logic and writes back datasetVersionPath.
frontend/src/app/workspace/component/dataset-version-selector/dataset-version-selector.component.html UI for dataset and version dropdowns.
frontend/src/app/common/formly/formly-config.ts Registers the new Formly field type datasetversionselector.
frontend/src/app/app.module.ts Declares the new selector component.
common/workflow-operator/src/test/scala/.../DatasetSelectorSourceOpDescSpec.scala Adds unit tests for descriptor metadata and output schema.
common/workflow-operator/src/main/scala/.../DatasetSelectorSourceOpExec.scala Implements tuple production by resolving dataset version and listing objects.
common/workflow-operator/src/main/scala/.../DatasetSelectorSourceOpDesc.scala Defines operator metadata + output schema (filename).
common/workflow-operator/src/main/scala/.../LogicalOp.scala Registers DatasetSelector in the operator type list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a Dataset Selector operator that outputs dataset file paths

2 participants