Skip to content

fix(dataset edit tools): clarifying root argument usage + adding related features#3049

Merged
CarolinePascal merged 19 commits intomainfrom
fix/dataset-edit-root
Mar 3, 2026
Merged

fix(dataset edit tools): clarifying root argument usage + adding related features#3049
CarolinePascal merged 19 commits intomainfrom
fix/dataset-edit-root

Conversation

@CarolinePascal
Copy link
Copy Markdown
Collaborator

@CarolinePascal CarolinePascal commented Feb 27, 2026

Type / Scope

  • Type: Chore/Feature
  • Scope: LeRobot Dataset editing tools

Summary / Motivation

Following #3035, this PR clarifies the meaning of root in lerobot-edit-dataset :

  • root describes the complete path to the input dataset, except for merge operations, where it defines the complete path to the output dataset
  • new_root describes the complete path to the output dataset, except for split operations, where if defines the common path of all split datasets
  • [NEW] roots is introduced in merge operations to describe a list of complete path to datasets to be merged

Related issues

What changed

How was this tested (or how to run locally)

pytest tests/datasets/test_dataset_tools.py is green

Example: Check in lerobot_edit_dataset.py

Checklist (required before merge)

  • Linting/formatting run (pre-commit run -a)
  • All tests pass locally (pytest)
  • Documentation updated
  • CI is green

Reviewer notes

  • Anything the reviewer should focus on (performance, edge-cases, specific files) or general notes.
  • Anyone in the community is free to review the PR.

Copilot AI review requested due to automatic review settings February 27, 2026 16:43
@github-actions github-actions Bot added the dataset Issues regarding data inputs, processing, or datasets label Feb 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR clarifies and refactors the root argument usage in LeRobot dataset editing tools, following up on PR #3035. The main changes introduce a clearer distinction between input and output dataset paths by adding new_root and roots parameters.

Changes:

  • Clarified root as the input dataset path (except for merge operations where it's the output path)
  • Added new_root parameter to specify the complete output dataset path
  • Added roots parameter for merge operations to accept a list of input dataset paths
  • Updated docstrings across dataset_tools.py to align with the new parameter semantics
  • Changed default behavior to use in-place modification with backup instead of appending "_modified" suffix

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File Description
src/lerobot/scripts/lerobot_edit_dataset.py Added examples for new parameters, updated EditDatasetConfig with comments, refactored get_output_path to handle new_root, added roots support in merge operations, updated all handlers to use new parameter conventions
src/lerobot/datasets/dataset_tools.py Updated docstrings for consistency with new parameter naming, changed default repo_id behavior from appending "_modified" to in-place modification, added repo_id parameter to split_dataset, made output_dir optional for convert_image_to_video_dataset

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lerobot/scripts/lerobot_edit_dataset.py
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py
@CarolinePascal CarolinePascal force-pushed the fix/dataset-edit-root branch from 53aac97 to 4a105cc Compare March 2, 2026 15:59
@CarolinePascal CarolinePascal force-pushed the fix/dataset-edit-root branch from 4a105cc to a110ee0 Compare March 2, 2026 16:06
Comment thread src/lerobot/datasets/dataset_tools.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py
Comment thread src/lerobot/datasets/dataset_tools.py
Comment thread src/lerobot/datasets/dataset_tools.py
@github-actions github-actions Bot added the tests Problems with test coverage, failures, or improvements to testing label Mar 3, 2026
Comment thread src/lerobot/scripts/lerobot_edit_dataset.py Outdated
Copy link
Copy Markdown
Member

@s1lent4gnt s1lent4gnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work !
LGTM!

@CarolinePascal CarolinePascal merged commit 63dca86 into main Mar 3, 2026
15 checks passed
@CarolinePascal CarolinePascal deleted the fix/dataset-edit-root branch March 3, 2026 14:40
LePao1 pushed a commit to LePao1/lerobot that referenced this pull request Mar 7, 2026
…lated features (huggingface#3049)

* fix(root): adding proper support for the root and new_root arguments

* feat(roots): adding a roots agrument for the merge operation

* chore(clean): cleaning up code

* chore(doctrings): updating doctrings with new features

* fix(repo_id): setting repo_id to None when not needed

* fix(roots/repo_ids): making mypy happy by using repo_ids and roots for merge operation

* fix(path): fixing path related issues

* fix(repo_id): fixing issues related to repo_id

* chore(doctrings): updating docstrings + fix typo

* chore(clean): cleaning code

* fix(split new_repo_id): reverting new_repo_id addition for split operation

* docs(dosctrings): completing docstrings

* fix(repo_ids/roots): improving checks for repo_ids/roots lengths

* fix(repo_ids): making repo_ids optional in MergeConfig but raise if not given

* fix(docstrings): fixing docstrings for split operation

* fix(hints): updating get_output_path hints to accept paths as strings too

* fix(y/N prompts): removing y/N prompts in lerobot_edit_dataset

* fix(merge repo_id): fixing merge operation to use new_repo_id instead of repo_id

* fix(typo): fixing typo in doctrings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Issues regarding data inputs, processing, or datasets tests Problems with test coverage, failures, or improvements to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants