fix(dataset edit tools): clarifying root argument usage + adding related features#3049
Merged
CarolinePascal merged 19 commits intomainfrom Mar 3, 2026
Merged
fix(dataset edit tools): clarifying root argument usage + adding related features#3049CarolinePascal merged 19 commits intomainfrom
root argument usage + adding related features#3049CarolinePascal merged 19 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR clarifies and refactors the root argument usage in LeRobot dataset editing tools, following up on PR #3035. The main changes introduce a clearer distinction between input and output dataset paths by adding new_root and roots parameters.
Changes:
- Clarified
rootas the input dataset path (except for merge operations where it's the output path) - Added
new_rootparameter to specify the complete output dataset path - Added
rootsparameter for merge operations to accept a list of input dataset paths - Updated docstrings across dataset_tools.py to align with the new parameter semantics
- Changed default behavior to use in-place modification with backup instead of appending "_modified" suffix
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| src/lerobot/scripts/lerobot_edit_dataset.py | Added examples for new parameters, updated EditDatasetConfig with comments, refactored get_output_path to handle new_root, added roots support in merge operations, updated all handlers to use new parameter conventions |
| src/lerobot/datasets/dataset_tools.py | Updated docstrings for consistency with new parameter naming, changed default repo_id behavior from appending "_modified" to in-place modification, added repo_id parameter to split_dataset, made output_dir optional for convert_image_to_video_dataset |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…r merge operation
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
53aac97 to
4a105cc
Compare
4a105cc to
a110ee0
Compare
s1lent4gnt
reviewed
Mar 3, 2026
s1lent4gnt
reviewed
Mar 3, 2026
s1lent4gnt
reviewed
Mar 3, 2026
s1lent4gnt
reviewed
Mar 3, 2026
s1lent4gnt
reviewed
Mar 3, 2026
s1lent4gnt
reviewed
Mar 3, 2026
s1lent4gnt
reviewed
Mar 3, 2026
This was referenced Mar 3, 2026
s1lent4gnt
reviewed
Mar 3, 2026
LePao1
pushed a commit
to LePao1/lerobot
that referenced
this pull request
Mar 7, 2026
…lated features (huggingface#3049) * fix(root): adding proper support for the root and new_root arguments * feat(roots): adding a roots agrument for the merge operation * chore(clean): cleaning up code * chore(doctrings): updating doctrings with new features * fix(repo_id): setting repo_id to None when not needed * fix(roots/repo_ids): making mypy happy by using repo_ids and roots for merge operation * fix(path): fixing path related issues * fix(repo_id): fixing issues related to repo_id * chore(doctrings): updating docstrings + fix typo * chore(clean): cleaning code * fix(split new_repo_id): reverting new_repo_id addition for split operation * docs(dosctrings): completing docstrings * fix(repo_ids/roots): improving checks for repo_ids/roots lengths * fix(repo_ids): making repo_ids optional in MergeConfig but raise if not given * fix(docstrings): fixing docstrings for split operation * fix(hints): updating get_output_path hints to accept paths as strings too * fix(y/N prompts): removing y/N prompts in lerobot_edit_dataset * fix(merge repo_id): fixing merge operation to use new_repo_id instead of repo_id * fix(typo): fixing typo in doctrings
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Type / Scope
Summary / Motivation
Following #3035, this PR clarifies the meaning of
rootinlerobot-edit-dataset:rootdescribes the complete path to the input dataset, except for merge operations, where it defines the complete path to the output datasetnew_rootdescribes the complete path to the output dataset, except for split operations, where if defines the common path of all split datasetsrootsis introduced in merge operations to describe a list of complete path to datasets to be mergedRelated issues
root/repo_idpath inlerobot_dataset.py#1057, Deleting episodes bylerobot_edit_datasetfails whenrootis used #2316rootargument description in LeRobotDataset class #3035What changed
How was this tested (or how to run locally)
pytest tests/datasets/test_dataset_tools.pyis greenExample: Check in
lerobot_edit_dataset.pyChecklist (required before merge)
pre-commit run -a)pytest)Reviewer notes