docs and sim: improve setup and validation workflows#559
Open
dcol91863 wants to merge 8 commits intoNVIDIA:mainfrom
Open
docs and sim: improve setup and validation workflows#559dcol91863 wants to merge 8 commits intoNVIDIA:mainfrom
dcol91863 wants to merge 8 commits intoNVIDIA:mainfrom
Conversation
Adds a standalone, well-documented example demonstrating core GR00T policy usage without requiring GPU hardware. Perfect for users getting started with GR00T or integrating into custom projects. Key features: - Load pre-trained GR00T policy from HuggingFace Hub - Prepare observations (images + proprioception) - Get action predictions with error handling - CPU-only execution for accessibility - Comprehensive docstrings and step-by-step comments Includes: - inference_minimal.py: Annotated example code (158 lines) - README.md: Complete usage guide (315 lines) * Quick start with installation steps * Detailed explanation of each phase * How to extend for real robots * Common issues and troubleshooting * Performance optimization tips - requirements.txt: Core dependencies This addresses common community questions: 1. 'How do I use GR00T policy for inference?' 2. 'Can I test without a GPU?' 3. 'How do I integrate GR00T into my project?' Follows established patterns in examples/robocasa, examples/LIBERO, examples/SimplerEnv, ensuring consistency with existing codebase. Signed-off-by: David <35550068+dcol91863@users.noreply.github.com>
Signed-off-by: David <35550068+dcol91863@users.noreply.github.com>
- Add validate_dataset.py: Comprehensive validation tool for GR00T LeRobot format datasets * Validates directory structure (meta, videos, data) * Checks metadata files (modality.json, episodes.jsonl, tasks.jsonl, info.json) * Verifies modality configuration integrity * Validates parquet file structure and content * Checks video file presence and naming conventions * Calculates dataset statistics (episodes, frames, size) * Provides detailed error and warning reports - Add inspect_dataset.py: Detailed inspection and analysis tool * Provides comprehensive dataset structure overview * Analyzes metadata (episodes, tasks, info) * Inspects modality configuration in detail * Calculates data statistics (frames, size, file counts) * Infers embodiment type hints based on action/state dimensions * Supports JSON report export for documentation - Add README_DATASET_TOOLS.md: Complete documentation * Usage instructions for both tools * Feature descriptions with examples * Common workflows and troubleshooting guide * Data format reference These tools help users validate and understand their datasets before training, ensuring compliance with the GR00T LeRobot format specification.
- Add embodiment_config_reference.py: Comprehensive tool for understanding robot embodiments * List all available embodiments (pre-trained and post-training) * Show detailed configuration for specific embodiments * Display state/action dimensions and modality keys * View action configurations (RELATIVE/ABSOLUTE, EEF/NON_EEF) * Generate configuration templates for new robots * Validate custom configuration files * Summary table with dimensions for all embodiments Features: * --list: List all embodiments with configuration status * --all: Show summary table with state/action dims and video counts * --show <embodiment>: Display detailed configuration * --template <name>: Generate template for custom robot * --validate <file>: Validate configuration file syntax This tool helps users: - Understand what embodiments are available and their specs - Debug configuration issues - Create custom embodiment configurations - Compare different robots' configurations - Update README_DATASET_TOOLS.md with complete documentation * Usage examples for all commands * Sample outputs showing table format and details * Common workflows for different use cases * Integration with other dataset tools
- Add validate_training_config.py: Pre-training validation and optimization tool * Validate dataset structure and metadata completeness * Check embodiment configuration compatibility * Validate hyperparameter ranges and values * Estimate GPU memory requirements (model + optimizer + batch) * Suggest optimal hyperparameters based on dataset size * Calculate per-GPU batch sizes for distributed training * Provide GPU type recommendations Features: * Dataset validation: directory structure, metadata files, episode count * Embodiment validation: tag existence, modality config availability * Hyperparameter validation: batch size, learning rate, warmup, weight decay * Memory analysis: Model (6GB) + Optimizer (12GB AdamW) + per-sample costs * Smart suggestions for batch size, learning rate, and max steps * Per-GPU batch size calculation for multi-GPU training Helps users: - Catch configuration errors before training - Optimize batch size and learning rate for their dataset - Estimate GPU memory needs - Validate dataset before expensive training runs - Get rapid feedback on configuration parameters - Update README_DATASET_TOOLS.md with complete documentation * Full usage examples with different scenarios * Sample output showing all validation checks * GPU memory breakdown explanations * Common use cases and workflows * Integration with finetuning pipeline
Signed-off-by: David <35550068+dcol91863@users.noreply.github.com>
Signed-off-by: David <35550068+dcol91863@users.noreply.github.com>
Author
|
Added a follow-up fix for #408 on this branch. What changed:
This is intended to make the custom-embodiment path actionable for users starting from current Hugging Face LeRobot datasets. |
Signed-off-by: David <35550068+dcol91863@users.noreply.github.com>
Author
|
Added a small follow-up for #551 on this branch. The RoboCasa GR1 setup script was installing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation