Skip to content

Egocentric Factory Episodes for Robot Foundation Model Pretraining

Notifications You must be signed in to change notification settings

msunbot/ego2robot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Ego2Robot: Egocentric Factory Episodes for Robot Foundation Models

Ego2Robot πŸ€–

Transform egocentric factory video into robot-ready training data

License Python 3.8+ Dataset

Ego2Robot is an open-source pipeline that converts egocentric human demonstrations into LeRobot-compatible datasets for robot foundation model training.

✨ Features

  • 🏭 Real manufacturing data from 10,000 hours of factory work
  • πŸ” Intelligent curation with motion + hand visibility filtering
  • 🧠 Unsupervised skill discovery via VideoMAE embeddings + clustering
  • πŸ€– LeRobot v3 format with observations + pseudo-actions
  • πŸ“Š Rich annotations including zero-shot labels and quality scores
  • πŸš€ Reusable pipeline for any egocentric video dataset

🎯 Quick Start

Installation

git clone https://github.com/msunbot/ego2robot.git
cd ego2robot
pip install -r requirements.txt

Usage

from ego2robot.data.sampler import EgocentricSampler
from ego2robot.data.clips import ClipExtractor

# Load and process video
sampler = EgocentricSampler(config)
extractor = ClipExtractor(config)

for video in sampler.filter_videos():
    clips = extractor.extract_clips(video['video_bytes'], video['metadata'])
    # Process clips...

Load Pre-built Dataset

from datasets import load_dataset

ds = load_dataset("msunbot1/ego2robot-factory-episodes")

for episode in ds:
    images = episode['observation.images.top']
    actions = episode['action']
    # Your code here

πŸ“Š Dataset

50 curated episodes of factory manipulation tasks:

  • Quality Inspection: 50% (25 episodes)
  • Assembly: 17% (9 episodes)
  • Fastening: 17% (8 episodes)
  • Machine Operation: 8% (4 episodes)
  • Mixed: 8% (4 episodes)

Format: LeRobot v3 with:

  • Observations: RGB (360x640@6fps) + hand bounding boxes
  • Actions: 2D hand motion vectors (pseudo-actions)
  • Metadata: Skill clusters, quality scores, zero-shot labels

β†’ View Dataset on Hugging Face

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Egocentric-10K (10,000 hours)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Quality Filtering    β”‚
         β”‚  - Motion scoring     β”‚
         β”‚  - Hand detection     β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ Feature Extraction    β”‚
         β”‚  - VideoMAE (768-dim) β”‚
         β”‚  - CLIP labels        β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Skill Clustering     β”‚
         β”‚  - K-means (k=10)     β”‚
         β”‚  - t-SNE viz          β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚   LeRobot Export      β”‚
         β”‚  - Hand tracking      β”‚
         β”‚  - Pseudo-actions     β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
      50 Robot-Ready Episodes

πŸ“ Project Structure

ego2robot/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ sampler.py          # Stream videos from HF
β”‚   β”œβ”€β”€ clips.py            # Extract 6s clips
β”‚   β”œβ”€β”€ quality.py          # Motion + hand filtering
β”‚   └── storage.py          # Save curated clips
β”œβ”€β”€ vision/
β”‚   β”œβ”€β”€ motion.py           # Motion scoring
β”‚   β”œβ”€β”€ hands.py            # Hand detection
β”‚   β”œβ”€β”€ videomae.py         # Video embeddings
β”‚   β”œβ”€β”€ clip_text.py        # Zero-shot labeling
β”‚   └── hand_tracker.py     # Trajectory extraction
β”œβ”€β”€ skills/
β”‚   └── cluster.py          # K-means clustering
β”œβ”€β”€ export/
β”‚   └── lerobot_builder.py  # LeRobot format
└── examples/
    β”œβ”€β”€ day5_build_dataset.py        # Full pipeline
    β”œβ”€β”€ day12_build_lerobot_dataset.py
    └── day17_training_demo.py       # Validation

πŸš€ Pipeline Steps

1. Curate Clips (Week 1)

python examples/day5_build_dataset.py

Outputs: 50-100 high-quality clips in data/ego2robot_dataset/

2. Extract Features (Week 2)

python examples/day9_extract_all_embeddings.py
python examples/day10_add_all_labels.py
python examples/day11_cluster_skills.py

Outputs: Embeddings, labels, and cluster IDs

3. Export to LeRobot (Week 3)

python examples/day12_build_lerobot_dataset.py

Outputs: 50 episodes in data/lerobot_dataset/

4. Upload to HF Hub

python examples/day14_upload_to_hf.py

πŸ“ˆ Results

Quality Metrics

  • Motion score: 0.168 avg (>0.15 threshold)
  • Hand visibility: 0.421 avg (>0.30 threshold)
  • Cluster separation: Clear in t-SNE visualization
  • Training demo: Converged MSE loss

Discovered Skills

10 fine-grained clusters mapping to 5 high-level actions:

  1. Quality Inspection (6 variants) - 30 clips
  2. Assembly (2 variants) - 10 clips
  3. Fastening - 10 clips
  4. Machine Operation - 5 clips
  5. Mixed - 5 clips

β†’ View t-SNE Visualization

πŸŽ“ Use Cases

For Researchers

  • VLA pretraining: Diverse visual data for models like Ο€β‚€
  • Representation learning: Learn manipulation primitives
  • Skill discovery: Study unsupervised clustering approaches
  • Domain adaptation: Manufacturing β†’ other domains

For Companies

  • Custom datasets: Process your factory video
  • Robot training: Fine-tune policies on domain-specific data
  • Quality control: Automated task recognition

🀝 Contributing

We welcome contributions! Areas of interest:

  • Additional domains (warehouses, kitchens, etc.)
  • Depth estimation integration
  • Improved action generation (3D trajectories)
  • Evaluation benchmarks
  • Documentation improvements

See CONTRIBUTING.md for guidelines.

πŸ“ Citation

If you use this dataset or code, please cite:

@software{ego2robot2025,
  author = {Michelle Sun},
  title = {Ego2Robot: Egocentric Factory Episodes for Robot Learning},
  year = {2025},
  url = {https://github.com/msunbot/ego2robot}
}

πŸ“„ License

  • Code: MIT License
  • Dataset: Apache 2.0 (inherits from Egocentric-10K)

πŸ™ Acknowledgments

πŸ“¬ Contact

Michelle Sun

Interested in:

  • Collaborations on Physical AI data & ecosystem
  • Advisory & angel investing opportunities in robotics/AI

πŸ”— Links


Built with ❀️ for the robotics community

About

Egocentric Factory Episodes for Robot Foundation Model Pretraining

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published