δΈζζζ‘£: README_CN.md | English Documentation: README.md
This project adapts UC Berkeley RAIL Lab's HIL-SERL framework and innovatively integrates NVIDIA Isaac Sim as a "pre-validation platform" for policy feasibility. By validating policies inside a simulator, we greatly reduce the trial-and-error risk of Real-World Reinforcement Learning (Real-World RL) on physical robots, enabling efficient RL training deployed directly on real hardware β achieving seamless transfer from virtual verification to real-world execution.
The IsaacSim-Hil-Serl repository has the following structure:
| Directory | Description |
|---|---|
dependencies |
Local dependency libraries required by the project |
examples |
Scripts for workspace calibration, demonstration data collection, reward classifier training, and policy training |
robot_infra |
Core infrastructure code that supports both simulation and real-robot operation |
robot_infra.gym_env |
Gym-style environment definitions for robot tasks |
robot_infra.isaacsim_venvs |
IsaacSim-based robot simulation environment configuration and initialization modules |
robot_infra.robot_servers |
Flask-based server implementation for ROS2 <-> robot interaction |
serl_launcher |
Core runtime logic and shared utilities that connect modules |
serl_launcher.agents |
Implementations of RL agent policies |
serl_launcher.common |
Shared foundational modules used across the framework |
serl_launcher.data |
Experience replay buffers and data storage management |
serl_launcher.networks |
Neural network layers and architectures used in training |
serl_launcher.utils |
Utility scripts and helper functions |
serl_launcher.vision |
Vision models and related helper functions |
serl_launcher.wrappers |
Gym environment wrappers and adapters |
- Ubuntu 22.04
- CUDA 12.8
- Python 3.11
git clone https://github.com/Incalos/IsaacSim-Hil-Serl
cd IsaacSim-Hil-Serl
# Install system dependencies
sudo apt update && sudo apt install -y \
xvfb \
x11-utils \
cmake \
build-essential \
coinor-libipopt-dev \
gfortran \
liblapack-dev \
pkg-config \
swig \
git \
python3 \
python3-pip \
git-lfs \
foxglove-studio \
--install-recommends
# Install uv and create basic Python venv
pip install uv --user
uv venv --python=3.11
source .venv/bin/activate
uv pip install ml_collections
# Install PyTorch
uv pip install -U torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
# Install IsaacSim 5.1
uv pip install "isaacsim[all,extscache]==5.1.0" --extra-index-url https://pypi.nvidia.com
# Install IsaacLab under dependencies/
cd dependencies/ && \
git clone https://github.com/isaac-sim/IsaacLab.git && \
cd IsaacLab/ && \
./isaaclab.sh --install
# Install LeIsaac under dependencies/
cd dependencies/ && \
git clone --branch v0.3.0 --depth 1 https://github.com/LightwheelAI/leisaac.git && \
cd leisaac/ && \
uv pip install -e source/leisaac
# Install LeRobot under dependencies/
cd dependencies/ && \
git clone --branch v0.4.3 --depth 1 https://github.com/huggingface/lerobot.git && \
cd lerobot && \
uv pip install -e .
# Install cuRobo under dependencies/
cd dependencies/ && \
git clone --branch v0.7.7 --depth 1 https://github.com/NVlabs/curobo.git && \
cd curobo && \
uv pip install -e . --no-build-isolation
# Install agentlace under dependencies/
cd dependencies/ && \
git clone https://github.com/youliangtan/agentlace.git && \
cd agentlace/ && \
uv pip install -e .This chapter will systematically elaborate on the complete configuration scheme and training implementation paradigm for the Real-World Reinforcement Learning (RL) of the SO101 robotic arm executing the SO101-Grasp-Orange task within the IsaacSim simulation environment.
It should be specifically noted that the Real-World RL scheme adopted in this chapter relies on simulation for implementation: by leveraging IsaacSim to achieve a high-fidelity replication of the real robot's physical scenario, the iterative training and effectiveness validation of intelligent policies are completed without directly manipulating the physical hardware.
Prior to initiating the Real-World RL training process for the SO101 robotic arm, please first download the USD assets and complete the deployment and configuration of the simulation scene.
Extract the downloaded archive and place all the extracted asset files into the robot_infra/isaacsim_venvs/tasks/scenes directory to complete the path deployment for the simulation resources.
The scenes folder must look like this:
tasks/
βββ robots/
βββ scenes/
βββ kitchen_with_orange/
βββ scene.usd
βββ assets/
βββ objects/Enter robot_infra/isaacsim_venvs/tasks/scenes/kitchen_with_orange/objects and remove the redundant assets Orange002, Orange003, and Plate from that folder.
IsaacSim acts as a high-fidelity digital-twin environment that provides low-latency control and accurate physics for the SO101 robot arm.
For SO101 this project supports both cartesian pose control and joint position control. To improve policy robustness, domain randomization is integrated in the environment. Press R during simulation to quickly reset the environment.
During simulation, key physical states β joint torques, end-effector poses, and camera image streams β are published over ROS2 to ensure observations align with realistic physics. We recommend using Foxglove Studio for visualization and debugging to monitor ROS2 topics and send control commands.
cd examples/experiments/so101_grasp_orange
bash ./1_start_isaacsim_venv.sh
# In a new terminal run:
bash ./2_foxglove_inspect_data.shAfter Foxglove Studio starts, you can import examples/experiments/so101_grasp_orange/foxglove_layout.json to load a preset visualization layout.
In addition to Foxglove Studio, we provide a Flask server that bridges ROS2 for monitoring and interaction. Steps to use the Flask server:
- Build the Flask server
Enter the ROS2 workspace and build the Flask server code:
cd robot_infra/robot_servers
colcon build- Start simulation and the server
Start IsaacSim first, then start the Flask server node in order:
cd examples/experiments/so101_grasp_orange
bash ./1_start_isaacsim_venv.sh
# In a new terminal:
bash ./3_start_robot_server.sh- Monitor and interact
Open a new terminal and run the commands below to poll topics or send commands similarly to Foxglove Studio.
while true; do curl -X POST http://127.0.0.1:5000/get_joint_positions; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_joint_efforts; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_joint_forces; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_joint_torques; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_eef_poses_quat; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_eef_poses_euler; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_eef_forces; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_eef_torques; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_eef_velocities; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_eef_jacobians; echo; done
while true; do curl -X POST http://127.0.0.1:5000/get_state; echo; done
# Reset robot to initial pose and reset IsaacSim
curl -X POST http://127.0.0.1:5000/reset_robot
# Reset IsaacSim environment
curl -X POST http://127.0.0.1:5000/reset_isaacsim
# Publish joints as positions
curl -X POST http://127.0.0.1:5000/move_joints -H "Content-Type: application/json" -d '{"joint_pose":[0.5,0.1,-0.4,0.2,1.2,0.7]}'
# Publish EEF using position + RPY
curl -X POST http://127.0.0.1:5000/move_eef -H "Content-Type: application/json" -d '{"eef_pose":[0.27138811349868774,-0.0001829345856094733,0.21648338437080383,0.7695847901163139,0.030466061901383457,-1.6022399150116016], "gripper_state":0.5}'
# Publish EEF using position + quaternion (x,y,z,w)
curl -X POST http://127.0.0.1:5000/move_eef -H "Content-Type: application/json" -d '{"eef_pose":[0.2988014817237854,-0.05197674408555031,0.18513618409633636,-0.6495405435562134,-0.5627134442329407,0.3135982155799866,0.4038655161857605], "gripper_state":1.0}'If you plan to use an Xbox controller for teleoperation, first connect the controller and obtain its unique GUID to avoid interference from other devices. Put the GUID into examples/experiments/so101_grasp_orange/exp_params.yaml. Keep the device connected and run the following command to list joystick devices and GUIDs:
python3 -c "import os; os.environ['PYGAME_HIDE_SUPPORT_PROMPT']='1'; import pygame; pygame.init(); pygame.joystick.init(); [print(f'\nIndex: {i}\nName: {j.get_name()}\nGUID: {j.get_guid()}\n' + '-'*20) or j.init() for i in range(pygame.joystick.get_count()) for j in [pygame.joystick.Joystick(i)]]; pygame.quit()"The mapping relationship between Xbox controller buttons and robotic arm operations is shown in the table below. The button diagram is as follows:
| Control Button | Description |
|---|---|
| Move Left Joystick Forward/Backward | Translate the end effector of the robotic arm forward and backward |
| Move Left Joystick Left/Right | Control the shoulder_pan joint to swing left and right |
| Move Right Joystick Forward/Backward | Control the wrist_flex joint to pitch the end effector up and down |
| Move Right Joystick Left/Right | Control the wrist_roll joint to rotate the end effector |
| Press LB Button | Control the end effector of the robotic arm to translate upward |
| Press LT Button | Control the end effector of the robotic arm to translate downward |
| Press RB Button | Control the grasp joint to open the gripper |
| Press RT Button | Control the grasp joint to close the gripper |
To avoid collisions and other safety risks during the exploration phase of RL, you must define the robot's workspace limits carefully before training.
These workspace parameters are monitored and adjusted during training and automatically saved to the task-specific config file: examples/experiments/so101_grasp_orange/exp_params.yaml.
cd examples/experiments/so101_grasp_orange
bash ./1_start_isaacsim_venv.sh
# In a new terminal
bash ./3_start_robot_server.sh
# In a new terminal
bash ./4_check_robot_workspace.shOperating Instructions:
-
Robotic arm control: Refer to Teleoperate the SO101 Arm.
-
Workspace definition: Determine the reasonable motion range of the robotic arm according to specific task requirements. Before formal training begins, be sure to fully verify that the robotic arm can avoid collisions at all limit positions and attitudes within the task space, so as to ensure the safety of the workspace.
Use the Xbox controller to teleoperate the robot and collect keyframes from recorded videos for manual annotation. These labeled samples are used to train a reward classifier. Collected samples are stored in examples/experiments/so101_grasp_orange/classifier_data/.
cd examples/experiments/so101_grasp_orange
bash ./1_start_isaacsim_venv.sh
# In a new terminal
bash ./3_start_robot_server.sh
# In a new terminal
bash ./5_record_classifier_data.shOperating Instructions:
-
Robotic arm control: Refer to Teleoperate the SO101 Arm.
-
Annotation guidelines:
-
Start recording: press
bto begin recording the current episode. -
Manually mark success: press
spaceto mark the current attempt as a "success"; the episode will end and the robot + IsaacSim will reset to the initial state. -
Auto terminate and reset: if an episode exceeds the configured max steps, the attempt will auto-terminate and reset the robot and IsaacSim state.
-
-
Reset IsaacSim: press
r(useful if the robot is stuck or the task fails).
After collecting data, train the reward classifier. The trained weights are saved to examples/experiments/so101_grasp_orange/classifier_ckpt/.
# In a new terminal run:
bash ./6_train_reward_classifier.shBefore training policies, collect a set of high-quality demonstration trajectories using the trained reward classifier as a guide. Demonstrations are still collected via the Xbox controller.
cd examples/experiments/so101_grasp_orange
bash ./1_start_isaacsim_venv.sh
# In a new terminal
bash ./3_start_robot_server.sh
# In a new terminal
bash ./7_record_demos.shOperating Instructions:
-
Robotic arm control: Refer to Teleoperate the SO101 Arm.
-
Demo collection:
-
Press
bto start recording an episode; the reward classifier will judge success and automatically end and restart recording for successful episodes. -
Episodes that exceed max steps will auto-terminate and reset.
-
Demonstrations are saved to examples/experiments/so101_grasp_orange/demo_data/. To validate collected demos, replay them:
# In a new terminal run:
bash ./8_replay_demos.shFollowing the HIL-SERL training paradigm, the early stages require intensive human intervention: operators teleoperate the robot to guide it through many successful trials. High-frequency human guidance helps the policy learn quickly and adapt.
cd examples/experiments/so101_grasp_orange
bash ./1_start_isaacsim_venv.sh
# In a new terminal
bash ./3_start_robot_server.sh
# In a new terminal
bash ./run_learner.sh
# In a new terminal
bash ./run_actor.shOperating Instructions:
- Robotic arm control: Refer to Teleoperate the SO101 Arm.
After training, load the learned policy and evaluate it inside IsaacSim to measure task performance, motion stability, and generalization under high-fidelity physical conditions. This provides strong evidence for successful sim-to-real transfer.
cd examples/experiments/so101_grasp_orange
bash ./1_start_isaacsim_venv.sh
# In a new terminal
bash ./3_start_robot_server.sh
# In a new terminal
bash ./9_val_actor.sh

