A comprehensive simulation and deep learning framework for DIGIT tactile sensors, featuring bidirectional neural networks for physical-visual mapping and advanced perception capabilities.
The complete dataset for this project is available on Hugging Face:
- NodalDataOutput.zip: CSV files containing nodal displacement data [x, y, z] from FEM simulations
- DeltaImages_100x75.zip: RGB delta images (100Γ75 resolution) showing tactile sensor changes
- ContactMasks_100x75.zip: Binary contact masks indicating interaction regions
- Sensors: 4 different DIGIT sensors
- Objects: 20 different shapes of indenters
- Time Steps: ~100 interaction sequences per object
- Total Samples: 50000 paired CSV-image samples for training
- Test Data: YCB unseen objects and additional unseen indenter configurations with 100 steps each
# Download dataset using Hugging Face datasets library
from huggingface_hub import snapshot_download
# Download entire dataset
snapshot_download(repo_id="Ndolphin/DIGIT_simulation", repo_type="dataset", local_dir="./DIGIT_data")
# Or download specific files
from huggingface_hub import hf_hub_download
# Download nodal data
nodal_data = hf_hub_download(repo_id="Ndolphin/DIGIT_simulation", filename="datasets/NodalDataOutput.zip", repo_type="dataset")This repository contains a complete pipeline for DIGIT tactile sensor simulation and analysis, including:
- FEM Simulation: SOFA-based finite element modeling of tactile interactions
- Perception Network: Visual-to-physical displacement mapping using U-Net
- Rendering Network: Physical-to-visual delta image synthesis using UNetSmall
- Sensor Experiments: Real DIGIT sensor data collection and processing
- Input: RGB delta images (320Γ240Γ3)
- Output: Physical displacement fields [x, y, dz] (320Γ240Γ3)
- Architecture: Clean U-Net with skip connections
- Purpose: Extract physical sensor data from visual representations
- Input: Multi-channel sensor data (19Γ75Γ100)
- Displacement field (1 channel)
- Coordinate grids (2 channels)
- Fourier positional encoding (16 channels)
- Output: RGB delta images (3Γ75Γ100)
- Architecture: UNetSmall with masked loss
- Purpose: Generate realistic tactile visualizations from sensor data
- SOFA-based finite element modeling
- Contact simulation with various indenters
- Nodal displacement data generation
DIGIT_simulation/
βββ FEM_simulation_SOFAscene/ # SOFA-based FEM simulation
β βββ ContactTest_in_loop.py # Batch contact simulation
β βββ ContactTest_single.py # Single contact test
β βββ meshes/ # 3D mesh files for simulation
β βββ NodalDataOutput/ # Generated CSV nodal data
βββ Perception_Network/ # Visual-to-physical mapping
β βββ Train.py # Training script for perception
β βββ data_preprocessing.py # Data preparation utilities
β βββ Testset_inference.py # Inference on test data
β βββ perception_unet_model.pth # Trained model weights
βββ Rendering_Network/ # Physical-to-visual synthesis
β βββ Train.py # Training script for rendering
β βββ data_loader.py # Data loading utilities
β βββ inference.py # Inference script
β βββ best_model.pt # Trained model weights
βββ DIGIT_sensor_experiment/ # Real sensor experiments
β βββ DIGIT_CameraView.py # Camera interface
β βββ ImageExtraction.py # Image processing
β βββ Recorded_Videos/ # Experimental data
βββ requirements.txt # Python dependencies
- Clone the repository:
git clone https://github.com/ndolphin-github/DIGIT_simulation.git
cd DIGIT_simulation- Create a virtual environment:
python -m venv digit_env
source digit_env/bin/activate # On Windows: digit_env\Scripts\activate- Install dependencies:
pip install -r requirements.txtcd Perception_Network
python Train.py --epochs 50 --batch_size 4 --learning_rate 1e-3
python Testset_inference.py --model_path perception_unet_model.pthcd Rendering_Network
python Train.py --epochs 50 --batch_size 8 --lr 2e-4
python inference.py -m best_model.pt -u NodalDataOutput -o generated_imagescd FEM_simulation_SOFAscene
python ContactTest_single.py # Single contact simulation
python ContactTest_in_loop.py # Batch simulationThe complete dataset is available at: Hugging Face DIGIT Simulation Dataset
DIGIT_simulation_dataset/
βββ datasets/
β βββ NodalDataOutput.zip # Nodal displacement data
β β βββ D21119/ # Sensor ID
β β β βββ hammar/ # Object name
β β β β βββ topROI_step_000.csv
β β β β βββ topROI_step_001.csv
β β β β βββ ...
β β β βββ mug/
β β β βββ ...
β β βββ D21242/
β β βββ D21273/
β βββ DeltaImages_100x75.zip # Tactile visualization images
β β βββ D21119/
β β β βββ hammar/
β β β β βββ hammar_000.jpg
β β β β βββ hammar_001.jpg
β β β β βββ ...
β β β βββ ...
β β βββ ...
β βββ ContactMasks_100x75.zip # Binary contact masks
β βββ D21119/
β β βββ hammar/
β β β βββ hammar_000_mask.jpg
β β β βββ hammar_001_mask.jpg
β β β βββ ...
β β βββ ...
β βββ ...
- CSV Files: Nodal displacement data with columns [x, y, z]
- RGB Images: Delta images showing tactile sensor changes
- Contact Masks: Binary masks indicating contact regions
- X Range: [-7.5mm, +7.5mm] (sensor width)
- Y Range: [0mm, 20mm] (sensor height)
- Resolution: 100Γ75 pixels (rendering), 320Γ240 pixels (perception)
import pandas as pd
from PIL import Image
import numpy as np
# Load nodal data
df = pd.read_csv("topROI_step_000.csv")
x_coords = df["x"].values # Node X coordinates
y_coords = df["y"].values # Node Y coordinates
dz_values = df["dz"].values # Displacement values
# Load corresponding delta image
delta_img = Image.open("hammar_000.jpg")
delta_array = np.array(delta_img) # Shape: (75, 100, 3)
# Load contact mask
mask_img = Image.open("hammar_000_mask.jpg")
mask_array = np.array(mask_img) > 0 # Binary mask| Parameter | Perception Network | Rendering Network |
|---|---|---|
| Input Size | 320Γ240Γ3 | 19Γ75Γ100 |
| Output Size | 320Γ240Γ3 | 3Γ75Γ100 |
| Batch Size | 4 | 8 |
| Learning Rate | 1e-3 | 2e-4 |
| Optimizer | Adam | AdamW |
| Loss Function | MSE (dz only) | Masked L1 |
| Epochs | 50 | 50 |
Perception Network:
loss = MSE(predicted_dz, target_dz) # Focus on displacement onlyRendering Network:
loss = contact_loss + 0.1 * background_loss + 0.02 * neutrality_loss- Perception Network: ~0.35ms per image (FP32), ~0.22ms (FP16)
- Rendering Network: ~0.20ms per image (FP32), ~0.12ms (FP16)
- Throughput: 2k-10k+ images/second depending on batch size
- Perception Network: ~500K-1M parameters
- Rendering Network: ~200K parameters (lightweight design)
"Bidirectional Mapping Between Physical Contacts and Visual Tactile Images for Physics-based Simulation," T. Hong and Y.-L. Park, Proceedings of the IEEE-RAS International Conference on Humanoid Robots, 2025.
- SOFA Framework for FEM simulation capabilities
For questions and support, please open an issue on GitHub or contact [ndolphin93@gmail.com].