-
Notifications
You must be signed in to change notification settings - Fork 4
TinyMyo (EMG) model support in BioFoundation #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Updated `run_train.py` to fetch existing env variables for data patch and checkpoint dir, support for FP32 with high precision, final test using 1 single GPU rather than DDP to ensure reproducibility of reported metrics - Introduced `finetune_task_EMG.py` for fine-tuning EMG classification models - Added `pretrain_task_EMG.py` for masked reconstruction training, including token masking and signal logging. - Updated `train_utils.py` with a new `MinMaxNormalization` class for input normalization using minmax scaling. - Improved code structure and readability across all modified files. - persistent workers for finetuning data module to avoid spawning and destroying worker group multiple times
… tasks with general model docs
…for clarity and usability
…ript - removed input normalization in finetuning config - updated timm imports
…training and finetuning tasks - auto strategy for finetuning - removed logging interval in pretraining - update target model class in pretraining - added persistent workers in pretrain data module - added cache evict policy in EMG pretrain dataset - update TinyMyo forward method with masking support for both pretraining and finetuning - update pretrain task with a cleaner masking approach moving the tokenization logic inside the model - update finetuning task with dummy masking generation
- Updated YAML configuration files for EMG fine-tuning and pre-training datasets to ensure consistency. - Refined the TinyMyo model configuration, including adjustments to class definitions and parameters. - Improved code readability by consolidating dictionary definitions and removing unnecessary line breaks. - Enhanced the EMG dataset classes by optimizing type hints and initialization methods. - Streamlined the TinyMyo model's forward pass and weight initialization methods for better clarity and performance. - Fixed minor formatting issues across various files to adhere to coding standards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces comprehensive support for the TinyMyo EMG foundation model in the BioFoundation codebase, enabling both pretraining and finetuning workflows for electromyography signals. The implementation follows a transformer-based architecture with rotary position embeddings and includes efficient HDF5-backed datasets with caching.
Key changes:
- Added TinyMyo model architecture (3.6M parameters) with support for pretraining, classification, and regression tasks
- Implemented EMG-specific datasets with HDF5 loading, caching, and channel padding capabilities
- Enhanced dataloaders with persistent workers for improved throughput
- Added checkpoint conversion utilities and comprehensive configuration files
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| util/train_utils.py | Added MinMaxNormalization utility class for input normalization |
| util/ckpt_to_safetensor.py | New checkpoint-to-safetensors conversion script with CLI |
| tasks/pretrain_task_EMG.py | Pretraining task with masked reconstruction and signal logging |
| tasks/finetune_task_EMG.py | Finetuning task with classification/regression support and metrics |
| models/TinyMyo.py | Complete TinyMyo model architecture with rotary attention blocks |
| datasets/emg_pretrain_dataset.py | HDF5-backed pretraining dataset with multi-file support |
| datasets/emg_finetune_dataset.py | HDF5-backed finetuning dataset with lazy loading |
| data_module/pretrain_data_module.py | Lightning data module with persistent workers enabled |
| data_module/finetune_data_module.py | Lightning data module for finetuning with persistent workers |
| run_train.py | Updated training script with rank-zero testing and process group management |
| pyproject.toml | Project configuration for uv package manager with dependencies |
| config/* | YAML configuration files for experiments, models, tasks, and data modules |
| docs/model/TinyMyo.md | Comprehensive documentation of model architecture and performance |
| .gitignore | Standard Python gitignore with Hydra outputs |
Comments suppressed due to low confidence (2)
tasks/pretrain_task_EMG.py:283
- This assignment to 'indices_array' is unnecessary as it is redefined before this value is used.
indices_array = np.array(indices)
tasks/pretrain_task_EMG.py:284
- This assignment to 'indices_array' is unnecessary as it is redefined before this value is used.
indices_array = np.unique(indices)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…and raising error when default values are unchanged - in final test, best checkpoint after training is loaded instead of last checkpoint
Updates
EMGDatasetclass with efficient HDF5-backed loading, caching, and streamlined batch access..gitignorefor Python build and cache artifacts.pyproject.tomlconfigured for theuvpackage and project manager.MinMaxNormalizationunder training utilities.util/ckpt_to_safetensor.py) with a CLI for exporting Lightning checkpoints to HuggingFace format.run_train.pyto enforce rank-zero single-GPU testing for reproducibility, aligning with practices from the Meta - Generic Neuromotor Interface codebase.