Skip to content

Conversation

@MatteoFasulo
Copy link

Updates

  • Added end-to-end configuration and dataset support for TinyMyo EMG pretraining and finetuning pipelines.
  • Introduced new YAML configuration files covering experiments, data modules, model setups, and task definitions.
  • Implemented a dedicated EMGDataset class with efficient HDF5-backed loading, caching, and streamlined batch access.
  • Improved dataloader throughput by enabling persistent workers, reducing process spawn overhead.
  • Added a project-level .gitignore for Python build and cache artifacts.
  • Added a pyproject.toml configured for the uv package and project manager.
  • Added MinMaxNormalization under training utilities.
  • Added a checkpoint-to-safetensors conversion script (util/ckpt_to_safetensor.py) with a CLI for exporting Lightning checkpoints to HuggingFace format.
  • Added documentation for TinyMyo with description of the model, pretraining and downstream dataset, model size and performances.
  • Updated run_train.py to enforce rank-zero single-GPU testing for reproducibility, aligning with practices from the Meta - Generic Neuromotor Interface codebase.

- Updated `run_train.py` to fetch existing env variables for data patch and checkpoint dir, support for FP32 with high precision, final test using 1 single GPU rather than DDP to ensure reproducibility of reported metrics
- Introduced `finetune_task_EMG.py` for fine-tuning EMG classification models
- Added `pretrain_task_EMG.py` for masked reconstruction training, including token masking and signal logging.
- Updated `train_utils.py` with a new `MinMaxNormalization` class for input normalization using minmax scaling.
- Improved code structure and readability across all modified files.
- persistent workers for finetuning data module to avoid spawning and destroying worker group multiple times
@MatteoFasulo MatteoFasulo changed the title [Draft] TinyMyo (EMG ) model support in BioFoundation [Draft] TinyMyo (EMG) model support in BioFoundation Dec 7, 2025
…ript

- removed input normalization in finetuning config
- updated timm imports
…training and finetuning tasks

- auto strategy for finetuning
- removed logging interval in pretraining
- update target model class in pretraining
- added persistent workers in pretrain data module
- added cache evict policy in EMG pretrain dataset
- update TinyMyo forward method with masking support for both pretraining and finetuning
- update pretrain task with a cleaner masking approach moving the tokenization logic inside the model
- update finetuning task with dummy masking generation
- Updated YAML configuration files for EMG fine-tuning and pre-training datasets to ensure consistency.
- Refined the TinyMyo model configuration, including adjustments to class definitions and parameters.
- Improved code readability by consolidating dictionary definitions and removing unnecessary line breaks.
- Enhanced the EMG dataset classes by optimizing type hints and initialization methods.
- Streamlined the TinyMyo model's forward pass and weight initialization methods for better clarity and performance.
- Fixed minor formatting issues across various files to adhere to coding standards.
@MatteoFasulo MatteoFasulo marked this pull request as ready for review December 19, 2025 14:53
Copilot AI review requested due to automatic review settings December 19, 2025 14:53
@MatteoFasulo MatteoFasulo changed the title [Draft] TinyMyo (EMG) model support in BioFoundation TinyMyo (EMG) model support in BioFoundation Dec 19, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces comprehensive support for the TinyMyo EMG foundation model in the BioFoundation codebase, enabling both pretraining and finetuning workflows for electromyography signals. The implementation follows a transformer-based architecture with rotary position embeddings and includes efficient HDF5-backed datasets with caching.

Key changes:

  • Added TinyMyo model architecture (3.6M parameters) with support for pretraining, classification, and regression tasks
  • Implemented EMG-specific datasets with HDF5 loading, caching, and channel padding capabilities
  • Enhanced dataloaders with persistent workers for improved throughput
  • Added checkpoint conversion utilities and comprehensive configuration files

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
util/train_utils.py Added MinMaxNormalization utility class for input normalization
util/ckpt_to_safetensor.py New checkpoint-to-safetensors conversion script with CLI
tasks/pretrain_task_EMG.py Pretraining task with masked reconstruction and signal logging
tasks/finetune_task_EMG.py Finetuning task with classification/regression support and metrics
models/TinyMyo.py Complete TinyMyo model architecture with rotary attention blocks
datasets/emg_pretrain_dataset.py HDF5-backed pretraining dataset with multi-file support
datasets/emg_finetune_dataset.py HDF5-backed finetuning dataset with lazy loading
data_module/pretrain_data_module.py Lightning data module with persistent workers enabled
data_module/finetune_data_module.py Lightning data module for finetuning with persistent workers
run_train.py Updated training script with rank-zero testing and process group management
pyproject.toml Project configuration for uv package manager with dependencies
config/* YAML configuration files for experiments, models, tasks, and data modules
docs/model/TinyMyo.md Comprehensive documentation of model architecture and performance
.gitignore Standard Python gitignore with Hydra outputs
Comments suppressed due to low confidence (2)

tasks/pretrain_task_EMG.py:283

  • This assignment to 'indices_array' is unnecessary as it is redefined before this value is used.
            indices_array = np.array(indices)

tasks/pretrain_task_EMG.py:284

  • This assignment to 'indices_array' is unnecessary as it is redefined before this value is used.
            indices_array = np.unique(indices)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Thoriri and others added 3 commits December 19, 2025 16:04
…and raising error when default values are unchanged

- in final test, best checkpoint after training is loaded instead of last checkpoint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants