Skip to content

Cmoris/SilenceLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SilenceLLM

Multimodal LLM Training (Audio + Video + Text)

<<<<<<< HEAD

Training script for a multimodal large language model that jointly learns from audio, video, and text data.
======= Multi-Modal LLM for Silence recognition

43dad61e70f22c35908deb86f27c69b577d9ea46 Built with PyTorch, Transformers, and torchrun.


🧩 Overview

This repository provides a training pipeline for SilenceQwen3, a multimodal extension of the Qwen3-1.7B model.
It integrates:

  • Audio encoder: openai/whisper-medium
  • Vision encoder: google/siglip2-base-patch16-224
  • Language backbone: Qwen3-1.7B
  • Q-Former for modality alignment (bert-large-uncased)
  • LoRA fine-tuning for efficient parameter adaptation

Installation

pip install torch torchvision torchaudio
pip install transformers accelerate peft
pip install pillow tqdm

Project

project_root/
├── train.py
├── eval.py
├── silence_trainer.py
├── data/
│   ├──collector.py
│   ├──dataset.py
│   └──mm_utils.py
├── model/
│   ├── submodels
│   ├── silence_llama.py
│   ├── silence_model.py
│   ├── silence_perceiver.py
│   └── silence_qwen.py  
├── script/
│   ├──train.sh
│   └──eval.sh

About

SilenceLLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors