Real-time hand gesture recognition using state-of-the-art vision transformers.
- Real-time gesture recognition through webcam integration
- 19 different hand gestures recognized with high accuracy
- Efficient model architecture using FastViT for low-latency predictions
- Simple deployment through Google Colab (no local setup required)
This project uses the FastViT architecture, a hybrid vision transformer developed by Apple that offers an excellent balance between accuracy and computational efficiency:
- Backbone:
fastvit_t8.apple_in1kpretrained model - Training approach: Transfer learning with frozen backbone
- Input size: 256ร256 px
- Classes: 19 hand gestures
FastViT was chosen for its efficiency advantages over other models like ConvNeXT, providing a fresh approach while maintaining high accuracy in a resource-constrained environment.
The model was trained on the Hand Gesture Recognition Image Dataset (HaGRID) 150k subset:
- 19 gesture classes including common gestures like "thumbs up", "peace sign", and "stop"
- Used the more manageable 150k version as the full dataset is too large for training in Colab
- Properly split between training and validation sets
- Open the training notebook
- Run all cells to train the model or load pretrained weights
- Follow instructions for webcam integration
- Open the inference notebook
- Upload the pretrained model file (
sign_lang_model.pkl) - Run the webcam inference cell to start real-time detection
- 97.5% accuracy on the validation set
- Robust performance across different lighting conditions
- Real-time inference capability (>30 FPS on modern hardware)
- Expanded gesture vocabulary: Scale to cover the entire sign language alphabet and common phrases
- Improved deployment: Create a standalone application for integration with video conferencing platforms
- Sequence modeling: Incorporate temporal information for dynamic gesture recognition
- Model optimization: Further quantization and pruning for edge device deployment
- HaGRID Dataset
- FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
- fastai Library
โญ If you find this project useful, please consider giving it a star!