High-performance audio transcription for Apple Silicon Macs
Transform your audio files into text with state-of-the-art AI, optimized for Apple's M1, M2, and M3 chips.
For fresh macOS systems, use our automated installer:
# Run the installer
bash install.shThis will handle everything automatically:
- β System requirements check
- β Install dependencies (Homebrew, Python, MLX)
- β Set up virtual environment with exact package versions
- β Create command-line tool and desktop app
- β Download AI models
Installation time: 10-20 minutes
Audio2Text-Distribution/
βββ install.sh # Automated installer script
βββ README.md # This file
βββ docs/ # Detailed documentation
β βββ INSTALLATION.md # Step-by-step installation guide
βββ resources/ # Application files
βββ transcribe_standalone.py # Main transcription script
βββ npz_loading_fix.py # NPZ compatibility fix
βββ config/ # Configuration templates
βββ docs/ # User documentation
βββ test/ # Installation tests
- Apple Silicon Optimized - Uses MLX framework for native M1/M2/M3 acceleration
- Multiple AI Models - MLX Whisper (primary) with WhisperX fallback
- Speaker Diarization - Identify and separate different speakers
- Multi-language Support - 99+ languages with auto-detection
- Flexible Output - Text, JSON, SRT subtitle formats
- Audio Format Support - WAV, MP3, M4A, FLAC, MP4, and more
- Dual Interface - Command-line tool and drag-drop desktop app
| Requirement | Details |
|---|---|
| macOS | 11.0+ (Big Sur or later) |
| Hardware | Apple Silicon Mac (M1/M2/M3) |
| Storage | 8GB free space |
| Internet | Required for setup and model downloads |
For production use on fresh or existing macOS systems:
./install.shFeatures:
- β Zero configuration required
- β Installs all dependencies correctly
- β Creates proper directory structure
- β Works on fresh macOS systems
- β Easy to uninstall
- β±οΈ Takes 10-20 minutes
- πΎ Downloads ~2-4GB of dependencies
For safe testing on development systems with existing Python/ML setups:
./install-test.shSafe Testing Features:
- π Completely isolated installation
- π« No conflicts with existing Python packages
- π« No system PATH modifications
- π Installs to
~/Applications/Audio2Text-Test/ - ποΈ Easy removal:
rm -rf ~/Applications/Audio2Text-Test/ - β±οΈ Faster setup with minimal dependencies
# Basic transcription
audio2text recording.wav
# German audio with speaker identification
audio2text --language de --speakers interview.mp3
# Output as SRT subtitles
audio2text --format srt --output-dir ~/Desktop video.m4a- Double-click Audio2Text.app in Applications
- Drag audio files onto the app window
- Transcriptions save to
~/Applications/Audio2Text/output/
To download AI models, you need a free HuggingFace token:
- Create account at https://huggingface.co/join
- Get token at https://huggingface.co/settings/tokens
- Configure it:
# Edit config file
nano ~/Applications/Audio2Text/config/env
# Add your token
HF_TOKEN=your_token_here# Test system compatibility
python ~/Applications/Audio2Text/test/test_installation.py
# Check logs
tail -f ~/Applications/Audio2Text/logs/audio2text_*.log
# Test core functionality
audio2text --help"No transcription engines available"
- Ensure you're on Apple Silicon Mac
- Check that MLX is properly installed
- Verify macOS version is 11.0+
"Failed to download models"
- Configure HuggingFace token
- Check internet connection
- Ensure sufficient disk space
"Command not found: audio2text"
- Restart terminal to pick up PATH changes
- Or run directly:
~/Applications/Audio2Text/bin/audio2text
- π Installation Guide - Detailed setup instructions
- π User Manual - Complete usage guide
- π§ Troubleshooting - Common issues and solutions
| Model Size | Speed | Accuracy | Memory |
|---|---|---|---|
tiny |
10x realtime | Good | 1GB |
base |
8x realtime | Better | 1GB |
small |
6x realtime | Very Good | 2GB |
medium |
4x realtime | Excellent | 3GB |
large-v3 |
3x realtime | Best | 4GB |
1 hour audio β 3-6 minutes processing time
# Run the uninstaller
~/Applications/Audio2Text/uninstall.sh
# Or manual cleanup
rm -rf ~/Applications/Audio2Text/
rm -rf ~/Applications/Audio2Text.appAudio2Text is released under the MIT License. See LICENSE for full details.
This software uses several open-source components:
- MLX Whisper (Apache 2.0) - Apple's efficient Whisper implementation
- WhisperX (BSD-4-Clause) - Enhanced Whisper with alignment and diarization
- PyTorch (BSD 3-Clause) - Machine learning framework
- Transformers (Apache 2.0) - Hugging Face transformer models
- pyannote.audio (MIT) - Speaker diarization toolkit
- librosa (ISC) - Audio analysis library
- OpenAI Whisper Models: MIT License, free for commercial use
- Pyannote Diarization Models: May require Hugging Face agreement
- Other Hugging Face Models: Individual licensing terms apply
Users are responsible for ensuring compliance with all model licenses for their intended use case.
The Audio2Text software itself is free for commercial use under MIT License. However:
- β Whisper models are MIT licensed (commercial OK)
β οΈ Some speaker diarization models may have restrictionsβ οΈ Verify individual model licenses before commercial deployment
For commercial applications, we recommend:
- Reviewing all model licenses on Hugging Face Hub
- Using only commercially-licensed models
- Consulting legal counsel for compliance
- Installation Issues: Check INSTALLATION.md
- Usage Questions: See User Manual
- Bug Reports: Include logs from
~/Applications/Audio2Text/logs/ - License Questions: See LICENSE file
ποΈ Ready to start transcribing!
Run the automated installer (./install.sh) for the best experience on Apple Silicon Macs.