Skip to content

Audio2Text - MLX Whisper Transcription for Apple Silicon - Native macOS app for fast, offline audio transcription using MLX Whisper on Apple Silicon. Supports German/multilingual transcription, speaker diarization, drag-and-drop interface. Optimized for M1/M2/M3 Macs with Metal GPU acceleration. Easy installer included.

License

Notifications You must be signed in to change notification settings

frischeDaten/audio2text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Audio2Text - MLX Whisper Distribution

High-performance audio transcription for Apple Silicon Macs

Transform your audio files into text with state-of-the-art AI, optimized for Apple's M1, M2, and M3 chips.

Quick Start

Automated Installation

For fresh macOS systems, use our automated installer:

# Run the installer
bash install.sh

This will handle everything automatically:

  • βœ… System requirements check
  • βœ… Install dependencies (Homebrew, Python, MLX)
  • βœ… Set up virtual environment with exact package versions
  • βœ… Create command-line tool and desktop app
  • βœ… Download AI models

Installation time: 10-20 minutes

What's Included

πŸ“ Distribution Contents

Audio2Text-Distribution/
β”œβ”€β”€ install.sh              # Automated installer script
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ docs/                   # Detailed documentation
β”‚   └── INSTALLATION.md     # Step-by-step installation guide
└── resources/              # Application files
    β”œβ”€β”€ transcribe_standalone.py  # Main transcription script
    β”œβ”€β”€ npz_loading_fix.py         # NPZ compatibility fix
    β”œβ”€β”€ config/             # Configuration templates
    β”œβ”€β”€ docs/               # User documentation  
    └── test/               # Installation tests

πŸš€ Key Features

  • Apple Silicon Optimized - Uses MLX framework for native M1/M2/M3 acceleration
  • Multiple AI Models - MLX Whisper (primary) with WhisperX fallback
  • Speaker Diarization - Identify and separate different speakers
  • Multi-language Support - 99+ languages with auto-detection
  • Flexible Output - Text, JSON, SRT subtitle formats
  • Audio Format Support - WAV, MP3, M4A, FLAC, MP4, and more
  • Dual Interface - Command-line tool and drag-drop desktop app

System Requirements

Requirement Details
macOS 11.0+ (Big Sur or later)
Hardware Apple Silicon Mac (M1/M2/M3)
Storage 8GB free space
Internet Required for setup and model downloads

Installation

πŸ”§ Production Installation

For production use on fresh or existing macOS systems:

./install.sh

Features:

  • βœ… Zero configuration required
  • βœ… Installs all dependencies correctly
  • βœ… Creates proper directory structure
  • βœ… Works on fresh macOS systems
  • βœ… Easy to uninstall
  • ⏱️ Takes 10-20 minutes
  • πŸ’Ύ Downloads ~2-4GB of dependencies

πŸ§ͺ Test Installation (For Developers)

For safe testing on development systems with existing Python/ML setups:

./install-test.sh

Safe Testing Features:

  • πŸ”’ Completely isolated installation
  • 🚫 No conflicts with existing Python packages
  • 🚫 No system PATH modifications
  • πŸ“ Installs to ~/Applications/Audio2Text-Test/
  • πŸ—‘οΈ Easy removal: rm -rf ~/Applications/Audio2Text-Test/
  • ⏱️ Faster setup with minimal dependencies

Quick Usage

Command Line

# Basic transcription
audio2text recording.wav

# German audio with speaker identification
audio2text --language de --speakers interview.mp3

# Output as SRT subtitles
audio2text --format srt --output-dir ~/Desktop video.m4a

Desktop App

  1. Double-click Audio2Text.app in Applications
  2. Drag audio files onto the app window
  3. Transcriptions save to ~/Applications/Audio2Text/output/

Configuration

HuggingFace Token (Required)

To download AI models, you need a free HuggingFace token:

  1. Create account at https://huggingface.co/join
  2. Get token at https://huggingface.co/settings/tokens
  3. Configure it:
# Edit config file
nano ~/Applications/Audio2Text/config/env

# Add your token
HF_TOKEN=your_token_here

Troubleshooting

Quick Diagnostics

# Test system compatibility
python ~/Applications/Audio2Text/test/test_installation.py

# Check logs
tail -f ~/Applications/Audio2Text/logs/audio2text_*.log

# Test core functionality
audio2text --help

Common Issues

"No transcription engines available"

  • Ensure you're on Apple Silicon Mac
  • Check that MLX is properly installed
  • Verify macOS version is 11.0+

"Failed to download models"

  • Configure HuggingFace token
  • Check internet connection
  • Ensure sufficient disk space

"Command not found: audio2text"

  • Restart terminal to pick up PATH changes
  • Or run directly: ~/Applications/Audio2Text/bin/audio2text

Documentation

Performance

Benchmarks (M2 Pro)

Model Size Speed Accuracy Memory
tiny 10x realtime Good 1GB
base 8x realtime Better 1GB
small 6x realtime Very Good 2GB
medium 4x realtime Excellent 3GB
large-v3 3x realtime Best 4GB

1 hour audio β‰ˆ 3-6 minutes processing time

Uninstallation

# Run the uninstaller
~/Applications/Audio2Text/uninstall.sh

# Or manual cleanup
rm -rf ~/Applications/Audio2Text/
rm -rf ~/Applications/Audio2Text.app

License & Legal

Software License

Audio2Text is released under the MIT License. See LICENSE for full details.

Third-Party Dependencies

This software uses several open-source components:

  • MLX Whisper (Apache 2.0) - Apple's efficient Whisper implementation
  • WhisperX (BSD-4-Clause) - Enhanced Whisper with alignment and diarization
  • PyTorch (BSD 3-Clause) - Machine learning framework
  • Transformers (Apache 2.0) - Hugging Face transformer models
  • pyannote.audio (MIT) - Speaker diarization toolkit
  • librosa (ISC) - Audio analysis library

AI Model Licenses

⚠️ Important: This software downloads AI models that have their own licenses:

  • OpenAI Whisper Models: MIT License, free for commercial use
  • Pyannote Diarization Models: May require Hugging Face agreement
  • Other Hugging Face Models: Individual licensing terms apply

Users are responsible for ensuring compliance with all model licenses for their intended use case.

Commercial Use

The Audio2Text software itself is free for commercial use under MIT License. However:

  1. βœ… Whisper models are MIT licensed (commercial OK)
  2. ⚠️ Some speaker diarization models may have restrictions
  3. ⚠️ Verify individual model licenses before commercial deployment

For commercial applications, we recommend:

  • Reviewing all model licenses on Hugging Face Hub
  • Using only commercially-licensed models
  • Consulting legal counsel for compliance

Support

  • Installation Issues: Check INSTALLATION.md
  • Usage Questions: See User Manual
  • Bug Reports: Include logs from ~/Applications/Audio2Text/logs/
  • License Questions: See LICENSE file

πŸŽ™οΈ Ready to start transcribing!

Run the automated installer (./install.sh) for the best experience on Apple Silicon Macs.

About

Audio2Text - MLX Whisper Transcription for Apple Silicon - Native macOS app for fast, offline audio transcription using MLX Whisper on Apple Silicon. Supports German/multilingual transcription, speaker diarization, drag-and-drop interface. Optimized for M1/M2/M3 Macs with Metal GPU acceleration. Easy installer included.

Resources

License

Stars

Watchers

Forks

Packages

No packages published