Skip to content

Comments

Add MLX-Whisper backend support for Apple Silicon#294

Open
fffilimonov wants to merge 2 commits intoKoljaB:masterfrom
fffilimonov:feature/mlx-support
Open

Add MLX-Whisper backend support for Apple Silicon#294
fffilimonov wants to merge 2 commits intoKoljaB:masterfrom
fffilimonov:feature/mlx-support

Conversation

@fffilimonov
Copy link

As mentioned in #48

Implements native Apple Silicon (M1/M2/M3/M4) acceleration through MLX-Whisper backend,
enabling efficient speech-to-text without CUDA dependencies.

Key Features:
- New 'backend' parameter supporting "faster-whisper" (default) and "mlx-whisper"
- Automatic model path translation (tiny -> mlx-community/whisper-tiny)
- Compatible transcription format (drop-in replacement for faster-whisper)
- Multiprocessing-safe implementation with pickle-compatible classes
- Near real-time performance (RTF ~1.07x with tiny model on M2)

Implementation Details:
- Added MLX import guard with graceful fallback
- Created MLXTranscriptionInfo and MLXTranscriptionSegment compatibility classes
- Modified TranscriptionWorker to support both backends
- Updated requirements.txt with conditional MLX dependency for macOS
- Added comprehensive documentation (MLX_SUPPORT.md, MLX_README_ADDITION.md)

Performance (Apple M2):
- Short audio (6.6s): RTF 1.80x
- Long audio (167s): RTF 1.07x (near real-time)
- Initialization: ~3s (includes model download/cache)

Closes: Apple Silicon support request
Tested on: macOS with M2, Python 3.13
Replace synthetic RTF numbers with actual real-time streaming test results
that better reflect real-world usage scenarios.

Test Setup:
- Real-time audio streaming (1.0x speed, simulating microphone)
- Multiple test scenarios (short audio, multi-sentence with pauses)
- Tested both tiny and medium models
- Apple M2, macOS, Python 3.13

Key Findings:
- Tiny model: MLX 0.1-0.2s faster per transcription
- Medium model: MLX 4.2s faster for first transcription
- MLX captures 4/5 sentences vs CPU's 3/5 with medium model
- Heavier models show larger performance gap (GPU advantage)
- MLX maintains transcription quality under load

Updated Documentation:
- MLX_SUPPORT.md: Replaced benchmark section with streaming test results
- Added multi-sentence test data showing sentence detection performance
- Highlighted quality advantage (complete vs incomplete transcriptions)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant