Add MLX-Whisper backend support for Apple Silicon#294
Open
fffilimonov wants to merge 2 commits intoKoljaB:masterfrom
Open
Add MLX-Whisper backend support for Apple Silicon#294fffilimonov wants to merge 2 commits intoKoljaB:masterfrom
fffilimonov wants to merge 2 commits intoKoljaB:masterfrom
Conversation
Implements native Apple Silicon (M1/M2/M3/M4) acceleration through MLX-Whisper backend, enabling efficient speech-to-text without CUDA dependencies. Key Features: - New 'backend' parameter supporting "faster-whisper" (default) and "mlx-whisper" - Automatic model path translation (tiny -> mlx-community/whisper-tiny) - Compatible transcription format (drop-in replacement for faster-whisper) - Multiprocessing-safe implementation with pickle-compatible classes - Near real-time performance (RTF ~1.07x with tiny model on M2) Implementation Details: - Added MLX import guard with graceful fallback - Created MLXTranscriptionInfo and MLXTranscriptionSegment compatibility classes - Modified TranscriptionWorker to support both backends - Updated requirements.txt with conditional MLX dependency for macOS - Added comprehensive documentation (MLX_SUPPORT.md, MLX_README_ADDITION.md) Performance (Apple M2): - Short audio (6.6s): RTF 1.80x - Long audio (167s): RTF 1.07x (near real-time) - Initialization: ~3s (includes model download/cache) Closes: Apple Silicon support request Tested on: macOS with M2, Python 3.13
Replace synthetic RTF numbers with actual real-time streaming test results that better reflect real-world usage scenarios. Test Setup: - Real-time audio streaming (1.0x speed, simulating microphone) - Multiple test scenarios (short audio, multi-sentence with pauses) - Tested both tiny and medium models - Apple M2, macOS, Python 3.13 Key Findings: - Tiny model: MLX 0.1-0.2s faster per transcription - Medium model: MLX 4.2s faster for first transcription - MLX captures 4/5 sentences vs CPU's 3/5 with medium model - Heavier models show larger performance gap (GPU advantage) - MLX maintains transcription quality under load Updated Documentation: - MLX_SUPPORT.md: Replaced benchmark section with streaming test results - Added multi-sentence test data showing sentence detection performance - Highlighted quality advantage (complete vs incomplete transcriptions)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As mentioned in #48