-
Notifications
You must be signed in to change notification settings - Fork 0
Troubleshooting
Error: RuntimeError: ffmpeg was not found or FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'
Fix: Install FFmpeg:
# macOS (Homebrew)
brew install ffmpeg
# Verify
ffmpeg -versionFFmpeg is required to decode audio files into raw PCM for processing.
Error: MemoryError or the process is killed by macOS
Fix: Reduce batch size or use a smaller/quantized model:
# Lower batch size
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=4)
# Use quantization
whisper = LightningWhisperMLX(model="large-v3", quant="4bit", batch_size=4)
# Use a smaller model
whisper = LightningWhisperMLX(model="small", batch_size=16)From CLI:
vayu audio.mp3 --batch-size 4Memory usage scales with: model size x batch size. See Models for recommended batch sizes.
Error: OSError: We couldn't connect to 'https://huggingface.co' or download hangs
Fixes:
- Check your internet connection
- Retry — HuggingFace Hub has occasional downtime
- If behind a proxy, set environment variables:
export HTTP_PROXY=http://proxy:port export HTTPS_PROXY=http://proxy:port
- Pre-download models:
from huggingface_hub import snapshot_download snapshot_download("mlx-community/distil-whisper-large-v3")
Models are cached after the first download at ~/.cache/huggingface/hub/.
Error: ModuleNotFoundError: No module named 'mlx'
Cause: MLX only works on Apple Silicon Macs. It cannot be installed on Intel Macs, Linux, or Windows.
Fix: Verify you're on Apple Silicon:
uname -m
# Should output: arm64Then install:
# Using uv
uv pip install "mlx>=0.11"
# Using pip
pip install "mlx>=0.11"Error: RuntimeError: Error running ffmpeg with codec errors
Fix: FFmpeg handles most formats (mp3, wav, flac, ogg, m4a, webm, etc.). If you hit an issue:
- Verify the file isn't corrupted:
ffprobe audio.mp3 - Convert manually:
ffmpeg -i input.format -ar 16000 -ac 1 output.wav - Pass the converted WAV to Vayu
Symptom: Output contains repeated phrases or nonsensical text
Fixes:
- Lower the compression ratio threshold:
vayu audio.mp3 --compression-ratio-threshold 2.0
- Use a better model (larger models hallucinate less):
vayu audio.mp3 --model large-v3
- Disable conditioning on previous text (prevents error propagation):
vayu audio.mp3 --condition-on-previous-text False
- Set a hallucination silence threshold:
vayu audio.mp3 --hallucination-silence-threshold 2.0
Symptom: Transcription is in the wrong language or garbled
Fix: Specify the language explicitly:
result = whisper.transcribe("audio.mp3", language="en")vayu audio.mp3 --language enAuto-detection uses the first 30 seconds of audio. If those are silent or ambiguous, detection may fail.
Symptom: Transcription takes much longer than expected
Checklist:
-
Set batch_size > 1 — the default is 1 (no batching):
vayu audio.mp3 --batch-size 12
-
Use FP16 — enabled by default, but verify:
vayu audio.mp3 --fp16 True
-
Use a faster model —
turboordistil-large-v3instead oflarge-v3 - Disable word timestamps if not needed — DTW alignment adds overhead
- Close other memory-intensive apps — MLX shares unified memory with the system
Symptom: Word timings don't align well with the audio
Tips:
- Word timestamps work best with clear speech and low background noise
- Accuracy varies by language — English has the best alignment
- CJK languages (Chinese, Japanese, Thai) use character-level splitting which may differ from expected word boundaries
- Try a larger model for better cross-attention patterns
No. Vayu requires Apple Silicon (M1/M2/M3/M4) because it depends on the MLX framework, which only supports Apple's Neural Engine and GPU.
No. MLX is macOS-only. For other platforms, consider faster-whisper or whisper.cpp.
Any format that FFmpeg can decode: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, WebM, and many more. Audio is always resampled to 16kHz mono internally.
Vayu processes audio in 30-second chunks, so it can handle long files. For real-time streaming, pipe audio to stdin:
# From a microphone (requires sox)
rec -q -r 16000 -c 1 -t wav - | vayu - --output-name liveUse yt-dlp to extract audio first:
yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=..." -o audio.mp3
vayu audio.mp3 --batch-size 12 --output-format srtThere is no hard limit. Vayu processes audio in 30-second chunks, so it can handle files of any length. Memory usage depends on the model and batch size, not audio duration.
Accuracy depends on the model, audio quality, and language. large-v3 achieves the best accuracy across all languages. distil-large-v3 is nearly as accurate but 2-3x faster. For English, even small performs well on clear audio.