Skip to content

Troubleshooting

Behnam Ebrahimi edited this page Mar 29, 2026 · 2 revisions

Troubleshooting & FAQ

Common Issues

FFmpeg Not Found

Error: RuntimeError: ffmpeg was not found or FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

Fix: Install FFmpeg:

# macOS (Homebrew)
brew install ffmpeg

# Verify
ffmpeg -version

FFmpeg is required to decode audio files into raw PCM for processing.


Out of Memory

Error: MemoryError or the process is killed by macOS

Fix: Reduce batch size or use a smaller/quantized model:

# Lower batch size
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=4)

# Use quantization
whisper = LightningWhisperMLX(model="large-v3", quant="4bit", batch_size=4)

# Use a smaller model
whisper = LightningWhisperMLX(model="small", batch_size=16)

From CLI:

vayu audio.mp3 --batch-size 4

Memory usage scales with: model size x batch size. See Models for recommended batch sizes.


Model Download Failures

Error: OSError: We couldn't connect to 'https://huggingface.co' or download hangs

Fixes:

  1. Check your internet connection
  2. Retry — HuggingFace Hub has occasional downtime
  3. If behind a proxy, set environment variables:
    export HTTP_PROXY=http://proxy:port
    export HTTPS_PROXY=http://proxy:port
  4. Pre-download models:
    from huggingface_hub import snapshot_download
    snapshot_download("mlx-community/distil-whisper-large-v3")

Models are cached after the first download at ~/.cache/huggingface/hub/.


ImportError: mlx

Error: ModuleNotFoundError: No module named 'mlx'

Cause: MLX only works on Apple Silicon Macs. It cannot be installed on Intel Macs, Linux, or Windows.

Fix: Verify you're on Apple Silicon:

uname -m
# Should output: arm64

Then install:

# Using uv
uv pip install "mlx>=0.11"

# Using pip
pip install "mlx>=0.11"

Audio Format Not Supported

Error: RuntimeError: Error running ffmpeg with codec errors

Fix: FFmpeg handles most formats (mp3, wav, flac, ogg, m4a, webm, etc.). If you hit an issue:

  1. Verify the file isn't corrupted: ffprobe audio.mp3
  2. Convert manually: ffmpeg -i input.format -ar 16000 -ac 1 output.wav
  3. Pass the converted WAV to Vayu

Hallucinated / Repeated Text

Symptom: Output contains repeated phrases or nonsensical text

Fixes:

  1. Lower the compression ratio threshold:
    vayu audio.mp3 --compression-ratio-threshold 2.0
  2. Use a better model (larger models hallucinate less):
    vayu audio.mp3 --model large-v3
  3. Disable conditioning on previous text (prevents error propagation):
    vayu audio.mp3 --condition-on-previous-text False
  4. Set a hallucination silence threshold:
    vayu audio.mp3 --hallucination-silence-threshold 2.0

Wrong Language Detected

Symptom: Transcription is in the wrong language or garbled

Fix: Specify the language explicitly:

result = whisper.transcribe("audio.mp3", language="en")
vayu audio.mp3 --language en

Auto-detection uses the first 30 seconds of audio. If those are silent or ambiguous, detection may fail.


Slow Transcription

Symptom: Transcription takes much longer than expected

Checklist:

  1. Set batch_size > 1 — the default is 1 (no batching):
    vayu audio.mp3 --batch-size 12
  2. Use FP16 — enabled by default, but verify:
    vayu audio.mp3 --fp16 True
  3. Use a faster modelturbo or distil-large-v3 instead of large-v3
  4. Disable word timestamps if not needed — DTW alignment adds overhead
  5. Close other memory-intensive apps — MLX shares unified memory with the system

Word Timestamps Are Inaccurate

Symptom: Word timings don't align well with the audio

Tips:

  • Word timestamps work best with clear speech and low background noise
  • Accuracy varies by language — English has the best alignment
  • CJK languages (Chinese, Japanese, Thai) use character-level splitting which may differ from expected word boundaries
  • Try a larger model for better cross-attention patterns

FAQ

Does Vayu work on Intel Macs?

No. Vayu requires Apple Silicon (M1/M2/M3/M4) because it depends on the MLX framework, which only supports Apple's Neural Engine and GPU.

Can I use Vayu on Linux or Windows?

No. MLX is macOS-only. For other platforms, consider faster-whisper or whisper.cpp.

What audio formats are supported?

Any format that FFmpeg can decode: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, WebM, and many more. Audio is always resampled to 16kHz mono internally.

Can I transcribe streaming audio?

Vayu processes audio in 30-second chunks, so it can handle long files. For real-time streaming, pipe audio to stdin:

# From a microphone (requires sox)
rec -q -r 16000 -c 1 -t wav - | vayu - --output-name live

How do I transcribe a YouTube video?

Use yt-dlp to extract audio first:

yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=..." -o audio.mp3
vayu audio.mp3 --batch-size 12 --output-format srt

What's the maximum audio length?

There is no hard limit. Vayu processes audio in 30-second chunks, so it can handle files of any length. Memory usage depends on the model and batch size, not audio duration.

How accurate is the transcription?

Accuracy depends on the model, audio quality, and language. large-v3 achieves the best accuracy across all languages. distil-large-v3 is nearly as accurate but 2-3x faster. For English, even small performs well on clear audio.

Clone this wiki locally