Troubleshooting

Troubleshooting & FAQ

Common Issues

FFmpeg Not Found

Error: RuntimeError: ffmpeg was not found or FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

Fix: Install FFmpeg:

# macOS (Homebrew)
brew install ffmpeg

# Verify
ffmpeg -version

FFmpeg is required to decode audio files into raw PCM for processing.

Out of Memory

Error: MemoryError or the process is killed by macOS

Fix: Reduce batch size or use a smaller/quantized model:

# Lower batch size
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=4)

# Use quantization
whisper = LightningWhisperMLX(model="large-v3", quant="4bit", batch_size=4)

# Use a smaller model
whisper = LightningWhisperMLX(model="small", batch_size=16)

From CLI:

vayu audio.mp3 --batch-size 4

Memory usage scales with: model size x batch size. See Models for recommended batch sizes.

Model Download Failures

Error: OSError: We couldn't connect to 'https://huggingface.co' or download hangs

Fixes:

Check your internet connection
Retry — HuggingFace Hub has occasional downtime

If behind a proxy, set environment variables:

export HTTP_PROXY=http://proxy:port
export HTTPS_PROXY=http://proxy:port

Pre-download models:

from huggingface_hub import snapshot_download
snapshot_download("mlx-community/distil-whisper-large-v3")

Models are cached after the first download at ~/.cache/huggingface/hub/.

ImportError: mlx

Error: ModuleNotFoundError: No module named 'mlx'

Cause: MLX only works on Apple Silicon Macs. It cannot be installed on Intel Macs, Linux, or Windows.

Fix: Verify you're on Apple Silicon:

uname -m
# Should output: arm64

Then install:

# Using uv
uv pip install "mlx>=0.11"

# Using pip
pip install "mlx>=0.11"

Audio Format Not Supported

Error: RuntimeError: Error running ffmpeg with codec errors

Fix: FFmpeg handles most formats (mp3, wav, flac, ogg, m4a, webm, etc.). If you hit an issue:

Verify the file isn't corrupted: ffprobe audio.mp3
Convert manually: ffmpeg -i input.format -ar 16000 -ac 1 output.wav
Pass the converted WAV to Vayu

Hallucinated / Repeated Text

Symptom: Output contains repeated phrases or nonsensical text

Fixes:

Lower the compression ratio threshold:

vayu audio.mp3 --compression-ratio-threshold 2.0

Use a better model (larger models hallucinate less):
```
vayu audio.mp3 --model large-v3
```
Disable conditioning on previous text (prevents error propagation):
```
vayu audio.mp3 --condition-on-previous-text False
```

Set a hallucination silence threshold:

vayu audio.mp3 --hallucination-silence-threshold 2.0

Wrong Language Detected

Symptom: Transcription is in the wrong language or garbled

Fix: Specify the language explicitly:

result = whisper.transcribe("audio.mp3", language="en")

vayu audio.mp3 --language en

Auto-detection uses the first 30 seconds of audio. If those are silent or ambiguous, detection may fail.

Slow Transcription

Symptom: Transcription takes much longer than expected

Checklist:

Set batch_size > 1 — the default is 1 (no batching):
```
vayu audio.mp3 --batch-size 12
```
Use FP16 — enabled by default, but verify:
```
vayu audio.mp3 --fp16 True
```
Use a faster model — turbo or distil-large-v3 instead of large-v3
Disable word timestamps if not needed — DTW alignment adds overhead
Close other memory-intensive apps — MLX shares unified memory with the system

Word Timestamps Are Inaccurate

Symptom: Word timings don't align well with the audio

Tips:

Word timestamps work best with clear speech and low background noise
Accuracy varies by language — English has the best alignment
CJK languages (Chinese, Japanese, Thai) use character-level splitting which may differ from expected word boundaries
Try a larger model for better cross-attention patterns

FAQ

Does Vayu work on Intel Macs?

No. Vayu requires Apple Silicon (M1/M2/M3/M4) because it depends on the MLX framework, which only supports Apple's Neural Engine and GPU.

Can I use Vayu on Linux or Windows?

No. MLX is macOS-only. For other platforms, consider faster-whisper or whisper.cpp.

What audio formats are supported?

Any format that FFmpeg can decode: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, WebM, and many more. Audio is always resampled to 16kHz mono internally.

Can I transcribe streaming audio?

Vayu processes audio in 30-second chunks, so it can handle long files. For real-time streaming, pipe audio to stdin:

# From a microphone (requires sox)
rec -q -r 16000 -c 1 -t wav - | vayu - --output-name live

How do I transcribe a YouTube video?

Use yt-dlp to extract audio first:

yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=..." -o audio.mp3
vayu audio.mp3 --batch-size 12 --output-format srt

What's the maximum audio length?

There is no hard limit. Vayu processes audio in 30-second chunks, so it can handle files of any length. Memory usage depends on the model and batch size, not audio duration.

How accurate is the transcription?

Accuracy depends on the model, audio quality, and language. large-v3 achieves the best accuracy across all languages. distil-large-v3 is nearly as accurate but 2-3x faster. For English, even small performs well on clear audio.

Troubleshooting

Troubleshooting & FAQ

Common Issues

FFmpeg Not Found

Out of Memory

Model Download Failures

ImportError: mlx

Audio Format Not Supported

Hallucinated / Repeated Text

Wrong Language Detected

Slow Transcription

Word Timestamps Are Inaccurate

FAQ

Does Vayu work on Intel Macs?

Can I use Vayu on Linux or Windows?

What audio formats are supported?

Can I transcribe streaming audio?

How do I transcribe a YouTube video?

What's the maximum audio length?

How accurate is the transcription?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally