Skip to content

CLI Reference

Behnam Ebrahimi edited this page Mar 29, 2026 · 1 revision

CLI Reference

Vayu provides the vayu command (and whisper-mlx for backward compatibility).

Basic Usage

vayu <audio_file> [options]

Arguments

Input

Argument Description
audio One or more audio file paths. Use - for stdin

Model Options

Flag Default Description
--model mlx-community/whisper-turbo Model name or HuggingFace repo
--fp16 True Use float16 for inference

Output Options

Flag Default Description
--output-dir, -o . Directory for output files
--output-format, -f txt Format: txt, vtt, srt, tsv, json, or all
--output-name Custom output filename (without extension)
--verbose True Print progress and results

Processing Options

Flag Default Description
--batch-size 1 Segments per forward pass (set >1 for batched decoding)
--task transcribe transcribe or translate (to English)
--language auto Language code (e.g., en, fa). Auto-detected if omitted

Quality Control

Flag Default Description
--temperature 0 Sampling temperature
--compression-ratio-threshold 2.4 Max compression ratio before rejection
--logprob-threshold -1.0 Min avg log probability
--no-speech-threshold 0.6 Silence detection threshold
--condition-on-previous-text True Use previous segment as context
--initial-prompt Initial text prompt

Decoding Options

Flag Default Description
--beam-size Beam search width
--patience Beam search patience factor
--best-of Number of candidates for best-of-N

Timestamp Options

Flag Default Description
--word-timestamps False Enable word-level timestamps
--highlight-words False Underline words in SRT/VTT as they're spoken
--max-line-width Max characters per subtitle line
--max-line-count Max lines per subtitle entry
--max-words-per-line Max words per subtitle line

Advanced

Flag Default Description
--clip-timestamps 0 Comma-separated timestamp ranges
--hallucination-silence-threshold Skip silent hallucinations longer than this (seconds)
--strict False Exit on first transcription error

Examples

# Fast transcription with batched decoding
vayu audio.mp3 --batch-size 12

# Use a specific model
vayu audio.mp3 --model mlx-community/distil-whisper-large-v3

# Generate SRT subtitles with word highlighting
vayu audio.mp3 --output-format srt --word-timestamps True --highlight-words True

# Transcribe multiple files to a directory
vayu *.mp3 --output-dir ./transcripts --output-format all

# Translate non-English audio to English
vayu french_audio.mp3 --task translate --language fr

# High-quality transcription with beam search
vayu audio.mp3 --beam-size 5 --best-of 5 --batch-size 6

# Process specific time ranges
vayu long_audio.mp3 --clip-timestamps "0,30,60,90"

# Read from stdin
cat audio.mp3 | vayu - --output-name result

Error Handling

By default, Vayu continues processing remaining files if one fails. Use --strict to stop on the first error.

Errors are collected and reported at the end:

Transcription errors:
  broken.mp3: FileNotFoundError - File not found

Clone this wiki locally