CLI Reference

Vayu provides the vayu command (and whisper-mlx for backward compatibility).

Basic Usage

vayu <audio_file> [options]

Arguments

Input

Argument	Description
`audio`	One or more audio file paths. Use `-` for stdin

Model Options

Flag	Default	Description
`--model`	`mlx-community/whisper-turbo`	Model name or HuggingFace repo
`--fp16`	`True`	Use float16 for inference

Output Options

Flag	Default	Description
`--output-dir`, `-o`	`.`	Directory for output files
`--output-format`, `-f`	`txt`	Format: `txt`, `vtt`, `srt`, `tsv`, `json`, or `all`
`--output-name`	—	Custom output filename (without extension)
`--verbose`	`True`	Print progress and results

Processing Options

Flag	Default	Description
`--batch-size`	`1`	Segments per forward pass (set >1 for batched decoding)
`--task`	`transcribe`	`transcribe` or `translate` (to English)
`--language`	auto	Language code (e.g., `en`, `fa`). Auto-detected if omitted

Quality Control

Flag	Default	Description
`--temperature`	`0`	Sampling temperature
`--compression-ratio-threshold`	`2.4`	Max compression ratio before rejection
`--logprob-threshold`	`-1.0`	Min avg log probability
`--no-speech-threshold`	`0.6`	Silence detection threshold
`--condition-on-previous-text`	`True`	Use previous segment as context
`--initial-prompt`	—	Initial text prompt

Decoding Options

Flag	Default	Description
`--beam-size`	—	Beam search width
`--patience`	—	Beam search patience factor
`--best-of`	—	Number of candidates for best-of-N

Timestamp Options

Flag	Default	Description
`--word-timestamps`	`False`	Enable word-level timestamps
`--highlight-words`	`False`	Underline words in SRT/VTT as they're spoken
`--max-line-width`	—	Max characters per subtitle line
`--max-line-count`	—	Max lines per subtitle entry
`--max-words-per-line`	—	Max words per subtitle line

Advanced

Flag	Default	Description
`--clip-timestamps`	`0`	Comma-separated timestamp ranges
`--hallucination-silence-threshold`	—	Skip silent hallucinations longer than this (seconds)
`--strict`	`False`	Exit on first transcription error

Examples

# Fast transcription with batched decoding
vayu audio.mp3 --batch-size 12

# Use a specific model
vayu audio.mp3 --model mlx-community/distil-whisper-large-v3

# Generate SRT subtitles with word highlighting
vayu audio.mp3 --output-format srt --word-timestamps True --highlight-words True

# Transcribe multiple files to a directory
vayu *.mp3 --output-dir ./transcripts --output-format all

# Translate non-English audio to English
vayu french_audio.mp3 --task translate --language fr

# High-quality transcription with beam search
vayu audio.mp3 --beam-size 5 --best-of 5 --batch-size 6

# Process specific time ranges
vayu long_audio.mp3 --clip-timestamps "0,30,60,90"

# Read from stdin
cat audio.mp3 | vayu - --output-name result

Error Handling

By default, Vayu continues processing remaining files if one fails. Use --strict to stop on the first error.

Errors are collected and reported at the end:

Transcription errors:
  broken.mp3: FileNotFoundError - File not found

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI Reference

CLI Reference

Basic Usage

Arguments

Input

Model Options

Output Options

Processing Options

Quality Control

Decoding Options

Timestamp Options

Advanced

Examples

Error Handling

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally