-
Notifications
You must be signed in to change notification settings - Fork 0
CLI Reference
Behnam Ebrahimi edited this page Mar 29, 2026
·
1 revision
Vayu provides the vayu command (and whisper-mlx for backward compatibility).
vayu <audio_file> [options]| Argument | Description |
|---|---|
audio |
One or more audio file paths. Use - for stdin |
| Flag | Default | Description |
|---|---|---|
--model |
mlx-community/whisper-turbo |
Model name or HuggingFace repo |
--fp16 |
True |
Use float16 for inference |
| Flag | Default | Description |
|---|---|---|
--output-dir, -o
|
. |
Directory for output files |
--output-format, -f
|
txt |
Format: txt, vtt, srt, tsv, json, or all
|
--output-name |
— | Custom output filename (without extension) |
--verbose |
True |
Print progress and results |
| Flag | Default | Description |
|---|---|---|
--batch-size |
1 |
Segments per forward pass (set >1 for batched decoding) |
--task |
transcribe |
transcribe or translate (to English) |
--language |
auto | Language code (e.g., en, fa). Auto-detected if omitted |
| Flag | Default | Description |
|---|---|---|
--temperature |
0 |
Sampling temperature |
--compression-ratio-threshold |
2.4 |
Max compression ratio before rejection |
--logprob-threshold |
-1.0 |
Min avg log probability |
--no-speech-threshold |
0.6 |
Silence detection threshold |
--condition-on-previous-text |
True |
Use previous segment as context |
--initial-prompt |
— | Initial text prompt |
| Flag | Default | Description |
|---|---|---|
--beam-size |
— | Beam search width |
--patience |
— | Beam search patience factor |
--best-of |
— | Number of candidates for best-of-N |
| Flag | Default | Description |
|---|---|---|
--word-timestamps |
False |
Enable word-level timestamps |
--highlight-words |
False |
Underline words in SRT/VTT as they're spoken |
--max-line-width |
— | Max characters per subtitle line |
--max-line-count |
— | Max lines per subtitle entry |
--max-words-per-line |
— | Max words per subtitle line |
| Flag | Default | Description |
|---|---|---|
--clip-timestamps |
0 |
Comma-separated timestamp ranges |
--hallucination-silence-threshold |
— | Skip silent hallucinations longer than this (seconds) |
--strict |
False |
Exit on first transcription error |
# Fast transcription with batched decoding
vayu audio.mp3 --batch-size 12
# Use a specific model
vayu audio.mp3 --model mlx-community/distil-whisper-large-v3
# Generate SRT subtitles with word highlighting
vayu audio.mp3 --output-format srt --word-timestamps True --highlight-words True
# Transcribe multiple files to a directory
vayu *.mp3 --output-dir ./transcripts --output-format all
# Translate non-English audio to English
vayu french_audio.mp3 --task translate --language fr
# High-quality transcription with beam search
vayu audio.mp3 --beam-size 5 --best-of 5 --batch-size 6
# Process specific time ranges
vayu long_audio.mp3 --clip-timestamps "0,30,60,90"
# Read from stdin
cat audio.mp3 | vayu - --output-name resultBy default, Vayu continues processing remaining files if one fails. Use --strict to stop on the first error.
Errors are collected and reported at the end:
Transcription errors:
broken.mp3: FileNotFoundError - File not found