Examples and Use Cases

Examples & Use Cases

Podcast Transcription

Transcribe a long podcast episode with high accuracy:

from whisper_mlx import LightningWhisperMLX

whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)
result = whisper.transcribe("podcast_episode.mp3", language="en")

# Save as plain text
with open("transcript.txt", "w") as f:
    f.write(result["text"])

CLI equivalent:

vayu podcast_episode.mp3 --model distil-large-v3 --batch-size 12 -f txt -o ./transcripts

Subtitle Generation

Generate SRT subtitles for a video:

vayu video_audio.mp3 --batch-size 12 --output-format srt --word-timestamps True

With word-by-word highlighting (karaoke-style):

vayu video_audio.mp3 --batch-size 12 -f srt \
    --word-timestamps True \
    --highlight-words True \
    --max-line-width 42 \
    --max-line-count 2

Generate WebVTT for HTML5 video:

vayu video_audio.mp3 --batch-size 12 -f vtt --word-timestamps True

Meeting Notes

Transcribe a meeting recording with speaker context:

from whisper_mlx import LightningWhisperMLX

whisper = LightningWhisperMLX(model="large-v3", batch_size=6)

result = whisper.transcribe(
    "meeting.m4a",
    language="en",
    word_timestamps=True,
    initial_prompt="Meeting participants: Alice, Bob, Charlie. Topic: Q4 planning.",
)

# Print timestamped segments
for seg in result["segments"]:
    minutes = int(seg["start"] // 60)
    seconds = int(seg["start"] % 60)
    print(f"[{minutes:02d}:{seconds:02d}] {seg['text'].strip()}")

The initial_prompt helps the model with proper nouns and domain-specific vocabulary.

Batch Processing

Transcribe an entire directory of audio files:

# All MP3 files in a directory
vayu recordings/*.mp3 --batch-size 12 -f all -o ./transcripts

# With error tolerance (continues on failure)
vayu recordings/*.mp3 --batch-size 12 -f json -o ./transcripts

In Python:

from pathlib import Path
from whisper_mlx import LightningWhisperMLX

whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)

audio_dir = Path("recordings")
output_dir = Path("transcripts")
output_dir.mkdir(exist_ok=True)

for audio_file in sorted(audio_dir.glob("*.mp3")):
    print(f"Processing: {audio_file.name}")
    result = whisper.transcribe(str(audio_file), language="en")

    output_path = output_dir / f"{audio_file.stem}.txt"
    output_path.write_text(result["text"])

Translation

Translate non-English audio to English text:

whisper = LightningWhisperMLX(model="large-v3", batch_size=6)

# Spanish audio → English text
result = whisper.transcribe("spanish_interview.mp3", language="es", task="translate")
print(result["text"])  # English translation

vayu spanish_interview.mp3 --language es --task translate --batch-size 6

YouTube Video Transcription

# Step 1: Extract audio with yt-dlp
yt-dlp -x --audio-format mp3 -o "video.mp3" "https://youtube.com/watch?v=VIDEO_ID"

# Step 2: Transcribe
vayu video.mp3 --batch-size 12 -f srt -o ./subtitles

Audio Clip Processing

Transcribe specific portions of a long audio file:

# Process only 0-60s and 120-180s
vayu long_recording.mp3 --clip-timestamps "0,60,120,180" --batch-size 12

Low Memory Usage

For Macs with limited RAM or when running alongside other apps:

# 4-bit quantized model uses ~4x less memory
whisper = LightningWhisperMLX(model="distil-large-v3", quant="4bit", batch_size=6)
result = whisper.transcribe("audio.mp3")

# Use a tiny model with high batch size
vayu audio.mp3 --model tiny --batch-size 32

JSON Processing Pipeline

Extract structured data from transcriptions:

import json
from whisper_mlx import LightningWhisperMLX

whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)

result = whisper.transcribe("interview.mp3", language="en", word_timestamps=True)

# Build a searchable index
index = []
for seg in result["segments"]:
    index.append({
        "start": seg["start"],
        "end": seg["end"],
        "text": seg["text"].strip(),
        "words": seg.get("words", []),
    })

with open("searchable_index.json", "w") as f:
    json.dump(index, f, indent=2)

Reading from stdin

Pipe audio directly from another process:

# From FFmpeg conversion
ffmpeg -i video.mkv -vn -ar 16000 -ac 1 -f wav - | vayu - --output-name video_transcript

# From a download
curl -sL "https://example.com/audio.mp3" | vayu - --output-name downloaded

Multiple Output Formats

Generate all formats at once for different consumers:

vayu lecture.mp3 --batch-size 12 -f all -o ./output --word-timestamps True
# Creates: lecture.txt, lecture.srt, lecture.vtt, lecture.tsv, lecture.json

.txt — for reading / search indexing
.srt — for video players
.vtt — for web embedding
.tsv — for spreadsheet analysis
.json — for programmatic access

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples and Use Cases

Examples & Use Cases

Podcast Transcription

Subtitle Generation

Meeting Notes

Batch Processing

Translation

YouTube Video Transcription

Audio Clip Processing

Low Memory Usage

JSON Processing Pipeline

Reading from stdin

Multiple Output Formats

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally