-
Notifications
You must be signed in to change notification settings - Fork 0
Examples and Use Cases
Behnam Ebrahimi edited this page Mar 29, 2026
·
1 revision
Transcribe a long podcast episode with high accuracy:
from whisper_mlx import LightningWhisperMLX
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)
result = whisper.transcribe("podcast_episode.mp3", language="en")
# Save as plain text
with open("transcript.txt", "w") as f:
f.write(result["text"])CLI equivalent:
vayu podcast_episode.mp3 --model distil-large-v3 --batch-size 12 -f txt -o ./transcriptsGenerate SRT subtitles for a video:
vayu video_audio.mp3 --batch-size 12 --output-format srt --word-timestamps TrueWith word-by-word highlighting (karaoke-style):
vayu video_audio.mp3 --batch-size 12 -f srt \
--word-timestamps True \
--highlight-words True \
--max-line-width 42 \
--max-line-count 2Generate WebVTT for HTML5 video:
vayu video_audio.mp3 --batch-size 12 -f vtt --word-timestamps TrueTranscribe a meeting recording with speaker context:
from whisper_mlx import LightningWhisperMLX
whisper = LightningWhisperMLX(model="large-v3", batch_size=6)
result = whisper.transcribe(
"meeting.m4a",
language="en",
word_timestamps=True,
initial_prompt="Meeting participants: Alice, Bob, Charlie. Topic: Q4 planning.",
)
# Print timestamped segments
for seg in result["segments"]:
minutes = int(seg["start"] // 60)
seconds = int(seg["start"] % 60)
print(f"[{minutes:02d}:{seconds:02d}] {seg['text'].strip()}")The initial_prompt helps the model with proper nouns and domain-specific vocabulary.
Transcribe an entire directory of audio files:
# All MP3 files in a directory
vayu recordings/*.mp3 --batch-size 12 -f all -o ./transcripts
# With error tolerance (continues on failure)
vayu recordings/*.mp3 --batch-size 12 -f json -o ./transcriptsIn Python:
from pathlib import Path
from whisper_mlx import LightningWhisperMLX
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)
audio_dir = Path("recordings")
output_dir = Path("transcripts")
output_dir.mkdir(exist_ok=True)
for audio_file in sorted(audio_dir.glob("*.mp3")):
print(f"Processing: {audio_file.name}")
result = whisper.transcribe(str(audio_file), language="en")
output_path = output_dir / f"{audio_file.stem}.txt"
output_path.write_text(result["text"])Translate non-English audio to English text:
whisper = LightningWhisperMLX(model="large-v3", batch_size=6)
# Spanish audio → English text
result = whisper.transcribe("spanish_interview.mp3", language="es", task="translate")
print(result["text"]) # English translationvayu spanish_interview.mp3 --language es --task translate --batch-size 6# Step 1: Extract audio with yt-dlp
yt-dlp -x --audio-format mp3 -o "video.mp3" "https://youtube.com/watch?v=VIDEO_ID"
# Step 2: Transcribe
vayu video.mp3 --batch-size 12 -f srt -o ./subtitlesTranscribe specific portions of a long audio file:
# Process only 0-60s and 120-180s
vayu long_recording.mp3 --clip-timestamps "0,60,120,180" --batch-size 12For Macs with limited RAM or when running alongside other apps:
# 4-bit quantized model uses ~4x less memory
whisper = LightningWhisperMLX(model="distil-large-v3", quant="4bit", batch_size=6)
result = whisper.transcribe("audio.mp3")# Use a tiny model with high batch size
vayu audio.mp3 --model tiny --batch-size 32Extract structured data from transcriptions:
import json
from whisper_mlx import LightningWhisperMLX
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)
result = whisper.transcribe("interview.mp3", language="en", word_timestamps=True)
# Build a searchable index
index = []
for seg in result["segments"]:
index.append({
"start": seg["start"],
"end": seg["end"],
"text": seg["text"].strip(),
"words": seg.get("words", []),
})
with open("searchable_index.json", "w") as f:
json.dump(index, f, indent=2)Pipe audio directly from another process:
# From FFmpeg conversion
ffmpeg -i video.mkv -vn -ar 16000 -ac 1 -f wav - | vayu - --output-name video_transcript
# From a download
curl -sL "https://example.com/audio.mp3" | vayu - --output-name downloadedGenerate all formats at once for different consumers:
vayu lecture.mp3 --batch-size 12 -f all -o ./output --word-timestamps True
# Creates: lecture.txt, lecture.srt, lecture.vtt, lecture.tsv, lecture.json-
.txt— for reading / search indexing -
.srt— for video players -
.vtt— for web embedding -
.tsv— for spreadsheet analysis -
.json— for programmatic access