SDT — Subtitle from Transcript Generator

Generate accurate .srt subtitle files by aligning a plain text transcript (no timestamps) to a YouTube video or local audio/video file.

How It Works

SDT uses forced alignment — it takes your known transcript and synchronizes it with the audio using stable-ts (built on OpenAI Whisper). This is much faster and more accurate than transcribing from scratch, because the model already knows what is being said and only needs to figure out when.

Input: YouTube URL + Plain Transcript  →  Output: Timed .srt Subtitle File

Features

🎬 YouTube support — paste a URL, audio is downloaded automatically via yt-dlp
📁 Local files — works with any video/audio format (MP4, MKV, MP3, WAV, etc.)
🌍 Multilingual — supports all Whisper languages (English, Chinese, Japanese, etc.)
🚀 GPU acceleration — auto-detects CUDA GPU for fast alignment
✂️ Smart segmentation — automatically splits subtitles at natural breakpoints
📓 Colab ready — included notebook for easy cloud usage with free GPU

Quick Start

Installation

pip install -r requirements.txt

Note: FFmpeg must be installed on your system for YouTube downloads.

CLI Usage

# From YouTube
python -m sdt -i "https://youtube.com/watch?v=VIDEO_ID" -t transcript.txt -o output.srt

# From local file
python -m sdt -i video.mp4 -t transcript.txt

# Chinese with large model
python -m sdt -i video.mp4 -t transcript.txt -l zh -m large-v3

# Preview without saving
python -m sdt -i audio.mp3 -t script.txt --preview

Python API

from sdt import download_audio, align_transcript
from sdt.srt_writer import write_srt

# 1. Get audio
audio_path = download_audio("https://youtube.com/watch?v=VIDEO_ID")

# 2. Align transcript
with open("transcript.txt", "r") as f:
    transcript = f.read()

result = align_transcript(audio_path, transcript, language="en")

# 3. Generate SRT
write_srt(result, "output.srt")

Google Colab

Open SDT_Colab.ipynb in Google Colab for a ready-to-use notebook with free GPU.

CLI Options

Flag	Description	Default
`-i`, `--input`	YouTube URL or local file path	required
`-t`, `--transcript`	Path to plain text transcript	required
`-o`, `--output`	Output file path	auto-named
`-l`, `--language`	Language code (`en`, `zh`, `ja`, ...)	auto-detect
`-m`, `--model`	Whisper model size	`medium`
`--max-chars`	Max characters per subtitle	`42`
`--max-duration`	Max seconds per subtitle	`5.0`
`--format`	Output format (`srt` or `vtt`)	`srt`
`--preview`	Print to console, don't save	off

Model Sizes

Model	Parameters	English	Multilingual	Speed
`tiny`	39M	⭐⭐	⭐	⚡⚡⚡⚡
`base`	74M	⭐⭐⭐	⭐⭐	⚡⚡⚡
`small`	244M	⭐⭐⭐⭐	⭐⭐⭐	⚡⚡
`medium`	769M	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⚡
`large-v3`	1.5B	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	🐢

Requirements

Python ≥ 3.9
FFmpeg (for YouTube downloads)
NVIDIA GPU (optional, but recommended for speed)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
sdt		sdt
README.md		README.md
SDT_Colab.ipynb		SDT_Colab.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDT — Subtitle from Transcript Generator

How It Works

Features

Quick Start

Installation

CLI Usage

Python API

Google Colab

CLI Options

Model Sizes

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SDT — Subtitle from Transcript Generator

How It Works

Features

Quick Start

Installation

CLI Usage

Python API

Google Colab

CLI Options

Model Sizes

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages