Skip to content

l2yao/SDT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SDT — Subtitle from Transcript Generator

Generate accurate .srt subtitle files by aligning a plain text transcript (no timestamps) to a YouTube video or local audio/video file.

How It Works

SDT uses forced alignment — it takes your known transcript and synchronizes it with the audio using stable-ts (built on OpenAI Whisper). This is much faster and more accurate than transcribing from scratch, because the model already knows what is being said and only needs to figure out when.

Input: YouTube URL + Plain Transcript  →  Output: Timed .srt Subtitle File

Features

  • 🎬 YouTube support — paste a URL, audio is downloaded automatically via yt-dlp
  • 📁 Local files — works with any video/audio format (MP4, MKV, MP3, WAV, etc.)
  • 🌍 Multilingual — supports all Whisper languages (English, Chinese, Japanese, etc.)
  • 🚀 GPU acceleration — auto-detects CUDA GPU for fast alignment
  • ✂️ Smart segmentation — automatically splits subtitles at natural breakpoints
  • 📓 Colab ready — included notebook for easy cloud usage with free GPU

Quick Start

Installation

pip install -r requirements.txt

Note: FFmpeg must be installed on your system for YouTube downloads.

CLI Usage

# From YouTube
python -m sdt -i "https://youtube.com/watch?v=VIDEO_ID" -t transcript.txt -o output.srt

# From local file
python -m sdt -i video.mp4 -t transcript.txt

# Chinese with large model
python -m sdt -i video.mp4 -t transcript.txt -l zh -m large-v3

# Preview without saving
python -m sdt -i audio.mp3 -t script.txt --preview

Python API

from sdt import download_audio, align_transcript
from sdt.srt_writer import write_srt

# 1. Get audio
audio_path = download_audio("https://youtube.com/watch?v=VIDEO_ID")

# 2. Align transcript
with open("transcript.txt", "r") as f:
    transcript = f.read()

result = align_transcript(audio_path, transcript, language="en")

# 3. Generate SRT
write_srt(result, "output.srt")

Google Colab

Open SDT_Colab.ipynb in Google Colab for a ready-to-use notebook with free GPU.

CLI Options

Flag Description Default
-i, --input YouTube URL or local file path required
-t, --transcript Path to plain text transcript required
-o, --output Output file path auto-named
-l, --language Language code (en, zh, ja, ...) auto-detect
-m, --model Whisper model size medium
--max-chars Max characters per subtitle 42
--max-duration Max seconds per subtitle 5.0
--format Output format (srt or vtt) srt
--preview Print to console, don't save off

Model Sizes

Model Parameters English Multilingual Speed
tiny 39M ⭐⭐ ⚡⚡⚡⚡
base 74M ⭐⭐⭐ ⭐⭐ ⚡⚡⚡
small 244M ⭐⭐⭐⭐ ⭐⭐⭐ ⚡⚡
medium 769M ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
large-v3 1.5B ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ 🐢

Requirements

  • Python ≥ 3.9
  • FFmpeg (for YouTube downloads)
  • NVIDIA GPU (optional, but recommended for speed)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors