Local-first pipeline that turns narration audio + a script into a timed, subtitled MP4.
FrameForge is a modular monolith: a FastAPI backend runs transcription, scene segmentation, visual assembly, subtitle generation, and FFmpeg-backed rendering; a Next.js studio submits jobs and tracks progress in real time. No database, no cloud API keys required for the default path—artifacts live on disk under backend/storage/.
Creators and engineers use FrameForge when they want repeatable video assembly without handing raw narration to a black-box SaaS editor. You upload audio, paste the script the voice actor (or TTS) followed, and the system:
- Transcribes audio with Faster-Whisper and aligns timing to the script.
- Segments the script into scenes (Ollama LLM with heuristic/timeline fallbacks).
- Attaches visuals per scene—generated placeholders or your own
scene_NNN.jpg/.mp4files. - Builds SRT and optionally burns cinematic subtitles into the frame.
- Renders a single H.264/AAC MP4 with configurable quality and streaming-friendly
faststart.
The studio UI exposes a five-step progress stepper, a live processing log (fed from job metadata), and an output preview with export metadata when rendering completes.
| Studio — new project | Studio — pipeline running | Completed output |
|---|---|---|
![]() |
![]() |
![]() |
| Upload form + script | Stepper + processing log | Video preview + metadata |
| Landing | API health |
|---|---|
![]() |
![]() |
docs/screenshots/
studio-empty.png # /studio before job submit
studio-processing.png # stepper + logs during render
studio-complete.png # output preview + download
landing.png # marketing hero
health-badge.png # optional — API status on landing
┌─────────────────────────────────────────────────────────────────┐
│ Next.js 16 (App Router) — localhost:3000 │
│ / · /studio · typed API client · job polling · progress UI │
└────────────────────────────┬────────────────────────────────────┘
│ REST /api/v1
┌────────────────────────────▼────────────────────────────────────┐
│ FastAPI — localhost:8000 │
│ routes: health · uploads · jobs │
│ PipelineOrchestrator (async steps via asyncio.to_thread) │
└────────────────────────────┬────────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
transcription segmentation visual_assembly
(Faster-Whisper) (Ollama / heuristic) (validate · normalize · timeline)
│ │ │
└───────────────────────┼───────────────────────┘
▼
subtitles · rendering
(SRT · MoviePy + FFmpeg)
│
▼
backend/storage/jobs/<uuid>/
job.json · transcript.json · output.mp4 · …
| Path | Responsibility |
|---|---|
frontend/src/app/ |
Routes: landing (/), studio (/studio) |
frontend/src/components/studio/ |
Upload form, stepper, logs, output preview |
frontend/src/lib/api/ |
Jobs, uploads, health clients |
backend/app/api/routes/ |
Thin HTTP layer |
backend/app/pipeline/orchestrator.py |
Ordered pipeline steps + error mapping |
backend/app/services/ |
Domain modules (no HTTP imports) |
backend/app/models/schemas.py |
Shared Pydantic contracts |
backend/storage/ |
Uploads + per-job artifact directories |
Each job is a folder:
storage/jobs/<job_id>/
job.json # status, progress, scenes, metadata
audio.* # narration upload
transcript.json
scenes.json
visual_timeline.json
visuals/ # scene_001.jpg, scene_002.mp4, normalized/
subtitles.srt
render_output.json # ffprobe-style export metadata
render_state.json # lifecycle during encode (cleared on success)
output.mp4
JobStore reads/writes job.json. This keeps local development simple and makes jobs inspectable with ordinary tools. Multi-instance deployment would need shared storage or a persistence swap—see Roadmap.
The orchestrator runs five pipeline steps (surfaced in the UI and API as progress.step):
| Step | Service | What happens |
|---|---|---|
| transcribe | services/transcription |
Faster-Whisper → segments, transcript.json, timeline blocks |
| segment | services/segmentation |
Script + transcript → Scene list with start_time / end_time |
| visuals | services/visual_assembly |
Per-scene media: validate → preprocess → normalize → visual_timeline.json |
| subtitles | services/subtitles |
SRT from segments; burn-in optional at render |
| render | services/rendering |
Compose clips + audio → encode MP4 |
flowchart LR
A[Audio + Script] --> B[Transcribe]
B --> C[Segment]
C --> D[Visual assembly]
D --> E[Subtitles SRT]
E --> F[Render engine]
F --> G[output.mp4]
- Default: gradient placeholder images from
visual_prompt(Pillow, no external image API). - Override: place
scene_001.png,scene_002.mp4, etc. in the jobvisuals/directory before the visuals step (or pre-stage in a future upload flow). - Images are cover-cropped to
VIDEO_WIDTH×VIDEO_HEIGHT; videos are probed and trimmed at render time. - Transitions:
cut,fade,crossfade(configurable per job defaults).
VideoRenderEngine (services/rendering/):
- Compose —
build_clips_from_timeline+ subtitle overlays + narration via MoviePy. - Encode —
write_videofile(libx264, AAC, CRF, preset) tooutput.mp4.partial. - Finalize — optional FFmpeg remux for
+faststart; atomic promote tooutput.mp4. - Recover — configurable retries with temp/partial cleanup between attempts.
- Queue —
RenderQueueserializes encodes so concurrent jobs do not fight for CPU.
Progress is written to job.metadata.rendering_progress and mapped to studio percent (85–99) during the render step.
- Script + audio → single MP4 with scene timing
- Faster-Whisper transcription with optional script initial prompt
- Segmentation via Ollama (
autobackend) with heuristic/timeline fallbacks - Visual timeline JSON artifact for debugging and re-render
- Cinematic subtitle themes (ratio-based typography, fade animation)
- Studio: stepper, live log panel, output preview, download
- OpenAPI at
/docs - Liveness / readiness health checks
- Structured errors (
FrameForgeErrorcodes) - CORS for local Next.js
- Typed frontend clients
- Ruff (backend) + ESLint/Prettier/tsc (frontend)
- Unit tests for visual assembly and rendering helpers
| Decision | Rationale |
|---|---|
| Modular monolith | One deployable unit; clear module boundaries without network chatter between pipeline stages. |
| File-backed jobs | Zero ops for SQLite/Postgres in v0; easy to inspect and delete; tradeoff is horizontal scale. |
| Blocking ML in thread pool | Whisper, MoviePy, and Pillow block; asyncio.to_thread keeps the event loop responsive for status polling. |
| Single render queue | Encodes are CPU-heavy; serial queue avoids thrashing on laptops and small VMs. |
| Partial then promote | Failed encodes do not corrupt output.mp4; retries start clean. |
| MoviePy v1/v2 compat | moviepy_compat isolates import differences for clips and effects. |
| Optional Ollama / Gemini | Segmentation quality improves with LLM; heuristics keep the pipeline alive offline. |
| No auth in v0 | Local tool first; auth and tenancy belong behind explicit product requirements. |
- Inspectable over opaque — JSON artifacts and folders beat opaque project blobs.
- Boring infrastructure — FastAPI, Next.js, FFmpeg; avoid custom job systems until needed.
- Fail with context —
TranscriptionError,SegmentationError,VisualAssemblyError,RenderingErrorcarry step and cause for the UI and logs. - Small modules, flat abstractions —
visual_assembly/andrendering/packages are focused; no enterprise framework inside the repo. - Local-first default — Cloud APIs are optional flags, not prerequisites.
- Python 3.11+
- Node.js 20+
- FFmpeg on
PATH(required for MoviePy export and faststart remux) - Ollama (optional, recommended for segmentation):
ollama pull llama3.2
cd backend
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
# source .venv/bin/activate
pip install -r requirements.txt
copy .env.example .env # Windows — use cp on Unix
python run.pyAPI: http://localhost:8000 · OpenAPI: http://localhost:8000/docs
cd frontend
npm installCreate frontend/.env.local:
NEXT_PUBLIC_API_URL=http://localhost:8000npm run devStudio: http://localhost:3000/studio
curl http://localhost:8000/api/v1/health
curl http://localhost:8000/api/v1/health/readyAfter a job is created, before or during the visuals step, add files to:
backend/storage/jobs/<job_id>/visuals/scene_001.jpg
backend/storage/jobs/<job_id>/visuals/scene_002.mp4
Supported images: .jpg, .jpeg, .png, .webp, .bmp
Supported video: .mp4, .mov, .webm, .mkv, .avi
Base path: /api/v1
| Method | Path | Description |
|---|---|---|
GET |
/health |
Version, Whisper model, Ollama/Gemini flags |
GET |
/health/live |
Process liveness |
GET |
/health/ready |
Storage writable |
| Method | Path | Description |
|---|---|---|
POST |
/uploads |
Upload narration audio → upload_id |
GET |
/uploads/{upload_id} |
Upload metadata |
| Method | Path | Description |
|---|---|---|
POST |
/jobs |
Start pipeline (script + upload_id or audio file) → 202 |
GET |
/jobs/{job_id} |
Status, progress, metadata (incl. rendering logs) |
GET |
/jobs/{job_id}/transcript |
Transcript artifact |
GET |
/jobs/{job_id}/scenes |
Scenes + segmentation metadata |
GET |
/jobs/{job_id}/video |
MP4 download (completed jobs only) |
Create job (multipart form):
# 1. Upload audio
curl -X POST http://localhost:8000/api/v1/uploads \
-F "file=@narration.wav"
# 2. Start job
curl -X POST http://localhost:8000/api/v1/jobs \
-F "script=Your full narration script here..." \
-F "upload_id=<upload_id>"Poll status:
curl http://localhost:8000/api/v1/jobs/<job_id>Response includes progress.step (transcribe | segment | visuals | subtitles | render), progress.percent, and metadata (e.g. rendering_progress, render_output).
Copy backend/.env.example → backend/.env. Highlights:
| Group | Variables | Notes |
|---|---|---|
| Server | HOST, PORT, CORS_ORIGINS |
Default 8000, allow localhost:3000 |
| Whisper | WHISPER_MODEL, WHISPER_DEVICE, WHISPER_COMPUTE_TYPE |
base + cpu + int8 for laptops |
| Segmentation | SEGMENTATION_BACKEND, OLLAMA_*, SCENE_*_DURATION_SECONDS |
auto tries LLM then heuristic |
| Video | VIDEO_WIDTH, VIDEO_HEIGHT, VIDEO_FPS |
1920×1080 @ 30 default |
| Visuals | VISUAL_DEFAULT_TRANSITION, VISUAL_* |
fade / crossfade timing |
| Subtitles | SUBTITLE_ENABLED, SUBTITLE_THEME, ratio knobs |
Theme: cinematic |
| Render | RENDER_CRF, RENDER_PRESET, RENDER_FASTSTART, RENDER_MAX_ATTEMPTS |
Export quality + retry |
Frontend: NEXT_PUBLIC_API_URL only.
# Backend
cd backend && ruff check app && ruff format app
cd backend && PYTHONPATH=. python -m unittest discover -s tests -v
# Frontend
cd frontend && npm run lint && npm run typecheck
cd frontend && npm run buildMakefile (backend/): make run, make lint, make format.
| Priority | Item |
|---|---|
| Near | User-facing upload of per-scene media in studio (not only filesystem drop-in) |
| Near | Persist screenshot assets in docs/screenshots/ for README |
| Medium | Job queue backend (Redis + worker) for multi-machine encode |
| Medium | Word-level subtitle highlighting when WHISPER_WORD_TIMESTAMPS=true |
| Medium | Pluggable image generation backend (SD / API) behind visual_prompt |
| Later | Auth, org workspaces, shared storage (S3/Azure Blob) |
| Later | Webhook / SDK for headless batch renders |
MIT




