Skip to content

DreamLab-AI/Video-to-Gaussian

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,673 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gaussian Toolkit

Video-to-3D-scene pipeline built on LichtFeld Studio. Upload a video, get a textured polygonal mesh and USD scene.

Architecture

Video Upload → Frame Extraction → COLMAP SfM → 3DGS Training →
  → Object Segmentation (SAM3) → Mesh Extraction (TSDF or MILo) →
  → Blender Assembly + Texture Bake → USD Scene + Web Viewer

Two-container Docker deployment:

Container Base GPU Purpose
gaussian-toolkit Ubuntu 24.04, CUDA 12.8, Python 3.12 GPU 0 COLMAP, LichtFeld 3DGS, web UI, Blender, SAM3
milo Ubuntu 22.04, CUDA 11.8, Python 3.10 GPU 1 MILo mesh extraction (SIGGRAPH Asia 2025)

Quick Start

# Set HuggingFace token (needed for SAM3 segmentation models)
export HF_TOKEN=hf_your_token_here

# Start both containers
docker compose -f docker-compose.consolidated.yml up -d

# Open the web interface
# http://localhost:7860

Upload a video at the web UI. The pipeline runs autonomously via Claude Code inside the container.

Pipeline Stages

Stage Tool Output
Frame extraction PyAV JPEG frames
Structure-from-Motion COLMAP 4.1.0 Camera poses + sparse point cloud
3DGS training LichtFeld Studio (MCP) Trained gaussian PLY (~1M splats)
Object segmentation SAM3 (4M concepts, text+visual) Per-object 2D masks
Mask projection Custom (ray casting) Per-object 3D gaussian labels
Mesh extraction TSDF (fast) or MILo (high quality) GLB meshes with vertex colours
Scene assembly Blender (Cycles GPU) Texture-baked USD scene
Web delivery Flask + model-viewer Preview carousel, download ZIP

Mesh Extraction Backends

  • TSDF (default): Open3D volumetric fusion from rendered depth maps. Fast (~2 min), lower geometric quality. Good for previews.
  • MILo (SIGGRAPH Asia 2025): Differentiable mesh-in-the-loop gaussian splatting. Delaunay triangulation + learned SDF. High quality, runs in the dedicated sidecar container on GPU 1.

Web Interface

The Flask app on port 7860 provides:

  • Video upload with drag-and-drop
  • Real-time pipeline progress (SSE log streaming)
  • 3D preview via <model-viewer> (Google)
  • Preview image carousel from Blender renders
  • Download ZIP of all outputs (mesh, USD, previews)
  • Anthropic API key management for Claude Code orchestration

Services

Port Service
7860 Web UI (Flask)
7681 Web terminal (ttyd / Claude Code)
8188 ComfyUI
45677 LichtFeld MCP server (70+ tools)
5901 VNC (Blender remote desktop)

Pipeline Modules

28 Python modules in src/pipeline/:

Category Modules
Core stages.py, orchestrator.py, cli.py, config.py, preflight.py
Reconstruction colmap_parser.py, coordinate_transform.py, frame_selector.py, frame_quality.py
Segmentation sam2_segmentor.py, sam3_segmentor.py, sam3d_client.py, mask_projector.py
Mesh mesh_extractor.py (TSDF), milo_extractor.py (MILo), mesh_cleaner.py
Texturing texture_baker.py, material_assigner.py
Scene blender_assembler.py, usd_assembler.py
Rendering multiview_renderer.py, hunyuan3d_client.py, comfyui_inpainter.py
Utilities mcp_client.py, quality_gates.py, person_remover.py

Web interface in src/web/: app.py, job_manager.py, pipeline_runner.py, templates, static assets.

Hardware

Tested on:

Component Spec
GPU 2x NVIDIA RTX 6000 Ada (48 GB VRAM each, 96 GB total)
CPU AMD Threadripper PRO 48-core
RAM 251 GB
Storage NVMe SSD

Minimum: single GPU with 12 GB VRAM (TSDF only, no MILo sidecar).

Project Boundaries

This is a fork of LichtFeld Studio. Upstream code (src/core/, src/app/, src/mcp/, src/rendering/, src/training/) is not modified. All pipeline additions live in src/pipeline/, src/web/, docker/, and scripts/. See BOUNDARIES.md for the full separation policy.

Current Status

The pipeline runs end-to-end: video upload through COLMAP, 3DGS training, mesh extraction (TSDF or MILo), Blender assembly, to a 3D web viewer. MILo produces 736K vertex meshes with learned SDF extraction.

Key finding: Source video quality is the primary quality bottleneck. YouTube-compressed video of featureless/reflective scenes produces broken gaussian training regardless of mesh extraction method. See docs/capture-methodology.md for filming guidelines that produce good reconstructions.

What works What needs improvement
COLMAP SfM (100% registration) Source video quality (YouTube too lossy)
MILo mesh extraction (736K verts) Gaussian training on reflective/featureless scenes
Web 3D viewer + preview carousel SAM3 per-object segmentation (dtype fix needed)
Blender Cycles texture bake (0.5s) Per-object Hunyuan MV pipeline (not yet wired)
Docker two-container deployment Automated MILo dispatch from web UI

See docs/engineering-log.md for the full development history.

License

Upstream LichtFeld Studio: GPL-3.0. Pipeline additions: GPL-3.0 (derivative work).

About

Video-to-3D scene pipeline: upload video → textured mesh + USD scene. Built on LichtFeld Studio with gsplat, COLMAP, Claude Code orchestration.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • C++ 57.4%
  • JavaScript 24.5%
  • Python 8.7%
  • Cuda 6.4%
  • TeX 1.0%
  • CMake 0.6%
  • Other 1.4%