feat: Fuse multiple splats from video by richiejp · Pull Request #3 · localai-org/free-splatter.cpp

richiejp · 2026-06-29T08:42:11Z

Recover camera positions for idividual splats and merge overlapping splats using a shared
frame. That is if we have three frames from a video and one of the frames overlaps with both
of the other two, we can create two splats that share a frame. Because one frame is shared by
both splats its camera coordinates/pose can be used to merge the coordinate systems for both
splats.

Unfortunately there is a lot of variation in the depth estimation of gaussians between splats
so when we try to merge them only a few gaussians are shared between the splats and it is difficult
to reconcile them. Various corrective measures have been tried with limited success.

pose/: downstream PnP + cross-run alignment prototype, verified vs upstream
pose/: dense GT-posed control (Tanks&Temples OOD, RealEstate10K validates pipeline)
pose/: cross-run consistency on in-distribution re10k (accumulation is clean)
CLAUDE.md: PnP now in scope; C++/Go-only policy, no-Python runtime
pose/: sliding-window accumulation prototype (the live idea, proven end-to-end)
pose/: loop closure -- machinery verified; real loop reveals it's not the lever
pose/: consensus fusion -- removes the edge-noise floaters from the accumulated cloud
pose/: begin the C++ port -- focal + Umeyama align + DLT/RANSAC PnP, golden-tested
pose/: robust C++ PnP (EPnP + Gauss-Newton) -- cv2-parity on real scenes
pose/: C++ accumulation chaining (Accumulator) -- one world from a photo stream
pose/: validate C++ loop closure (sim4_invert + distribute_drift) vs Python
pose/: validate C++ consensus fusion (consensus_fuse) vs Python fuse.py
CLI + C-API: pose recovery + accumulation surface (no Python)
pose/: delete the Python prototype -- fully ported to C++ and shipped
fuzz: pose C-API + image-decode harnesses; fix a SIGFPE the fuzzer found
demo: accumulating-reconstruction viewer (cloud grows as photos are added)
pose: carry gaussian scale + rotation through accumulation (render as splats)
demo: add consensus-fused step + accumulate from cached .f32 pair dumps
demo: make the consensus-fused step unmistakable + defeat stale caching
demo: tighter framing + baseline guidance (forward-dolly clips reconstruct blurry)
pose: carry opacity into the splat alpha (fix the opaque, swirling, blurry cloud)
splat: unify the .splat encoder + pin it with regression tests (Model weights? #1, Benchmark the forward pass and optimize CPU+GPU; portable build by default #2)
fuse: add dense "kept" mode (fix the sparse fused scene)
pose: de-ghost the accumulated cloud -- gaussian-level consensus_refine + best-frame fusion
demo: de-ghost the bake by default (--refine) + document the fuse/refine knobs
pose: parallax estimation (after-inference C++ metric + independent cv2 reference)
cli: --min-parallax keyframe gate for accumulate
demo: wire --min-parallax gate into the bake (+ honest de-ghost docs)
demo: auto-bake videos from demo-vids/ (scripts/demo/bake-vids.sh)
device: default to GPU/Vulkan, fail-closed; CPU is explicit opt-in
server: scene switcher + upload-a-video-to-make-a-scene (in-process GPU bake)
server: / is a menu of the demo pages (one server, one port)
Tree-merge accumulation + vibrance slider for the accumulate demo

…stream A self-contained, pure-Python prototype DOWNSTREAM of the engine seam ([N,H,W,23]); deliberately OUTSIDE the validated src/ and NOT wired into CMake/ctest. It prototypes live, accumulating reconstruction from a moving camera: recover each view's camera (PnP) and align successive runs. - focal.py / pnp.py: faithful port of FreeSplatter's scene estimate_poses -- Weiszfeld shared focal (view 0 only, all pixels; use_first_focal), integer pixels, cv2.solvePnPRansac(SQPNP, reprojErr=5, iters=10), cam2world=inv(w2c), and the runner's 1/baseline camera rescale. numpy DLT+RANSAC fallback runs without cv2. - align.py: Umeyama similarity fit + RANSAC + a residual ladder (diagnose) that detects whether cross-run mismatch is a uniform-scale similarity or a nonlinear warp; plus similarity chaining and loop-closure metrics. Verification (all green): - check_cv2_parity.py -- numpy solver == exact cv2.solvePnPRansac on synthetic ground truth (~1e-7 clean). - check_upstream_parity.py -- our WHOLE orchestration vs upstream estimate_poses on REAL engine output. Caught and fixed five divergences (the costly one: focal averaged over a low-overlap 2nd view gave 507 vs the correct 596, ~15% -> 1.35 deg pose error). Now bit-exact (0.00 deg) on 2 scene + 2 object dumps; the only residual is upstream's float32 K vs our float64 (a RANSAC inlier- boundary precision effect, <=0.5 deg on near-degenerate object data, 0 on scene), root-caused and documented, not papered over. - test_pose.py -- 28 asset-free golden tests (no model/fixtures/cv2). Empirical finding: cross-run mismatch is a uniform-scale similarity (~11% scene scale drift), no nonlinear warp -- a 7-DoF similarity is the right alignment model; diagnose() flags it if that ever changes. flake.nix: add opencv4 (cv2) for the exact-upstream PnP path; numpy-only fallback otherwise. Not needed by the engine build/test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ates pipeline) A known-good, ground-truth-posed control so a live-path failure can be attributed to the data/model vs our code -- and a reusable engine-vs-GT harness. - tt_control.py / tt_experiment.py: Tanks-and-Temples (NSVF) loader (OpenGL->OpenCV poses, baseline-vs-stride report) + engine-vs-GT check. Verdict: T&T is OUT OF DISTRIBUTION for FreeSplatter-scene (narrow-FOV object orbits) -- opacity confident on only 8-17% of pixels, pose error 28-145deg. Kept as harness + negative result. - re10k_control.py / re10k_fetch.py / re10k_experiment.py: RealEstate10K loader (parser + GT geometry, GT focal_512 = fy*512), yt-dlp/ffmpeg frame fetch with a dead-video skip, and the engine-vs-GT check. IN DISTRIBUTION -> the control works: relative pose recovered to 0.4-1.5deg vs INDEPENDENT GT, opacity confident on 68-75% of pixels. Validates our PnP beyond the upstream parity, and confirms the re10k camera convention. Findings: (1) the model has a CONSTANT wide-FOV focal bias (recovers ~274 vs GT ~439, ~37%) -- benign for relative accumulation (consistent across runs), off for metric scale; (2) pose error scales ~linearly (~25% of baseline rotation), no degeneracy even at ~1.4deg baselines on texture-rich interiors. Data (T&T ~1.5GB, re10k poses ~720MB, fetched frames) stays under .cache/ and the scratchpad -- gitignored, never committed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s clean) re10k_crossrun.py: a frame shared by two overlapping pairs (m-s,m) and (m,m+s) is reconstructed twice in two coordinate systems; fit a robust similarity between the two and read consistency + the residual ladder, swept over baseline stride. Mirrors empirical.py but on in-distribution data where geometry is good. Result (shared frame 120): cross-run consistency is HIGH and best at small baseline -- stride 20 (0.41deg): 65% of pixels agree within 2% of scene extent, 98% within 10%; stride 80 (6deg): 46% / 95%. Versus the OOD doll's 7% / 21%. The mismatch is a clean uniform-scale similarity everywhere (sim->affine buys ~0.1%, verdict similarity_plus_noise); per-step scale drift is small and grows with baseline (1.7% at 0.4deg, 11% at 6deg). Takeaways for the live path: (1) the small-baseline hypothesis holds -- the sliding window's small steps land in the high-consistency regime, so accumulation is mostly clean; (2) consensus fusion is a polish for the residual ~2% floaters, not a prerequisite; (3) registration is a 7-DoF similarity (no nonlinear warp), with slow scale drift a sim3 pose-graph can bound. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Policy update directed by the project owner: - PnP pose recovery and cross-run Sim(3) alignment / accumulation are now IN scope (previously out of scope), alongside the engine. - Everything ships in C++; Go only for the demo web server (purego -> C API -> Vulkan + WebGL). The CLI and C API must have NO Python dependency at runtime. - Python is confined to (1) dev-time reference/conversion/validation in the CUDA docker (hf_dump/convert/compare_taps), never a runtime dep, and (2) the pose/ research prototype TEMPORARILY -- continued in Python only until the approach is proven, then rewritten in C++ and the Python deleted. - Per-component discipline now lists the C++ `pose` component, inheriting the parity discipline the Python prototype established (bit-exact to upstream estimate_poses; validated vs independent GT poses). pose/README.md updated to match (temporary prototype, C++ port pending). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nd-to-end) accumulate.py assembles the validated pieces into the live pipeline (minus realtime): slide a window over a re10k clip, recover each pair's camera (PnP), fit a Sim(3) between consecutive runs from their SHARED-frame per-pixel correspondences and compose into a per-run global transform (align.compose), drop every frame's gaussians into one world (frame f_0's), and measure the recovered camera trajectory vs ground truth. Engine dumps are cached so analysis re-runs skip inference. render_ply.py projects the colored cloud through a pinhole camera to a PNG. Result (13 pairs, stride 20, frames 0..260): - per-link Sim(3) registration clean: residual ~1.0-1.4% of scene extent. - scale drift accumulates monotonically (forward pan, no revisit): 0.755 over 12 links, ~2.3%/link -- the monocular 1/d drift compounding. - recovered camera trajectory tracks GT to ATE ~11% of extent (single global Sim(3) align); drift grows 7%->13% first->second half, worst at the endpoint -- exactly what a Sim(3) pose-graph + loop closure bounds. - the 2.6M-point accumulated cloud renders from camera 0 as a COHERENT room that matches the input frame (wider FOV due to the model's ~274-vs-439 focal bias). The accumulating-reconstruction idea is proven; per CLAUDE.md the next implementation step is the C++ port (CLI + C API), after which this Python goes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… the lever find_loop.py searches re10k poses for a clip that revisits its start; loop_closure.py chains open-loop, measures the loop error via a closing pair (f_0,f_n), distributes it by even Sim(3) relaxation (D^(k/n)), and reports ATE before/after. Sim(3) 4x4 helpers (sim_matrix, sim_frac_power) promoted into align.py; golden test added. Honest result on a real loop clip (camera out to 2.29 and back to 0.23): - the open-loop chain ALREADY closes the loop (loop error 4.4deg / scale 1.12 / 8% trans) -- there is almost no accumulated drift to distribute. - the dominant ~34% ATE is per-link ODOMETRY NOISE (Sim(3) inlier% as low as 17-24% on the fast outbound leg) plus the model's focal-bias warp: self-consistent (the loop closes) but distorted vs GT. Loop closure can't fix that; naive distribution slightly hurts. - diagnosis confirmed not a bug: the correction recovers SYNTHETIC uniform accumulated drift to ~0 (test_pose.py::test_loop_correction, ATE 1e-15). Lesson: loop closure pays off on LONG trajectories with consistent accumulated drift (cf. the forward clip's monotone 7%->13%); for short loops the lever is better odometry -- smaller baselines, consensus fusion, and the focal bias. Golden suite green (test_loop_correction added). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ccumulated cloud fuse.py answers the question that started this thread ("around the edges there's a lot of noise -- does accumulation remove it?"). Each per-pixel gaussian is partner-view-dependent, so occlusion-edge / depth-ambiguous points are floaters; a real surface point is reconstructed by several overlapping frames and they agree in the global frame. So: voxelize the accumulated cloud at the consistency scale (~2% of extent) and keep only voxels corroborated by >= K distinct frames, averaging the agreeing predictions (which also denoises the surface). Reuses cached engine dumps (no new inference). Result (forward clip, 14 frames, K>=2): 46% of voxels are single-frame, holding 14% of points -- these render as INCOHERENT EDGE-HAZE (floaters + swept-volume periphery). The >=2-frame consensus (86% of points) renders as a CLEAN, CRISP room with the haze gone. Definitive yes: accumulation + consensus fusion removes the edge noise. Honest tradeoff: dropping single-frame points also trims the single-view periphery (coverage vs cleanliness). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…golden-tested Start porting the proven Python pose prototype (focal.py + align.py + pnp.py) to shipped C++, dependency-free per CLAUDE.md (only a self-contained Jacobi eigensolver -- no Eigen, no OpenCV), wired into the library and the asset-free test tier. src/linalg.h small dense linear algebra: symmetric cyclic-Jacobi eigensolver, 3x3 SVD (via MᵀM), det/inv, 4x4 rigid inverse. Everything the pose math needs reduces to the eigensolver. src/pose.{h,cpp} estimate_focal (Weiszfeld); fit_similarity (Umeyama) + RANSAC + residual-ladder/diagnose; Sim(3) compose/invert/sim_matrix/ loop_closure_error; sim_frac_power (closed-form one-parameter subgroup, no complex eig); solve_pnp_numpy (DLT via the 12x12 AᵀA nullspace + cheirality decode + RANSAC); estimate_poses (scene recipe: view-0 all-pixel focal, per-view opacity-masked PnP, optional baseline rescale). tests/test_pose.cpp the asset-free mirror of pose/test_pose.py -- 9 golden tests (similarity roundtrip, scale/nonlinear detection, RANSAC outliers, loop correction/closure, focal, PnP recovery/outliers). All green under the debug (ASan/UBSan) preset; ctest -LE model. Cross-checked against the Python reference on a real scene dump: focal is bit-exact (596.408591886). PnP is correct on clean data, but on real scenes the DLT solver inherits the textbook planar/mirror degeneracy (3/5 RANSAC seeds match cv2's ~57deg relative rotation, 2/5 flip) -- the same instability that made the prototype use cv2 (SQPNP) for all real-data results. Next: a robust in-house PnP (EPnP/SQPNP + Gauss-Newton refine) for cv2-parity with no OpenCV dependency, then the accumulation/loop-closure/fusion chaining and the CLI / C-API surface. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The DLT/RANSAC solver inherits the planar/mirror degeneracy on real dumps (seed-dependent: ~3/5 seeds near cv2, ~2/5 a ~135-152deg flip). Add the shipped real-data solver, solve_pnp: * EPnP (Lepetit/Moreno-Noguer/Fua) for the init -- barycentric control points, the camera-frame control points from the 12x12 MᵀM null space (the same Jacobi eigensolver), beta solve for N=1,2,3 with cheirality sign-fix, R,t via the rigid Umeyama on the 4 control points, best-N by reprojection. Non-iterative, uses ALL points (no random minimal samples -> no seed-dependent flips), planar-robust by construction. * Huber-robust Gauss-Newton reprojection refine (6-DoF left perturbation, J = [-[Xc]_x | I]) to polish to the reprojection minimum and downweight outliers -- the deterministic analogue of cv2's RANSAC+SQPNP+refine. On the real scene dump (A_scn): deterministic across all RANSAC seeds, and within 0.73deg rotation / 0.74deg translation-direction of the upstream cv2/SQPNP -- vs the numpy DLT's 175deg miss there. estimate_poses now uses solve_pnp; the DLT solve_pnp_numpy stays as the asset-free golden reference. Golden tests (all green under ASan/UBSan, ctest -LE model): exact recovery on clean data, a near-planar slab (where the DLT's coplanar minimal samples flip), and 15% gross-outlier rejection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…oto stream Port accumulate.py to src/pose.{h,cpp}: the sliding-window accumulating- reconstruction loop. Accumulator::add_pair takes each consecutive pair's [2,H,W,gc] engine output, recovers the pair's cameras (estimate_poses), fits the cross-run Sim(3) on the shared frame (fit_similarity_ransac), composes a global chain, and drops every new frame's gaussians into one world. Exposes cloud(), camera_path(), and per-link ChainLink diagnostics (scale/inlier/valid/resid). Validation: - Asset-free golden (test_accumulate_chain): synthetic pinhole clip with distinct per-run scales -> trajectory ATE 7.6e-8 of extent, per-link scale to 3e-9, fit residual ~1e-7. Green under ASan/UBSan. - Real-data parity on the 13 cached pair_*.f32 dumps vs the numpy/cv2 prototype: cloud size bit-exact (2,633,725), per-link valid% identical (deterministic mask), per-link Sim(3) scale to mean 0.5% (11/12 links <1%), trajectory within 6.6% of the cv2 chain. Residual = known RANSAC-RNG + EPnP-vs-SQPNP delta. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Python The loop-closure machinery (sim4_invert, distribute_drift, sim_frac_power) shipped with the Accumulator commit; this records its parity validation. - sim_frac_power C++ closed-form == numpy eig-based to 5e-10 across f in [-0.5,1.3] (so distribute_drift is bit-identical to the prototype's distribution). - sim4_invert is an exact similarity inverse (1e-9); golden recovers a known drifted loop to 4e-16. - Real-data parity on the loopcache (13 chain pairs + close_0_260): recovered drift matches the prototype's loop error (scale 1.09 vs 1.12, 4.6 vs 4.4 deg), deterministic valid% identical. The corrected-trajectory delta is the known EPnP-vs-cv2 PnP backend feeding D, not the distribution math. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

consensus_fuse shipped with the Accumulator commit (one hash-grid pass over the frame-tagged cloud: >=K distinct-frame voxels kept, agreeing predictions averaged). This records its parity validation. - Golden (test_consensus_fuse): exact counts on controlled synthetic support. - Real-data parity vs fuse.py on the 13 acc dumps (voxel 0.02, K>=2): raw points bit-exact (2,633,725), per-point floater drop 14.0% vs 14%, raw->fused reduction 93.9% vs 94%, kept-voxel fraction 53.8% vs 54%. Sub-1% voxel-count delta is the chaining RANSAC-RNG, not the fusion math. Reproduces the prototype's "remove the 14% single-frame edge-haze floaters" result. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Expose the now-ported pose pipeline from the C API and the CLI, with no Python runtime dependency (per CLAUDE.md): C API (include/free_splatter.h): - free_splatter_estimate_poses: recover per-view cam2world from an engine buffer. - free_splatter_accumulator_{new,free,add_pair,frame_count,cloud,fuse, camera_path}: opaque sliding-window accumulator wrapping pose::Accumulator; add_pair takes each consecutive pair's [2,H,W,gc] engine output, returns the growing global cloud (free_splatter_point: xyz+rgb+frame), the consensus-fused cloud, and the global camera trajectory. FFI-friendly, malloc'd buffers freed with free_splatter_buf_free. CLI (free_splatter-cli --accumulate): - runs the engine over each consecutive image pair, chains the runs into one world, and writes PREFIX_<nframes>.splat after each pair (the evolving reconstruction) plus, with --fuse, a consensus-fused PREFIX_fused.splat. New write_cloud_splat emits the xyz+rgb cloud as small isotropic .splat gaussians. Verified end-to-end on real frames (5 frames -> evolving 519K/779K/.. splats + 103K fused), sanitizer-clean under the debug preset; asset-free ctest tier green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per CLAUDE.md: the pose/ research prototype (focal/align/pnp + accumulate/loop/ fuse + the re10k/T&T validation harnesses) proved the accumulating-reconstruction approach. That whole pipeline is now rewritten in C++ (src/pose.{h,cpp}), exposed via free_splatter-cli + include/free_splatter.h with no Python, and validated (asset-free golden tests + real-data parity recorded in the prior commits). The prototype was a throwaway, not a parallel implementation to maintain -- so it is removed. Git history preserves it and its layer-by-layer parity harnesses. Updates the dangling references (CLAUDE.md, src/linalg.h, src/pose.{h,cpp}) to point at git history instead of the deleted files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fuzz the untrusted user-facing surfaces opened by the pose work (GGUF stays trusted, not fuzzed): - fuzz_pose: the public pose C-API (free_splatter_estimate_poses + the accumulator add_pair/cloud/fuse/camera_path) on arbitrary float gaussian buffers (NaN/Inf/denormals) and fuzz-chosen geometry. - fuzz_decode: the image-FILE path (arbitrary bytes -> stb_image -> crop/resize -> CHW), the surface a user photo crosses in the CLI/demo. stb is third-party; per CLAUDE.md we fuzz the boundary and would guard rather than patch a stb-internal trip -- none seen (31k+ runs clean). Fixes found by fuzz_pose (our code, so fixed not guarded): - SIGFPE: fit_similarity_ransac sampled `% N` with N=0 (an image pair with no overlapping valid pixels). Guard N<3 (RANSAC's minimal sample): all-inlier, plain fit when N>=1, else identity. - Latent float-cast UB: consensus_fuse now skips non-finite cloud points and clamps the voxel-coordinate cast, so an Inf point can't make (int32)floor(NaN). All four fuzzers clean; asset-free ctest tier (incl. test_pose fusion goldens) still green under ASan/UBSan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…dded) A web demo of the live idea: feed a photo stream, watch one world assemble. - web/accumulate.html: forks the index.html EWA splat renderer to play a SEQUENCE of clouds (the reconstruction from 2 photos, then 3, then 4, ...), with the input photos accumulating in the TOP-RIGHT filmstrip (newest highlighted) as each is folded in. Auto-advances + gentle auto-orbit; ?start=/auto=/ms=/spin= deep-link params. Seeds an unsorted draw order on each step so the cloud paints immediately (the depth-sort worker then refines it). - scripts/make_accumulate_demo.sh: one engine pass over the frames via `free_splatter-cli --accumulate` -> acc_2.splat..acc_N.splat + input thumbnails + manifest.json + the viewer, a self-contained servable dir. - CLI --splat-scale default 0.0015 -> 0.006 of extent (point clouds read as surfaces, not grains); documented in web/README.md. Verified end-to-end on a RealEstate10K clip (8 frames -> 7 steps): headless chromium/SwiftShader renders a coherent dining-room reconstruction that grows 259,702 -> 413,517 splats across the steps, filmstrip populating as designed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… splats) The accumulated cloud kept only position+color, so the demo rendered as an isotropic point cloud. A similarity x->s*(R@x)+t scales a gaussian's covariance by s^2 and rotates it by R, so the shape transforms cleanly: carry it. - AccumPoint / free_splatter_point gain scale[3] + rotation quaternion (w,x,y,z). - Accumulator::add_pair de-interleaves the engine's scale (ch16:19) and rotation (ch19:23); add_view sets scale_world = T.s * scale_local and q_world = quat(T.R) * q_local (new mat3_to_quat / quat_mul / quat_normalize helpers, with a zero/NaN-quaternion -> identity guard). - consensus_fuse averages the scale and keeps a representative orientation. - write_cloud_splat emits the real anisotropic scale + rotation (OpenCV->OpenGL remap, same as the single-run write_splat); --splat-scale is now a radius multiplier (default 1.0), not an isotropic fraction. Verified: anisotropic scale spans the engine's [1e-4,0.02] (mean axis-ratio ~29) with per-gaussian orientations; golden test_pose green; fuzz_pose clean (the identity-quaternion guard keeps the new quat math UB-free on garbage input). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- free_splatter-cli --accumulate now also accepts pre-computed [2,H,W,gc] .f32 pair dumps (each one pair) instead of images, skipping the engine -- for fast re-bakes and fusion sweeps off cached runs. Verified byte-identical acc_8 to the image path on the same frames, and it runs in ~4s vs ~2min. - make_accumulate_demo.sh passes --fuse and appends a final consensus-fused step to the manifest; the viewer shows a step's optional "label" (so the demo ends on "consensus-fused -- single-view floaters removed", 413k -> 149k splats, the edge-haze floaters gone). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The fused step (8th) showed the same "8 photos" panel as acc_8, so it read as a duplicate. Now a labelled step shows a prominent top-center banner ("consensus- fused -- single-view floaters removed -- N splats"), the stat reads "fused (8)", and manifest/splat fetches use cache:"no-store" so an updated demo dir is never served stale. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…truct blurry) The default demo clip was a near-pure forward dolly (~0.15% lateral baseline of scene depth), so two-view depth was unconstrained and the gaussians came out blurry. Re-baking from an orbiting clip with real sideways motion (stride chosen for ~9% baseline) reconstructs legibly; tighten the viewer framing (1.25x->0.9x of the cloud diagonal) so it fills the view, and document the lateral-baseline requirement in web/README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lurry cloud) The accumulator dropped per-gaussian opacity, so write_cloud_splat emitted every splat fully opaque (alpha 255). With "over" alpha compositing that means the front-most splat at each pixel fully occludes the rest, so as the camera orbits the depth order flips and the hard anisotropic ellipsoids visibly swirl/rotate -- and the over-contribution reads as blur. The single-run write_splat instead uses the gaussian's activated opacity as the alpha (mean ~41/255), blending many low- alpha splats into a stable smooth surface. Fix: AccumPoint / free_splatter_point carry `opacity`; add_view stores it, consensus_fuse averages it, and write_cloud_splat emits it as the splat alpha and caps by importance (opacity*volume) like write_splat. On the same pair the cloud splat is now byte-identical in alpha (1/254/mean 41) to write_splat and renders as a clean alpha-blended surface instead of an opaque swirling soup. test_pose green; the C-API struct gains one float (emit_points copies it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Prevent the class of regression that dropped first rotation/scale, then opacity, from the accumulated-cloud splat writer (a second copy of the encoder that drifted from the proven single-run one). #1 Unify: new src/splat.h::encode_splat_record is the ONE definition of the OpenCV->OpenGL convention, quaternion remap, opacity->alpha and byte packing. Both write_splat (single-run) and write_cloud_splat (cloud) now build a (pos,scale,quat,rgb,opacity) tuple and call it, so they cannot diverge again. Verified byte-identical to the previous write_splat output (pure refactor). #2 Pin: two asset-free tests in test_pose.cpp -- - test_splat_record: pins the encoder bytes, incl. the exact regressed field (opacity 0.5 -> alpha 127, NOT a forced 255) and the rotation remap. - test_accumulate_channels: a one-pair (T=identity) accumulation must preserve every gaussian channel (xyz, SH->rgb, opacity, scale, rotation, frame); a dropped channel fails immediately with zero fixtures. Both guards fail on either historical bug. ctest -LE model green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Consensus fusion only keeps voxels seen by >= K frames; with few frames (a short arc) only a small fraction of the scene is multiply-observed, and the existing "averaged" output (one point per consensus voxel) then decimated that further -- so the final fused scene was very sparse (~38k of 1M points on the 4-frame demo). Add a "kept" mode (the prototype's fuse.py --ply-kept): keep every raw gaussian whose voxel is corroborated by >= K frames -- floaters still removed, but nothing averaged away. On the demo that's 446k vs 38k points (12x denser), a solid surface instead of a sparse scatter. - consensus_fuse gains `keep_raw` (default false = averaged, unchanged); free_splatter_accumulator_fuse gains a `keep_raw` arg; CLI `--fuse-mode kept|averaged` (kept is the default for the cloud demo). - Golden test pins both modes (averaged -> 5 voxel points, kept -> 15 raw points); fuzz_pose drives both modes; ctest -LE model green, fuzz clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ne + best-frame fusion Pairwise Sim(3) chaining leaves per-object (non-rigid) misregistration: overlapping frames show doubled objects (two lamps, a bed end offset from its top). A rigid per-frame pose BA can't fix this (tried, measured: 0.6%->0.6%, no-op -- reverted) -- the residual isn't a camera-pose error, it's that each frame predicts an object's position slightly differently. Two fixes that DO work: - consensus_refine (gaussian-level, non-rigid): each iteration moves every point a fraction toward the opacity-weighted consensus of the OTHER frames' points in its coarse-to-fine voxel neighbourhood. Local + non-rigid, so spatially-varying ghosting collapses. On the real strong-arc cloud: ghosting 1.2% -> 0.26% (~5x); golden 3.7% -> 0.0%. Exposed as Accumulator::refine, free_splatter_refine_cloud / _accumulator_refine, CLI --refine. - consensus_fuse gains FUSE_BEST: per consensus voxel keep only the single most- confident frame's gaussians (dense AND de-ghosted, no stacked copies). keep_raw bool -> mode int {averaged,kept,best}; CLI --fuse-mode best. Golden tests pin both (consensus_refine de-ghost, best=one-frame-per-voxel); fuzz_pose drives the refine functions + all three fuse modes on garbage. ctest -LE model green, fuzz clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ine knobs make_accumulate_demo.sh now passes --refine (gaussian-level consensus de-ghost) by default (REFINE=0 to disable) and the live demo ends on a best-frame fused surface. web/README documents the three fuse modes (kept/best/averaged) and the de-ghosting (why gaussian-level consensus works where rigid pose-BA doesn't). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…v2 reference) Quantifies how well a 2-view pair constrains depth, two independent ways, to flag pairs whose reconstruction can't be trusted (and to detect the model inventing depth the images don't support). - pose::parallax_stats / pair_parallax (src/pose.{h,cpp}): from the model's OWN recovered geometry (estimate_poses, normalize=false so cameras+points share the view-0 frame) compute the median triangulation angle, the baseline angle off the optical axis (0=dolly/no-parallax, 90=strafe/ideal), and baseline/median-depth. All angles scale-invariant. - C-API free_splatter_pair_parallax + free_splatter_parallax (include/free_splatter.h, src/free_splatter.cpp); CLI `--parallax MODEL {img0 img1 | pair.f32}`. - tests/test_pose.cpp test_parallax_geometry: golden on synthetic cameras — a strafe baseline gives lateral=90 / tri=atan(B/Z); a dolly of the SAME length gives ~0. Pins the metric measures depth-resolving motion, not raw displacement. - scripts/parallax_ref.py: dev-time INDEPENDENT reference (cv2, nix devShell only, never shipped). Feature matches -> ORB-SLAM homography-vs-fundamental R_H (calibration-free degeneracy) + essential-matrix median triangulation angle. Validation (re10k ladder, focal matched to the model): on the well-conditioned pair the two agree to 0.3 deg (after 15.6 vs before 15.9); on near-degenerate pairs the model over-reports parallax 2-4x (after 7.0/4.5 vs independent 1.7/1.7), i.e. it hallucinates depth the images can't support -- exactly what the cross-check is for. Wide-baseline orbits (Truck) starve sparse matching, so the geometric reference is only reliable on well-textured moderate-baseline pairs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fold a candidate frame into the world only if its triangulation angle vs the last KEPT frame clears DEG degrees (free_splatter_pair_parallax) -- else its depth is ill-conditioned and the model would invent it. Implemented as keyframe selection: the next candidate is re-paired against the last kept frame (not the immediate predecessor), so skipping a frame re-anchors instead of breaking the Sim(3) chain. Threshold 0 degenerates to consecutive pairs (no behavior change). Image mode only (re-pairing needs the engine); .f32 dump mode warns and accumulates all, since fixed pairs can't be re-anchored. Verified on the re10k demo frames: (f0000,f0020)=15.6 deg kept, (f0020,f0040)=7.0 skipped, then f0060 re-paired against f0020 = 15.7 deg kept -- the degenerate middle frame dropped, the world kept to well-conditioned views only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

make_accumulate_demo.sh now passes --min-parallax (MIN_PARALLAX, default 8 deg; 0 disables) and --fuse-mode best, and builds the manifest from the frames the gate actually KEPT (parsing the keep/skip log): frame 0 is the anchor, each "keep frame J" adds input frame J, and step acc_n shows that step's n kept thumbnails. So you can feed a long dense frame stream and the gate curates the well-conditioned subset instead of folding in frames whose depth the model would invent. Verified end-to-end on the 7-frame re10k loop: gate kept f0000/f0020/f0060 and dropped f0040 (7 deg), f0080 (2.4), f0100 (4.8), f0120 (6.6) -- the walkthrough stops translating laterally after f0060, so only three views carry depth. web/README documents MIN_PARALLAX (after-inference angle, over-reports -> keep well above COLMAP's 1-2 deg; parallax_ref.py is the independent cross-check) and folds in the earlier honest de-ghost rewrite (best-frame selection; --refine off, why). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop a clip under demo-vids/<name>/ (mirroring the demo-photos/ convention) and this samples it to frames, lets the --min-parallax gate curate the well-conditioned keyframes, and bakes a growing-reconstruction demo (make_accumulate_demo.sh) to .cache/demo/<name>/. Time-lapse / slow-pan clips are the sweet spot: the gate drops the near-duplicate tight frames and keeps ~10-14deg-parallax steps. Verified on two clips: flower-bed (81 frames -> 21 sampled -> 11 kept, a clean 10-step growing flower bed) and office-corner (9 -> 5 kept, 4 steps). gitignore the demo-vids/ drop folder like demo-photos/ (user media, not committed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A user could silently land on the ~50x slower CPU path. Now the library default device is "vulkan" (src/options.h) and an empty/unset device resolves to vulkan (free_splatter.cpp), so a caller who never sets one fails-closed if no GPU is present rather than running on CPU. backend.cpp already errors for an explicit vulkan request with no device — that path is now the default. - CLI forwards the library default (vulkan); on a load failure with no explicit --device it prints guidance ("pass --device cpu to run on CPU"). Usage updated. - bench: device label no longer assumes cpu when unset. - make_accumulate_demo.sh: DEVICE defaults to vulkan and resolves/builds the vulkan CLI (build/vulkan/bin/free_splatter-cli) like serve.sh builds the .so; DEVICE=cpu + the release CLI stays as the explicit CPU escape (headless/CI bakes). Tests are unaffected (every test passes an explicit device; default FREE_SPLATTER_DEVICE is cpu). Verified: CPU build with no --device fails-closed with guidance; --device cpu runs; ctest -LE model green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…PU bake) The accumulating-reconstruction demos were separate static dirs on separate ports; now the web app switches between them and can generate a new one from a video in the UI. - /api/scenes lists the growing-world scenes (server/scenes.go scanScenes: any scenesDir subdir whose manifest.json has a "steps" array — the storyboard's "stations" manifest is excluded). Served at /scenes-assets/. New `-scenes-dir` flag (defaults to -demo-dir). No in-memory registry: re-scanned per request, so externally-baked and uploaded scenes appear with no restart. - POST /api/scene/from-video (+ GET /api/scene/status/{job}): samples frames (ffmpeg), runs the keyframe parallax gate + accumulate + best-frame fuse IN-PROCESS on the loaded GPU engine, and writes the scene dir incrementally. Async job + polling; one bake at a time (429 if busy). engine.go binds the accumulator + pair_parallax C API (with point/parallax ABI-mirror structs and copy helpers); convert.go adds encodeCloudSplat (mirrors write_cloud_splat + splat.h — note the cloud colour is already linear, NOT SH-transformed). - server/web/accumulate.html: the viewer gains a scene dropdown + a "make scene from video" upload widget with live progress. It is now the SINGLE source (moved from web/; standalone baked dirs fall back to ./manifest.json when /api/scenes is absent). make_accumulate_demo.sh copies it from server/web. index.html links to it. Verified end-to-end on Vulkan (AMD RADV): /api/scenes lists the 6 baked scenes; uploading office-corner1.mp4 baked a 5-step scene (gate kept 5/9, same as the offline bake-vids.sh) that renders identically to the CLI result and appears in the dropdown live. Go build + vet clean; ctest -LE model green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The pages were already one server, but / was the reconstruct page and the baked accumulating demos were being served as separate static instances on other ports. Now / is a landing menu and everything is reachable from this one server: - server/web/index.html: new menu — cards link to /reconstruct.html (photos -> splat), /accumulate.html (accumulating scenes + video upload), /demo.html (storyboard). - the reconstruct page moves to /reconstruct.html (was index.html); each page gets a "‹ all demos" link back to the menu. - the standalone per-scene static servers are obsolete: those scenes are listed by /api/scenes and switched in /accumulate.html, so there's nothing left to run on a separate port. Verified on the single Vulkan server: /, /reconstruct.html, /accumulate.html, /demo.html, /api/scenes all 200; the menu renders three cards linking the pages. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a batch hierarchical (balanced binary tree) alternative to the linear Sim(3) accumulator, to cut the registration drift that smears the far end of a long photo stream. Adjacent submaps of `block` frames overlap by `overlap` frames and each merge fits one Sim(3) over all shared frames at once, so the fit is over-determined and per-frame depth noise averages out — drift compounds over ~log2(N) hops instead of N. block=2,overlap=1 is the plain overlap-by-one tree. - Engine (src/pose.{h,cpp}): tree_accumulate_overlap + consensus_fuse of an arbitrary frame-tagged cloud (best-frame mode de-ghosts the overlapping per-frame copies); shared fit_band_sim RANSAC registration primitive. - C API: free_splatter_tree_overlap (full merge or staged side-by-side via block/overlap/max_levels/layout_spacing/per_node_cap) + free_splatter_fuse_cloud. - CLI: --accumulate-tree (--tree-block/--tree-overlap, --fuse) and --tree-stages, which bakes one laid-out .splat per merge level + a consensus-fused final stage and a manifest the viewer steps through. - Viewer (accumulate.html): vibrance/saturation slider (?sat= override), ported from reconstruct.html; reframes per stage for tree-stages manifests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

richiejp and others added 30 commits June 26, 2026 15:28

richiejp and others added 3 commits June 28, 2026 11:48

richiejp merged commit be8cb94 into master Jul 1, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Fuse multiple splats from video#3

feat: Fuse multiple splats from video#3
richiejp merged 33 commits into
masterfrom
pose/pnp-cross-run-alignment

richiejp commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

richiejp commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant