feat: Fuse multiple splats from video#3
Merged
Conversation
…stream A self-contained, pure-Python prototype DOWNSTREAM of the engine seam ([N,H,W,23]); deliberately OUTSIDE the validated src/ and NOT wired into CMake/ctest. It prototypes live, accumulating reconstruction from a moving camera: recover each view's camera (PnP) and align successive runs. - focal.py / pnp.py: faithful port of FreeSplatter's scene estimate_poses -- Weiszfeld shared focal (view 0 only, all pixels; use_first_focal), integer pixels, cv2.solvePnPRansac(SQPNP, reprojErr=5, iters=10), cam2world=inv(w2c), and the runner's 1/baseline camera rescale. numpy DLT+RANSAC fallback runs without cv2. - align.py: Umeyama similarity fit + RANSAC + a residual ladder (diagnose) that detects whether cross-run mismatch is a uniform-scale similarity or a nonlinear warp; plus similarity chaining and loop-closure metrics. Verification (all green): - check_cv2_parity.py -- numpy solver == exact cv2.solvePnPRansac on synthetic ground truth (~1e-7 clean). - check_upstream_parity.py -- our WHOLE orchestration vs upstream estimate_poses on REAL engine output. Caught and fixed five divergences (the costly one: focal averaged over a low-overlap 2nd view gave 507 vs the correct 596, ~15% -> 1.35 deg pose error). Now bit-exact (0.00 deg) on 2 scene + 2 object dumps; the only residual is upstream's float32 K vs our float64 (a RANSAC inlier- boundary precision effect, <=0.5 deg on near-degenerate object data, 0 on scene), root-caused and documented, not papered over. - test_pose.py -- 28 asset-free golden tests (no model/fixtures/cv2). Empirical finding: cross-run mismatch is a uniform-scale similarity (~11% scene scale drift), no nonlinear warp -- a 7-DoF similarity is the right alignment model; diagnose() flags it if that ever changes. flake.nix: add opencv4 (cv2) for the exact-upstream PnP path; numpy-only fallback otherwise. Not needed by the engine build/test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ates pipeline) A known-good, ground-truth-posed control so a live-path failure can be attributed to the data/model vs our code -- and a reusable engine-vs-GT harness. - tt_control.py / tt_experiment.py: Tanks-and-Temples (NSVF) loader (OpenGL->OpenCV poses, baseline-vs-stride report) + engine-vs-GT check. Verdict: T&T is OUT OF DISTRIBUTION for FreeSplatter-scene (narrow-FOV object orbits) -- opacity confident on only 8-17% of pixels, pose error 28-145deg. Kept as harness + negative result. - re10k_control.py / re10k_fetch.py / re10k_experiment.py: RealEstate10K loader (parser + GT geometry, GT focal_512 = fy*512), yt-dlp/ffmpeg frame fetch with a dead-video skip, and the engine-vs-GT check. IN DISTRIBUTION -> the control works: relative pose recovered to 0.4-1.5deg vs INDEPENDENT GT, opacity confident on 68-75% of pixels. Validates our PnP beyond the upstream parity, and confirms the re10k camera convention. Findings: (1) the model has a CONSTANT wide-FOV focal bias (recovers ~274 vs GT ~439, ~37%) -- benign for relative accumulation (consistent across runs), off for metric scale; (2) pose error scales ~linearly (~25% of baseline rotation), no degeneracy even at ~1.4deg baselines on texture-rich interiors. Data (T&T ~1.5GB, re10k poses ~720MB, fetched frames) stays under .cache/ and the scratchpad -- gitignored, never committed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s clean) re10k_crossrun.py: a frame shared by two overlapping pairs (m-s,m) and (m,m+s) is reconstructed twice in two coordinate systems; fit a robust similarity between the two and read consistency + the residual ladder, swept over baseline stride. Mirrors empirical.py but on in-distribution data where geometry is good. Result (shared frame 120): cross-run consistency is HIGH and best at small baseline -- stride 20 (0.41deg): 65% of pixels agree within 2% of scene extent, 98% within 10%; stride 80 (6deg): 46% / 95%. Versus the OOD doll's 7% / 21%. The mismatch is a clean uniform-scale similarity everywhere (sim->affine buys ~0.1%, verdict similarity_plus_noise); per-step scale drift is small and grows with baseline (1.7% at 0.4deg, 11% at 6deg). Takeaways for the live path: (1) the small-baseline hypothesis holds -- the sliding window's small steps land in the high-consistency regime, so accumulation is mostly clean; (2) consensus fusion is a polish for the residual ~2% floaters, not a prerequisite; (3) registration is a 7-DoF similarity (no nonlinear warp), with slow scale drift a sim3 pose-graph can bound. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Policy update directed by the project owner: - PnP pose recovery and cross-run Sim(3) alignment / accumulation are now IN scope (previously out of scope), alongside the engine. - Everything ships in C++; Go only for the demo web server (purego -> C API -> Vulkan + WebGL). The CLI and C API must have NO Python dependency at runtime. - Python is confined to (1) dev-time reference/conversion/validation in the CUDA docker (hf_dump/convert/compare_taps), never a runtime dep, and (2) the pose/ research prototype TEMPORARILY -- continued in Python only until the approach is proven, then rewritten in C++ and the Python deleted. - Per-component discipline now lists the C++ `pose` component, inheriting the parity discipline the Python prototype established (bit-exact to upstream estimate_poses; validated vs independent GT poses). pose/README.md updated to match (temporary prototype, C++ port pending). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nd-to-end) accumulate.py assembles the validated pieces into the live pipeline (minus realtime): slide a window over a re10k clip, recover each pair's camera (PnP), fit a Sim(3) between consecutive runs from their SHARED-frame per-pixel correspondences and compose into a per-run global transform (align.compose), drop every frame's gaussians into one world (frame f_0's), and measure the recovered camera trajectory vs ground truth. Engine dumps are cached so analysis re-runs skip inference. render_ply.py projects the colored cloud through a pinhole camera to a PNG. Result (13 pairs, stride 20, frames 0..260): - per-link Sim(3) registration clean: residual ~1.0-1.4% of scene extent. - scale drift accumulates monotonically (forward pan, no revisit): 0.755 over 12 links, ~2.3%/link -- the monocular 1/d drift compounding. - recovered camera trajectory tracks GT to ATE ~11% of extent (single global Sim(3) align); drift grows 7%->13% first->second half, worst at the endpoint -- exactly what a Sim(3) pose-graph + loop closure bounds. - the 2.6M-point accumulated cloud renders from camera 0 as a COHERENT room that matches the input frame (wider FOV due to the model's ~274-vs-439 focal bias). The accumulating-reconstruction idea is proven; per CLAUDE.md the next implementation step is the C++ port (CLI + C API), after which this Python goes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… the lever find_loop.py searches re10k poses for a clip that revisits its start; loop_closure.py chains open-loop, measures the loop error via a closing pair (f_0,f_n), distributes it by even Sim(3) relaxation (D^(k/n)), and reports ATE before/after. Sim(3) 4x4 helpers (sim_matrix, sim_frac_power) promoted into align.py; golden test added. Honest result on a real loop clip (camera out to 2.29 and back to 0.23): - the open-loop chain ALREADY closes the loop (loop error 4.4deg / scale 1.12 / 8% trans) -- there is almost no accumulated drift to distribute. - the dominant ~34% ATE is per-link ODOMETRY NOISE (Sim(3) inlier% as low as 17-24% on the fast outbound leg) plus the model's focal-bias warp: self-consistent (the loop closes) but distorted vs GT. Loop closure can't fix that; naive distribution slightly hurts. - diagnosis confirmed not a bug: the correction recovers SYNTHETIC uniform accumulated drift to ~0 (test_pose.py::test_loop_correction, ATE 1e-15). Lesson: loop closure pays off on LONG trajectories with consistent accumulated drift (cf. the forward clip's monotone 7%->13%); for short loops the lever is better odometry -- smaller baselines, consensus fusion, and the focal bias. Golden suite green (test_loop_correction added). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ccumulated cloud
fuse.py answers the question that started this thread ("around the edges there's a
lot of noise -- does accumulation remove it?"). Each per-pixel gaussian is
partner-view-dependent, so occlusion-edge / depth-ambiguous points are floaters; a
real surface point is reconstructed by several overlapping frames and they agree in
the global frame. So: voxelize the accumulated cloud at the consistency scale (~2%
of extent) and keep only voxels corroborated by >= K distinct frames, averaging the
agreeing predictions (which also denoises the surface). Reuses cached engine dumps
(no new inference).
Result (forward clip, 14 frames, K>=2): 46% of voxels are single-frame, holding
14% of points -- these render as INCOHERENT EDGE-HAZE (floaters + swept-volume
periphery). The >=2-frame consensus (86% of points) renders as a CLEAN, CRISP room
with the haze gone. Definitive yes: accumulation + consensus fusion removes the edge
noise. Honest tradeoff: dropping single-frame points also trims the single-view
periphery (coverage vs cleanliness).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…golden-tested
Start porting the proven Python pose prototype (focal.py + align.py + pnp.py) to
shipped C++, dependency-free per CLAUDE.md (only a self-contained Jacobi
eigensolver -- no Eigen, no OpenCV), wired into the library and the asset-free
test tier.
src/linalg.h small dense linear algebra: symmetric cyclic-Jacobi eigensolver,
3x3 SVD (via MᵀM), det/inv, 4x4 rigid inverse. Everything the
pose math needs reduces to the eigensolver.
src/pose.{h,cpp}
estimate_focal (Weiszfeld); fit_similarity (Umeyama) + RANSAC +
residual-ladder/diagnose; Sim(3) compose/invert/sim_matrix/
loop_closure_error; sim_frac_power (closed-form one-parameter
subgroup, no complex eig); solve_pnp_numpy (DLT via the 12x12 AᵀA
nullspace + cheirality decode + RANSAC); estimate_poses (scene
recipe: view-0 all-pixel focal, per-view opacity-masked PnP,
optional baseline rescale).
tests/test_pose.cpp
the asset-free mirror of pose/test_pose.py -- 9 golden tests
(similarity roundtrip, scale/nonlinear detection, RANSAC
outliers, loop correction/closure, focal, PnP recovery/outliers).
All green under the debug (ASan/UBSan) preset; ctest -LE model.
Cross-checked against the Python reference on a real scene dump: focal is
bit-exact (596.408591886). PnP is correct on clean data, but on real scenes the
DLT solver inherits the textbook planar/mirror degeneracy (3/5 RANSAC seeds match
cv2's ~57deg relative rotation, 2/5 flip) -- the same instability that made the
prototype use cv2 (SQPNP) for all real-data results. Next: a robust in-house PnP
(EPnP/SQPNP + Gauss-Newton refine) for cv2-parity with no OpenCV dependency, then
the accumulation/loop-closure/fusion chaining and the CLI / C-API surface.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DLT/RANSAC solver inherits the planar/mirror degeneracy on real dumps
(seed-dependent: ~3/5 seeds near cv2, ~2/5 a ~135-152deg flip). Add the shipped
real-data solver, solve_pnp:
* EPnP (Lepetit/Moreno-Noguer/Fua) for the init -- barycentric control points,
the camera-frame control points from the 12x12 MᵀM null space (the same Jacobi
eigensolver), beta solve for N=1,2,3 with cheirality sign-fix, R,t via the
rigid Umeyama on the 4 control points, best-N by reprojection. Non-iterative,
uses ALL points (no random minimal samples -> no seed-dependent flips),
planar-robust by construction.
* Huber-robust Gauss-Newton reprojection refine (6-DoF left perturbation,
J = [-[Xc]_x | I]) to polish to the reprojection minimum and downweight
outliers -- the deterministic analogue of cv2's RANSAC+SQPNP+refine.
On the real scene dump (A_scn): deterministic across all RANSAC seeds, and within
0.73deg rotation / 0.74deg translation-direction of the upstream cv2/SQPNP -- vs
the numpy DLT's 175deg miss there. estimate_poses now uses solve_pnp; the DLT
solve_pnp_numpy stays as the asset-free golden reference.
Golden tests (all green under ASan/UBSan, ctest -LE model): exact recovery on
clean data, a near-planar slab (where the DLT's coplanar minimal samples flip),
and 15% gross-outlier rejection.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oto stream
Port accumulate.py to src/pose.{h,cpp}: the sliding-window accumulating-
reconstruction loop. Accumulator::add_pair takes each consecutive pair's
[2,H,W,gc] engine output, recovers the pair's cameras (estimate_poses), fits the
cross-run Sim(3) on the shared frame (fit_similarity_ransac), composes a global
chain, and drops every new frame's gaussians into one world. Exposes cloud(),
camera_path(), and per-link ChainLink diagnostics (scale/inlier/valid/resid).
Validation:
- Asset-free golden (test_accumulate_chain): synthetic pinhole clip with distinct
per-run scales -> trajectory ATE 7.6e-8 of extent, per-link scale to 3e-9,
fit residual ~1e-7. Green under ASan/UBSan.
- Real-data parity on the 13 cached pair_*.f32 dumps vs the numpy/cv2 prototype:
cloud size bit-exact (2,633,725), per-link valid% identical (deterministic
mask), per-link Sim(3) scale to mean 0.5% (11/12 links <1%), trajectory within
6.6% of the cv2 chain. Residual = known RANSAC-RNG + EPnP-vs-SQPNP delta.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Python The loop-closure machinery (sim4_invert, distribute_drift, sim_frac_power) shipped with the Accumulator commit; this records its parity validation. - sim_frac_power C++ closed-form == numpy eig-based to 5e-10 across f in [-0.5,1.3] (so distribute_drift is bit-identical to the prototype's distribution). - sim4_invert is an exact similarity inverse (1e-9); golden recovers a known drifted loop to 4e-16. - Real-data parity on the loopcache (13 chain pairs + close_0_260): recovered drift matches the prototype's loop error (scale 1.09 vs 1.12, 4.6 vs 4.4 deg), deterministic valid% identical. The corrected-trajectory delta is the known EPnP-vs-cv2 PnP backend feeding D, not the distribution math. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
consensus_fuse shipped with the Accumulator commit (one hash-grid pass over the frame-tagged cloud: >=K distinct-frame voxels kept, agreeing predictions averaged). This records its parity validation. - Golden (test_consensus_fuse): exact counts on controlled synthetic support. - Real-data parity vs fuse.py on the 13 acc dumps (voxel 0.02, K>=2): raw points bit-exact (2,633,725), per-point floater drop 14.0% vs 14%, raw->fused reduction 93.9% vs 94%, kept-voxel fraction 53.8% vs 54%. Sub-1% voxel-count delta is the chaining RANSAC-RNG, not the fusion math. Reproduces the prototype's "remove the 14% single-frame edge-haze floaters" result. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Expose the now-ported pose pipeline from the C API and the CLI, with no Python
runtime dependency (per CLAUDE.md):
C API (include/free_splatter.h):
- free_splatter_estimate_poses: recover per-view cam2world from an engine buffer.
- free_splatter_accumulator_{new,free,add_pair,frame_count,cloud,fuse,
camera_path}: opaque sliding-window accumulator wrapping pose::Accumulator;
add_pair takes each consecutive pair's [2,H,W,gc] engine output, returns the
growing global cloud (free_splatter_point: xyz+rgb+frame), the consensus-fused
cloud, and the global camera trajectory. FFI-friendly, malloc'd buffers freed
with free_splatter_buf_free.
CLI (free_splatter-cli --accumulate):
- runs the engine over each consecutive image pair, chains the runs into one
world, and writes PREFIX_<nframes>.splat after each pair (the evolving
reconstruction) plus, with --fuse, a consensus-fused PREFIX_fused.splat. New
write_cloud_splat emits the xyz+rgb cloud as small isotropic .splat gaussians.
Verified end-to-end on real frames (5 frames -> evolving 519K/779K/.. splats +
103K fused), sanitizer-clean under the debug preset; asset-free ctest tier green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per CLAUDE.md: the pose/ research prototype (focal/align/pnp + accumulate/loop/
fuse + the re10k/T&T validation harnesses) proved the accumulating-reconstruction
approach. That whole pipeline is now rewritten in C++ (src/pose.{h,cpp}), exposed
via free_splatter-cli + include/free_splatter.h with no Python, and validated
(asset-free golden tests + real-data parity recorded in the prior commits). The
prototype was a throwaway, not a parallel implementation to maintain -- so it is
removed. Git history preserves it and its layer-by-layer parity harnesses.
Updates the dangling references (CLAUDE.md, src/linalg.h, src/pose.{h,cpp}) to
point at git history instead of the deleted files.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fuzz the untrusted user-facing surfaces opened by the pose work (GGUF stays trusted, not fuzzed): - fuzz_pose: the public pose C-API (free_splatter_estimate_poses + the accumulator add_pair/cloud/fuse/camera_path) on arbitrary float gaussian buffers (NaN/Inf/denormals) and fuzz-chosen geometry. - fuzz_decode: the image-FILE path (arbitrary bytes -> stb_image -> crop/resize -> CHW), the surface a user photo crosses in the CLI/demo. stb is third-party; per CLAUDE.md we fuzz the boundary and would guard rather than patch a stb-internal trip -- none seen (31k+ runs clean). Fixes found by fuzz_pose (our code, so fixed not guarded): - SIGFPE: fit_similarity_ransac sampled `% N` with N=0 (an image pair with no overlapping valid pixels). Guard N<3 (RANSAC's minimal sample): all-inlier, plain fit when N>=1, else identity. - Latent float-cast UB: consensus_fuse now skips non-finite cloud points and clamps the voxel-coordinate cast, so an Inf point can't make (int32)floor(NaN). All four fuzzers clean; asset-free ctest tier (incl. test_pose fusion goldens) still green under ASan/UBSan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dded) A web demo of the live idea: feed a photo stream, watch one world assemble. - web/accumulate.html: forks the index.html EWA splat renderer to play a SEQUENCE of clouds (the reconstruction from 2 photos, then 3, then 4, ...), with the input photos accumulating in the TOP-RIGHT filmstrip (newest highlighted) as each is folded in. Auto-advances + gentle auto-orbit; ?start=/auto=/ms=/spin= deep-link params. Seeds an unsorted draw order on each step so the cloud paints immediately (the depth-sort worker then refines it). - scripts/make_accumulate_demo.sh: one engine pass over the frames via `free_splatter-cli --accumulate` -> acc_2.splat..acc_N.splat + input thumbnails + manifest.json + the viewer, a self-contained servable dir. - CLI --splat-scale default 0.0015 -> 0.006 of extent (point clouds read as surfaces, not grains); documented in web/README.md. Verified end-to-end on a RealEstate10K clip (8 frames -> 7 steps): headless chromium/SwiftShader renders a coherent dining-room reconstruction that grows 259,702 -> 413,517 splats across the steps, filmstrip populating as designed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… splats) The accumulated cloud kept only position+color, so the demo rendered as an isotropic point cloud. A similarity x->s*(R@x)+t scales a gaussian's covariance by s^2 and rotates it by R, so the shape transforms cleanly: carry it. - AccumPoint / free_splatter_point gain scale[3] + rotation quaternion (w,x,y,z). - Accumulator::add_pair de-interleaves the engine's scale (ch16:19) and rotation (ch19:23); add_view sets scale_world = T.s * scale_local and q_world = quat(T.R) * q_local (new mat3_to_quat / quat_mul / quat_normalize helpers, with a zero/NaN-quaternion -> identity guard). - consensus_fuse averages the scale and keeps a representative orientation. - write_cloud_splat emits the real anisotropic scale + rotation (OpenCV->OpenGL remap, same as the single-run write_splat); --splat-scale is now a radius multiplier (default 1.0), not an isotropic fraction. Verified: anisotropic scale spans the engine's [1e-4,0.02] (mean axis-ratio ~29) with per-gaussian orientations; golden test_pose green; fuzz_pose clean (the identity-quaternion guard keeps the new quat math UB-free on garbage input). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- free_splatter-cli --accumulate now also accepts pre-computed [2,H,W,gc] .f32 pair dumps (each one pair) instead of images, skipping the engine -- for fast re-bakes and fusion sweeps off cached runs. Verified byte-identical acc_8 to the image path on the same frames, and it runs in ~4s vs ~2min. - make_accumulate_demo.sh passes --fuse and appends a final consensus-fused step to the manifest; the viewer shows a step's optional "label" (so the demo ends on "consensus-fused -- single-view floaters removed", 413k -> 149k splats, the edge-haze floaters gone). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The fused step (8th) showed the same "8 photos" panel as acc_8, so it read as a
duplicate. Now a labelled step shows a prominent top-center banner ("consensus-
fused -- single-view floaters removed -- N splats"), the stat reads "fused (8)",
and manifest/splat fetches use cache:"no-store" so an updated demo dir is never
served stale.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…truct blurry) The default demo clip was a near-pure forward dolly (~0.15% lateral baseline of scene depth), so two-view depth was unconstrained and the gaussians came out blurry. Re-baking from an orbiting clip with real sideways motion (stride chosen for ~9% baseline) reconstructs legibly; tighten the viewer framing (1.25x->0.9x of the cloud diagonal) so it fills the view, and document the lateral-baseline requirement in web/README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lurry cloud) The accumulator dropped per-gaussian opacity, so write_cloud_splat emitted every splat fully opaque (alpha 255). With "over" alpha compositing that means the front-most splat at each pixel fully occludes the rest, so as the camera orbits the depth order flips and the hard anisotropic ellipsoids visibly swirl/rotate -- and the over-contribution reads as blur. The single-run write_splat instead uses the gaussian's activated opacity as the alpha (mean ~41/255), blending many low- alpha splats into a stable smooth surface. Fix: AccumPoint / free_splatter_point carry `opacity`; add_view stores it, consensus_fuse averages it, and write_cloud_splat emits it as the splat alpha and caps by importance (opacity*volume) like write_splat. On the same pair the cloud splat is now byte-identical in alpha (1/254/mean 41) to write_splat and renders as a clean alpha-blended surface instead of an opaque swirling soup. test_pose green; the C-API struct gains one float (emit_points copies it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prevent the class of regression that dropped first rotation/scale, then opacity, from the accumulated-cloud splat writer (a second copy of the encoder that drifted from the proven single-run one). #1 Unify: new src/splat.h::encode_splat_record is the ONE definition of the OpenCV->OpenGL convention, quaternion remap, opacity->alpha and byte packing. Both write_splat (single-run) and write_cloud_splat (cloud) now build a (pos,scale,quat,rgb,opacity) tuple and call it, so they cannot diverge again. Verified byte-identical to the previous write_splat output (pure refactor). #2 Pin: two asset-free tests in test_pose.cpp -- - test_splat_record: pins the encoder bytes, incl. the exact regressed field (opacity 0.5 -> alpha 127, NOT a forced 255) and the rotation remap. - test_accumulate_channels: a one-pair (T=identity) accumulation must preserve every gaussian channel (xyz, SH->rgb, opacity, scale, rotation, frame); a dropped channel fails immediately with zero fixtures. Both guards fail on either historical bug. ctest -LE model green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Consensus fusion only keeps voxels seen by >= K frames; with few frames (a short arc) only a small fraction of the scene is multiply-observed, and the existing "averaged" output (one point per consensus voxel) then decimated that further -- so the final fused scene was very sparse (~38k of 1M points on the 4-frame demo). Add a "kept" mode (the prototype's fuse.py --ply-kept): keep every raw gaussian whose voxel is corroborated by >= K frames -- floaters still removed, but nothing averaged away. On the demo that's 446k vs 38k points (12x denser), a solid surface instead of a sparse scatter. - consensus_fuse gains `keep_raw` (default false = averaged, unchanged); free_splatter_accumulator_fuse gains a `keep_raw` arg; CLI `--fuse-mode kept|averaged` (kept is the default for the cloud demo). - Golden test pins both modes (averaged -> 5 voxel points, kept -> 15 raw points); fuzz_pose drives both modes; ctest -LE model green, fuzz clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ne + best-frame fusion
Pairwise Sim(3) chaining leaves per-object (non-rigid) misregistration: overlapping
frames show doubled objects (two lamps, a bed end offset from its top). A rigid
per-frame pose BA can't fix this (tried, measured: 0.6%->0.6%, no-op -- reverted) --
the residual isn't a camera-pose error, it's that each frame predicts an object's
position slightly differently. Two fixes that DO work:
- consensus_refine (gaussian-level, non-rigid): each iteration moves every point a
fraction toward the opacity-weighted consensus of the OTHER frames' points in its
coarse-to-fine voxel neighbourhood. Local + non-rigid, so spatially-varying
ghosting collapses. On the real strong-arc cloud: ghosting 1.2% -> 0.26% (~5x);
golden 3.7% -> 0.0%. Exposed as Accumulator::refine, free_splatter_refine_cloud /
_accumulator_refine, CLI --refine.
- consensus_fuse gains FUSE_BEST: per consensus voxel keep only the single most-
confident frame's gaussians (dense AND de-ghosted, no stacked copies). keep_raw
bool -> mode int {averaged,kept,best}; CLI --fuse-mode best.
Golden tests pin both (consensus_refine de-ghost, best=one-frame-per-voxel);
fuzz_pose drives the refine functions + all three fuse modes on garbage. ctest
-LE model green, fuzz clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ine knobs make_accumulate_demo.sh now passes --refine (gaussian-level consensus de-ghost) by default (REFINE=0 to disable) and the live demo ends on a best-frame fused surface. web/README documents the three fuse modes (kept/best/averaged) and the de-ghosting (why gaussian-level consensus works where rigid pose-BA doesn't). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…v2 reference)
Quantifies how well a 2-view pair constrains depth, two independent ways, to flag
pairs whose reconstruction can't be trusted (and to detect the model inventing
depth the images don't support).
- pose::parallax_stats / pair_parallax (src/pose.{h,cpp}): from the model's OWN
recovered geometry (estimate_poses, normalize=false so cameras+points share the
view-0 frame) compute the median triangulation angle, the baseline angle off the
optical axis (0=dolly/no-parallax, 90=strafe/ideal), and baseline/median-depth.
All angles scale-invariant.
- C-API free_splatter_pair_parallax + free_splatter_parallax (include/free_splatter.h,
src/free_splatter.cpp); CLI `--parallax MODEL {img0 img1 | pair.f32}`.
- tests/test_pose.cpp test_parallax_geometry: golden on synthetic cameras — a
strafe baseline gives lateral=90 / tri=atan(B/Z); a dolly of the SAME length
gives ~0. Pins the metric measures depth-resolving motion, not raw displacement.
- scripts/parallax_ref.py: dev-time INDEPENDENT reference (cv2, nix devShell only,
never shipped). Feature matches -> ORB-SLAM homography-vs-fundamental R_H
(calibration-free degeneracy) + essential-matrix median triangulation angle.
Validation (re10k ladder, focal matched to the model): on the well-conditioned
pair the two agree to 0.3 deg (after 15.6 vs before 15.9); on near-degenerate
pairs the model over-reports parallax 2-4x (after 7.0/4.5 vs independent 1.7/1.7),
i.e. it hallucinates depth the images can't support -- exactly what the cross-check
is for. Wide-baseline orbits (Truck) starve sparse matching, so the geometric
reference is only reliable on well-textured moderate-baseline pairs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold a candidate frame into the world only if its triangulation angle vs the last KEPT frame clears DEG degrees (free_splatter_pair_parallax) -- else its depth is ill-conditioned and the model would invent it. Implemented as keyframe selection: the next candidate is re-paired against the last kept frame (not the immediate predecessor), so skipping a frame re-anchors instead of breaking the Sim(3) chain. Threshold 0 degenerates to consecutive pairs (no behavior change). Image mode only (re-pairing needs the engine); .f32 dump mode warns and accumulates all, since fixed pairs can't be re-anchored. Verified on the re10k demo frames: (f0000,f0020)=15.6 deg kept, (f0020,f0040)=7.0 skipped, then f0060 re-paired against f0020 = 15.7 deg kept -- the degenerate middle frame dropped, the world kept to well-conditioned views only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
make_accumulate_demo.sh now passes --min-parallax (MIN_PARALLAX, default 8 deg; 0 disables) and --fuse-mode best, and builds the manifest from the frames the gate actually KEPT (parsing the keep/skip log): frame 0 is the anchor, each "keep frame J" adds input frame J, and step acc_n shows that step's n kept thumbnails. So you can feed a long dense frame stream and the gate curates the well-conditioned subset instead of folding in frames whose depth the model would invent. Verified end-to-end on the 7-frame re10k loop: gate kept f0000/f0020/f0060 and dropped f0040 (7 deg), f0080 (2.4), f0100 (4.8), f0120 (6.6) -- the walkthrough stops translating laterally after f0060, so only three views carry depth. web/README documents MIN_PARALLAX (after-inference angle, over-reports -> keep well above COLMAP's 1-2 deg; parallax_ref.py is the independent cross-check) and folds in the earlier honest de-ghost rewrite (best-frame selection; --refine off, why). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop a clip under demo-vids/<name>/ (mirroring the demo-photos/ convention) and this samples it to frames, lets the --min-parallax gate curate the well-conditioned keyframes, and bakes a growing-reconstruction demo (make_accumulate_demo.sh) to .cache/demo/<name>/. Time-lapse / slow-pan clips are the sweet spot: the gate drops the near-duplicate tight frames and keeps ~10-14deg-parallax steps. Verified on two clips: flower-bed (81 frames -> 21 sampled -> 11 kept, a clean 10-step growing flower bed) and office-corner (9 -> 5 kept, 4 steps). gitignore the demo-vids/ drop folder like demo-photos/ (user media, not committed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A user could silently land on the ~50x slower CPU path. Now the library default
device is "vulkan" (src/options.h) and an empty/unset device resolves to vulkan
(free_splatter.cpp), so a caller who never sets one fails-closed if no GPU is
present rather than running on CPU. backend.cpp already errors for an explicit
vulkan request with no device — that path is now the default.
- CLI forwards the library default (vulkan); on a load failure with no explicit
--device it prints guidance ("pass --device cpu to run on CPU"). Usage updated.
- bench: device label no longer assumes cpu when unset.
- make_accumulate_demo.sh: DEVICE defaults to vulkan and resolves/builds the vulkan
CLI (build/vulkan/bin/free_splatter-cli) like serve.sh builds the .so; DEVICE=cpu
+ the release CLI stays as the explicit CPU escape (headless/CI bakes).
Tests are unaffected (every test passes an explicit device; default FREE_SPLATTER_DEVICE
is cpu). Verified: CPU build with no --device fails-closed with guidance; --device
cpu runs; ctest -LE model green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…PU bake)
The accumulating-reconstruction demos were separate static dirs on separate ports;
now the web app switches between them and can generate a new one from a video in
the UI.
- /api/scenes lists the growing-world scenes (server/scenes.go scanScenes: any
scenesDir subdir whose manifest.json has a "steps" array — the storyboard's
"stations" manifest is excluded). Served at /scenes-assets/. New `-scenes-dir`
flag (defaults to -demo-dir). No in-memory registry: re-scanned per request, so
externally-baked and uploaded scenes appear with no restart.
- POST /api/scene/from-video (+ GET /api/scene/status/{job}): samples frames
(ffmpeg), runs the keyframe parallax gate + accumulate + best-frame fuse
IN-PROCESS on the loaded GPU engine, and writes the scene dir incrementally.
Async job + polling; one bake at a time (429 if busy). engine.go binds the
accumulator + pair_parallax C API (with point/parallax ABI-mirror structs and
copy helpers); convert.go adds encodeCloudSplat (mirrors write_cloud_splat +
splat.h — note the cloud colour is already linear, NOT SH-transformed).
- server/web/accumulate.html: the viewer gains a scene dropdown + a "make scene
from video" upload widget with live progress. It is now the SINGLE source (moved
from web/; standalone baked dirs fall back to ./manifest.json when /api/scenes is
absent). make_accumulate_demo.sh copies it from server/web. index.html links to it.
Verified end-to-end on Vulkan (AMD RADV): /api/scenes lists the 6 baked scenes;
uploading office-corner1.mp4 baked a 5-step scene (gate kept 5/9, same as the
offline bake-vids.sh) that renders identically to the CLI result and appears in the
dropdown live. Go build + vet clean; ctest -LE model green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pages were already one server, but / was the reconstruct page and the baked accumulating demos were being served as separate static instances on other ports. Now / is a landing menu and everything is reachable from this one server: - server/web/index.html: new menu — cards link to /reconstruct.html (photos -> splat), /accumulate.html (accumulating scenes + video upload), /demo.html (storyboard). - the reconstruct page moves to /reconstruct.html (was index.html); each page gets a "‹ all demos" link back to the menu. - the standalone per-scene static servers are obsolete: those scenes are listed by /api/scenes and switched in /accumulate.html, so there's nothing left to run on a separate port. Verified on the single Vulkan server: /, /reconstruct.html, /accumulate.html, /demo.html, /api/scenes all 200; the menu renders three cards linking the pages. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a batch hierarchical (balanced binary tree) alternative to the linear
Sim(3) accumulator, to cut the registration drift that smears the far end of a
long photo stream. Adjacent submaps of `block` frames overlap by `overlap`
frames and each merge fits one Sim(3) over all shared frames at once, so the
fit is over-determined and per-frame depth noise averages out — drift compounds
over ~log2(N) hops instead of N. block=2,overlap=1 is the plain overlap-by-one
tree.
- Engine (src/pose.{h,cpp}): tree_accumulate_overlap + consensus_fuse of an
arbitrary frame-tagged cloud (best-frame mode de-ghosts the overlapping
per-frame copies); shared fit_band_sim RANSAC registration primitive.
- C API: free_splatter_tree_overlap (full merge or staged side-by-side via
block/overlap/max_levels/layout_spacing/per_node_cap) + free_splatter_fuse_cloud.
- CLI: --accumulate-tree (--tree-block/--tree-overlap, --fuse) and --tree-stages,
which bakes one laid-out .splat per merge level + a consensus-fused final stage
and a manifest the viewer steps through.
- Viewer (accumulate.html): vibrance/saturation slider (?sat= override), ported
from reconstruct.html; reframes per stage for tree-stages manifests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Recover camera positions for idividual splats and merge overlapping splats using a shared
frame. That is if we have three frames from a video and one of the frames overlaps with both
of the other two, we can create two splats that share a frame. Because one frame is shared by
both splats its camera coordinates/pose can be used to merge the coordinate systems for both
splats.
Unfortunately there is a lot of variation in the depth estimation of gaussians between splats
so when we try to merge them only a few gaussians are shared between the splats and it is difficult
to reconcile them. Various corrective measures have been tried with limited success.