Skip to content

feat: Fuse multiple splats from video#3

Merged
richiejp merged 33 commits into
masterfrom
pose/pnp-cross-run-alignment
Jul 1, 2026
Merged

feat: Fuse multiple splats from video#3
richiejp merged 33 commits into
masterfrom
pose/pnp-cross-run-alignment

Conversation

@richiejp

Copy link
Copy Markdown
Contributor

Recover camera positions for idividual splats and merge overlapping splats using a shared
frame. That is if we have three frames from a video and one of the frames overlaps with both
of the other two, we can create two splats that share a frame. Because one frame is shared by
both splats its camera coordinates/pose can be used to merge the coordinate systems for both
splats.

Unfortunately there is a lot of variation in the depth estimation of gaussians between splats
so when we try to merge them only a few gaussians are shared between the splats and it is difficult
to reconcile them. Various corrective measures have been tried with limited success.

  • pose/: downstream PnP + cross-run alignment prototype, verified vs upstream
  • pose/: dense GT-posed control (Tanks&Temples OOD, RealEstate10K validates pipeline)
  • pose/: cross-run consistency on in-distribution re10k (accumulation is clean)
  • CLAUDE.md: PnP now in scope; C++/Go-only policy, no-Python runtime
  • pose/: sliding-window accumulation prototype (the live idea, proven end-to-end)
  • pose/: loop closure -- machinery verified; real loop reveals it's not the lever
  • pose/: consensus fusion -- removes the edge-noise floaters from the accumulated cloud
  • pose/: begin the C++ port -- focal + Umeyama align + DLT/RANSAC PnP, golden-tested
  • pose/: robust C++ PnP (EPnP + Gauss-Newton) -- cv2-parity on real scenes
  • pose/: C++ accumulation chaining (Accumulator) -- one world from a photo stream
  • pose/: validate C++ loop closure (sim4_invert + distribute_drift) vs Python
  • pose/: validate C++ consensus fusion (consensus_fuse) vs Python fuse.py
  • CLI + C-API: pose recovery + accumulation surface (no Python)
  • pose/: delete the Python prototype -- fully ported to C++ and shipped
  • fuzz: pose C-API + image-decode harnesses; fix a SIGFPE the fuzzer found
  • demo: accumulating-reconstruction viewer (cloud grows as photos are added)
  • pose: carry gaussian scale + rotation through accumulation (render as splats)
  • demo: add consensus-fused step + accumulate from cached .f32 pair dumps
  • demo: make the consensus-fused step unmistakable + defeat stale caching
  • demo: tighter framing + baseline guidance (forward-dolly clips reconstruct blurry)
  • pose: carry opacity into the splat alpha (fix the opaque, swirling, blurry cloud)
  • splat: unify the .splat encoder + pin it with regression tests (Model weights? #1, Benchmark the forward pass and optimize CPU+GPU; portable build by default #2)
  • fuse: add dense "kept" mode (fix the sparse fused scene)
  • pose: de-ghost the accumulated cloud -- gaussian-level consensus_refine + best-frame fusion
  • demo: de-ghost the bake by default (--refine) + document the fuse/refine knobs
  • pose: parallax estimation (after-inference C++ metric + independent cv2 reference)
  • cli: --min-parallax keyframe gate for accumulate
  • demo: wire --min-parallax gate into the bake (+ honest de-ghost docs)
  • demo: auto-bake videos from demo-vids/ (scripts/demo/bake-vids.sh)
  • device: default to GPU/Vulkan, fail-closed; CPU is explicit opt-in
  • server: scene switcher + upload-a-video-to-make-a-scene (in-process GPU bake)
  • server: / is a menu of the demo pages (one server, one port)
  • Tree-merge accumulation + vibrance slider for the accumulate demo

richiejp and others added 30 commits June 26, 2026 15:28
…stream

A self-contained, pure-Python prototype DOWNSTREAM of the engine seam
([N,H,W,23]); deliberately OUTSIDE the validated src/ and NOT wired into
CMake/ctest. It prototypes live, accumulating reconstruction from a moving
camera: recover each view's camera (PnP) and align successive runs.

- focal.py / pnp.py: faithful port of FreeSplatter's scene estimate_poses --
  Weiszfeld shared focal (view 0 only, all pixels; use_first_focal), integer
  pixels, cv2.solvePnPRansac(SQPNP, reprojErr=5, iters=10), cam2world=inv(w2c),
  and the runner's 1/baseline camera rescale. numpy DLT+RANSAC fallback runs
  without cv2.
- align.py: Umeyama similarity fit + RANSAC + a residual ladder (diagnose)
  that detects whether cross-run mismatch is a uniform-scale similarity or a
  nonlinear warp; plus similarity chaining and loop-closure metrics.

Verification (all green):
- check_cv2_parity.py  -- numpy solver == exact cv2.solvePnPRansac on synthetic
  ground truth (~1e-7 clean).
- check_upstream_parity.py -- our WHOLE orchestration vs upstream estimate_poses
  on REAL engine output. Caught and fixed five divergences (the costly one:
  focal averaged over a low-overlap 2nd view gave 507 vs the correct 596, ~15%
  -> 1.35 deg pose error). Now bit-exact (0.00 deg) on 2 scene + 2 object dumps;
  the only residual is upstream's float32 K vs our float64 (a RANSAC inlier-
  boundary precision effect, <=0.5 deg on near-degenerate object data, 0 on
  scene), root-caused and documented, not papered over.
- test_pose.py -- 28 asset-free golden tests (no model/fixtures/cv2).

Empirical finding: cross-run mismatch is a uniform-scale similarity (~11% scene
scale drift), no nonlinear warp -- a 7-DoF similarity is the right alignment
model; diagnose() flags it if that ever changes.

flake.nix: add opencv4 (cv2) for the exact-upstream PnP path; numpy-only
fallback otherwise. Not needed by the engine build/test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ates pipeline)

A known-good, ground-truth-posed control so a live-path failure can be attributed
to the data/model vs our code -- and a reusable engine-vs-GT harness.

- tt_control.py / tt_experiment.py: Tanks-and-Temples (NSVF) loader (OpenGL->OpenCV
  poses, baseline-vs-stride report) + engine-vs-GT check. Verdict: T&T is OUT OF
  DISTRIBUTION for FreeSplatter-scene (narrow-FOV object orbits) -- opacity confident
  on only 8-17% of pixels, pose error 28-145deg. Kept as harness + negative result.
- re10k_control.py / re10k_fetch.py / re10k_experiment.py: RealEstate10K loader
  (parser + GT geometry, GT focal_512 = fy*512), yt-dlp/ffmpeg frame fetch with a
  dead-video skip, and the engine-vs-GT check. IN DISTRIBUTION -> the control works:
  relative pose recovered to 0.4-1.5deg vs INDEPENDENT GT, opacity confident on
  68-75% of pixels. Validates our PnP beyond the upstream parity, and confirms the
  re10k camera convention.

Findings: (1) the model has a CONSTANT wide-FOV focal bias (recovers ~274 vs GT
~439, ~37%) -- benign for relative accumulation (consistent across runs), off for
metric scale; (2) pose error scales ~linearly (~25% of baseline rotation), no
degeneracy even at ~1.4deg baselines on texture-rich interiors.

Data (T&T ~1.5GB, re10k poses ~720MB, fetched frames) stays under .cache/ and the
scratchpad -- gitignored, never committed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s clean)

re10k_crossrun.py: a frame shared by two overlapping pairs (m-s,m) and (m,m+s) is
reconstructed twice in two coordinate systems; fit a robust similarity between the
two and read consistency + the residual ladder, swept over baseline stride. Mirrors
empirical.py but on in-distribution data where geometry is good.

Result (shared frame 120): cross-run consistency is HIGH and best at small
baseline -- stride 20 (0.41deg): 65% of pixels agree within 2% of scene extent,
98% within 10%; stride 80 (6deg): 46% / 95%. Versus the OOD doll's 7% / 21%. The
mismatch is a clean uniform-scale similarity everywhere (sim->affine buys ~0.1%,
verdict similarity_plus_noise); per-step scale drift is small and grows with
baseline (1.7% at 0.4deg, 11% at 6deg).

Takeaways for the live path: (1) the small-baseline hypothesis holds -- the sliding
window's small steps land in the high-consistency regime, so accumulation is mostly
clean; (2) consensus fusion is a polish for the residual ~2% floaters, not a
prerequisite; (3) registration is a 7-DoF similarity (no nonlinear warp), with slow
scale drift a sim3 pose-graph can bound.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Policy update directed by the project owner:
- PnP pose recovery and cross-run Sim(3) alignment / accumulation are now IN
  scope (previously out of scope), alongside the engine.
- Everything ships in C++; Go only for the demo web server (purego -> C API ->
  Vulkan + WebGL). The CLI and C API must have NO Python dependency at runtime.
- Python is confined to (1) dev-time reference/conversion/validation in the CUDA
  docker (hf_dump/convert/compare_taps), never a runtime dep, and (2) the pose/
  research prototype TEMPORARILY -- continued in Python only until the approach is
  proven, then rewritten in C++ and the Python deleted.
- Per-component discipline now lists the C++ `pose` component, inheriting the
  parity discipline the Python prototype established (bit-exact to upstream
  estimate_poses; validated vs independent GT poses).

pose/README.md updated to match (temporary prototype, C++ port pending).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nd-to-end)

accumulate.py assembles the validated pieces into the live pipeline (minus
realtime): slide a window over a re10k clip, recover each pair's camera (PnP), fit
a Sim(3) between consecutive runs from their SHARED-frame per-pixel correspondences
and compose into a per-run global transform (align.compose), drop every frame's
gaussians into one world (frame f_0's), and measure the recovered camera trajectory
vs ground truth. Engine dumps are cached so analysis re-runs skip inference.
render_ply.py projects the colored cloud through a pinhole camera to a PNG.

Result (13 pairs, stride 20, frames 0..260):
- per-link Sim(3) registration clean: residual ~1.0-1.4% of scene extent.
- scale drift accumulates monotonically (forward pan, no revisit): 0.755 over 12
  links, ~2.3%/link -- the monocular 1/d drift compounding.
- recovered camera trajectory tracks GT to ATE ~11% of extent (single global
  Sim(3) align); drift grows 7%->13% first->second half, worst at the endpoint --
  exactly what a Sim(3) pose-graph + loop closure bounds.
- the 2.6M-point accumulated cloud renders from camera 0 as a COHERENT room that
  matches the input frame (wider FOV due to the model's ~274-vs-439 focal bias).

The accumulating-reconstruction idea is proven; per CLAUDE.md the next
implementation step is the C++ port (CLI + C API), after which this Python goes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… the lever

find_loop.py searches re10k poses for a clip that revisits its start; loop_closure.py
chains open-loop, measures the loop error via a closing pair (f_0,f_n), distributes it
by even Sim(3) relaxation (D^(k/n)), and reports ATE before/after. Sim(3) 4x4 helpers
(sim_matrix, sim_frac_power) promoted into align.py; golden test added.

Honest result on a real loop clip (camera out to 2.29 and back to 0.23):
- the open-loop chain ALREADY closes the loop (loop error 4.4deg / scale 1.12 / 8%
  trans) -- there is almost no accumulated drift to distribute.
- the dominant ~34% ATE is per-link ODOMETRY NOISE (Sim(3) inlier% as low as 17-24%
  on the fast outbound leg) plus the model's focal-bias warp: self-consistent (the
  loop closes) but distorted vs GT. Loop closure can't fix that; naive distribution
  slightly hurts.
- diagnosis confirmed not a bug: the correction recovers SYNTHETIC uniform
  accumulated drift to ~0 (test_pose.py::test_loop_correction, ATE 1e-15).

Lesson: loop closure pays off on LONG trajectories with consistent accumulated drift
(cf. the forward clip's monotone 7%->13%); for short loops the lever is better
odometry -- smaller baselines, consensus fusion, and the focal bias.

Golden suite green (test_loop_correction added).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ccumulated cloud

fuse.py answers the question that started this thread ("around the edges there's a
lot of noise -- does accumulation remove it?"). Each per-pixel gaussian is
partner-view-dependent, so occlusion-edge / depth-ambiguous points are floaters; a
real surface point is reconstructed by several overlapping frames and they agree in
the global frame. So: voxelize the accumulated cloud at the consistency scale (~2%
of extent) and keep only voxels corroborated by >= K distinct frames, averaging the
agreeing predictions (which also denoises the surface). Reuses cached engine dumps
(no new inference).

Result (forward clip, 14 frames, K>=2): 46% of voxels are single-frame, holding
14% of points -- these render as INCOHERENT EDGE-HAZE (floaters + swept-volume
periphery). The >=2-frame consensus (86% of points) renders as a CLEAN, CRISP room
with the haze gone. Definitive yes: accumulation + consensus fusion removes the edge
noise. Honest tradeoff: dropping single-frame points also trims the single-view
periphery (coverage vs cleanliness).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…golden-tested

Start porting the proven Python pose prototype (focal.py + align.py + pnp.py) to
shipped C++, dependency-free per CLAUDE.md (only a self-contained Jacobi
eigensolver -- no Eigen, no OpenCV), wired into the library and the asset-free
test tier.

  src/linalg.h   small dense linear algebra: symmetric cyclic-Jacobi eigensolver,
                 3x3 SVD (via MᵀM), det/inv, 4x4 rigid inverse. Everything the
                 pose math needs reduces to the eigensolver.
  src/pose.{h,cpp}
                 estimate_focal (Weiszfeld); fit_similarity (Umeyama) + RANSAC +
                 residual-ladder/diagnose; Sim(3) compose/invert/sim_matrix/
                 loop_closure_error; sim_frac_power (closed-form one-parameter
                 subgroup, no complex eig); solve_pnp_numpy (DLT via the 12x12 AᵀA
                 nullspace + cheirality decode + RANSAC); estimate_poses (scene
                 recipe: view-0 all-pixel focal, per-view opacity-masked PnP,
                 optional baseline rescale).
  tests/test_pose.cpp
                 the asset-free mirror of pose/test_pose.py -- 9 golden tests
                 (similarity roundtrip, scale/nonlinear detection, RANSAC
                 outliers, loop correction/closure, focal, PnP recovery/outliers).
                 All green under the debug (ASan/UBSan) preset; ctest -LE model.

Cross-checked against the Python reference on a real scene dump: focal is
bit-exact (596.408591886). PnP is correct on clean data, but on real scenes the
DLT solver inherits the textbook planar/mirror degeneracy (3/5 RANSAC seeds match
cv2's ~57deg relative rotation, 2/5 flip) -- the same instability that made the
prototype use cv2 (SQPNP) for all real-data results. Next: a robust in-house PnP
(EPnP/SQPNP + Gauss-Newton refine) for cv2-parity with no OpenCV dependency, then
the accumulation/loop-closure/fusion chaining and the CLI / C-API surface.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DLT/RANSAC solver inherits the planar/mirror degeneracy on real dumps
(seed-dependent: ~3/5 seeds near cv2, ~2/5 a ~135-152deg flip). Add the shipped
real-data solver, solve_pnp:

  * EPnP (Lepetit/Moreno-Noguer/Fua) for the init -- barycentric control points,
    the camera-frame control points from the 12x12 MᵀM null space (the same Jacobi
    eigensolver), beta solve for N=1,2,3 with cheirality sign-fix, R,t via the
    rigid Umeyama on the 4 control points, best-N by reprojection. Non-iterative,
    uses ALL points (no random minimal samples -> no seed-dependent flips),
    planar-robust by construction.
  * Huber-robust Gauss-Newton reprojection refine (6-DoF left perturbation,
    J = [-[Xc]_x | I]) to polish to the reprojection minimum and downweight
    outliers -- the deterministic analogue of cv2's RANSAC+SQPNP+refine.

On the real scene dump (A_scn): deterministic across all RANSAC seeds, and within
0.73deg rotation / 0.74deg translation-direction of the upstream cv2/SQPNP -- vs
the numpy DLT's 175deg miss there. estimate_poses now uses solve_pnp; the DLT
solve_pnp_numpy stays as the asset-free golden reference.

Golden tests (all green under ASan/UBSan, ctest -LE model): exact recovery on
clean data, a near-planar slab (where the DLT's coplanar minimal samples flip),
and 15% gross-outlier rejection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oto stream

Port accumulate.py to src/pose.{h,cpp}: the sliding-window accumulating-
reconstruction loop. Accumulator::add_pair takes each consecutive pair's
[2,H,W,gc] engine output, recovers the pair's cameras (estimate_poses), fits the
cross-run Sim(3) on the shared frame (fit_similarity_ransac), composes a global
chain, and drops every new frame's gaussians into one world. Exposes cloud(),
camera_path(), and per-link ChainLink diagnostics (scale/inlier/valid/resid).

Validation:
- Asset-free golden (test_accumulate_chain): synthetic pinhole clip with distinct
  per-run scales -> trajectory ATE 7.6e-8 of extent, per-link scale to 3e-9,
  fit residual ~1e-7. Green under ASan/UBSan.
- Real-data parity on the 13 cached pair_*.f32 dumps vs the numpy/cv2 prototype:
  cloud size bit-exact (2,633,725), per-link valid% identical (deterministic
  mask), per-link Sim(3) scale to mean 0.5% (11/12 links <1%), trajectory within
  6.6% of the cv2 chain. Residual = known RANSAC-RNG + EPnP-vs-SQPNP delta.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Python

The loop-closure machinery (sim4_invert, distribute_drift, sim_frac_power) shipped
with the Accumulator commit; this records its parity validation.

- sim_frac_power C++ closed-form == numpy eig-based to 5e-10 across f in [-0.5,1.3]
  (so distribute_drift is bit-identical to the prototype's distribution).
- sim4_invert is an exact similarity inverse (1e-9); golden recovers a known
  drifted loop to 4e-16.
- Real-data parity on the loopcache (13 chain pairs + close_0_260): recovered
  drift matches the prototype's loop error (scale 1.09 vs 1.12, 4.6 vs 4.4 deg),
  deterministic valid% identical. The corrected-trajectory delta is the known
  EPnP-vs-cv2 PnP backend feeding D, not the distribution math.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
consensus_fuse shipped with the Accumulator commit (one hash-grid pass over the
frame-tagged cloud: >=K distinct-frame voxels kept, agreeing predictions
averaged). This records its parity validation.

- Golden (test_consensus_fuse): exact counts on controlled synthetic support.
- Real-data parity vs fuse.py on the 13 acc dumps (voxel 0.02, K>=2): raw points
  bit-exact (2,633,725), per-point floater drop 14.0% vs 14%, raw->fused
  reduction 93.9% vs 94%, kept-voxel fraction 53.8% vs 54%. Sub-1% voxel-count
  delta is the chaining RANSAC-RNG, not the fusion math. Reproduces the
  prototype's "remove the 14% single-frame edge-haze floaters" result.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Expose the now-ported pose pipeline from the C API and the CLI, with no Python
runtime dependency (per CLAUDE.md):

C API (include/free_splatter.h):
- free_splatter_estimate_poses: recover per-view cam2world from an engine buffer.
- free_splatter_accumulator_{new,free,add_pair,frame_count,cloud,fuse,
  camera_path}: opaque sliding-window accumulator wrapping pose::Accumulator;
  add_pair takes each consecutive pair's [2,H,W,gc] engine output, returns the
  growing global cloud (free_splatter_point: xyz+rgb+frame), the consensus-fused
  cloud, and the global camera trajectory. FFI-friendly, malloc'd buffers freed
  with free_splatter_buf_free.

CLI (free_splatter-cli --accumulate):
- runs the engine over each consecutive image pair, chains the runs into one
  world, and writes PREFIX_<nframes>.splat after each pair (the evolving
  reconstruction) plus, with --fuse, a consensus-fused PREFIX_fused.splat. New
  write_cloud_splat emits the xyz+rgb cloud as small isotropic .splat gaussians.

Verified end-to-end on real frames (5 frames -> evolving 519K/779K/.. splats +
103K fused), sanitizer-clean under the debug preset; asset-free ctest tier green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per CLAUDE.md: the pose/ research prototype (focal/align/pnp + accumulate/loop/
fuse + the re10k/T&T validation harnesses) proved the accumulating-reconstruction
approach. That whole pipeline is now rewritten in C++ (src/pose.{h,cpp}), exposed
via free_splatter-cli + include/free_splatter.h with no Python, and validated
(asset-free golden tests + real-data parity recorded in the prior commits). The
prototype was a throwaway, not a parallel implementation to maintain -- so it is
removed. Git history preserves it and its layer-by-layer parity harnesses.

Updates the dangling references (CLAUDE.md, src/linalg.h, src/pose.{h,cpp}) to
point at git history instead of the deleted files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fuzz the untrusted user-facing surfaces opened by the pose work (GGUF stays
trusted, not fuzzed):

- fuzz_pose: the public pose C-API (free_splatter_estimate_poses + the
  accumulator add_pair/cloud/fuse/camera_path) on arbitrary float gaussian
  buffers (NaN/Inf/denormals) and fuzz-chosen geometry.
- fuzz_decode: the image-FILE path (arbitrary bytes -> stb_image -> crop/resize
  -> CHW), the surface a user photo crosses in the CLI/demo. stb is third-party;
  per CLAUDE.md we fuzz the boundary and would guard rather than patch a
  stb-internal trip -- none seen (31k+ runs clean).

Fixes found by fuzz_pose (our code, so fixed not guarded):
- SIGFPE: fit_similarity_ransac sampled `% N` with N=0 (an image pair with no
  overlapping valid pixels). Guard N<3 (RANSAC's minimal sample): all-inlier,
  plain fit when N>=1, else identity.
- Latent float-cast UB: consensus_fuse now skips non-finite cloud points and
  clamps the voxel-coordinate cast, so an Inf point can't make (int32)floor(NaN).

All four fuzzers clean; asset-free ctest tier (incl. test_pose fusion goldens)
still green under ASan/UBSan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dded)

A web demo of the live idea: feed a photo stream, watch one world assemble.

- web/accumulate.html: forks the index.html EWA splat renderer to play a SEQUENCE
  of clouds (the reconstruction from 2 photos, then 3, then 4, ...), with the
  input photos accumulating in the TOP-RIGHT filmstrip (newest highlighted) as
  each is folded in. Auto-advances + gentle auto-orbit; ?start=/auto=/ms=/spin=
  deep-link params. Seeds an unsorted draw order on each step so the cloud paints
  immediately (the depth-sort worker then refines it).
- scripts/make_accumulate_demo.sh: one engine pass over the frames via
  `free_splatter-cli --accumulate` -> acc_2.splat..acc_N.splat + input thumbnails
  + manifest.json + the viewer, a self-contained servable dir.
- CLI --splat-scale default 0.0015 -> 0.006 of extent (point clouds read as
  surfaces, not grains); documented in web/README.md.

Verified end-to-end on a RealEstate10K clip (8 frames -> 7 steps): headless
chromium/SwiftShader renders a coherent dining-room reconstruction that grows
259,702 -> 413,517 splats across the steps, filmstrip populating as designed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… splats)

The accumulated cloud kept only position+color, so the demo rendered as an
isotropic point cloud. A similarity x->s*(R@x)+t scales a gaussian's covariance by
s^2 and rotates it by R, so the shape transforms cleanly: carry it.

- AccumPoint / free_splatter_point gain scale[3] + rotation quaternion (w,x,y,z).
- Accumulator::add_pair de-interleaves the engine's scale (ch16:19) and rotation
  (ch19:23); add_view sets scale_world = T.s * scale_local and q_world =
  quat(T.R) * q_local (new mat3_to_quat / quat_mul / quat_normalize helpers, with
  a zero/NaN-quaternion -> identity guard).
- consensus_fuse averages the scale and keeps a representative orientation.
- write_cloud_splat emits the real anisotropic scale + rotation (OpenCV->OpenGL
  remap, same as the single-run write_splat); --splat-scale is now a radius
  multiplier (default 1.0), not an isotropic fraction.

Verified: anisotropic scale spans the engine's [1e-4,0.02] (mean axis-ratio ~29)
with per-gaussian orientations; golden test_pose green; fuzz_pose clean (the
identity-quaternion guard keeps the new quat math UB-free on garbage input).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- free_splatter-cli --accumulate now also accepts pre-computed [2,H,W,gc] .f32
  pair dumps (each one pair) instead of images, skipping the engine -- for fast
  re-bakes and fusion sweeps off cached runs. Verified byte-identical acc_8 to the
  image path on the same frames, and it runs in ~4s vs ~2min.
- make_accumulate_demo.sh passes --fuse and appends a final consensus-fused step
  to the manifest; the viewer shows a step's optional "label" (so the demo ends on
  "consensus-fused -- single-view floaters removed", 413k -> 149k splats, the
  edge-haze floaters gone).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The fused step (8th) showed the same "8 photos" panel as acc_8, so it read as a
duplicate. Now a labelled step shows a prominent top-center banner ("consensus-
fused -- single-view floaters removed -- N splats"), the stat reads "fused (8)",
and manifest/splat fetches use cache:"no-store" so an updated demo dir is never
served stale.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…truct blurry)

The default demo clip was a near-pure forward dolly (~0.15% lateral baseline of
scene depth), so two-view depth was unconstrained and the gaussians came out
blurry. Re-baking from an orbiting clip with real sideways motion (stride chosen
for ~9% baseline) reconstructs legibly; tighten the viewer framing (1.25x->0.9x of
the cloud diagonal) so it fills the view, and document the lateral-baseline
requirement in web/README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lurry cloud)

The accumulator dropped per-gaussian opacity, so write_cloud_splat emitted every
splat fully opaque (alpha 255). With "over" alpha compositing that means the
front-most splat at each pixel fully occludes the rest, so as the camera orbits
the depth order flips and the hard anisotropic ellipsoids visibly swirl/rotate --
and the over-contribution reads as blur. The single-run write_splat instead uses
the gaussian's activated opacity as the alpha (mean ~41/255), blending many low-
alpha splats into a stable smooth surface.

Fix: AccumPoint / free_splatter_point carry `opacity`; add_view stores it,
consensus_fuse averages it, and write_cloud_splat emits it as the splat alpha and
caps by importance (opacity*volume) like write_splat. On the same pair the cloud
splat is now byte-identical in alpha (1/254/mean 41) to write_splat and renders as
a clean alpha-blended surface instead of an opaque swirling soup.

test_pose green; the C-API struct gains one float (emit_points copies it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prevent the class of regression that dropped first rotation/scale, then opacity,
from the accumulated-cloud splat writer (a second copy of the encoder that drifted
from the proven single-run one).

#1 Unify: new src/splat.h::encode_splat_record is the ONE definition of the
   OpenCV->OpenGL convention, quaternion remap, opacity->alpha and byte packing.
   Both write_splat (single-run) and write_cloud_splat (cloud) now build a
   (pos,scale,quat,rgb,opacity) tuple and call it, so they cannot diverge again.
   Verified byte-identical to the previous write_splat output (pure refactor).

#2 Pin: two asset-free tests in test_pose.cpp --
   - test_splat_record: pins the encoder bytes, incl. the exact regressed field
     (opacity 0.5 -> alpha 127, NOT a forced 255) and the rotation remap.
   - test_accumulate_channels: a one-pair (T=identity) accumulation must preserve
     every gaussian channel (xyz, SH->rgb, opacity, scale, rotation, frame); a
     dropped channel fails immediately with zero fixtures.

Both guards fail on either historical bug. ctest -LE model green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Consensus fusion only keeps voxels seen by >= K frames; with few frames (a short
arc) only a small fraction of the scene is multiply-observed, and the existing
"averaged" output (one point per consensus voxel) then decimated that further --
so the final fused scene was very sparse (~38k of 1M points on the 4-frame demo).

Add a "kept" mode (the prototype's fuse.py --ply-kept): keep every raw gaussian
whose voxel is corroborated by >= K frames -- floaters still removed, but nothing
averaged away. On the demo that's 446k vs 38k points (12x denser), a solid surface
instead of a sparse scatter.

- consensus_fuse gains `keep_raw` (default false = averaged, unchanged);
  free_splatter_accumulator_fuse gains a `keep_raw` arg; CLI `--fuse-mode
  kept|averaged` (kept is the default for the cloud demo).
- Golden test pins both modes (averaged -> 5 voxel points, kept -> 15 raw points);
  fuzz_pose drives both modes; ctest -LE model green, fuzz clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ne + best-frame fusion

Pairwise Sim(3) chaining leaves per-object (non-rigid) misregistration: overlapping
frames show doubled objects (two lamps, a bed end offset from its top). A rigid
per-frame pose BA can't fix this (tried, measured: 0.6%->0.6%, no-op -- reverted) --
the residual isn't a camera-pose error, it's that each frame predicts an object's
position slightly differently. Two fixes that DO work:

- consensus_refine (gaussian-level, non-rigid): each iteration moves every point a
  fraction toward the opacity-weighted consensus of the OTHER frames' points in its
  coarse-to-fine voxel neighbourhood. Local + non-rigid, so spatially-varying
  ghosting collapses. On the real strong-arc cloud: ghosting 1.2% -> 0.26% (~5x);
  golden 3.7% -> 0.0%. Exposed as Accumulator::refine, free_splatter_refine_cloud /
  _accumulator_refine, CLI --refine.
- consensus_fuse gains FUSE_BEST: per consensus voxel keep only the single most-
  confident frame's gaussians (dense AND de-ghosted, no stacked copies). keep_raw
  bool -> mode int {averaged,kept,best}; CLI --fuse-mode best.

Golden tests pin both (consensus_refine de-ghost, best=one-frame-per-voxel);
fuzz_pose drives the refine functions + all three fuse modes on garbage. ctest
-LE model green, fuzz clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ine knobs

make_accumulate_demo.sh now passes --refine (gaussian-level consensus de-ghost) by
default (REFINE=0 to disable) and the live demo ends on a best-frame fused surface.
web/README documents the three fuse modes (kept/best/averaged) and the de-ghosting
(why gaussian-level consensus works where rigid pose-BA doesn't).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…v2 reference)

Quantifies how well a 2-view pair constrains depth, two independent ways, to flag
pairs whose reconstruction can't be trusted (and to detect the model inventing
depth the images don't support).

- pose::parallax_stats / pair_parallax (src/pose.{h,cpp}): from the model's OWN
  recovered geometry (estimate_poses, normalize=false so cameras+points share the
  view-0 frame) compute the median triangulation angle, the baseline angle off the
  optical axis (0=dolly/no-parallax, 90=strafe/ideal), and baseline/median-depth.
  All angles scale-invariant.
- C-API free_splatter_pair_parallax + free_splatter_parallax (include/free_splatter.h,
  src/free_splatter.cpp); CLI `--parallax MODEL {img0 img1 | pair.f32}`.
- tests/test_pose.cpp test_parallax_geometry: golden on synthetic cameras — a
  strafe baseline gives lateral=90 / tri=atan(B/Z); a dolly of the SAME length
  gives ~0. Pins the metric measures depth-resolving motion, not raw displacement.
- scripts/parallax_ref.py: dev-time INDEPENDENT reference (cv2, nix devShell only,
  never shipped). Feature matches -> ORB-SLAM homography-vs-fundamental R_H
  (calibration-free degeneracy) + essential-matrix median triangulation angle.

Validation (re10k ladder, focal matched to the model): on the well-conditioned
pair the two agree to 0.3 deg (after 15.6 vs before 15.9); on near-degenerate
pairs the model over-reports parallax 2-4x (after 7.0/4.5 vs independent 1.7/1.7),
i.e. it hallucinates depth the images can't support -- exactly what the cross-check
is for. Wide-baseline orbits (Truck) starve sparse matching, so the geometric
reference is only reliable on well-textured moderate-baseline pairs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold a candidate frame into the world only if its triangulation angle vs the last
KEPT frame clears DEG degrees (free_splatter_pair_parallax) -- else its depth is
ill-conditioned and the model would invent it. Implemented as keyframe selection:
the next candidate is re-paired against the last kept frame (not the immediate
predecessor), so skipping a frame re-anchors instead of breaking the Sim(3) chain.
Threshold 0 degenerates to consecutive pairs (no behavior change). Image mode only
(re-pairing needs the engine); .f32 dump mode warns and accumulates all, since
fixed pairs can't be re-anchored.

Verified on the re10k demo frames: (f0000,f0020)=15.6 deg kept, (f0020,f0040)=7.0
skipped, then f0060 re-paired against f0020 = 15.7 deg kept -- the degenerate
middle frame dropped, the world kept to well-conditioned views only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
make_accumulate_demo.sh now passes --min-parallax (MIN_PARALLAX, default 8 deg;
0 disables) and --fuse-mode best, and builds the manifest from the frames the gate
actually KEPT (parsing the keep/skip log): frame 0 is the anchor, each "keep frame
J" adds input frame J, and step acc_n shows that step's n kept thumbnails. So you
can feed a long dense frame stream and the gate curates the well-conditioned subset
instead of folding in frames whose depth the model would invent.

Verified end-to-end on the 7-frame re10k loop: gate kept f0000/f0020/f0060 and
dropped f0040 (7 deg), f0080 (2.4), f0100 (4.8), f0120 (6.6) -- the walkthrough
stops translating laterally after f0060, so only three views carry depth.

web/README documents MIN_PARALLAX (after-inference angle, over-reports -> keep well
above COLMAP's 1-2 deg; parallax_ref.py is the independent cross-check) and folds
in the earlier honest de-ghost rewrite (best-frame selection; --refine off, why).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop a clip under demo-vids/<name>/ (mirroring the demo-photos/ convention) and
this samples it to frames, lets the --min-parallax gate curate the well-conditioned
keyframes, and bakes a growing-reconstruction demo (make_accumulate_demo.sh) to
.cache/demo/<name>/. Time-lapse / slow-pan clips are the sweet spot: the gate drops
the near-duplicate tight frames and keeps ~10-14deg-parallax steps.

Verified on two clips: flower-bed (81 frames -> 21 sampled -> 11 kept, a clean
10-step growing flower bed) and office-corner (9 -> 5 kept, 4 steps). gitignore the
demo-vids/ drop folder like demo-photos/ (user media, not committed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A user could silently land on the ~50x slower CPU path. Now the library default
device is "vulkan" (src/options.h) and an empty/unset device resolves to vulkan
(free_splatter.cpp), so a caller who never sets one fails-closed if no GPU is
present rather than running on CPU. backend.cpp already errors for an explicit
vulkan request with no device — that path is now the default.

- CLI forwards the library default (vulkan); on a load failure with no explicit
  --device it prints guidance ("pass --device cpu to run on CPU"). Usage updated.
- bench: device label no longer assumes cpu when unset.
- make_accumulate_demo.sh: DEVICE defaults to vulkan and resolves/builds the vulkan
  CLI (build/vulkan/bin/free_splatter-cli) like serve.sh builds the .so; DEVICE=cpu
  + the release CLI stays as the explicit CPU escape (headless/CI bakes).

Tests are unaffected (every test passes an explicit device; default FREE_SPLATTER_DEVICE
is cpu). Verified: CPU build with no --device fails-closed with guidance; --device
cpu runs; ctest -LE model green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
richiejp and others added 3 commits June 28, 2026 11:48
…PU bake)

The accumulating-reconstruction demos were separate static dirs on separate ports;
now the web app switches between them and can generate a new one from a video in
the UI.

- /api/scenes lists the growing-world scenes (server/scenes.go scanScenes: any
  scenesDir subdir whose manifest.json has a "steps" array — the storyboard's
  "stations" manifest is excluded). Served at /scenes-assets/. New `-scenes-dir`
  flag (defaults to -demo-dir). No in-memory registry: re-scanned per request, so
  externally-baked and uploaded scenes appear with no restart.
- POST /api/scene/from-video (+ GET /api/scene/status/{job}): samples frames
  (ffmpeg), runs the keyframe parallax gate + accumulate + best-frame fuse
  IN-PROCESS on the loaded GPU engine, and writes the scene dir incrementally.
  Async job + polling; one bake at a time (429 if busy). engine.go binds the
  accumulator + pair_parallax C API (with point/parallax ABI-mirror structs and
  copy helpers); convert.go adds encodeCloudSplat (mirrors write_cloud_splat +
  splat.h — note the cloud colour is already linear, NOT SH-transformed).
- server/web/accumulate.html: the viewer gains a scene dropdown + a "make scene
  from video" upload widget with live progress. It is now the SINGLE source (moved
  from web/; standalone baked dirs fall back to ./manifest.json when /api/scenes is
  absent). make_accumulate_demo.sh copies it from server/web. index.html links to it.

Verified end-to-end on Vulkan (AMD RADV): /api/scenes lists the 6 baked scenes;
uploading office-corner1.mp4 baked a 5-step scene (gate kept 5/9, same as the
offline bake-vids.sh) that renders identically to the CLI result and appears in the
dropdown live. Go build + vet clean; ctest -LE model green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pages were already one server, but / was the reconstruct page and the baked
accumulating demos were being served as separate static instances on other ports.
Now / is a landing menu and everything is reachable from this one server:

- server/web/index.html: new menu — cards link to /reconstruct.html (photos -> splat),
  /accumulate.html (accumulating scenes + video upload), /demo.html (storyboard).
- the reconstruct page moves to /reconstruct.html (was index.html); each page gets a
  "‹ all demos" link back to the menu.
- the standalone per-scene static servers are obsolete: those scenes are listed by
  /api/scenes and switched in /accumulate.html, so there's nothing left to run on a
  separate port.

Verified on the single Vulkan server: /, /reconstruct.html, /accumulate.html,
/demo.html, /api/scenes all 200; the menu renders three cards linking the pages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a batch hierarchical (balanced binary tree) alternative to the linear
Sim(3) accumulator, to cut the registration drift that smears the far end of a
long photo stream. Adjacent submaps of `block` frames overlap by `overlap`
frames and each merge fits one Sim(3) over all shared frames at once, so the
fit is over-determined and per-frame depth noise averages out — drift compounds
over ~log2(N) hops instead of N. block=2,overlap=1 is the plain overlap-by-one
tree.

- Engine (src/pose.{h,cpp}): tree_accumulate_overlap + consensus_fuse of an
  arbitrary frame-tagged cloud (best-frame mode de-ghosts the overlapping
  per-frame copies); shared fit_band_sim RANSAC registration primitive.
- C API: free_splatter_tree_overlap (full merge or staged side-by-side via
  block/overlap/max_levels/layout_spacing/per_node_cap) + free_splatter_fuse_cloud.
- CLI: --accumulate-tree (--tree-block/--tree-overlap, --fuse) and --tree-stages,
  which bakes one laid-out .splat per merge level + a consensus-fused final stage
  and a manifest the viewer steps through.
- Viewer (accumulate.html): vibrance/saturation slider (?sat= override), ported
  from reconstruct.html; reframes per stage for tree-stages manifests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@richiejp richiejp merged commit be8cb94 into master Jul 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant