diff --git a/.gitignore b/.gitignore
index 97a3841..15c3e90 100644
--- a/.gitignore
+++ b/.gitignore
@@ -20,6 +20,9 @@ __pycache__/
 # reference code (third-party, fetched on demand — never vendored here).
 .cache/
 
+# Throwaway working dir for ad-hoc harnesses / scratch (never committed).
+/scratchpad/
+
 # Model + data artifacts (generated; never committed)
 *.gguf
 *.safetensors
@@ -50,5 +53,6 @@ timeout-*
 /server/fsserver
 /server/server
 
-# Demo-video drop folder (user photos; baked into .cache/demo)
+# Demo drop folders (user photos / videos; baked into .cache/demo)
 /demo-photos/
+/demo-vids/
diff --git a/CLAUDE.md b/CLAUDE.md
index 1d1ec3e..15bc41a 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -2,19 +2,46 @@
 
 ## What this project is
 
-A GGML/C++ inference engine for the **neural-network front half** of
-TencentARC/FreeSplatter: image patch tokenizer → multi-view self-attention
-transformer → per-pixel 3D-Gaussian parameter head. Given N uncalibrated views
-it returns, per input pixel, the activated Gaussian parameters.
-
-**Scope:** pieces 1–3 only. The **rasterizer and the PnP pose solver are OUT OF
-SCOPE** — they are a downstream consumer of the `[N, H, W, gaussian_channels]`
-tensor this engine produces. Do not add them here.
+A GGML/C++ engine for the **neural-network front half** of
+TencentARC/FreeSplatter (image patch tokenizer → multi-view self-attention
+transformer → per-pixel 3D-Gaussian parameter head) **plus the downstream
+camera-pose recovery and cross-run registration that consume its output**. Given
+N uncalibrated views it returns, per input pixel, the activated Gaussian
+parameters; **PnP** then recovers each view's camera, and successive runs are
+aligned into one accumulating world — the path toward live reconstruction from a
+moving camera.
+
+**Scope (updated):** the engine (pieces 1–3), **PnP pose recovery (now IN
+scope)**, and the cross-run Sim(3) alignment / accumulation. Keep the seam at the
+`[N, H, W, gaussian_channels]` tensor clean — it is the contract between the
+engine and the pose consumer. Rendering itself stays in the demo viewer
+(Vulkan/WebGL), not the engine.
 
 Target checkpoint first: **`freesplatter-scene`** (`gaussian_channels=23`,
 `sh_residual=true`, black background). The transformer backbone is identical
 across all three variants, so object/object-2dgs are cheap follow-ons.
 
+## Language & dependency policy
+
+- **Everything ships in C++** (the engine, PnP, and the alignment/accumulation
+  once proven). **Go only for the demo web server** — the purego layer that drives
+  the C API → Vulkan + WebGL viewer.
+- **The CLI and the C API must have NO Python dependency at runtime.** Every piece
+  of current and future functionality is reachable from `free_splatter-cli` and
+  `include/free_splatter.h` without invoking Python.
+- **Python is confined to two places, neither shipped:**
+  1. **Dev-time reference / conversion / validation** that runs in
+     `docker/Dockerfile.cuda` (`scripts/hf_dump.py`, `convert.py`,
+     `compare_taps.py`, …) — the only place torch runs; never a runtime dependency.
+  2. **The `pose/` research prototype — now DONE and DELETED.** It was the
+     temporary Python (numpy + cv2) prototype that proved the
+     accumulating-reconstruction approach. That approach is proven, the whole
+     pipeline (focal, Sim(3) align, robust PnP, accumulation, loop closure,
+     consensus fusion) is **rewritten in C++** (`src/pose.{h,cpp}`), exposed via
+     `free_splatter-cli` + `include/free_splatter.h`, and the Python prototype has
+     been **removed** (see git history for it and its layer-by-layer parity
+     harnesses). No Python remains in the pose path.
+
 ## Validation is the backbone (non-negotiable)
 
 This is a numerical port. Correctness means matching the PyTorch reference
@@ -47,11 +74,16 @@ This is a numerical port. Correctness means matching the PyTorch reference
 
 ## Per-component discipline
 
-Each component (`image`, `gguf_loader`, `backend`, `model`, the head) has its own
+Each component (`image`, `gguf_loader`, `backend`, `model`, the head, and
+**`pose` = PnP + focal + Sim(3) alignment + accumulation/loop/fusion**) has its own
 unit test and is made independently green **before** cross-component parity.
 Component boundaries match the file layout. Keep the seam at the
-`[N,H,W,gaussian_channels]` tensor clean — that is the contract with any
-rasterizer.
+`[N,H,W,gaussian_channels]` tensor clean — that is the contract between the engine
+and the pose/rendering consumers. The C++ `pose` component (`tests/test_pose.cpp`,
+asset-free golden tier) carries the parity discipline the Python prototype
+established: **bit-exact to upstream `estimate_poses`** and **validated against
+independent ground-truth poses** — see git history for the prototype's
+`check_upstream_parity.py` / `re10k_experiment.py` harnesses.
 
 ## Debugging philosophy (mandatory sequence)
 
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 83fabbc..2e757af 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -122,6 +122,7 @@ add_library(free_splatter ${_fs_libtype}
     src/gguf_loader.cpp
     src/model.cpp
     src/image.cpp
+    src/pose.cpp
     src/free_splatter.cpp
 )
 target_include_directories(free_splatter PUBLIC include)
@@ -144,12 +145,19 @@ if (FREE_SPLATTER_BUILD_TOOLS AND NOT EMSCRIPTEN)
 endif()
 
 if (FREE_SPLATTER_FUZZ)
-    foreach(target fuzz_image fuzz_options)
+    foreach(target fuzz_image fuzz_options fuzz_pose)
         add_executable(${target} fuzz/${target}.cpp)
         target_link_libraries(${target} PRIVATE free_splatter)
         target_include_directories(${target} PRIVATE src)
         target_link_options(${target} PRIVATE -fsanitize=fuzzer)  # libFuzzer main
     endforeach()
+    # fuzz_decode also drives the vendored (third-party) stb decoder, so it needs
+    # the stb implementation TU + headers (built warning-quiet like the CLI).
+    add_executable(fuzz_decode fuzz/fuzz_decode.cpp tools/stb_impl.cpp)
+    target_link_libraries(fuzz_decode PRIVATE free_splatter)
+    target_include_directories(fuzz_decode PRIVATE src ${CMAKE_SOURCE_DIR}/third_party/stb)
+    target_link_options(fuzz_decode PRIVATE -fsanitize=fuzzer)
+    set_source_files_properties(tools/stb_impl.cpp PROPERTIES COMPILE_OPTIONS "-w")
 endif()
 
 # --- WASM (Emscripten) -------------------------------------------------------
diff --git a/bench/free_splatter-bench.cpp b/bench/free_splatter-bench.cpp
index 8f3b165..e505433 100644
--- a/bench/free_splatter-bench.cpp
+++ b/bench/free_splatter-bench.cpp
@@ -88,7 +88,7 @@ int main(int argc, char ** argv) {
     }
 
     std::printf("model: %s  device=%s  %dx%d  in=%d  gc=%d  views=%d  S=%lld  load=%.1f ms\n",
-                model, device ? device : "cpu", geo.image_width, geo.image_height,
+                model, device ? device : "vulkan", geo.image_width, geo.image_height,
                 geo.in_channels, geo.gaussian_channels, views,
                 (long long) ((int64_t) views * (geo.image_height/8) * (geo.image_width/8)), load_ms);
 
@@ -124,7 +124,7 @@ int main(int argc, char ** argv) {
                 "(%.2f views/s, %.2f scenes/s)\n", mn, med, mean, mx, vps, 1000.0 / med);
     std::printf("RESULT engine device=%s views=%d gc=%d load_ms=%.1f min_ms=%.1f "
                 "median_ms=%.1f mean_ms=%.1f max_ms=%.1f views_per_s=%.3f\n",
-                device ? device : "cpu", views, geo.gaussian_channels, load_ms,
+                device ? device : "vulkan", views, geo.gaussian_channels, load_ms,
                 mn, med, mean, mx, vps);
     return 0;
 }
diff --git a/flake.nix b/flake.nix
index 7976a20..5a252a5 100644
--- a/flake.nix
+++ b/flake.nix
@@ -11,7 +11,10 @@
       # hf_dump.py) run in docker/Dockerfile.cuda alongside the GPU model. The
       # M0 gate synthesizes its test GGUF in C++ (tests/test_loader.cpp), so the
       # build + fast test tier need no Python at all.
-      pyEnv = pkgs.python3.withPackages (ps: with ps; [ numpy ]);
+      # opencv4 (cv2): the downstream pose/ prototype mirrors FreeSplatter's
+      # cv2.solvePnPRansac for the exact-upstream PnP path; numpy-only fallback
+      # otherwise. Not needed by the engine build/test.
+      pyEnv = pkgs.python3.withPackages (ps: with ps; [ numpy opencv4 ]);
     in
     {
       devShells.${system} = {
diff --git a/fuzz/README.md b/fuzz/README.md
index 8789e0e..e36f47d 100644
--- a/fuzz/README.md
+++ b/fuzz/README.md
@@ -6,8 +6,10 @@ libFuzzer harnesses for the **untrusted** input surfaces, built with
 ```sh
 nix develop .#fuzz
 cmake --preset fuzz && cmake --build --preset fuzz
-mkdir -p corpus_img && ./build/fuzz/fuzz_image -max_total_time=60 corpus_img
-mkdir -p corpus_opt && ./build/fuzz/fuzz_options -max_total_time=60 corpus_opt
+mkdir -p corpus_img    && ./build/fuzz/fuzz_image   -max_total_time=60 corpus_img
+mkdir -p corpus_opt    && ./build/fuzz/fuzz_options -max_total_time=60 corpus_opt
+mkdir -p corpus_pose   && ./build/fuzz/fuzz_pose    -max_total_time=60 corpus_pose
+mkdir -p corpus_decode && ./build/fuzz/fuzz_decode  -max_total_time=60 corpus_decode
 ```
 
 ## Targets
@@ -18,9 +20,20 @@ mkdir -p corpus_opt && ./build/fuzz/fuzz_options -max_total_time=60 corpus_opt
   crashes, reads OOB, or returns a populated buffer on a rejected input.
 - **`fuzz_options`** — the options-builder / device-string setters with
   arbitrary byte strings; asserts no crash and NULL-safe frees.
+- **`fuzz_pose`** — the public pose C-API: `free_splatter_estimate_poses` and the
+  accumulator (`add_pair` → `cloud` / `fuse` / `camera_path`) driven with
+  arbitrary float gaussian buffers (NaN/Inf/denormals) and fuzz-chosen geometry.
+  Caught a real SIGFPE (the RANSAC sampler's `% N` with N=0 valid
+  correspondences) and motivated the non-finite guards in `consensus_fuse`
+  (the float→int voxel-coordinate cast).
+- **`fuzz_decode`** — the untrusted image-FILE path: arbitrary bytes →
+  `stb_image` → center-crop + resize → CHW (the surface a user photo crosses in
+  the CLI / demo). stb is vendored THIRD-PARTY; per CLAUDE.md we fuzz the boundary
+  and would guard (not patch) any stb-internal sanitizer trip — none seen so far.
 
 ## Trust boundary (intentional)
 
 The **GGUF model file is TRUSTED** and is deliberately **not** fuzzed: it is our
-own converter's output, loaded by us. Image inputs and the public C-API string
-arguments are **untrusted** and fuzzed. Keep this asymmetry — see `CLAUDE.md`.
+own converter's output, loaded by us. Image inputs (encoded files and decoded
+pixels) and the public C-API arguments are **untrusted** and fuzzed. Keep this
+asymmetry — see `CLAUDE.md`.
diff --git a/fuzz/fuzz_decode.cpp b/fuzz/fuzz_decode.cpp
new file mode 100644
index 0000000..dcb0a5c
--- /dev/null
+++ b/fuzz/fuzz_decode.cpp
@@ -0,0 +1,39 @@
+// libFuzzer harness for the UNTRUSTED image-FILE decode path: arbitrary bytes ->
+// stb_image -> the CLI's center-crop + resize -> [0,1] CHW. This is the actual
+// surface a user photo crosses in `free_splatter-cli` / the demo. stb_image is a
+// vendored THIRD-PARTY decoder; per CLAUDE.md we fuzz the boundary and, where stb
+// itself trips a sanitizer on malformed input, GUARD it rather than patch stb.
+#include "stb_image.h"
+#include "stb_image_resize2.h"
+
+#include <cstdint>
+#include <cstring>
+#include <vector>
+
+extern "C" int LLVMFuzzerTestOneInput(const uint8_t * data, size_t size) {
+    int w = 0, h = 0, n = 0;
+    // Decode arbitrary bytes as an image, forcing RGB (the CLI's contract).
+    unsigned char * px = stbi_load_from_memory(data, (int) size, &w, &h, &n, 3);
+    if (!px) return 0;                      // rejected -> nothing to do
+
+    // Guard against a decoded size that would overflow the crop/resize math (stb
+    // can report large dims on crafted headers); the CLI works on sane photos.
+    if ((int64_t) w * h > 64ll * 1024 * 1024 || w <= 0 || h <= 0) { stbi_image_free(px); return 0; }
+
+    // center-crop to a square, then resize to a small size (mirror the CLI).
+    const int s = w < h ? w : h, left = (w - s) / 2, top = (h - s) / 2;
+    std::vector<unsigned char> sq((size_t) s * s * 3);
+    for (int y = 0; y < s; y++)
+        std::memcpy(&sq[(size_t) y * s * 3], &px[((size_t)(top + y) * w + left) * 3], (size_t) s * 3);
+    stbi_image_free(px);
+
+    const int outsz = 32;
+    std::vector<unsigned char> rz((size_t) outsz * outsz * 3);
+    stbir_resize_uint8_linear(sq.data(), s, s, 0, rz.data(), outsz, outsz, 0, STBIR_RGB);
+
+    std::vector<float> chw((size_t) 3 * outsz * outsz);
+    for (int c = 0; c < 3; c++)
+        for (int i = 0; i < outsz * outsz; i++)
+            chw[(size_t) c * outsz * outsz + i] = rz[(size_t) i * 3 + c] / 255.0f;
+    return 0;
+}
diff --git a/fuzz/fuzz_pose.cpp b/fuzz/fuzz_pose.cpp
new file mode 100644
index 0000000..de8a016
--- /dev/null
+++ b/fuzz/fuzz_pose.cpp
@@ -0,0 +1,67 @@
+// libFuzzer harness for the public pose C-API surface (free_splatter_estimate_poses
+// + the accumulator). These now take caller-supplied float buffers across the ABI,
+// so a binding that feeds NaN/Inf/degenerate geometry must not crash, read OOB, or
+// trip UBSan (e.g. a non-finite float -> int voxel-coord cast). The gaussian buffer
+// is engine output, but the C-API is a public boundary; keep it robust.
+#include "free_splatter.h"
+
+#include <cmath>
+#include <cstdint>
+#include <cstring>
+#include <vector>
+
+extern "C" int LLVMFuzzerTestOneInput(const uint8_t * data, size_t size) {
+    if (size < 8) return 0;
+
+    // small, fuzz-driven geometry so many inputs are explored cheaply
+    const int gc = 23;
+    const int H = 4 + (data[0] % 5) * 4;          // 4,8,12,16,20
+    const int W = 4 + (data[1] % 5) * 4;
+    const int P = H * W;
+
+    auto fill = [&](std::vector<float> & buf, size_t off) {
+        // reinterpret the fuzz bytes as floats (NaN/Inf/denormals all appear),
+        // tiling deterministically to fill the whole byte range.
+        uint8_t * b = (uint8_t *) buf.data();
+        const size_t nbytes = buf.size() * sizeof(float);
+        for (size_t i = 0; i < nbytes; i++) b[i] = data[(off + i) % size];
+    };
+
+    // 1) estimate_poses on a 2-view buffer
+    {
+        std::vector<float> g((size_t) 2 * P * gc);
+        fill(g, 2);
+        std::vector<float> c2w((size_t) 2 * 16);
+        float focal = 0;
+        free_splatter_estimate_poses(g.data(), 2, H, W, gc, 0.05f, c2w.data(), &focal);
+    }
+
+    // 2) accumulator: add a few pairs, then cloud / fuse / camera_path
+    free_splatter_accumulator * acc = free_splatter_accumulator_new(H, W, 0.05f);
+    if (acc) {
+        const int npairs = 1 + (data[2] % 3);     // 1..3 pairs
+        std::vector<float> g((size_t) 2 * P * gc);
+        for (int k = 0; k < npairs; k++) {
+            fill(g, (size_t) 3 + k);
+            free_splatter_accumulator_add_pair(acc, g.data(), gc);
+        }
+        free_splatter_point * cloud = nullptr; size_t nc = 0;
+        if (free_splatter_accumulator_cloud(acc, &cloud, &nc) == 0) {
+            free_splatter_refine_cloud(cloud, nc, 0.01f + (data[6] % 8) * 0.02f, 1 + (data[7] % 4), 0.5f);
+            free_splatter_buf_free(cloud);
+        }
+        free_splatter_accumulator_refine(acc, 0.03f, 2, 0.5f);   // de-ghost on garbage
+
+        const float voxel = 0.005f + (data[3] % 16) * 0.01f;   // 0.005..0.155
+        const int k = 1 + (data[4] % 4);                       // 1..4
+        const int mode = data[5] % 3;                          // averaged / kept / best
+        free_splatter_point * fused = nullptr; size_t nf = 0;
+        if (free_splatter_accumulator_fuse(acc, voxel, k, mode, &fused, &nf) == 0) free_splatter_buf_free(fused);
+
+        float * path = nullptr; int32_t nfr = 0;
+        if (free_splatter_accumulator_camera_path(acc, &path, &nfr) == 0) free_splatter_buf_free(path);
+
+        free_splatter_accumulator_free(acc);
+    }
+    return 0;
+}
diff --git a/include/free_splatter.h b/include/free_splatter.h
index 4707170..22ea889 100644
--- a/include/free_splatter.h
+++ b/include/free_splatter.h
@@ -77,6 +77,132 @@ FREE_SPLATTER_API int  free_splatter_run(free_splatter_ctx * ctx, const float *
                        int32_t height, int32_t width, float ** out, size_t * n_out);
 FREE_SPLATTER_API void free_splatter_buf_free(void * buf);  // NULL-safe
 
+// ---- downstream pose recovery + accumulation ----
+// The pose consumer of the engine seam: given the engine's [N,H,W,gaussian_channels]
+// output it recovers each view's camera (PnP) and stitches successive runs into one
+// accumulating world (cross-run Sim(3)). Dependency-free (no Python, no OpenCV).
+
+// Recover each view's camera from an engine output buffer. gaussians:
+// n_views*height*width*gaussian_channels float32 (the layout free_splatter_run
+// returns). cam2world_out: caller-owned n_views*16 float32, filled with a
+// row-major 4x4 cam2world per view. If focal_out is non-NULL it receives the
+// estimated shared focal. opacity_threshold gates valid pixels (engine output is
+// already activated). Returns 0 on success, -1 on failure.
+FREE_SPLATTER_API int free_splatter_estimate_poses(
+    const float * gaussians, int32_t n_views, int32_t height, int32_t width,
+    int32_t gaussian_channels, float opacity_threshold,
+    float * cam2world_out, float * focal_out);
+
+// After-inference parallax: how well a 2-view pair constrains depth, measured
+// from the model's OWN recovered geometry. Angles are scale-invariant.
+typedef struct {
+    float tri_angle_deg;       // median triangulation angle over confident points
+    float lateral_angle_deg;   // baseline angle off view-0 optical axis (0=dolly, 90=strafe)
+    float baseline_over_depth; // ||C1-C0|| / median point depth from camera 0
+    float baseline;
+    float median_depth;
+    float focal;               // estimated (or supplied) focal, px
+    int32_t n_points;          // confident points used for the medians
+} free_splatter_parallax;
+
+// Estimate parallax for the first two views of an engine output buffer.
+// gaussians: n_views(>=2)*height*width*gaussian_channels f32. Returns 0 on
+// success, -1 on bad args. A low tri_angle_deg (< ~1-2 deg) or near-zero
+// lateral_angle_deg means depth is ill-conditioned (reconstruction unreliable).
+FREE_SPLATTER_API int free_splatter_pair_parallax(
+    const float * gaussians, int32_t n_views, int32_t height, int32_t width,
+    int32_t gaussian_channels, float opacity_threshold, free_splatter_parallax * out);
+
+// One accumulated 3D gaussian in the global frame: position, color in [0,1], the
+// anisotropic scale and rotation quaternion (w,x,y,z) — transformed through the
+// cross-run Sim(3) so the cloud renders as oriented splats — and the source frame
+// it came from (for consensus fusion).
+typedef struct {
+    float x, y, z;
+    float r, g, b, opacity;
+    float sx, sy, sz;
+    float qw, qx, qy, qz;
+    int32_t frame;
+} free_splatter_point;
+
+// Sliding-window accumulator: feed each consecutive pair's engine output and it
+// chains the runs into one world. Opaque handle, not thread-safe.
+typedef struct free_splatter_accumulator free_splatter_accumulator;
+FREE_SPLATTER_API free_splatter_accumulator * free_splatter_accumulator_new(
+    int32_t height, int32_t width, float opacity_threshold);
+FREE_SPLATTER_API void free_splatter_accumulator_free(free_splatter_accumulator * acc); // NULL-safe
+
+// Add one pair's engine output: 2*height*width*gaussian_channels float32, where
+// view 0 shares a frame with the previous pair's view 1. Returns 0 on success.
+FREE_SPLATTER_API int free_splatter_accumulator_add_pair(
+    free_splatter_accumulator * acc, const float * gaussians, int32_t gaussian_channels);
+
+// Number of frames accumulated so far (== pairs_added + 1, or 0 before any pair).
+FREE_SPLATTER_API int free_splatter_accumulator_frame_count(const free_splatter_accumulator * acc);
+
+// Copy the current accumulated cloud: *out is malloc'd free_splatter_point[*n_out]
+// (free with free_splatter_buf_free). Returns 0 on success, -1 on failure.
+FREE_SPLATTER_API int free_splatter_accumulator_cloud(
+    const free_splatter_accumulator * acc, free_splatter_point ** out, size_t * n_out);
+
+// Consensus-fuse the current cloud: keep only voxels (size voxel_frac * extent)
+// corroborated by >= k distinct source frames (single-frame floaters dropped). mode:
+// 0 = averaged (one denoised point per voxel; sparse), 1 = kept (every raw gaussian
+// in a consensus voxel; dense), 2 = best (only the most-confident frame's gaussians
+// per voxel; dense AND de-ghosted). *out malloc'd as above. Returns 0 on success.
+FREE_SPLATTER_API int free_splatter_accumulator_fuse(
+    const free_splatter_accumulator * acc, float voxel_frac, int32_t k, int32_t mode,
+    free_splatter_point ** out, size_t * n_out);
+
+// Copy the global camera trajectory: *out malloc'd (*n_frames)*16 float32, a
+// row-major 4x4 cam2world (similarity) per frame. Returns 0 on success.
+FREE_SPLATTER_API int free_splatter_accumulator_camera_path(
+    const free_splatter_accumulator * acc, float ** out, int32_t * n_frames);
+
+// Geometric de-ghost a cloud in place (gaussian-level consensus refinement): each
+// iteration non-rigidly pulls every gaussian a fraction `alpha` toward the
+// multi-frame consensus of its (coarse-to-fine voxel_frac) neighbourhood — removing
+// the doubled objects the pairwise pose chain leaves. Returns 0 on success.
+FREE_SPLATTER_API int free_splatter_refine_cloud(
+    free_splatter_point * points, size_t n, float voxel_frac, int32_t iters, float alpha);
+
+// Same, applied to the accumulator's internal cloud (so a subsequent _fuse also
+// operates on the de-ghosted cloud). Returns 0 on success, -1 on bad args.
+FREE_SPLATTER_API int free_splatter_accumulator_refine(
+    free_splatter_accumulator * acc, float voxel_frac, int32_t iters, float alpha);
+
+// Consensus-fuse an arbitrary point cloud (e.g. a tree root) using each point's
+// `frame` tag: keep voxels (size voxel_frac*extent) seen by >= k distinct frames,
+// emitting per `mode` (0 averaged, 1 kept, 2 best-frame — dense AND de-ghosted).
+// *out malloc'd (free with free_splatter_buf_free). Returns 0 on success.
+FREE_SPLATTER_API int free_splatter_fuse_cloud(
+    const free_splatter_point * pts, size_t n, float voxel_frac, int32_t k, int32_t mode,
+    free_splatter_point ** out, size_t * n_out);
+
+// Hierarchical (balanced binary tree) accumulation — batch alternative to the
+// linear accumulator. pairs[k] is one engine output [2*H*W*gc] (pair k = frames
+// k,k+1; pair k's view-1 and pair k+1's view-0 are the same frame). Adjacent
+// submaps of `block` frames overlap by `overlap` frames, and each merge fits its
+// Sim(3) over all shared frames at once (over-determined -> averages per-frame
+// noise) — so drift compounds over ~log2(N) hops, not N, and is spread evenly
+// instead of dumped on the last frame. block=2,overlap=1 is the plain
+// overlap-by-one (single shared boundary) tree.
+//
+// The remaining args are for visualising the merge side by side (pass max_levels=-1,
+// layout_spacing=0, per_node_cap=0, n_nodes=NULL for a plain full merge to the root):
+// max_levels stops after that many rounds (-1 = full), so 0 returns the independent
+// base submaps, 1 the first merge round, …; step 0..D to watch the tree collapse to
+// one scene. The remaining nodes are laid out side by side (layout_spacing: 0 = none,
+// <0 = auto, >0 = explicit units) and each capped to its most important
+// (opacity*radius) points (per_node_cap>0) so every scene renders at its own detail.
+// *n_nodes (optional) gets the remaining node count (1 once fully merged). *out
+// malloc'd as free_splatter_point[*n_out] (free with free_splatter_buf_free). 0 on success.
+FREE_SPLATTER_API int free_splatter_tree_overlap(
+    const float * const * pairs, int32_t n_pairs, int32_t gaussian_channels,
+    int32_t height, int32_t width, float opacity_threshold, int32_t block, int32_t overlap,
+    int32_t max_levels, float layout_spacing, int32_t per_node_cap,
+    free_splatter_point ** out, size_t * n_out, int32_t * n_nodes);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/scripts/demo/bake-vids.sh b/scripts/demo/bake-vids.sh
new file mode 100755
index 0000000..4c8635e
--- /dev/null
+++ b/scripts/demo/bake-vids.sh
@@ -0,0 +1,65 @@
+#!/usr/bin/env bash
+# Auto-discover videos and bake each into an accumulating-reconstruction demo.
+#
+# Mirrors the drop-a-folder convenience of demo-photos/ (scripts/demo/bake.sh),
+# but for VIDEO: drop one folder per scene under demo-vids/ with a clip inside —
+#   demo-vids/office-corner/clip.mp4
+#   demo-vids/flower-bed/clip.mp4
+# — then run this. Each clip is sampled into frames, the parallax gate curates the
+# well-conditioned subset (scripts/make_accumulate_demo.sh --min-parallax), and the
+# result is a self-contained growing-reconstruction demo (accumulate.html) under
+# .cache/demo/<name>/.
+#
+#   nix develop -c scripts/demo/bake-vids.sh                 # bake every demo-vids/ folder
+#   MIN_PARALLAX=6 MAXFRAMES=30 nix develop -c scripts/demo/bake-vids.sh
+#
+# Env: VIDS (demo-vids/), OUTBASE (.cache/demo), MAXFRAMES (frames sampled per
+# clip, default 24), MIN_PARALLAX (gate degrees, default 8; 0 disables).
+set -euo pipefail
+cd "$(dirname "$0")/../.."
+ROOT=$(pwd)
+VIDS=${VIDS:-$ROOT/demo-vids}
+OUTBASE=${OUTBASE:-$ROOT/.cache/demo}
+MAXFRAMES=${MAXFRAMES:-24}
+export MIN_PARALLAX=${MIN_PARALLAX:-8}
+
+command -v ffmpeg >/dev/null || { echo "ffmpeg required" >&2; exit 1; }
+mkdir -p "$VIDS"
+[ -f "$VIDS/README.txt" ] || cat >"$VIDS/README.txt" <<'TXT'
+Drop one folder per scene here with a video clip inside (a time-lapse / slow pan
+works best), then run scripts/demo/bake-vids.sh:
+  office-corner/  clip.mp4
+  flower-bed/     clip.mp4
+Each clip is sampled to frames, the parallax gate keeps the well-conditioned ones,
+and the result is a growing-reconstruction demo under .cache/demo/<name>/.
+TXT
+
+shopt -s nullglob
+baked=()
+for D in "$VIDS"/*/; do
+  D=${D%/}; name=$(basename "$D")
+  vid=$(find "$D" -maxdepth 1 -type f \( -iname '*.mp4' -o -iname '*.mov' -o -iname '*.webm' -o -iname '*.mkv' -o -iname '*.m4v' \) | sort | head -1)
+  [ -n "$vid" ] || { echo "  no video in $name/ — skipping"; continue; }
+  safe=$(printf '%s' "$name" | tr -c 'A-Za-z0-9_-' '_')
+
+  # Sample frames: decode all, then take an even stride down to ~MAXFRAMES. (Time-
+  # lapse clips are short, so decoding all is cheap and avoids frame-count guessing.)
+  fdir="$OUTBASE/_frames/$safe"; rm -rf "$fdir"; mkdir -p "$fdir"
+  ffmpeg -y -loglevel error -i "$vid" "$fdir/all%04d.png"
+  mapfile -t ALL < <(ls "$fdir"/all*.png | sort)
+  nb=${#ALL[@]}
+  [ "$nb" -ge 2 ] || { echo "  $name: <2 frames decoded — skipping"; continue; }
+  stride=$(( (nb + MAXFRAMES - 1) / MAXFRAMES )); [ "$stride" -lt 1 ] && stride=1
+  FR=(); for ((i=0;i<nb;i+=stride)); do FR+=("${ALL[$i]}"); done
+  echo "== $name: $(basename "$vid")  ($nb frames -> ${#FR[@]} sampled, stride $stride, gate ${MIN_PARALLAX}deg) =="
+
+  bash scripts/make_accumulate_demo.sh "$OUTBASE/$safe" "${FR[@]}"
+  baked+=("$safe")
+done
+
+if [ "${#baked[@]}" -eq 0 ]; then
+  echo "no video folders found under $VIDS (drop demo-vids/<name>/<clip>.mp4)"; exit 0
+fi
+echo
+echo "baked ${#baked[@]} demo(s): ${baked[*]}"
+echo "serve one with:  python3 -m http.server -d $OUTBASE/<name> 8080   # open http://localhost:8080/"
diff --git a/scripts/make_accumulate_demo.sh b/scripts/make_accumulate_demo.sh
new file mode 100755
index 0000000..e837418
--- /dev/null
+++ b/scripts/make_accumulate_demo.sh
@@ -0,0 +1,88 @@
+#!/usr/bin/env bash
+# Bake the accumulating-reconstruction demo: run the engine over a stream of
+# photos, fold them into one growing world (free_splatter-cli --accumulate), and
+# lay out a self-contained web demo — the cloud reconstructed from 2 photos, then
+# 3, then 4, …, plus the input thumbnails and the accumulate.html viewer.
+#
+#   MODEL=.cache/freesplatter-scene-f16.gguf \
+#     scripts/make_accumulate_demo.sh OUT_DIR FRAME0 FRAME1 FRAME2 ...
+#
+# Then serve it:  cd OUT_DIR && python3 -m http.server 8080   (open index.html)
+set -euo pipefail
+cd "$(dirname "$0")/.."
+ROOT=$(pwd)
+MODEL=${MODEL:-.cache/freesplatter-scene-f16.gguf}
+MAXSPLATS=${MAXSPLATS:-500000}
+# GPU by default (CPU is ~50x slower); DEVICE=cpu is the explicit opt-in. The CLI
+# is resolved to the matching build, and the vulkan CLI is built if missing (it
+# isn't part of the default CPU build) — mirroring scripts/serve.sh for the .so.
+DEVICE=${DEVICE:-vulkan}
+if [ "$DEVICE" = cpu ]; then
+  CLI=${CLI:-$ROOT/build/release/bin/free_splatter-cli}
+else
+  CLI=${CLI:-$ROOT/build/vulkan/bin/free_splatter-cli}
+  if [ ! -x "$CLI" ]; then
+    echo "building vulkan CLI ($CLI) ..." >&2
+    cmake -S "$ROOT" -B "$ROOT/build/vulkan" -DFREE_SPLATTER_VULKAN=ON -DFREE_SPLATTER_BUILD_SHARED=ON >&2
+    cmake --build "$ROOT/build/vulkan" --target free_splatter-cli >&2
+  fi
+fi
+
+[ "$#" -ge 3 ] || { echo "usage: $0 OUT_DIR FRAME0 FRAME1 [FRAME2 ...]  (>=2 frames)" >&2; exit 1; }
+OUT=$1; shift
+FRAMES=("$@")
+mkdir -p "$OUT"
+
+# 1) input thumbnails: center-crop square + resize, mirroring the engine's own
+#    preprocessing so the strip shows exactly the view that was reconstructed.
+for i in "${!FRAMES[@]}"; do
+  ffmpeg -y -loglevel error -i "${FRAMES[$i]}" \
+    -vf "crop='min(iw,ih)':'min(iw,ih)',scale=360:360" "$OUT/view_$i.jpg"
+done
+
+# 2) accumulate: one engine pass over the stream writes acc_2.splat, acc_3.splat,
+#    ... (the cloud after each KEPT photo) + a best-frame consensus acc_fused.splat.
+#    --min-parallax gates by keyframe: a candidate frame is folded in only if its
+#    triangulation angle vs the last KEPT frame clears MIN_PARALLAX deg (else its
+#    depth is ill-conditioned and the model would invent it); skipped frames re-
+#    anchor against the last kept one. This is the after-inference angle, which the
+#    model over-reports, so keep it well above COLMAP's 1-2 deg. MIN_PARALLAX=0
+#    folds in every frame. --refine (OFF) is the gaussian averaging pass (blurs).
+RFLAG=; [ "${REFINE:-0}" = 1 ] && RFLAG=--refine
+MINPAR=${MIN_PARALLAX:-8}
+PFLAG=; [ "$MINPAR" != "0" ] && PFLAG="--min-parallax $MINPAR"
+LOG="$OUT/accumulate.log"
+"$CLI" --device "$DEVICE" --accumulate $RFLAG $PFLAG --fuse --fuse-mode best \
+  --splat-prefix "$OUT/acc" --max-splats "$MAXSPLATS" "$MODEL" "${FRAMES[@]}" | tee "$LOG"
+
+# 3) manifest.json: one step per KEPT photo (acc_n.splat + its n kept thumbnails),
+#    then the consensus-fused step. With gating only the frames that carried enough
+#    parallax are kept: frame 0 is the anchor and each "keep frame J" line adds
+#    input frame J. Ungated (MIN_PARALLAX=0), every frame is kept.
+if [ -z "$PFLAG" ]; then
+  kept=(); for i in "${!FRAMES[@]}"; do kept+=("$i"); done
+else
+  kept=(0)
+  while read -r j; do kept+=("$j"); done < <(grep -oP '^keep frame \K[0-9]+' "$LOG")
+fi
+nkept=${#kept[@]}
+[ "$nkept" -ge 2 ] || { echo "only $nkept frame(s) kept (MIN_PARALLAX=$MINPAR too high?)" >&2; exit 1; }
+thumb() { printf '"view_%s.jpg"' "$1"; }       # thumbnail referenced by original index
+{
+  echo '{ "steps": ['
+  for ((k=2;k<=nkept;k++)); do
+    imgs=""
+    for ((j=0;j<k;j++)); do imgs+="$(thumb "${kept[$j]}")"; [ $j -lt $((k-1)) ] && imgs+=", "; done
+    printf '    {"splat":"acc_%d.splat","images":[%s],"n":%d},\n' "$k" "$imgs" "$k"
+  done
+  allimgs=""
+  for ((j=0;j<nkept;j++)); do allimgs+="$(thumb "${kept[$j]}")"; [ $j -lt $((nkept-1)) ] && allimgs+=", "; done
+  printf '    {"splat":"acc_fused.splat","images":[%s],"n":%d,"label":"consensus-fused — best-frame, parallax-gated"}\n' "$allimgs" "$nkept"
+  echo '  ] }'
+} > "$OUT/manifest.json"
+echo "kept frames (input indices): ${kept[*]}  of ${#FRAMES[@]}"
+
+# 4) the viewer, self-contained next to its assets
+cp "$ROOT/server/web/accumulate.html" "$OUT/index.html"   # single source; standalone-fallback to ./manifest.json
+echo "demo baked -> $OUT"
+echo "serve: (cd $OUT && python3 -m http.server 8080)  then open http://localhost:8080/"
diff --git a/scripts/parallax_ref.py b/scripts/parallax_ref.py
new file mode 100644
index 0000000..eab6bfe
--- /dev/null
+++ b/scripts/parallax_ref.py
@@ -0,0 +1,137 @@
+#!/usr/bin/env python3
+"""Independent (model-free) parallax reference for a frame pair.
+
+Dev-time validation ONLY (never shipped; runs in the nix devShell, which provides
+cv2+numpy per flake.nix). The C++ engine's after-inference `--parallax` measures
+depth conditioning from the model's OWN recovered geometry; this measures it from
+raw 2D image evidence (feature matches -> epipolar geometry), so the two can be
+cross-checked. Agreement on a known-good-depth scene validates the engine metric;
+a large gap (geometric angle tiny, model angle confident) flags the model
+hallucinating depth.
+
+Two outputs:
+  * R_H  -- ORB-SLAM homography-vs-fundamental score ratio. ~>0.45 => a homography
+            explains the motion => pure-rotation / planar => no usable parallax.
+            Calibration-free.
+  * median triangulation angle (deg) -- COLMAP's depth-conditioning number; below
+            ~1-2 deg depth is ill-posed. Uses an assumed/supplied focal.
+
+Images are center-cropped square and resized to 512 to mirror the engine's
+preprocessing, so a focal in 512-px units matches the model's estimated focal.
+
+  nix develop -c python3 scripts/parallax_ref.py IMG0 IMG1 [--focal-px F]
+"""
+import argparse, sys
+import numpy as np
+import cv2
+
+
+def preprocess(path, size=512):
+    img = cv2.imread(path, cv2.IMREAD_COLOR)
+    if img is None:
+        sys.exit(f"cannot read {path}")
+    h, w = img.shape[:2]
+    s = min(h, w)
+    img = img[(h - s) // 2:(h - s) // 2 + s, (w - s) // 2:(w - s) // 2 + s]
+    return cv2.resize(img, (size, size), interpolation=cv2.INTER_AREA)
+
+
+def match(img0, img1):
+    g0, g1 = (cv2.cvtColor(i, cv2.COLOR_BGR2GRAY) for i in (img0, img1))
+    try:
+        det = cv2.SIFT_create(nfeatures=4000)
+        norm = cv2.NORM_L2
+    except AttributeError:
+        det = cv2.ORB_create(nfeatures=4000)
+        norm = cv2.NORM_HAMMING
+    k0, d0 = det.detectAndCompute(g0, None)
+    k1, d1 = det.detectAndCompute(g1, None)
+    if d0 is None or d1 is None or len(k0) < 8 or len(k1) < 8:
+        sys.exit("too few features")
+    bf = cv2.BFMatcher(norm)
+    good = [m for m, n in bf.knnMatch(d0, d1, k=2) if m.distance < 0.75 * n.distance]
+    p0 = np.float32([k0[m.queryIdx].pt for m in good])
+    p1 = np.float32([k1[m.trainIdx].pt for m in good])
+    return p0, p1
+
+
+def score_homography(H, p0, p1, sigma=1.0):
+    if H is None:
+        return 0.0
+    th, inv = 5.991, 1.0 / sigma ** 2
+    Hi = np.linalg.inv(H)
+    def transfer(M, a, b):
+        ah = np.c_[a, np.ones(len(a))]
+        q = (M @ ah.T).T
+        q = q[:, :2] / q[:, 2:3]
+        chi = np.sum((q - b) ** 2, axis=1) * inv
+        return np.where(chi < th, th - chi, 0.0)
+    return float(np.sum(transfer(H, p0, p1) + transfer(Hi, p1, p0)))
+
+
+def score_fundamental(F, p0, p1, sigma=1.0):
+    if F is None:
+        return 0.0
+    thF, thScore, inv = 3.841, 5.991, 1.0 / sigma ** 2
+    a0 = np.c_[p0, np.ones(len(p0))]
+    a1 = np.c_[p1, np.ones(len(p1))]
+    def epi(F, src, dst):                             # point-line chi in dst image
+        l = (F @ src.T).T                             # lines in dst
+        num = np.sum(l[:, :2] * dst, axis=1) + l[:, 2]
+        d2 = num ** 2 / (l[:, 0] ** 2 + l[:, 1] ** 2 + 1e-12)
+        chi = d2 * inv
+        return np.where(chi < thF, thScore - chi, 0.0)
+    s01 = epi(F, a0, p1)
+    s10 = epi(F.T, a1, p0)
+    return float(np.sum(s01 + s10))
+
+
+def triangulation_angle(p0, p1, f, cx=256.0, cy=256.0):
+    K = np.array([[f, 0, cx], [0, f, cy], [0, 0, 1]], np.float64)
+    E, _ = cv2.findEssentialMat(p0, p1, K, cv2.RANSAC, 0.999, 1.0)
+    if E is None or E.shape != (3, 3):
+        return None, 0
+    _, R, t, mP = cv2.recoverPose(E, p0, p1, K)
+    inl = (mP.ravel() > 0)
+    if inl.sum() < 8:
+        return None, int(inl.sum())
+    P0 = K @ np.eye(3, 4)
+    P1 = K @ np.hstack([R, t])
+    X4 = cv2.triangulatePoints(P0, P1, p0[inl].T, p1[inl].T)
+    X = (X4[:3] / X4[3]).T
+    C1 = (-R.T @ t).ravel()
+    r0 = X - 0.0
+    r1 = X - C1
+    n0 = np.linalg.norm(r0, axis=1)
+    n1 = np.linalg.norm(r1, axis=1)
+    ok = (n0 > 1e-9) & (n1 > 1e-9) & (X[:, 2] > 0)
+    cosang = np.sum(r0[ok] * r1[ok], axis=1) / (n0[ok] * n1[ok])
+    ang = np.degrees(np.arccos(np.clip(cosang, -1, 1)))
+    if len(ang) == 0:
+        return None, int(inl.sum())
+    return float(np.median(ang)), int(ok.sum())
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("img0"); ap.add_argument("img1")
+    ap.add_argument("--focal-px", type=float, default=None,
+                    help="focal in 512-px units; default from --fov-deg")
+    ap.add_argument("--fov-deg", type=float, default=60.0)
+    a = ap.parse_args()
+    f = a.focal_px if a.focal_px else 0.5 * 512 / np.tan(np.radians(a.fov_deg) / 2)
+
+    i0, i1 = preprocess(a.img0), preprocess(a.img1)
+    p0, p1 = match(i0, i1)
+    H, _ = cv2.findHomography(p0, p1, cv2.RANSAC, 3.0)
+    F, _ = cv2.findFundamentalMat(p0, p1, cv2.FM_RANSAC, 3.0, 0.99)
+    sH, sF = score_homography(H, p0, p1), score_fundamental(F, p0, p1)
+    rH = sH / (sH + sF) if (sH + sF) > 0 else float("nan")
+    tri, ntri = triangulation_angle(p0, p1, f)
+    tri_s = f"{tri:.3f}" if tri is not None else "n/a"
+    print(f"parallax_ref: matches={len(p0)}  R_H={rH:.3f} (>0.45 => degenerate)  "
+          f"tri_angle={tri_s} deg (npts={ntri})  focal={f:.1f}px")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/server/README.md b/server/README.md
index e85b60d..ec424e4 100644
--- a/server/README.md
+++ b/server/README.md
@@ -9,6 +9,10 @@ This is a thin Go HTTP server that binds the `free_splatter` C API with
 engine — the numerical port, validation, and build details live in the repo root
 `CLAUDE.md`.
 
+One server, one port. `/` is a **menu** linking to the demo pages, all served from
+here: `/reconstruct.html` (photos → splat), `/accumulate.html` (accumulating scenes
++ video upload), `/demo.html` (storyboard).
+
 ## Run
 
 ```sh
@@ -26,7 +30,26 @@ CGO_ENABLED=0 go run . \
 `net/http` the pure-Go resolver so the server builds on a bare `PATH`.
 
 Flags: `-addr` `-lib` `-models` (comma list of `name=path.gguf`; the first is the
-default) `-device` (`cpu`|`vulkan`|`vulkan:N`) `-max-splats` `-opacity-threshold`.
+default) `-device` (`vulkan` by default and **fail-closed** — it errors if no GPU
+is present rather than silently using the slow CPU path; pass `cpu` to opt in)
+`-max-splats` `-opacity-threshold` `-demo-dir` `-scenes-dir`.
+
+## Accumulating scenes + video upload (`/accumulate.html`)
+
+A second page plays the **growing-world** demo: pick a scene from the dropdown and
+watch its reconstruction build up as photos are folded in. The scenes are the
+accumulating-reconstruction dirs under `-scenes-dir` (default: the `-demo-dir`
+value) — any subdir whose `manifest.json` has a `steps` array, produced by
+`scripts/demo/bake-vids.sh` or by an upload. `GET /api/scenes` lists them (no
+restart needed when new ones appear).
+
+**Upload a video → it makes the scene.** The page's "make scene from video" control
+POSTs a clip to `POST /api/scene/from-video`; the server samples frames, runs the
+keyframe **parallax gate** (`--min-parallax`, default 8°) and the accumulate +
+best-frame fuse **in-process on the loaded GPU engine** (no shell-out, no second
+model load), and writes the scene dir. Progress streams via
+`GET /api/scene/status/{job}`; one bake runs at a time (HTTP 429 if busy). The new
+scene then appears in the dropdown automatically. Needs `ffmpeg` on `PATH`.
 
 ## Models
 
diff --git a/server/convert.go b/server/convert.go
index 2655e2b..da0ca52 100644
--- a/server/convert.go
+++ b/server/convert.go
@@ -142,6 +142,52 @@ func encodeSplat(g []float32, opacThr float32, maxSplats int) ([]byte, int) {
 	return buf, m
 }
 
+// encodeCloudSplat converts an accumulator cloud ([]point) to antimatter15 .splat
+// bytes — mirroring tools/free_splatter-cli.cpp write_cloud_splat + src/splat.h
+// encode_splat_record. CRITICAL: unlike encodeSplat (raw gaussians), the cloud's
+// R/G/B are ALREADY activated linear colour in [0,1] — do NOT apply the SH-DC
+// transform. Importance = opacity*sx*sy*sz; sort+cap only when capping (else keep
+// order). Scale is multiplied by scaleMult; pos/quat get the OpenCV->OpenGL flip.
+func encodeCloudSplat(pts []point, maxSplats int, scaleMult float32) []byte {
+	n := len(pts)
+	idx := make([]int, n)
+	for i := range idx {
+		idx[i] = i
+	}
+	imp := func(i int) float64 {
+		p := pts[i]
+		vol := float64(maxf(p.Sx, 1e-9)) * float64(maxf(p.Sy, 1e-9)) * float64(maxf(p.Sz, 1e-9))
+		return float64(maxf(p.Opacity, 0)) * vol
+	}
+	m := n
+	if maxSplats > 0 && n > maxSplats {
+		sort.Slice(idx, func(a, b int) bool { return imp(idx[a]) > imp(idx[b]) })
+		m = maxSplats
+		idx = idx[:m]
+	}
+	buf := make([]byte, m*32)
+	for k, i := range idx {
+		p := pts[i]
+		o := k * 32
+		putf(buf[o:], p.X)
+		putf(buf[o+4:], -p.Y)
+		putf(buf[o+8:], -p.Z)
+		putf(buf[o+12:], scaleMult*p.Sx)
+		putf(buf[o+16:], scaleMult*p.Sy)
+		putf(buf[o+20:], scaleMult*p.Sz)
+		buf[o+24] = u8(clamp01(float64(p.R)) * 255.0)
+		buf[o+25] = u8(clamp01(float64(p.G)) * 255.0)
+		buf[o+26] = u8(clamp01(float64(p.B)) * 255.0)
+		buf[o+27] = u8(clamp01(float64(p.Opacity)) * 255.0)
+		q := [4]float64{-float64(p.Qx), float64(p.Qw), -float64(p.Qz), float64(p.Qy)}
+		nrm := math.Sqrt(q[0]*q[0]+q[1]*q[1]+q[2]*q[2]+q[3]*q[3]) + 1e-12
+		for c := 0; c < 4; c++ {
+			buf[o+28+c] = u8(q[c]/nrm*128.0 + 128.0)
+		}
+	}
+	return buf
+}
+
 func putf(b []byte, v float32) { binary.LittleEndian.PutUint32(b, math.Float32bits(v)) }
 
 func maxf(a, b float32) float32 {
diff --git a/server/engine.go b/server/engine.go
index 6166005..010b675 100644
--- a/server/engine.go
+++ b/server/engine.go
@@ -22,6 +22,19 @@ type geometry struct {
 	gaussianChannels int32
 }
 
+// point mirrors free_splatter_point exactly: 14 float32 + 1 int32 = 60 bytes, all
+// 4-byte aligned (no padding). Field ORDER is load-bearing across the C ABI.
+type point struct {
+	X, Y, Z, R, G, B, Opacity, Sx, Sy, Sz, Qw, Qx, Qy, Qz float32
+	Frame                                                 int32
+}
+
+// parallax mirrors free_splatter_parallax: 6 float32 + 1 int32 = 28 bytes.
+type parallax struct {
+	TriAngleDeg, LateralAngleDeg, BaselineOverDepth, Baseline, MedianDepth, Focal float32
+	NPoints                                                                       int32
+}
+
 // Lib is the dlopen'd shared library with the C API bound by name.
 type Lib struct {
 	handle uintptr
@@ -36,6 +49,15 @@ type Lib struct {
 	run        func(uintptr, unsafe.Pointer, int32, int32, int32, unsafe.Pointer, unsafe.Pointer) int32
 	bufFree    func(uintptr)
 	abiVersion func() int32
+
+	// accumulator + parallax (for in-process video->scene baking)
+	accNew      func(int32, int32, float32) uintptr
+	accFree     func(uintptr)
+	accAddPair  func(uintptr, unsafe.Pointer, int32) int32
+	accFrameCnt func(uintptr) int32
+	accCloud    func(uintptr, unsafe.Pointer, unsafe.Pointer) int32
+	accFuse     func(uintptr, float32, int32, int32, unsafe.Pointer, unsafe.Pointer) int32
+	parallaxFn  func(unsafe.Pointer, int32, int32, int32, int32, float32, unsafe.Pointer) int32
 }
 
 // Engine is one loaded model (context) belonging to a Lib.
@@ -79,6 +101,13 @@ func OpenLib(libPath string) (*Lib, error) {
 	purego.RegisterLibFunc(&l.run, handle, "free_splatter_run")
 	purego.RegisterLibFunc(&l.bufFree, handle, "free_splatter_buf_free")
 	purego.RegisterLibFunc(&l.abiVersion, handle, "free_splatter_abi_version")
+	purego.RegisterLibFunc(&l.accNew, handle, "free_splatter_accumulator_new")
+	purego.RegisterLibFunc(&l.accFree, handle, "free_splatter_accumulator_free")
+	purego.RegisterLibFunc(&l.accAddPair, handle, "free_splatter_accumulator_add_pair")
+	purego.RegisterLibFunc(&l.accFrameCnt, handle, "free_splatter_accumulator_frame_count")
+	purego.RegisterLibFunc(&l.accCloud, handle, "free_splatter_accumulator_cloud")
+	purego.RegisterLibFunc(&l.accFuse, handle, "free_splatter_accumulator_fuse")
+	purego.RegisterLibFunc(&l.parallaxFn, handle, "free_splatter_pair_parallax")
 	if v := l.abiVersion(); v != 1 {
 		purego.Dlclose(handle)
 		return nil, fmt.Errorf("ABI mismatch: library reports %d, expected 1", v)
@@ -149,3 +178,55 @@ func (e *Engine) Close() {
 		e.ctx = 0
 	}
 }
+
+// --- accumulator + parallax helpers (in-process video->scene baking) ---
+
+// pointsFromC copies a malloc'd free_splatter_point[n] to a Go slice and frees it.
+func (l *Lib) pointsFromC(ptr uintptr, n uint64) []point {
+	if ptr == 0 {
+		return nil
+	}
+	out := make([]point, int(n))
+	if n > 0 {
+		copy(out, unsafe.Slice((*point)(unsafe.Pointer(ptr)), int(n)))
+	}
+	l.bufFree(ptr)
+	return out
+}
+
+// accCloudCopy returns a copy of the accumulator's current cloud.
+func (l *Lib) accCloudCopy(acc uintptr) []point {
+	var ptr uintptr
+	var n uint64
+	if l.accCloud(acc, unsafe.Pointer(&ptr), unsafe.Pointer(&n)) != 0 {
+		return nil
+	}
+	return l.pointsFromC(ptr, n)
+}
+
+// accFuseCopy returns the consensus-fused cloud (voxelFrac, k, mode: 0 avg/1 kept/2 best).
+func (l *Lib) accFuseCopy(acc uintptr, voxelFrac float32, k, mode int32) []point {
+	var ptr uintptr
+	var n uint64
+	if l.accFuse(acc, voxelFrac, k, mode, unsafe.Pointer(&ptr), unsafe.Pointer(&n)) != 0 {
+		return nil
+	}
+	return l.pointsFromC(ptr, n)
+}
+
+// pairParallax computes the after-inference parallax of a 2-view gaussian buffer
+// (the layout Engine.Run returns for nViews=2).
+func (l *Lib) pairParallax(g []float32, h, w, gc int32, thr float32) (parallax, bool) {
+	var px parallax
+	rc := l.parallaxFn(unsafe.Pointer(&g[0]), 2, h, w, gc, thr, unsafe.Pointer(&px))
+	runtime.KeepAlive(g)
+	return px, rc == 0
+}
+
+// accAddPairSlice folds one pair's gaussians into the accumulator (the C side
+// copies them, so g may be reused after). Returns 0 on success.
+func (l *Lib) accAddPairSlice(acc uintptr, g []float32, gc int32) int32 {
+	rc := l.accAddPair(acc, unsafe.Pointer(&g[0]), gc)
+	runtime.KeepAlive(g)
+	return rc
+}
diff --git a/server/main.go b/server/main.go
index 8afee0b..981f2a0 100644
--- a/server/main.go
+++ b/server/main.go
@@ -6,10 +6,14 @@
 //   cd server && go run . [-models scene=A.gguf,object=B.gguf] [-device vulkan] [-addr :8080]
 //
 // REST API:
-//   GET  /api/models         -> [{name,label,hint,views,size}, ...]
-//   POST /api/reconstruct     multipart "images" (>=1) + optional "model" -> {id,model,n_views,n_splats,size,seconds}
-//   GET  /api/splat/{id}      -> the .splat bytes (importance-sorted, 32B/splat)
-//   GET  /                    -> embedded frontend (web/)
+//   GET  /api/models          -> [{name,label,hint,views,size}, ...]
+//   POST /api/reconstruct      multipart "images" (>=1) + optional "model" -> {id,model,n_views,n_splats,size,seconds}
+//   GET  /api/splat/{id}       -> the .splat bytes (importance-sorted, 32B/splat)
+//   GET  /api/scenes           -> accumulating-reconstruction scenes [{name,label,steps,thumb,source}, ...]
+//   POST /api/scene/from-video  multipart "video" + "name" -> {job}  (gated in-process accumulate)
+//   GET  /api/scene/status/{job} -> {state,total,done,kept,scene,error}
+//   GET  /scenes-assets/...    -> a scene's manifest.json + .splat + thumbnails
+//   GET  /                     -> menu; /reconstruct.html, /accumulate.html, /demo.html are the pages
 package main
 
 import (
@@ -53,12 +57,17 @@ type server struct {
 	opts        options
 	bgRemoveCmd string // external matting command ("" = feature disabled)
 	demoDir     string // static dir served at /demo-assets/ (manifest + splats + images); "" = off
+	scenesDir   string // accumulating-reconstruction scenes (per-subdir steps-manifest); "" = off
 	workDir     string // scratch root for demo video frames + encoded MP4s
 
 	mu      sync.Mutex
 	results map[string][]byte
 	order2  []string
 	counter int64
+
+	jobsMu  sync.Mutex
+	jobs    map[string]*sceneJob
+	bakeSem chan struct{} // capacity 1: at most one video->scene bake at a time
 }
 
 func respondJSON(w http.ResponseWriter, v any) {
@@ -255,6 +264,7 @@ func main() {
 	bgCmd := flag.String("bgremove-cmd", "",
 		"external background-removal command for the object path ({in}/{out} are batch dirs), e.g. 'rembg p -m u2netp {in} {out}'")
 	demoDir := flag.String("demo-dir", "", "directory of baked demo assets (manifest.json + .splat + images) served at /demo-assets/")
+	scenesDir := flag.String("scenes-dir", "", "root of accumulating-reconstruction scenes (per-subdir steps-manifest); listed at /api/scenes, served at /scenes-assets/, written by video uploads. Default: the -demo-dir value")
 	workDir := flag.String("work-dir", filepath.Join(os.TempDir(), "freesplatter-demo"), "scratch root for demo video frames + encoded MP4s")
 	flag.Parse()
 
@@ -274,13 +284,20 @@ func main() {
 	if *demoDir != "" {
 		demoAbs, _ = filepath.Abs(*demoDir)
 	}
+	scenesAbs := demoAbs // default: scenes live alongside the baked demo assets
+	if *scenesDir != "" {
+		scenesAbs, _ = filepath.Abs(*scenesDir)
+	}
 	s := &server{
 		models:      map[string]*Engine{},
 		opts:        options{maxSplats: *maxSplats, opacThr: float32(*opacThr)},
 		bgRemoveCmd: *bgCmd,
 		demoDir:     demoAbs,
+		scenesDir:   scenesAbs,
 		workDir:     *workDir,
 		results:     map[string][]byte{},
+		jobs:        map[string]*sceneJob{},
+		bakeSem:     make(chan struct{}, 1),
 	}
 	if *bgCmd != "" {
 		log.Printf("background removal enabled: %s", *bgCmd)
@@ -309,10 +326,17 @@ func main() {
 	mux.HandleFunc("/api/demo/frame", s.handleDemoFrame)
 	mux.HandleFunc("/api/demo/encode", s.handleDemoEncode)
 	mux.HandleFunc("/api/demo/video/", s.handleDemoVideo)
+	mux.HandleFunc("/api/scenes", s.handleScenes)
+	mux.HandleFunc("/api/scene/from-video", s.handleSceneFromVideo)
+	mux.HandleFunc("/api/scene/status/", s.handleSceneStatus)
 	if s.demoDir != "" {
 		mux.Handle("/demo-assets/", http.StripPrefix("/demo-assets/", http.FileServer(http.Dir(s.demoDir))))
 		log.Printf("serving demo assets from %s at /demo-assets/ (work dir %s)", s.demoDir, s.workDir)
 	}
+	if s.scenesDir != "" {
+		mux.Handle("/scenes-assets/", http.StripPrefix("/scenes-assets/", http.FileServer(http.Dir(s.scenesDir))))
+		log.Printf("serving scenes from %s at /scenes-assets/ (list: /api/scenes)", s.scenesDir)
+	}
 	mux.Handle("/", http.FileServer(http.FS(sub)))
 
 	log.Printf("serving %d model(s) %v on http://localhost%s", len(s.order), s.order, *addr)
diff --git a/server/scenes.go b/server/scenes.go
new file mode 100644
index 0000000..36dd3de
--- /dev/null
+++ b/server/scenes.go
@@ -0,0 +1,364 @@
+// scenes.go — the accumulating-reconstruction side of the web app: list the baked
+// "growing world" scenes, and generate a new one from an uploaded video in-process
+// (reuse the already-loaded GPU engine, no shell-out, no second model load).
+//
+//   GET  /api/scenes                 -> [{name,label,steps,thumb,source}, ...]
+//   POST /api/scene/from-video        multipart "video" + "name" -> {job}
+//   GET  /api/scene/status/{job}      -> {state,total,done,kept,scene,error}
+//   GET  /scenes-assets/<name>/...    static scene files (manifest.json, *.splat, view_*.jpg)
+//
+// A scene is a subdir of scenesDir whose manifest.json has a top-level "steps"
+// array (acc_2.splat, acc_3.splat, ... + a fused step) — the format produced by
+// scripts/make_accumulate_demo.sh and by handleSceneFromVideo here. The viewer
+// (server/web/accumulate.html) plays the steps as the reconstruction grows.
+package main
+
+import (
+	"encoding/json"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"sort"
+	"strconv"
+	"strings"
+)
+
+// --- manifest format (shared with server/web/accumulate.html + make_accumulate_demo.sh) ---
+
+type manifestStep struct {
+	Splat  string   `json:"splat"`
+	Images []string `json:"images"`
+	N      int      `json:"n"`
+	Label  string   `json:"label,omitempty"`
+}
+type sceneManifest struct {
+	Steps []manifestStep `json:"steps"`
+}
+
+// --- listing -------------------------------------------------------------------
+
+type sceneInfo struct {
+	Name   string `json:"name"`
+	Label  string `json:"label"`
+	Steps  int    `json:"steps"`
+	Thumb  string `json:"thumb"` // /scenes-assets/<name>/<first image>, for the picker
+	Source string `json:"source"`
+}
+
+// scanScenes lists the accumulating-reconstruction scenes under scenesDir: each
+// immediate subdir with a manifest.json that has a non-empty "steps" array. Cheap
+// enough to run per request, so externally-baked and uploaded scenes appear with
+// no restart.
+func (s *server) scanScenes() []sceneInfo {
+	if s.scenesDir == "" {
+		return nil
+	}
+	ents, err := os.ReadDir(s.scenesDir)
+	if err != nil {
+		return nil
+	}
+	out := []sceneInfo{}
+	for _, e := range ents {
+		if !e.IsDir() {
+			continue
+		}
+		name := e.Name()
+		raw, err := os.ReadFile(filepath.Join(s.scenesDir, name, "manifest.json"))
+		if err != nil {
+			continue
+		}
+		var m sceneManifest
+		if json.Unmarshal(raw, &m) != nil || len(m.Steps) == 0 {
+			continue // storyboard manifests ("stations") and malformed dirs are skipped
+		}
+		thumb := ""
+		if len(m.Steps[0].Images) > 0 {
+			thumb = "/scenes-assets/" + name + "/" + m.Steps[0].Images[0]
+		}
+		label := strings.ReplaceAll(name, "-", " ")
+		out = append(out, sceneInfo{
+			Name: name, Label: label, Steps: len(m.Steps), Thumb: thumb, Source: "baked",
+		})
+	}
+	sort.Slice(out, func(a, b int) bool { return out[a].Name < out[b].Name })
+	return out
+}
+
+func (s *server) handleScenes(w http.ResponseWriter, r *http.Request) {
+	respondJSON(w, map[string]any{"scenes": s.scanScenes()})
+}
+
+// --- video -> scene (async) ----------------------------------------------------
+
+type sceneJob struct {
+	State string `json:"state"` // "running" | "done" | "error"
+	Total int    `json:"total"` // candidate frames
+	Done  int    `json:"done"`  // candidates processed
+	Kept  int    `json:"kept"`  // frames folded in so far
+	Scene string `json:"scene"` // slug, set on done
+	Err   string `json:"error,omitempty"`
+}
+
+func (s *server) putJob(id string, mutate func(*sceneJob)) {
+	s.jobsMu.Lock()
+	defer s.jobsMu.Unlock()
+	j := s.jobs[id]
+	if j == nil {
+		j = &sceneJob{}
+		s.jobs[id] = j
+	}
+	mutate(j)
+}
+
+func (s *server) handleSceneStatus(w http.ResponseWriter, r *http.Request) {
+	id := strings.TrimPrefix(r.URL.Path, "/api/scene/status/")
+	s.jobsMu.Lock()
+	j := s.jobs[id]
+	var cp sceneJob
+	if j != nil {
+		cp = *j
+	}
+	s.jobsMu.Unlock()
+	if j == nil {
+		http.Error(w, "no such job", http.StatusNotFound)
+		return
+	}
+	respondJSON(w, cp)
+}
+
+// handleSceneFromVideo accepts a clip, returns a job id immediately, and bakes the
+// gated accumulating scene in a background goroutine (max one at a time).
+func (s *server) handleSceneFromVideo(w http.ResponseWriter, r *http.Request) {
+	if r.Method != http.MethodPost {
+		http.Error(w, "POST only", http.StatusMethodNotAllowed)
+		return
+	}
+	if s.scenesDir == "" {
+		http.Error(w, "no scenes dir configured (-scenes-dir / -demo-dir)", http.StatusServiceUnavailable)
+		return
+	}
+	if err := r.ParseMultipartForm(256 << 20); err != nil {
+		http.Error(w, "bad multipart form: "+err.Error(), http.StatusBadRequest)
+		return
+	}
+	name, ok := safeToken(r.FormValue("name"))
+	if !ok {
+		http.Error(w, "bad or missing scene name (a-z0-9-_ only)", http.StatusBadRequest)
+		return
+	}
+	minParallax := float32(8)
+	if v, err := strconv.ParseFloat(r.FormValue("min_parallax"), 32); err == nil {
+		minParallax = float32(v)
+	}
+	maxFrames := 24
+	if v, err := strconv.Atoi(r.FormValue("max_frames")); err == nil && v >= 2 && v <= 200 {
+		maxFrames = v
+	}
+	file, _, err := r.FormFile("video")
+	if err != nil {
+		http.Error(w, `no video (expected multipart field "video")`, http.StatusBadRequest)
+		return
+	}
+	defer file.Close()
+
+	// one bake at a time — a bake monopolizes the single engine for many GPU-seconds.
+	select {
+	case s.bakeSem <- struct{}{}:
+	default:
+		http.Error(w, "a scene bake is already running; try again shortly", http.StatusTooManyRequests)
+		return
+	}
+
+	jobDir := filepath.Join(s.workDir, "scene", name)
+	if err := os.MkdirAll(jobDir, 0o755); err != nil {
+		<-s.bakeSem
+		http.Error(w, "mkdir: "+err.Error(), http.StatusInternalServerError)
+		return
+	}
+	inPath := filepath.Join(jobDir, "input")
+	in, err := os.Create(inPath)
+	if err != nil {
+		<-s.bakeSem
+		http.Error(w, "save upload: "+err.Error(), http.StatusInternalServerError)
+		return
+	}
+	if _, err := io.Copy(in, file); err != nil {
+		in.Close()
+		<-s.bakeSem
+		http.Error(w, "save upload: "+err.Error(), http.StatusInternalServerError)
+		return
+	}
+	in.Close()
+
+	jobID := name // one bake at a time + scene name as slug -> stable status URL
+	s.putJob(jobID, func(j *sceneJob) { *j = sceneJob{State: "running"} })
+
+	go func() {
+		defer func() { <-s.bakeSem }()
+		scene, err := s.bakeSceneFromVideo(name, inPath, jobDir, minParallax, maxFrames)
+		s.putJob(jobID, func(j *sceneJob) {
+			if err != nil {
+				j.State, j.Err = "error", err.Error()
+				return
+			}
+			j.State, j.Scene = "done", scene
+		})
+	}()
+
+	respondJSON(w, map[string]any{"job": jobID, "name": name})
+}
+
+// bakeSceneFromVideo: sample frames -> gated in-process accumulate -> scene dir.
+func (s *server) bakeSceneFromVideo(name, inPath, jobDir string, minParallax float32, maxFrames int) (string, error) {
+	eng := s.models[s.order[0]] // scene model is the default/first
+	if eng == nil {
+		return "", fmt.Errorf("no scene model loaded")
+	}
+	sz := int(eng.Geo.imageWidth)
+	gc := int(eng.Geo.gaussianChannels)
+
+	// 1) decode all frames, then even-stride down to <= maxFrames (time-lapse clips
+	//    are short; decoding all avoids frame-count guessing — mirrors bake-vids.sh).
+	frameDir := filepath.Join(jobDir, "frames")
+	os.RemoveAll(frameDir)
+	if err := os.MkdirAll(frameDir, 0o755); err != nil {
+		return "", err
+	}
+	if err := runFFmpeg("-i", inPath, filepath.Join(frameDir, "all%04d.png")); err != nil {
+		return "", fmt.Errorf("frame extraction: %w", err)
+	}
+	all, _ := filepath.Glob(filepath.Join(frameDir, "all*.png"))
+	sort.Strings(all)
+	if len(all) < 2 {
+		return "", fmt.Errorf("video decoded to %d frames (need >= 2)", len(all))
+	}
+	stride := (len(all) + maxFrames - 1) / maxFrames
+	if stride < 1 {
+		stride = 1
+	}
+	var pngs []string
+	for i := 0; i < len(all); i += stride {
+		pngs = append(pngs, all[i])
+	}
+
+	// 2) preprocess sampled frames to model input (CHW [0,1]).
+	frames := make([][]float32, len(pngs))
+	for i, p := range pngs {
+		data, err := os.ReadFile(p)
+		if err != nil {
+			return "", err
+		}
+		px, err := preprocessImage(data, sz)
+		if err != nil {
+			return "", fmt.Errorf("preprocess %s: %w", filepath.Base(p), err)
+		}
+		frames[i] = px
+	}
+	s.putJob(name, func(j *sceneJob) { j.Total = len(frames) - 1 })
+
+	// 3) gated in-process accumulate (mirrors free_splatter-cli.cpp keyframe loop).
+	acc := eng.lib.accNew(int32(sz), int32(sz), s.opts.opacThr)
+	if acc == 0 {
+		return "", fmt.Errorf("accumulator alloc failed")
+	}
+	defer eng.lib.accFree(acc)
+
+	sceneDir := filepath.Join(s.scenesDir, name)
+	if err := os.MkdirAll(sceneDir, 0o755); err != nil {
+		return "", err
+	}
+	// frame 0 is the anchor; thumbnail it as view_0.jpg.
+	if err := thumbnail(pngs[0], filepath.Join(sceneDir, "view_0.jpg")); err != nil {
+		return "", err
+	}
+
+	keptIdx := []int{0}
+	lastKept := 0
+	for j := 1; j < len(frames); j++ {
+		pair := make([]float32, 0, 2*len(frames[0]))
+		pair = append(pair, frames[lastKept]...)
+		pair = append(pair, frames[j]...)
+		g, err := eng.Run(pair, 2)
+		if err != nil {
+			return "", fmt.Errorf("inference pair (%d,%d): %w", lastKept, j, err)
+		}
+		if minParallax > 0 {
+			if px, ok := eng.lib.pairParallax(g, int32(sz), int32(sz), int32(gc), s.opts.opacThr); ok && px.TriAngleDeg < minParallax {
+				s.putJob(name, func(jb *sceneJob) { jb.Done = j })
+				continue // re-anchor: lastKept unchanged, try the next candidate
+			}
+		}
+		if eng.lib.accAddPairSlice(acc, g, int32(gc)) != 0 {
+			return "", fmt.Errorf("add_pair (%d,%d) failed", lastKept, j)
+		}
+		nframes := int(eng.lib.accFrameCnt(acc))
+		cloud := eng.lib.accCloudCopy(acc)
+		splat := encodeCloudSplat(cloud, s.opts.maxSplats, 1.0)
+		if err := os.WriteFile(filepath.Join(sceneDir, fmt.Sprintf("acc_%d.splat", nframes)), splat, 0o644); err != nil {
+			return "", err
+		}
+		keptIdx = append(keptIdx, j)
+		if err := thumbnail(pngs[j], filepath.Join(sceneDir, fmt.Sprintf("view_%d.jpg", len(keptIdx)-1))); err != nil {
+			return "", err
+		}
+		lastKept = j
+		s.putJob(name, func(jb *sceneJob) { jb.Done, jb.Kept = j, len(keptIdx) })
+	}
+	if len(keptIdx) < 2 {
+		return "", fmt.Errorf("only %d frame kept (min-parallax %.0f too high for this clip?)", len(keptIdx), minParallax)
+	}
+
+	// 4) consensus-fused surface (best-frame, k=2, voxel 0.02 — same as the CLI bake).
+	fused := encodeCloudSplat(eng.lib.accFuseCopy(acc, 0.02, 2, 2), s.opts.maxSplats, 1.0)
+	if err := os.WriteFile(filepath.Join(sceneDir, "acc_fused.splat"), fused, 0o644); err != nil {
+		return "", err
+	}
+
+	// 5) manifest: one step per kept frame (acc_2..acc_N) + the fused step.
+	if err := writeSceneManifest(sceneDir, len(keptIdx)); err != nil {
+		return "", err
+	}
+	return name, nil
+}
+
+// writeSceneManifest emits the steps manifest from `nkept` kept frames, referencing
+// view_0.jpg .. view_<k-1>.jpg per step — identical shape to make_accumulate_demo.sh.
+func writeSceneManifest(dir string, nkept int) error {
+	var m sceneManifest
+	imgs := func(k int) []string {
+		out := make([]string, k)
+		for j := 0; j < k; j++ {
+			out[j] = fmt.Sprintf("view_%d.jpg", j)
+		}
+		return out
+	}
+	for k := 2; k <= nkept; k++ {
+		m.Steps = append(m.Steps, manifestStep{Splat: fmt.Sprintf("acc_%d.splat", k), Images: imgs(k), N: k})
+	}
+	m.Steps = append(m.Steps, manifestStep{
+		Splat: "acc_fused.splat", Images: imgs(nkept), N: nkept,
+		Label: "consensus-fused — best-frame, parallax-gated",
+	})
+	raw, err := json.MarshalIndent(m, "", "  ")
+	if err != nil {
+		return err
+	}
+	return os.WriteFile(filepath.Join(dir, "manifest.json"), raw, 0o644)
+}
+
+// thumbnail center-crops src to a square and scales to 360px, mirroring the engine
+// preprocessing (so the filmstrip shows exactly the reconstructed view).
+func thumbnail(src, dst string) error {
+	return runFFmpeg("-i", src, "-vf", "crop='min(iw,ih)':'min(iw,ih)',scale=360:360", dst)
+}
+
+func runFFmpeg(args ...string) error {
+	full := append([]string{"-y", "-loglevel", "error"}, args...)
+	if out, err := exec.Command("ffmpeg", full...).CombinedOutput(); err != nil {
+		return fmt.Errorf("ffmpeg: %v: %s", err, strings.TrimSpace(string(out)))
+	}
+	return nil
+}
diff --git a/server/web/accumulate.html b/server/web/accumulate.html
new file mode 100644
index 0000000..5b6692e
--- /dev/null
+++ b/server/web/accumulate.html
@@ -0,0 +1,489 @@
+<!doctype html>
+<!--
+  free-splatter.cpp — ACCUMULATING reconstruction demo.
+  Plays a sequence of splats produced by `free_splatter-cli --accumulate`: the
+  cloud reconstructed from 2 photos, then 3, then 4, … as each new view is folded
+  into one world (cross-run Sim(3) alignment). The input photos appear, one per
+  step, in the top-right filmstrip — the newest highlighted. The WebGL splat
+  renderer is the same EWA formulation as index.html (3DGS / antimatter15).
+
+  Assets come from a manifest.json next to this page:
+    { "steps": [ {"splat":"acc_2.splat","images":["f0.jpg","f1.jpg"],"n":2}, … ] }
+  Each step's splat is the COMPLETE cloud at n frames; images lists every photo
+  folded in so far (the last entry is the one just added).
+-->
+<html lang="en">
+<head>
+<meta charset="utf-8"/>
+<meta name="viewport" content="width=device-width, initial-scale=1"/>
+<title>free-splatter.cpp — accumulating reconstruction</title>
+<style>
+  :root { color-scheme: dark; }
+  * { margin: 0; box-sizing: border-box; }
+  html, body { height: 100%; background: #0b0d10; overflow: hidden; font-family: ui-sans-serif, system-ui, sans-serif; }
+  #gl { position: fixed; inset: 0; width: 100%; height: 100%; display: block; }
+  #panel {
+    position: fixed; top: 16px; left: 16px; z-index: 10; width: 264px;
+    background: rgba(16,19,24,.82); backdrop-filter: blur(8px);
+    border: 1px solid rgba(255,255,255,.08); border-radius: 12px; padding: 14px 16px;
+    color: #e8eaed; font-size: 13px; line-height: 1.45; box-shadow: 0 8px 30px rgba(0,0,0,.4);
+  }
+  #panel h1 { font-size: 14px; font-weight: 650; letter-spacing: .2px; }
+  #panel .sub { color: #8b93a1; font-size: 11.5px; margin-top: 2px; }
+  #bar { height: 6px; background: rgba(255,255,255,.08); border-radius: 99px; overflow: hidden; margin: 12px 0 8px; }
+  #bar > div { height: 100%; width: 0; background: linear-gradient(90deg,#5b8cff,#8a5bff); transition: width .35s ease; }
+  .stat { display: flex; justify-content: space-between; font-variant-numeric: tabular-nums; }
+  .stat span:first-child { color: #8b93a1; }
+  #status { margin-top: 8px; color: #9fd3a0; font-size: 11.5px; min-height: 1.2em; }
+  #status.err { color: #ff8a8a; }
+  #ctl { display: flex; gap: 6px; margin-top: 10px; }
+  #ctl button { flex: 1; padding: 5px 8px; font-size: 12px; color: #e8eaed; cursor: pointer;
+    background: rgba(255,255,255,.06); border: 1px solid rgba(255,255,255,.12); border-radius: 8px; }
+  #ctl button:hover { background: rgba(255,255,255,.12); }
+  /* top-right filmstrip of the input photos, newest on top */
+  #strip { position: fixed; top: 16px; right: 16px; z-index: 10; width: 168px;
+    display: flex; flex-direction: column-reverse; gap: 8px; align-items: flex-end; }
+  #strip .thumb { width: 156px; border-radius: 8px; overflow: hidden; position: relative;
+    border: 1px solid rgba(255,255,255,.12); box-shadow: 0 6px 18px rgba(0,0,0,.45);
+    opacity: .55; transform: scale(.92); transition: opacity .4s ease, transform .4s ease, filter .4s ease;
+    filter: saturate(.8); background: #11141a; }
+  #strip .thumb img { width: 100%; display: block; }
+  #strip .thumb .tag { position: absolute; left: 6px; bottom: 5px; font-size: 10px; padding: 1px 6px;
+    border-radius: 6px; background: rgba(0,0,0,.55); color: #cdd3dc; letter-spacing: .3px; }
+  #strip .thumb.new { opacity: 1; transform: scale(1); filter: saturate(1.05);
+    border-color: rgba(122,162,255,.9); box-shadow: 0 0 0 2px rgba(122,162,255,.5), 0 8px 22px rgba(0,0,0,.5); }
+  #strip .thumb.new .tag { background: rgba(91,140,255,.92); color: #fff; }
+  #hint { position: fixed; bottom: 14px; left: 50%; transform: translateX(-50%); z-index: 10;
+    color: #6b7280; font-size: 11.5px; background: rgba(16,19,24,.7); padding: 6px 12px; border-radius: 8px; }
+  /* prominent banner shown only on a labelled (consensus-fused) step */
+  #banner { position: fixed; top: 18px; left: 50%; transform: translateX(-50%) translateY(-12px); z-index: 11;
+    display: none; opacity: 0; transition: opacity .4s ease, transform .4s ease;
+    padding: 9px 18px; border-radius: 999px; font-size: 13.5px; font-weight: 600; letter-spacing: .2px;
+    color: #eafff0; background: linear-gradient(90deg, rgba(34,160,90,.95), rgba(46,120,210,.95));
+    border: 1px solid rgba(255,255,255,.25); box-shadow: 0 8px 26px rgba(0,0,0,.5); }
+  #banner.show { display: block; opacity: 1; transform: translateX(-50%) translateY(0); }
+  #banner b { font-weight: 750; }
+  /* scene switcher + video upload (server mode only) */
+  #server-ui { display: none; margin-top: 10px; border-top: 1px solid rgba(255,255,255,.08); padding-top: 10px; }
+  #server-ui .srow { display: flex; align-items: center; gap: 8px; margin-bottom: 8px; }
+  #server-ui label { color: #8b93a1; font-size: 11.5px; }
+  #server-ui select, #server-ui input[type=text] { flex: 1; min-width: 0; background: rgba(255,255,255,.06);
+    color: #e8eaed; border: 1px solid rgba(255,255,255,.12); border-radius: 8px; padding: 4px 8px; font-size: 12px; }
+  #server-ui .up { display: flex; flex-direction: column; gap: 6px; }
+  #server-ui input[type=file] { font-size: 11px; color: #9aa3b2; }
+  #server-ui #vid-go { padding: 5px 8px; font-size: 12px; color: #e8eaed; cursor: pointer;
+    background: rgba(91,140,255,.18); border: 1px solid rgba(122,162,255,.5); border-radius: 8px; }
+  #server-ui #vid-go:hover { background: rgba(91,140,255,.3); }
+  #server-ui #vid-go:disabled { opacity: .5; cursor: default; }
+  #vid-bar { height: 5px; background: rgba(255,255,255,.08); border-radius: 99px; overflow: hidden; display: none; }
+  #vid-bar > div { height: 100%; width: 0; background: linear-gradient(90deg,#5b8cff,#22a05a); transition: width .3s ease; }
+</style>
+</head>
+<body>
+<canvas id="gl"></canvas>
+<div id="panel">
+  <a href="/" style="display:inline-block;color:#7aa2ff;text-decoration:none;font-size:11.5px;margin-bottom:6px">‹ all demos</a>
+  <h1>accumulating reconstruction</h1>
+  <div class="sub">free-splatter.cpp — one world from a growing photo stream.</div>
+  <div id="bar"><div></div></div>
+  <div class="stat"><span>photos folded in</span><span id="s-frames">0</span></div>
+  <div class="stat"><span>splats</span><span id="s-splats">0</span></div>
+  <div class="stat"><span>fps</span><span id="s-fps">—</span></div>
+  <div id="status">starting…</div>
+  <div id="ctl"><button id="prev">‹ prev</button><button id="play">⏸ pause</button><button id="next">next ›</button></div>
+  <label id="vibrow" style="display:flex;align-items:center;gap:8px;font-size:11.5px;color:#8b93a1;margin-top:10px;">
+    <span>vibrance</span><input id="vibrance" type="range" min="0.6" max="2.2" step="0.05" value="1.45" style="flex:1;accent-color:#6f9bff"/>
+  </label>
+  <div id="server-ui">
+    <div class="srow"><label>scene</label><select id="scene-sel"></select></div>
+    <div class="up">
+      <input type="file" id="vid" accept="video/*"/>
+      <input type="text" id="vid-name" placeholder="scene name (a-z0-9-)"/>
+      <button id="vid-go">＋ make scene from video</button>
+      <div id="vid-bar"><div></div></div>
+    </div>
+  </div>
+</div>
+<div id="strip"></div>
+<div id="banner"></div>
+<div id="hint">drag to orbit · scroll to zoom · the cloud grows as each photo is added</div>
+
+<script type="module">
+const ROW = 32;                       // bytes per splat
+const _q = new URLSearchParams(location.search);
+// BASE is the dir holding manifest.json + splats. Set by ?dir= (standalone baked
+// dir), else chosen at runtime: server mode = /scenes-assets/<scene>, standalone
+// fallback = "." (manifest.json next to this page).
+let BASE = (_q.get("dir") || "").replace(/\/$/, "");
+let serverMode = false;
+const STEP_MS = +(_q.get("ms") || 2600);
+const START_STEP = parseInt(_q.get("start") || "0", 10);   // begin at this step
+const AUTOPLAY = _q.get("auto") !== "0";                    // ?auto=0 = no advance
+const $ = (id) => document.getElementById(id);
+const setStatus = (t, err=false) => { const s=$("status"); s.textContent=t; s.classList.toggle("err",err); };
+
+const canvas = $("gl");
+const gl = canvas.getContext("webgl2", { antialias: false, premultipliedAlpha: true, alpha: false });
+if (!gl) { setStatus("WebGL2 not available in this browser.", true); throw new Error("no webgl2"); }
+
+// ---------- shaders (EWA splatting — identical to index.html) ----------
+const VERT = `#version 300 es
+precision highp float; precision highp int;
+uniform highp usampler2D u_tex;
+uniform mat4 u_proj, u_view;
+uniform vec2 u_focal, u_viewport;
+in vec2 a_corner;
+in uint a_index;
+out vec4 v_color;
+out vec2 v_pos;
+void main() {
+  uint t = a_index * 2u;
+  ivec2 wh = textureSize(u_tex, 0);
+  uvec4 A = texelFetch(u_tex, ivec2(int(t % uint(wh.x)), int(t / uint(wh.x))), 0);
+  uvec4 B = texelFetch(u_tex, ivec2(int((t+1u) % uint(wh.x)), int((t+1u) / uint(wh.x))), 0);
+  vec3 center = uintBitsToFloat(A.xyz);
+  vec3 scale  = vec3(uintBitsToFloat(A.w), uintBitsToFloat(B.x), uintBitsToFloat(B.y));
+  uint rgba = B.z, rotp = B.w;
+  v_color = vec4(float(rgba & 0xffu), float((rgba>>8)&0xffu), float((rgba>>16)&0xffu), float((rgba>>24)&0xffu)) / 255.0;
+  vec4 q = (vec4(float(rotp&0xffu), float((rotp>>8)&0xffu), float((rotp>>16)&0xffu), float((rotp>>24)&0xffu)) - 128.0) / 128.0;
+  q = normalize(q);
+  float w=q.x, x=q.y, y=q.z, z=q.w;
+  mat3 R = mat3(
+    1.-2.*(y*y+z*z), 2.*(x*y+w*z),   2.*(x*z-w*y),
+    2.*(x*y-w*z),    1.-2.*(x*x+z*z),2.*(y*z+w*x),
+    2.*(x*z+w*y),    2.*(y*z-w*x),   1.-2.*(x*x+y*y));
+  mat3 S = mat3(scale.x,0.,0., 0.,scale.y,0., 0.,0.,scale.z);
+  mat3 M = R * S;
+  mat3 cov3d = M * transpose(M);
+  vec4 cam = u_view * vec4(center, 1.0);
+  if (cam.z >= -0.2) { gl_Position = vec4(0.0,0.0,2.0,1.0); return; }
+  vec4 clip = u_proj * cam;
+  if (clip.w <= 0.0 || abs(clip.x) > 1.3*clip.w || abs(clip.y) > 1.3*clip.w) {
+    gl_Position = vec4(0.0,0.0,2.0,1.0); return;
+  }
+  mat3 J = mat3(
+    u_focal.x/cam.z, 0.0, -(u_focal.x*cam.x)/(cam.z*cam.z),
+    0.0, -u_focal.y/cam.z, (u_focal.y*cam.y)/(cam.z*cam.z),
+    0.0, 0.0, 0.0);
+  mat3 W = transpose(mat3(u_view));
+  mat3 T = W * J;
+  mat3 cov2 = transpose(T) * cov3d * T;
+  float a = cov2[0][0] + 0.3, b = cov2[0][1], c = cov2[1][1] + 0.3;
+  float det = a*c - b*b;
+  if (det <= 0.0) { gl_Position = vec4(0.0,0.0,2.0,1.0); return; }
+  float mid = 0.5*(a+c);
+  float rad = sqrt(max(0.0, mid*mid - det));
+  float l1 = mid + rad;
+  float l2 = max(mid - rad, 0.1);
+  vec2 d = vec2(b, l1 - a);
+  vec2 e1 = (dot(d,d) > 1e-9) ? normalize(d) : vec2(1.0, 0.0);
+  float MAXPX = 0.3 * min(u_viewport.x, u_viewport.y);
+  vec2 v1 = min(sqrt(2.0*l1), MAXPX) * e1;
+  vec2 v2 = min(sqrt(2.0*l2), MAXPX) * vec2(e1.y, -e1.x);
+  vec2 ndc = clip.xy / clip.w;
+  vec2 offset = (a_corner.x*v1 + a_corner.y*v2) / u_viewport * 2.0;
+  v_pos = a_corner;
+  gl_Position = vec4(ndc + offset, clip.z/clip.w, 1.0);
+}`;
+const FRAG = `#version 300 es
+precision highp float;
+uniform float u_sat;     // saturation/vibrance boost (1.0 = as-decoded)
+in vec4 v_color; in vec2 v_pos;
+out vec4 o;
+void main() {
+  float p = -dot(v_pos, v_pos);
+  if (p < -4.0) discard;
+  float alpha = exp(p) * v_color.a;
+  // SH-DC-only color is flat/desaturated; push each splat's color away from its
+  // luma before the premultiplied blend (the blend itself also greys overlaps).
+  vec3 rgb = v_color.rgb;
+  float luma = dot(rgb, vec3(0.2126, 0.7152, 0.0722));
+  rgb = clamp(mix(vec3(luma), rgb, u_sat), 0.0, 1.0);
+  o = vec4(rgb * alpha, alpha);
+}`;
+function shader(type, src){ const s=gl.createShader(type); gl.shaderSource(s,src); gl.compileShader(s);
+  if(!gl.getShaderParameter(s,gl.COMPILE_STATUS)) throw new Error(gl.getShaderInfoLog(s)); return s; }
+const prog = gl.createProgram();
+gl.attachShader(prog, shader(gl.VERTEX_SHADER, VERT));
+gl.attachShader(prog, shader(gl.FRAGMENT_SHADER, FRAG));
+gl.linkProgram(prog);
+if(!gl.getProgramParameter(prog, gl.LINK_STATUS)) { setStatus(gl.getProgramInfoLog(prog), true); throw new Error("link"); }
+gl.useProgram(prog);
+
+const quad = new Float32Array([-2,-2, 2,-2, 2,2, -2,-2, 2,2, -2,2]);
+const quadBuf = gl.createBuffer(); gl.bindBuffer(gl.ARRAY_BUFFER, quadBuf); gl.bufferData(gl.ARRAY_BUFFER, quad, gl.STATIC_DRAW);
+const aCorner = gl.getAttribLocation(prog, "a_corner");
+gl.enableVertexAttribArray(aCorner); gl.vertexAttribPointer(aCorner, 2, gl.FLOAT, false, 0, 0);
+const idxBuf = gl.createBuffer();
+const aIndex = gl.getAttribLocation(prog, "a_index");
+const tex = gl.createTexture();
+const TEX_W = 2048;
+const U = {
+  proj: gl.getUniformLocation(prog,"u_proj"), view: gl.getUniformLocation(prog,"u_view"),
+  focal: gl.getUniformLocation(prog,"u_focal"), viewport: gl.getUniformLocation(prog,"u_viewport"),
+  sat: gl.getUniformLocation(prog,"u_sat"),
+};
+gl.uniform1i(gl.getUniformLocation(prog,"u_tex"), 0);
+gl.disable(gl.DEPTH_TEST);
+gl.enable(gl.BLEND);
+gl.blendFuncSeparate(gl.ONE, gl.ONE_MINUS_SRC_ALPHA, gl.ONE, gl.ONE_MINUS_SRC_ALPHA);
+
+// ---------- splat store ----------
+let count = 0;
+let positions = new Float32Array(0);
+let texData = new Uint32Array(0);
+function ensureCapacity(n){ if (n*8 <= texData.length) return;
+  const cap = Math.max(n, Math.ceil(texData.length/8*1.5), 65536);
+  const td = new Uint32Array(cap*8); td.set(texData); texData = td;
+  const ps = new Float32Array(cap*3); ps.set(positions); positions = ps; }
+function uploadTexture(){
+  const rows = Math.max(1, Math.ceil((count*2)/TEX_W));
+  const buf = new Uint32Array(TEX_W*rows*4);
+  buf.set(texData.subarray(0, count*8));
+  gl.activeTexture(gl.TEXTURE0); gl.bindTexture(gl.TEXTURE_2D, tex);
+  gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA32UI, TEX_W, rows, 0, gl.RGBA_INTEGER, gl.UNSIGNED_INT, buf);
+  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
+  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
+}
+// load a whole step splat, replacing the store
+function setSplat(bytes){
+  const n = Math.floor(bytes.byteLength / ROW);
+  ensureCapacity(n);
+  const f32 = new Float32Array(bytes.buffer, bytes.byteOffset, n*8);
+  const u32 = new Uint32Array(bytes.buffer, bytes.byteOffset, n*8);
+  texData.set(u32.subarray(0, n*8), 0);
+  for (let i=0;i<n;i++){ positions[i*3]=f32[i*8]; positions[i*3+1]=f32[i*8+1]; positions[i*3+2]=f32[i*8+2]; }
+  count = n;
+  uploadTexture();
+  // Seed an unsorted draw order so the cloud paints immediately; the depth-sort
+  // worker refines it to proper back-to-front asynchronously. Without this there
+  // is a blank gap each step until the worker returns (and a single headless
+  // frame would capture nothing).
+  const seq = new Uint32Array(count);
+  for (let i=0;i<count;i++) seq[i]=i;
+  gl.bindBuffer(gl.ARRAY_BUFFER, idxBuf); gl.bufferData(gl.ARRAY_BUFFER, seq, gl.DYNAMIC_DRAW);
+  sortedIndex = seq; sortedCount = count;
+  $("s-splats").textContent = count.toLocaleString();
+  sortDirty = true;
+}
+
+// ---------- depth-sort worker (identical to index.html) ----------
+const worker = new Worker(URL.createObjectURL(new Blob([`
+let pos=null, n=0;
+onmessage = (e) => {
+  if (e.data.pos) { pos = e.data.pos; n = e.data.n; return; }
+  if (!pos) return;
+  const view = e.data.view, N = Math.min(n, e.data.n);
+  const d = new Float32Array(N); let dmin=Infinity, dmax=-Infinity;
+  for (let i=0;i<N;i++){ const z = view[2]*pos[i*3] + view[6]*pos[i*3+1] + view[10]*pos[i*3+2];
+    d[i]=z; if(z<dmin)dmin=z; if(z>dmax)dmax=z; }
+  const B=65536, cnt=new Uint32Array(B);
+  const scale = dmax>dmin ? (B-1)/(dmax-dmin) : 0;
+  const key=new Uint16Array(N);
+  for(let i=0;i<N;i++){ let k=((d[i]-dmin)*scale)|0; k=k<0?0:k>65535?65535:k; key[i]=k; cnt[k]++; }
+  for(let i=1;i<B;i++) cnt[i]+=cnt[i-1];
+  const order=new Uint32Array(N);
+  for(let i=N-1;i>=0;i--) order[--cnt[key[i]]]=i;
+  postMessage({order, n:N}, [order.buffer]);
+};
+`], {type:"text/javascript"})));
+let sortedIndex = new Uint32Array(0), sortedCount = 0, sorting = false, sortDirty = true;
+worker.onmessage = (e) => { sortedIndex = e.data.order; sortedCount = e.data.n; sorting = false;
+  gl.bindBuffer(gl.ARRAY_BUFFER, idxBuf); gl.bufferData(gl.ARRAY_BUFFER, sortedIndex, gl.DYNAMIC_DRAW); };
+function requestSort(view){ if (sorting || count === 0) return; sorting = true;
+  worker.postMessage({ pos: positions.slice(0, count*3), n: count }); worker.postMessage({ view, n: count }); }
+
+// ---------- camera (orbit + zoom + gentle auto-spin) ----------
+const _p = new URLSearchParams(location.search);
+let target = [0,0,0], yaw = parseFloat(_p.get("yaw") ?? "0.0"), pitch = parseFloat(_p.get("pitch") ?? "0.05"), dist = 2.0;
+let framed = false, autospin = _p.get("spin") !== "0";
+function frameToData(mult){
+  if (count === 0) return;
+  let lo=[1e30,1e30,1e30], hi=[-1e30,-1e30,-1e30];
+  for (let i=0;i<count;i++) for (let k=0;k<3;k++){ const v=positions[i*3+k]; if(v<lo[k])lo[k]=v; if(v>hi[k])hi[k]=v; }
+  const diag = Math.hypot(hi[0]-lo[0], hi[1]-lo[1], hi[2]-lo[2]);
+  target = [(lo[0]+hi[0])/2, (lo[1]+hi[1])/2, (lo[2]+hi[2])/2];
+  dist = Math.max(0.5, (mult||1.0)*diag);
+  framed = true; sortDirty = true;
+}
+let drag=false, lx=0, ly=0;
+canvas.addEventListener("pointerdown", e => { drag=true; autospin=false; lx=e.clientX; ly=e.clientY; canvas.setPointerCapture(e.pointerId); });
+canvas.addEventListener("pointerup",   () => drag=false);
+canvas.addEventListener("pointermove", e => { if(!drag)return; yaw -= (e.clientX-lx)*0.005; pitch -= (e.clientY-ly)*0.005;
+  pitch = Math.max(-1.5, Math.min(1.5, pitch)); lx=e.clientX; ly=e.clientY; sortDirty=true; });
+canvas.addEventListener("wheel", e => { dist *= Math.exp(e.deltaY*0.001); dist=Math.max(0.2,Math.min(10,dist)); sortDirty=true; e.preventDefault(); }, {passive:false});
+
+function perspective(fovy, aspect, n, f){ const t=1/Math.tan(fovy/2); return new Float32Array([t/aspect,0,0,0, 0,t,0,0, 0,0,(f+n)/(n-f),-1, 0,0,(2*f*n)/(n-f),0]); }
+function lookAt(eye, c){ const z=norm(sub(eye,c)), x=norm(cross([0,1,0],z)), y=cross(z,x);
+  return new Float32Array([x[0],y[0],z[0],0, x[1],y[1],z[1],0, x[2],y[2],z[2],0, -dot(x,eye),-dot(y,eye),-dot(z,eye),1]); }
+const sub=(a,b)=>[a[0]-b[0],a[1]-b[1],a[2]-b[2]], dot=(a,b)=>a[0]*b[0]+a[1]*b[1]+a[2]*b[2];
+const cross=(a,b)=>[a[1]*b[2]-a[2]*b[1],a[2]*b[0]-a[0]*b[2],a[0]*b[1]-a[1]*b[0]];
+const norm=(a)=>{const l=Math.hypot(...a)||1;return [a[0]/l,a[1]/l,a[2]/l];};
+function resize(){ const dpr=Math.min(devicePixelRatio,2); canvas.width=innerWidth*dpr; canvas.height=innerHeight*dpr; gl.viewport(0,0,canvas.width,canvas.height); }
+addEventListener("resize", resize); resize();
+
+// vibrance (saturation) — cosmetic, demo-layer only; ?sat= overrides so headless/
+// video renders match what you dialed in. Same control as reconstruct.html.
+let saturation = parseFloat(_q.get("sat") ?? "1.45");
+{ const vib = $("vibrance"); vib.value = saturation;
+  vib.addEventListener("input", e => { saturation = +e.target.value; }); }
+
+let frames=0, fpsT=performance.now(), lastT=performance.now();
+function render(){
+  const now=performance.now(), dt=(now-lastT)/1000; lastT=now;
+  if (autospin && !drag) { yaw += dt*0.18; sortDirty=true; }
+  const eye=[target[0]+dist*Math.sin(yaw)*Math.cos(pitch), target[1]+dist*Math.sin(pitch), target[2]+dist*Math.cos(yaw)*Math.cos(pitch)];
+  const view=lookAt(eye,target);
+  const aspect=canvas.width/canvas.height, fovy=1.0;
+  const proj=perspective(fovy, aspect, 0.05, 100);
+  const fy=canvas.height/(2*Math.tan(fovy/2)), fx=fy;
+  if(sortDirty){ requestSort(view); sortDirty=false; }
+  gl.clearColor(0.043,0.051,0.063,1); gl.clear(gl.COLOR_BUFFER_BIT);
+  gl.useProgram(prog);
+  gl.uniformMatrix4fv(U.view,false,view); gl.uniformMatrix4fv(U.proj,false,proj);
+  gl.uniform2f(U.focal,fx,fy); gl.uniform2f(U.viewport,canvas.width,canvas.height);
+  gl.uniform1f(U.sat, saturation);
+  gl.bindBuffer(gl.ARRAY_BUFFER,quadBuf); gl.enableVertexAttribArray(aCorner); gl.vertexAttribPointer(aCorner,2,gl.FLOAT,false,0,0);
+  if(sortedCount>0){
+    gl.bindBuffer(gl.ARRAY_BUFFER,idxBuf); gl.enableVertexAttribArray(aIndex);
+    gl.vertexAttribIPointer(aIndex,1,gl.UNSIGNED_INT,0,0); gl.vertexAttribDivisor(aIndex,1);
+    gl.drawArraysInstanced(gl.TRIANGLES,0,6,sortedCount);
+  }
+  frames++; if(now-fpsT>500){ $("s-fps").textContent=Math.round(frames*1000/(now-fpsT)); frames=0; fpsT=now; }
+  requestAnimationFrame(render);
+}
+
+// ---------- step playback ----------
+let manifest=null, cur=-1, playing=AUTOPLAY, timer=null;
+function renderStrip(step){
+  const strip=$("strip"); strip.innerHTML="";
+  const imgs=step.images||[];
+  const start=Math.max(0, imgs.length-6);            // show at most the latest 6
+  for (let i=start;i<imgs.length;i++){
+    const d=document.createElement("div"); d.className="thumb"+(i===imgs.length-1?" new":"");
+    const im=document.createElement("img"); im.src=`${BASE}/${imgs[i]}`; im.alt="view "+(i+1);
+    const tag=document.createElement("div"); tag.className="tag"; tag.textContent=(i===imgs.length-1?"+ view ":"view ")+(i+1);
+    d.appendChild(im); d.appendChild(tag); strip.appendChild(d);
+  }
+}
+async function loadStep(i){
+  if (!manifest || i<0 || i>=manifest.steps.length) return;
+  cur=i; const step=manifest.steps[i];
+  $("bar").firstElementChild.style.width = Math.round((i+1)/manifest.steps.length*100)+"%";
+  $("s-frames").textContent = step.label ? "fused (" + step.n + ")" : step.n;
+  const banner = $("banner");
+  banner.classList.toggle("show", !!step.label);     // prominent only on the fused step
+  setStatus(`folding view ${step.n} — ${step.splat}`);
+  renderStrip(step);
+  try {
+    const resp = await fetch(`${BASE}/${step.splat}`, { cache: "no-store" });
+    if(!resp.ok) throw new Error("HTTP "+resp.status+" for "+step.splat);
+    const bytes = new Uint8Array(await resp.arrayBuffer());
+    setSplat(bytes);
+    // frame once (camera stays put as the cloud grows); but a tree-stages manifest
+    // sets "reframe":true so each stage (very different extents) refits the view.
+    if(!framed || (manifest && manifest.reframe)) frameToData(0.9);
+    if (step.label) {
+      banner.innerHTML = `<b>✓ ${step.label}</b> · ${count.toLocaleString()} splats`;
+      setStatus(`${step.label} · ${count.toLocaleString()} splats`);
+    } else {
+      setStatus(`${step.n} photos · ${count.toLocaleString()} splats`);
+    }
+  } catch(err){ setStatus(err.message, true); console.error(err); }
+}
+function schedule(){ clearTimeout(timer); if(!playing) return;
+  timer=setTimeout(async ()=>{ await loadStep((cur+1)%manifest.steps.length); schedule(); }, STEP_MS); }
+$("prev").onclick = ()=>{ playing=false; $("play").textContent="▶ play"; loadStep(Math.max(0,cur-1)); };
+$("next").onclick = ()=>{ playing=false; $("play").textContent="▶ play"; loadStep(Math.min(manifest.steps.length-1,cur+1)); };
+$("play").onclick = ()=>{ playing=!playing; $("play").textContent = playing?"⏸ pause":"▶ play"; schedule(); };
+
+// load the manifest at the current BASE and start (re-entrant: resets per scene).
+async function loadManifestAndPlay(){
+  clearTimeout(timer);
+  manifest=null; cur=-1; framed=false;
+  try {
+    setStatus("loading manifest…");
+    manifest = await (await fetch(`${BASE}/manifest.json`, { cache: "no-store" })).json();
+    if(!manifest.steps || !manifest.steps.length) throw new Error("manifest has no steps");
+    const s0 = Math.max(0, Math.min(manifest.steps.length-1, START_STEP));
+    playing = AUTOPLAY;
+    await loadStep(s0);
+    $("play").textContent = playing ? "⏸ pause" : "▶ play";
+    schedule();
+  } catch(err){ setStatus("manifest.json not found — "+err.message, true); console.error(err); }
+}
+
+// ---------- scene switching (server mode) ----------
+function populateScenes(scenes){
+  const sel=$("scene-sel"); sel.innerHTML="";
+  for (const sc of scenes){
+    const o=document.createElement("option"); o.value=sc.name;
+    o.textContent=`${sc.label} · ${sc.steps} steps`; sel.appendChild(o);
+  }
+}
+async function selectScene(name){
+  if(!name) return;
+  $("scene-sel").value = name;
+  BASE = "/scenes-assets/" + name;
+  await loadManifestAndPlay();
+}
+$("scene-sel").onchange = () => selectScene($("scene-sel").value);
+
+// ---------- upload a video -> bake a scene ----------
+function slug(s){ return (s||"").toLowerCase().replace(/[^a-z0-9_-]+/g,"-").replace(/^-+|-+$/g,"").slice(0,48); }
+$("vid").onchange = () => { if(!$("vid-name").value && $("vid").files[0])
+  $("vid-name").value = slug($("vid").files[0].name.replace(/\.[^.]+$/,"")); };
+$("vid-go").onclick = async () => {
+  const f = $("vid").files[0];
+  const nm = slug($("vid-name").value || (f && f.name.replace(/\.[^.]+$/,"")));
+  if(!f){ setStatus("pick a video first", true); return; }
+  if(!nm){ setStatus("enter a scene name", true); return; }
+  const fd = new FormData(); fd.append("video", f); fd.append("name", nm);
+  const go=$("vid-go"), bar=$("vid-bar"); go.disabled=true; bar.style.display="block"; bar.firstElementChild.style.width="0%";
+  setStatus(`uploading ${f.name}…`);
+  let resp;
+  try { resp = await fetch("/api/scene/from-video", { method:"POST", body:fd }); }
+  catch(e){ setStatus("upload failed: "+e.message, true); go.disabled=false; return; }
+  if(!resp.ok){ setStatus("rejected: "+(await resp.text()).trim(), true); go.disabled=false; bar.style.display="none"; return; }
+  const { job } = await resp.json();
+  const poll = setInterval(async () => {
+    let st; try { st = await (await fetch("/api/scene/status/"+encodeURIComponent(job), {cache:"no-store"})).json(); }
+    catch(e){ return; }
+    if(st.state==="running"){
+      setStatus(`baking ${nm}: ${st.done}/${st.total} frames · ${st.kept} kept`);
+      bar.firstElementChild.style.width = st.total ? Math.round(st.done/st.total*100)+"%" : "0%";
+    } else if(st.state==="done"){
+      clearInterval(poll); go.disabled=false; bar.firstElementChild.style.width="100%";
+      const scenes=(await (await fetch("/api/scenes",{cache:"no-store"})).json()).scenes;
+      populateScenes(scenes); await selectScene(st.scene);
+      setTimeout(()=>{ bar.style.display="none"; }, 1200);
+    } else if(st.state==="error"){
+      clearInterval(poll); go.disabled=false; bar.style.display="none";
+      setStatus("bake failed: "+st.error, true);
+    }
+  }, 1200);
+};
+
+// ---------- init: server mode (scene list) vs standalone (?dir / local manifest) ----------
+(async function init(){
+  requestAnimationFrame(render);
+  if (BASE) { await loadManifestAndPlay(); return; }   // ?dir= : explicit standalone dir
+  let scenes = null;
+  try { const r = await fetch("/api/scenes", { cache:"no-store" }); if(r.ok) scenes = (await r.json()).scenes; } catch(e){}
+  if (scenes !== null) {                                // served by the Go app
+    serverMode = true;
+    $("server-ui").style.display = "block";
+    populateScenes(scenes);
+    if (scenes.length) {
+      const want = _q.get("scene");
+      const pick = scenes.find(s => s.name === want) ? want : scenes[0].name;
+      await selectScene(pick);
+    } else setStatus("no scenes yet — upload a video to make one");
+    return;
+  }
+  BASE = ".";                                           // standalone baked dir: manifest.json next to this page
+  await loadManifestAndPlay();
+})();
+</script>
+</body>
+</html>
diff --git a/server/web/demo.html b/server/web/demo.html
index f0549c2..ab78579 100644
--- a/server/web/demo.html
+++ b/server/web/demo.html
@@ -45,6 +45,7 @@
 <canvas id="gl"></canvas>
 
 <div class="card" id="ui">
+  <a href="/" style="display:inline-block;color:#7aa2ff;text-decoration:none;font-size:11.5px;margin-bottom:6px">‹ all demos</a>
   <h1>free-splatter.cpp · demo render</h1>
   <div class="sub" id="sub">loading storyboard…</div>
   <div class="row">
diff --git a/server/web/index.html b/server/web/index.html
index 2c0cb09..cc73877 100644
--- a/server/web/index.html
+++ b/server/web/index.html
@@ -1,554 +1,60 @@
 <!doctype html>
 <!--
-  free-splatter.cpp web app.
-  Drop photos -> POST /api/reconstruct (Go server, purego -> Vulkan engine) ->
-  fetch the .splat -> render in WebGL2 with depth-sorted alpha blending.
-  The .splat is importance-sorted, so the "build-up" reveal slider/animation
-  shows the scene assembling from its most important gaussians first — the honest
-  analogue of a "step count" for a feed-forward (one-pass) reconstruction.
-  EWA splat shaders follow 3DGS (Kerbl et al.) / antimatter15's WebGL formulation.
+  free-splatter.cpp — landing menu. One server, one port: this page links to the
+  demo pages (each its own route on this same server). Add a card here when you add
+  a page.
 -->
 <html lang="en">
 <head>
 <meta charset="utf-8"/>
 <meta name="viewport" content="width=device-width, initial-scale=1"/>
-<title>free-splatter.cpp — photos to gaussian splat</title>
+<title>free-splatter.cpp — demos</title>
 <style>
   :root { color-scheme: dark; }
   * { margin: 0; box-sizing: border-box; }
-  html, body { height: 100%; background: #0b0d10; overflow: hidden; font-family: ui-sans-serif, system-ui, sans-serif; color: #e8eaed; }
-  #gl { position: fixed; inset: 0; width: 100%; height: 100%; display: block; }
-
-  .card { background: rgba(16,19,24,.86); backdrop-filter: blur(10px);
-    border: 1px solid rgba(255,255,255,.08); border-radius: 14px; box-shadow: 0 10px 40px rgba(0,0,0,.5); }
-
-  /* upload overlay (initial + "new photos") */
-  #drop { position: fixed; inset: 0; z-index: 20; display: flex; align-items: center; justify-content: center; }
-  #drop .card { width: min(440px, 92vw); padding: 26px 26px 22px; text-align: center; }
-  #drop h1 { font-size: 18px; font-weight: 680; letter-spacing: .2px; }
-  #model { width: 100%; margin-top: 12px; padding: 7px 9px; font-size: 12.5px; color: #e8eaed;
-    background: rgba(255,255,255,.06); border: 1px solid rgba(255,255,255,.12); border-radius: 8px; cursor: pointer; }
-  #model:hover { background: rgba(255,255,255,.1); }
-  #bgrow { display: flex; align-items: center; gap: 6px; font-size: 12px; color: #b9c0cc; margin-top: 12px; cursor: pointer; }
-  #bgrow input { accent-color: #6f9bff; }
-  #bgrow .bgnote { color: #8b93a1; font-size: 11px; }
-  #drop .sub { color: #8b93a1; font-size: 12.5px; margin-top: 4px; line-height: 1.5; }
-  #zone { margin-top: 18px; border: 1.5px dashed rgba(255,255,255,.18); border-radius: 12px;
-    padding: 26px 16px; cursor: pointer; transition: border-color .15s, background .15s; }
-  #zone.hot { border-color: #6f9bff; background: rgba(91,140,255,.08); }
-  #zone .big { font-size: 13.5px; }
-  #zone .small { color: #8b93a1; font-size: 11.5px; margin-top: 4px; }
-  #thumbs { display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin-top: 14px; }
-  #thumbs img { width: 62px; height: 62px; object-fit: cover; border-radius: 8px; border: 1px solid rgba(255,255,255,.12); }
-  .btn { margin-top: 16px; width: 100%; padding: 11px; font-size: 13.5px; font-weight: 640; color: #fff; cursor: pointer;
-    border: none; border-radius: 10px; background: linear-gradient(90deg,#5b8cff,#8a5bff); }
-  .btn:disabled { opacity: .45; cursor: not-allowed; }
-  .btn.ghost { background: rgba(255,255,255,.08); }
-  #drop .err { color: #ff8a8a; font-size: 12px; margin-top: 10px; min-height: 1em; }
-
-  /* viewer panel (after reconstruction) */
-  #panel { position: fixed; top: 16px; left: 16px; z-index: 10; width: 264px; padding: 14px 16px; font-size: 13px; line-height: 1.45; display: none; }
-  #panel h1 { font-size: 14px; font-weight: 650; }
-  #panel .sub { color: #8b93a1; font-size: 11.5px; margin-top: 2px; }
-  .stat { display: flex; justify-content: space-between; font-variant-numeric: tabular-nums; margin-top: 3px; }
-  .stat span:first-child { color: #8b93a1; }
-  #status { margin-top: 9px; color: #9fd3a0; font-size: 11.5px; min-height: 1.2em; }
-  #status.err { color: #ff8a8a; }
-  #autores { display: flex; align-items: center; gap: 6px; font-size: 11.5px; color: #8b93a1; margin-top: 7px; cursor: pointer; }
-  #autores input { accent-color: #6f9bff; }
-  .rev { margin-top: 12px; border-top: 1px solid rgba(255,255,255,.08); padding-top: 11px; }
-  .rev .lab { display: flex; justify-content: space-between; font-size: 11.5px; color: #8b93a1; }
-  #reveal { width: 100%; margin: 7px 0 4px; accent-color: #6f9bff; }
-  .row { display: flex; gap: 8px; margin-top: 8px; }
-  .row .btn { margin-top: 0; padding: 8px; font-size: 12px; }
-  #hint { position: fixed; bottom: 14px; left: 50%; transform: translateX(-50%); z-index: 10;
-    color: #6b7280; font-size: 11.5px; background: rgba(16,19,24,.7); padding: 6px 12px; border-radius: 8px; display: none; }
+  body { min-height: 100vh; background: radial-gradient(1200px 620px at 50% -8%, #171b24, #0b0d10);
+    color: #e8eaed; font-family: ui-sans-serif, system-ui, sans-serif;
+    display: flex; flex-direction: column; align-items: center; justify-content: center;
+    padding: 48px 20px; gap: 30px; }
+  .head { text-align: center; }
+  .head h1 { font-size: 27px; font-weight: 680; letter-spacing: .2px; }
+  .head p { color: #8b93a1; font-size: 14px; margin-top: 7px; }
+  .grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(244px, 1fr));
+    gap: 18px; width: 100%; max-width: 880px; }
+  a.card { display: block; text-decoration: none; color: inherit; background: rgba(16,19,24,.72);
+    border: 1px solid rgba(255,255,255,.08); border-radius: 14px; padding: 22px;
+    box-shadow: 0 8px 30px rgba(0,0,0,.35);
+    transition: transform .15s ease, border-color .15s ease, background .15s ease; }
+  a.card:hover { transform: translateY(-3px); border-color: rgba(122,162,255,.6); background: rgba(20,24,32,.92); }
+  a.card .ico { font-size: 27px; line-height: 1; }
+  a.card h2 { font-size: 16px; font-weight: 640; margin-top: 12px; }
+  a.card p { color: #8b93a1; font-size: 12.5px; line-height: 1.5; margin-top: 7px; }
+  .foot { color: #5b6473; font-size: 11.5px; }
+  .foot a { color: #6b7787; }
 </style>
 </head>
 <body>
-<canvas id="gl"></canvas>
-
-<div id="drop">
-  <div class="card">
-    <h1>Photos → 3D gaussian splat</h1>
-    <select id="model" title="reconstruction model"></select>
-    <div class="sub" id="sub">Drop photos of the same scene from different angles — they must overlap. One feed-forward pass on the GPU (Vulkan).</div>
-    <div id="zone">
-      <div class="big">📷 Drag photos here, or click to choose</div>
-      <div class="small" id="zone-hint">JPG / PNG · overlapping views</div>
-    </div>
-    <label id="bgrow" style="display:none"><input type="checkbox" id="bgremove" checked/> remove background <span class="bgnote">(recommended; ~2s/photo)</span></label>
-    <div id="thumbs"></div>
-    <button id="go" class="btn" disabled>Reconstruct</button>
-    <div id="err" class="err"></div>
-    <input id="file" type="file" accept="image/*" multiple hidden/>
+  <div class="head">
+    <h1>free-splatter.cpp</h1>
+    <p>FreeSplatter, in C++/GGML on the GPU — pick a demo.</p>
   </div>
-</div>
-
-<div id="panel" class="card">
-  <h1>free-splatter.cpp</h1>
-  <div class="sub">3D gaussian scene from your photos.</div>
-  <div class="stat"><span>views</span><span id="s-views">—</span></div>
-  <div class="stat"><span>splats</span><span id="s-splats">0</span></div>
-  <div class="stat"><span>reconstruct</span><span id="s-time">—</span></div>
-  <div class="stat"><span>fps</span><span id="s-fps">—</span></div>
-  <div class="stat"><span>render</span><span id="s-render">100%</span></div>
-  <label id="autores"><input type="checkbox" id="autores-cb" checked/> auto quality (smooth navigation)</label>
-  <label id="vibrow" style="display:flex;align-items:center;gap:8px;font-size:11.5px;color:#8b93a1;margin-top:7px;">
-    <span>vibrance</span><input id="vibrance" type="range" min="0.6" max="2.2" step="0.05" value="1.45" style="flex:1;accent-color:#6f9bff"/>
-  </label>
-  <div id="status">…</div>
-  <div class="rev">
-    <div class="lab"><span>build-up</span><span id="s-reveal">100%</span></div>
-    <input id="reveal" type="range" min="0" max="1000" value="1000"/>
-    <div class="row">
-      <button id="replay" class="btn ghost">▶ replay</button>
-      <button id="again" class="btn ghost">＋ new photos</button>
-    </div>
+  <div class="grid">
+    <a class="card" href="/reconstruct.html">
+      <div class="ico">📷</div>
+      <h2>Photos → 3D splat</h2>
+      <p>Drop a few overlapping photos of a scene; one feed-forward GPU pass returns a 3D gaussian splat you can orbit.</p>
+    </a>
+    <a class="card" href="/accumulate.html">
+      <div class="ico">🌱</div>
+      <h2>Accumulating scenes</h2>
+      <p>Watch one world grow as photos are folded in — switch between scenes, or upload a video and it builds the scene for you.</p>
+    </a>
+    <a class="card" href="/demo.html">
+      <div class="ico">🎬</div>
+      <h2>Storyboard</h2>
+      <p>A guided tour through reconstructions — also the offline, deterministic video-render harness.</p>
+    </a>
   </div>
-</div>
-
-<div id="hint">drag to orbit · scroll to zoom · WASD/arrows to move</div>
-
-<script type="module">
-const ROW = 32;                       // bytes per splat
-const $ = (id) => document.getElementById(id);
-const P = new URLSearchParams(location.search);   // ?id=&yaw=&pitch=&dist=&reveal=
-const setStatus = (t, err=false) => { const s=$("status"); s.textContent=t; s.classList.toggle("err",err); };
-
-const canvas = $("gl");
-const gl = canvas.getContext("webgl2", { antialias: false, premultipliedAlpha: true, alpha: false });
-if (!gl) { $("err").textContent = "WebGL2 not available in this browser."; throw new Error("no webgl2"); }
-
-// ---------- shaders (EWA splatting) ----------
-const VERT = `#version 300 es
-precision highp float; precision highp int;
-uniform highp usampler2D u_tex;
-uniform mat4 u_proj, u_view;
-uniform vec2 u_focal, u_viewport;
-in vec2 a_corner;      // quad corner in [-2,2]
-in uint a_index;       // sorted splat index
-out vec4 v_color;
-out vec2 v_pos;
-void main() {
-  uint t = a_index * 2u;
-  ivec2 wh = textureSize(u_tex, 0);
-  uvec4 A = texelFetch(u_tex, ivec2(int(t % uint(wh.x)), int(t / uint(wh.x))), 0);
-  uvec4 B = texelFetch(u_tex, ivec2(int((t+1u) % uint(wh.x)), int((t+1u) / uint(wh.x))), 0);
-  vec3 center = uintBitsToFloat(A.xyz);
-  vec3 scale  = vec3(uintBitsToFloat(A.w), uintBitsToFloat(B.x), uintBitsToFloat(B.y));
-  uint rgba = B.z, rotp = B.w;
-  v_color = vec4(float(rgba & 0xffu), float((rgba>>8)&0xffu), float((rgba>>16)&0xffu), float((rgba>>24)&0xffu)) / 255.0;
-  vec4 q = (vec4(float(rotp&0xffu), float((rotp>>8)&0xffu), float((rotp>>16)&0xffu), float((rotp>>24)&0xffu)) - 128.0) / 128.0;
-  q = normalize(q);                    // (w,x,y,z)
-  float w=q.x, x=q.y, y=q.z, z=q.w;
-  mat3 R = mat3(
-    1.-2.*(y*y+z*z), 2.*(x*y+w*z),   2.*(x*z-w*y),
-    2.*(x*y-w*z),    1.-2.*(x*x+z*z),2.*(y*z+w*x),
-    2.*(x*z+w*y),    2.*(y*z-w*x),   1.-2.*(x*x+y*y));
-  mat3 S = mat3(scale.x,0.,0., 0.,scale.y,0., 0.,0.,scale.z);
-  mat3 M = R * S;
-  mat3 cov3d = M * transpose(M);
-
-  vec4 cam = u_view * vec4(center, 1.0);
-  if (cam.z >= -0.2) { gl_Position = vec4(0.0,0.0,2.0,1.0); return; }
-  vec4 clip = u_proj * cam;
-  if (clip.w <= 0.0 || abs(clip.x) > 1.3*clip.w || abs(clip.y) > 1.3*clip.w) {
-    gl_Position = vec4(0.0,0.0,2.0,1.0); return;
-  }
-  mat3 J = mat3(
-    u_focal.x/cam.z, 0.0, -(u_focal.x*cam.x)/(cam.z*cam.z),
-    0.0, -u_focal.y/cam.z, (u_focal.y*cam.y)/(cam.z*cam.z),
-    0.0, 0.0, 0.0);
-  mat3 W = transpose(mat3(u_view));
-  mat3 T = W * J;
-  mat3 cov2 = transpose(T) * cov3d * T;
-  float a = cov2[0][0] + 0.3, b = cov2[0][1], c = cov2[1][1] + 0.3;
-  float det = a*c - b*b;
-  if (det <= 0.0) { gl_Position = vec4(0.0,0.0,2.0,1.0); return; }
-  float mid = 0.5*(a+c);
-  float rad = sqrt(max(0.0, mid*mid - det));
-  float l1 = mid + rad;
-  float l2 = max(mid - rad, 0.1);
-  vec2 d = vec2(b, l1 - a);
-  vec2 e1 = (dot(d,d) > 1e-9) ? normalize(d) : vec2(1.0, 0.0);
-  float MAXPX = 0.20 * min(u_viewport.x, u_viewport.y);
-  vec2 v1 = min(sqrt(2.0*l1), MAXPX) * e1;
-  vec2 v2 = min(sqrt(2.0*l2), MAXPX) * vec2(e1.y, -e1.x);
-
-  vec2 ndc = clip.xy / clip.w;
-  vec2 offset = (a_corner.x*v1 + a_corner.y*v2) / u_viewport * 2.0;
-  v_pos = a_corner;
-  gl_Position = vec4(ndc + offset, clip.z/clip.w, 1.0);
-}`;
-
-const FRAG = `#version 300 es
-precision highp float;
-uniform float u_sat;     // saturation/vibrance boost (1.0 = as-decoded)
-in vec4 v_color; in vec2 v_pos;
-out vec4 o;
-void main() {
-  float p = -dot(v_pos, v_pos);
-  if (p < -4.0) discard;
-  float alpha = exp(p) * v_color.a;
-  // SH-DC-only color is flat/desaturated; push each splat's color away from its
-  // luma before the premultiplied blend (the blend itself also greys overlaps).
-  vec3 rgb = v_color.rgb;
-  float luma = dot(rgb, vec3(0.2126, 0.7152, 0.0722));
-  rgb = clamp(mix(vec3(luma), rgb, u_sat), 0.0, 1.0);
-  o = vec4(rgb * alpha, alpha);
-}`;
-
-function shader(type, src){ const s=gl.createShader(type); gl.shaderSource(s,src); gl.compileShader(s);
-  if(!gl.getShaderParameter(s,gl.COMPILE_STATUS)) throw new Error(gl.getShaderInfoLog(s)); return s; }
-const prog = gl.createProgram();
-gl.attachShader(prog, shader(gl.VERTEX_SHADER, VERT));
-gl.attachShader(prog, shader(gl.FRAGMENT_SHADER, FRAG));
-gl.linkProgram(prog);
-if(!gl.getProgramParameter(prog, gl.LINK_STATUS)) throw new Error(gl.getProgramInfoLog(prog));
-gl.useProgram(prog);
-
-const quad = new Float32Array([-2,-2, 2,-2, 2,2, -2,-2, 2,2, -2,2]);
-const quadBuf = gl.createBuffer(); gl.bindBuffer(gl.ARRAY_BUFFER, quadBuf); gl.bufferData(gl.ARRAY_BUFFER, quad, gl.STATIC_DRAW);
-const aCorner = gl.getAttribLocation(prog, "a_corner");
-const idxBuf = gl.createBuffer();
-const aIndex = gl.getAttribLocation(prog, "a_index");
-const tex = gl.createTexture();
-const TEX_W = 2048;
-const U = {
-  proj: gl.getUniformLocation(prog,"u_proj"), view: gl.getUniformLocation(prog,"u_view"),
-  focal: gl.getUniformLocation(prog,"u_focal"), viewport: gl.getUniformLocation(prog,"u_viewport"),
-  sat: gl.getUniformLocation(prog,"u_sat"),
-};
-gl.uniform1i(gl.getUniformLocation(prog,"u_tex"), 0);
-gl.disable(gl.DEPTH_TEST);
-gl.enable(gl.BLEND);
-gl.blendFuncSeparate(gl.ONE, gl.ONE_MINUS_SRC_ALPHA, gl.ONE, gl.ONE_MINUS_SRC_ALPHA);
-
-// ---------- splat store ----------
-let count = 0;                         // total splats ingested
-let revealCount = 0;                   // top-K (importance-sorted) actually shown
-let followFull = true;                 // reveal tracks count as data streams in
-// default render ceiling: show the top 300k for smooth navigation even when the
-// splat holds up to ~1M; the slider still reaches the full count. ?maxsplats= overrides.
-const VIEW_CAP = parseInt(P.get("maxsplats") ?? "300000", 10);
-// ease-in reveal: front-loaded importance-sorted splats fill fast then stall, so
-// reveal count slowly at first to even out the visible build-up. ?buildcurve= tunes.
-const BUILD_CURVE = parseFloat(P.get("buildcurve") ?? "3");
-let posSent = false;                   // positions uploaded to the sort worker?
-let positions = new Float32Array(0);   // xyz per splat (for sorting)
-let texData = new Uint32Array(0);      // 8 uint32 per splat
-
-function ensureCapacity(n) {
-  if (n*8 <= texData.length) return;
-  const cap = Math.max(n, Math.ceil(texData.length/8*1.5), 65536);
-  const td = new Uint32Array(cap*8); td.set(texData); texData = td;
-  const ps = new Float32Array(cap*3); ps.set(positions); positions = ps;
-}
-function uploadTexture() {
-  const rows = Math.ceil((count*2)/TEX_W);
-  const buf = new Uint32Array(TEX_W*rows*4);
-  buf.set(texData.subarray(0, count*8));
-  gl.activeTexture(gl.TEXTURE0); gl.bindTexture(gl.TEXTURE_2D, tex);
-  gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA32UI, TEX_W, rows, 0, gl.RGBA_INTEGER, gl.UNSIGNED_INT, buf);
-  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
-  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
-}
-function ingest(bytes) {
-  const n = Math.floor(bytes.byteLength / ROW);
-  ensureCapacity(n);
-  const f32 = new Float32Array(bytes.buffer, bytes.byteOffset, n*8);
-  const u32 = new Uint32Array(bytes.buffer, bytes.byteOffset, n*8);
-  texData.set(u32.subarray(0, n*8), 0);
-  for (let i=0;i<n;i++){ positions[i*3]=f32[i*8]; positions[i*3+1]=f32[i*8+1]; positions[i*3+2]=f32[i*8+2]; }
-  count = n;
-  posSent = false;                     // new positions -> resend to the sort worker
-  if (followFull) revealCount = Math.min(count, VIEW_CAP);
-  uploadTexture();
-  $("s-splats").textContent = count.toLocaleString();
-  syncRevealUI();
-  if (count > 5000) frameToData();
-}
-
-// ---------- reveal ----------
-function setReveal(k) {
-  revealCount = Math.max(0, Math.min(count, k|0));
-  followFull = revealCount >= count;
-  sortDirty = true;
-  syncRevealUI();
-}
-function syncRevealUI() {
-  const r = $("reveal");
-  r.max = Math.max(1, count);
-  r.value = revealCount;
-  const pct = count ? Math.round(revealCount/count*100) : 0;
-  $("s-reveal").textContent = revealCount.toLocaleString() + " · " + pct + "%";
-}
-$("reveal").addEventListener("input", e => setReveal(+e.target.value));
-
-let buildupRAF = 0;
-function playBuildup(ms=3500) {
-  if (count === 0) return;
-  cancelAnimationFrame(buildupRAF);
-  const cap = Math.min(count, VIEW_CAP);         // build up to the render ceiling, not the full 1M
-  const t0 = performance.now(), start = Math.max(1, Math.floor(cap*0.012));
-  const step = (now) => {
-    const u = Math.min(1, (now - t0) / ms);
-    const e = Math.pow(u, BUILD_CURVE);          // ease-in (even visible build-up)
-    setReveal(Math.floor(start + (cap-start)*e));
-    if (u < 1) buildupRAF = requestAnimationFrame(step); else setReveal(cap);
-  };
-  buildupRAF = requestAnimationFrame(step);
-}
-$("replay").addEventListener("click", () => playBuildup());
-
-// ---------- depth-sort worker ----------
-const worker = new Worker(URL.createObjectURL(new Blob([`
-let pos=null, n=0;
-onmessage = (e) => {
-  if (e.data.pos) { pos = e.data.pos; n = e.data.n; return; }
-  if (!pos) return;
-  const view = e.data.view, N = Math.min(n, e.data.n);
-  const d = new Float32Array(N); let dmin=Infinity, dmax=-Infinity;
-  for (let i=0;i<N;i++){
-    const z = view[2]*pos[i*3] + view[6]*pos[i*3+1] + view[10]*pos[i*3+2];
-    d[i]=z; if(z<dmin)dmin=z; if(z>dmax)dmax=z;
-  }
-  const B=65536, cnt=new Uint32Array(B);
-  const scale = dmax>dmin ? (B-1)/(dmax-dmin) : 0;
-  const key=new Uint16Array(N);
-  for(let i=0;i<N;i++){ let k=((d[i]-dmin)*scale)|0; k=k<0?0:k>65535?65535:k; key[i]=k; cnt[k]++; }
-  for(let i=1;i<B;i++) cnt[i]+=cnt[i-1];
-  const order=new Uint32Array(N);
-  for(let i=N-1;i>=0;i--) order[--cnt[key[i]]]=i;
-  postMessage({order, n:N}, [order.buffer]);
-};
-`], {type:"text/javascript"})));
-
-let sortedIndex = new Uint32Array(0), sortedCount = 0, sorting = false, sortDirty = true;
-worker.onmessage = (e) => {
-  sortedIndex = e.data.order; sortedCount = e.data.n; sorting = false;
-  gl.bindBuffer(gl.ARRAY_BUFFER, idxBuf);
-  gl.bufferData(gl.ARRAY_BUFFER, sortedIndex, gl.DYNAMIC_DRAW);
-};
-function requestSort(view) {
-  if (sorting || revealCount === 0) return;
-  sorting = true;
-  // positions change only on ingest; resend them then, not every camera move
-  // (the worker keeps the full set and sorts the first revealCount of them).
-  if (!posSent) { worker.postMessage({ pos: positions.slice(0, count*3), n: count }); posSent = true; }
-  worker.postMessage({ view, n: revealCount });
-}
-
-// ---------- camera ----------
-// yaw/pitch/dist overridable via query (shareable views + headless screenshots).
-let target = [0,0,0], framed = false;
-let yaw   = parseFloat(P.get("yaw")   ?? "0.0");
-let pitch = parseFloat(P.get("pitch") ?? "0.05");
-let dist  = parseFloat(P.get("dist")  ?? "2.0");
-function frameToData() {
-  if (framed || count === 0) return;
-  let lo = [1e30,1e30,1e30], hi = [-1e30,-1e30,-1e30];
-  for (let i=0;i<count;i++) for (let k=0;k<3;k++){ const v=positions[i*3+k]; if(v<lo[k])lo[k]=v; if(v>hi[k])hi[k]=v; }
-  const diag = Math.hypot(hi[0]-lo[0], hi[1]-lo[1], hi[2]-lo[2]);
-  target = [(lo[0]+hi[0])/2, (lo[1]+hi[1])/2, (lo[2]+hi[2])/2];
-  if (!P.has("dist")) dist = Math.max(0.5, 0.7*diag);
-  framed = true; sortDirty = true;
-}
-const keys = {};
-addEventListener("keydown", e => { if (drop.style.display==="none") keys[e.key.toLowerCase()] = true; });
-addEventListener("keyup",   e => keys[e.key.toLowerCase()] = false);
-let drag=false, lx=0, ly=0;
-canvas.addEventListener("pointerdown", e => { drag=true; lx=e.clientX; ly=e.clientY; canvas.setPointerCapture(e.pointerId); });
-canvas.addEventListener("pointerup",   e => drag=false);
-canvas.addEventListener("pointermove", e => { if(!drag)return; yaw -= (e.clientX-lx)*0.005; pitch -= (e.clientY-ly)*0.005;
-  pitch = Math.max(-1.5, Math.min(1.5, pitch)); lx=e.clientX; ly=e.clientY; sortDirty=true; });
-canvas.addEventListener("wheel", e => { dist *= Math.exp(e.deltaY*0.001); dist=Math.max(0.2,Math.min(10,dist)); sortDirty=true; e.preventDefault(); }, {passive:false});
-
-function mat4Mul(a,b){ const o=new Float32Array(16); for(let i=0;i<4;i++)for(let j=0;j<4;j++){let s=0;for(let k=0;k<4;k++)s+=a[k*4+j]*b[i*4+k];o[i*4+j]=s;} return o; }
-function perspective(fovy, aspect, n, f){ const t=1/Math.tan(fovy/2); return new Float32Array([t/aspect,0,0,0, 0,t,0,0, 0,0,(f+n)/(n-f),-1, 0,0,(2*f*n)/(n-f),0]); }
-function lookAt(eye, c){ const z=norm(sub(eye,c)), x=norm(cross([0,1,0],z)), y=cross(z,x);
-  return new Float32Array([x[0],y[0],z[0],0, x[1],y[1],z[1],0, x[2],y[2],z[2],0, -dot(x,eye),-dot(y,eye),-dot(z,eye),1]); }
-const sub=(a,b)=>[a[0]-b[0],a[1]-b[1],a[2]-b[2]], dot=(a,b)=>a[0]*b[0]+a[1]*b[1]+a[2]*b[2];
-const cross=(a,b)=>[a[1]*b[2]-a[2]*b[1],a[2]*b[0]-a[0]*b[2],a[0]*b[1]-a[1]*b[0]];
-const norm=(a)=>{const l=Math.hypot(...a)||1;return [a[0]/l,a[1]/l,a[2]/l];};
-
-// adaptive resolution: overdraw (not geometry) is the bottleneck when you move
-// into the cloud, so trade internal render scale for framerate and restore it
-// when FPS recovers. The CSS size stays full; only the GL backing shrinks.
-let renderScale = 1.0, autoRes = true;
-// vibrance (saturation) — cosmetic, in the demo layer only; ?sat= overrides so
-// headless/video renders match what you dialed in.
-let saturation = parseFloat(P.get("sat") ?? "1.45");
-{ const vib = $("vibrance"); vib.value = saturation;
-  vib.addEventListener("input", e => { saturation = +e.target.value; }); }
-function resize(){ const dpr=Math.min(devicePixelRatio,2)*renderScale;
-  canvas.width=Math.max(1,Math.round(innerWidth*dpr)); canvas.height=Math.max(1,Math.round(innerHeight*dpr));
-  gl.viewport(0,0,canvas.width,canvas.height); }
-addEventListener("resize", resize); resize();
-$("autores-cb").addEventListener("change", e => {
-  autoRes = e.target.checked;
-  if (!autoRes) { renderScale = 1.0; resize(); $("s-render").textContent = "100%"; }
-});
-
-let frames=0, fpsT=performance.now();
-function render(){
-  const speed = dist*0.02;
-  const fwd=[Math.sin(yaw)*Math.cos(pitch),Math.sin(pitch),Math.cos(yaw)*Math.cos(pitch)];
-  const right=norm(cross([0,1,0],fwd));
-  if(keys["w"]||keys["arrowup"]) { for(let i=0;i<3;i++) target[i]-=fwd[i]*speed; sortDirty=true; }
-  if(keys["s"]||keys["arrowdown"]){ for(let i=0;i<3;i++) target[i]+=fwd[i]*speed; sortDirty=true; }
-  if(keys["a"]||keys["arrowleft"]){ for(let i=0;i<3;i++) target[i]-=right[i]*speed; sortDirty=true; }
-  if(keys["d"]||keys["arrowright"]){for(let i=0;i<3;i++) target[i]+=right[i]*speed; sortDirty=true; }
-
-  const eye=[target[0]+dist*Math.sin(yaw)*Math.cos(pitch), target[1]+dist*Math.sin(pitch), target[2]+dist*Math.cos(yaw)*Math.cos(pitch)];
-  const view=lookAt(eye,target);
-  const aspect=canvas.width/canvas.height, fovy=1.0;
-  const proj=perspective(fovy, aspect, 0.05, 100);
-  const fy=canvas.height/(2*Math.tan(fovy/2)), fx=fy;
-
-  if(sortDirty){ requestSort(view); sortDirty=false; }
-
-  gl.clearColor(0.043,0.051,0.063,1); gl.clear(gl.COLOR_BUFFER_BIT);
-  gl.useProgram(prog);
-  gl.uniformMatrix4fv(U.view,false,view); gl.uniformMatrix4fv(U.proj,false,proj);
-  gl.uniform2f(U.focal,fx,fy); gl.uniform2f(U.viewport,canvas.width,canvas.height);
-  gl.uniform1f(U.sat, saturation);
-  gl.bindBuffer(gl.ARRAY_BUFFER,quadBuf); gl.enableVertexAttribArray(aCorner); gl.vertexAttribPointer(aCorner,2,gl.FLOAT,false,0,0);
-  if(sortedCount>0){
-    gl.bindBuffer(gl.ARRAY_BUFFER,idxBuf); gl.enableVertexAttribArray(aIndex);
-    gl.vertexAttribIPointer(aIndex,1,gl.UNSIGNED_INT,0,0); gl.vertexAttribDivisor(aIndex,1);
-    gl.drawArraysInstanced(gl.TRIANGLES,0,6,sortedCount);
-  }
-  frames++; const now=performance.now();
-  if(now-fpsT>500){
-    const fps=Math.round(frames*1000/(now-fpsT)); frames=0; fpsT=now;
-    $("s-fps").textContent=fps;
-    if(autoRes){                       // hold ~30fps by trading resolution
-      if(fps<26 && renderScale>0.35){ renderScale=Math.max(0.35,renderScale-0.12); resize(); }
-      else if(fps>52 && renderScale<1.0){ renderScale=Math.min(1.0,renderScale+0.12); resize(); }
-    }
-    $("s-render").textContent=Math.round(renderScale*100)+"%";
-  }
-  requestAnimationFrame(render);
-}
-requestAnimationFrame(render);
-
-// ---------- splat load (progressive) ----------
-async function loadSplat(url){
-  framed = false; followFull = true; count = 0; revealCount = 0; posSent = false;
-  const resp = await fetch(url);
-  if(!resp.ok) throw new Error("HTTP "+resp.status+" for splat");
-  const total = +resp.headers.get("content-length") || 0;
-  const reader = resp.body.getReader();
-  let chunks=[], received=0, lastIngest=0;
-  while(true){
-    const {done,value}=await reader.read();
-    if(done) break;
-    chunks.push(value); received+=value.byteLength;
-    if(received-lastIngest > 4096*ROW || !total){
-      const all=new Uint8Array(received); let o=0; for(const c of chunks){all.set(c,o); o+=c.byteLength;}
-      ingest(all); sortDirty=true; lastIngest=received;
-    }
-  }
-  const all=new Uint8Array(received); let o=0; for(const c of chunks){all.set(c,o); o+=c.byteLength;}
-  ingest(all); sortDirty=true;
-}
-
-// ---------- upload UI ----------
-const drop = $("drop");
-let selected = [];
-function showThumbs() {
-  const t = $("thumbs"); t.innerHTML = "";
-  for (const f of selected) { const im = document.createElement("img"); im.src = URL.createObjectURL(f); t.appendChild(im); }
-  $("go").disabled = selected.length === 0;
-}
-function pick(files) {
-  selected = [...files].filter(f => f.type.startsWith("image/")).slice(0, 8);
-  $("err").textContent = "";
-  showThumbs();
-}
-// model selector: populate from /api/models; update the hint per model. Hidden
-// when only one model is loaded.
-let MODELS = [], BG_AVAILABLE = false;
-(async () => {
-  const sel = $("model");
-  try {
-    const data = await (await fetch("/api/models")).json();
-    MODELS = data.models || []; BG_AVAILABLE = !!data.bg_remove;
-  } catch { MODELS = []; }
-  if (MODELS.length <= 1) sel.style.display = "none";
-  for (const m of MODELS) {
-    const o = document.createElement("option");
-    o.value = m.name; o.textContent = `${m.label} · ${m.views} views`;
-    sel.appendChild(o);
-  }
-  const applyHint = () => {
-    const m = MODELS.find(x => x.name === sel.value) || MODELS[0];
-    if (!m) return;
-    $("sub").textContent = m.hint;
-    $("zone-hint").textContent = `JPG / PNG · best with ${m.views} overlapping views`;
-    // background removal only helps the object path (isolate one object on white)
-    $("bgrow").style.display = (BG_AVAILABLE && sel.value === "object") ? "flex" : "none";
-  };
-  sel.addEventListener("change", applyHint);
-  applyHint();
-})();
-
-$("zone").addEventListener("click", () => $("file").click());
-$("file").addEventListener("change", e => pick(e.target.files));
-$("zone").addEventListener("dragover", e => { e.preventDefault(); $("zone").classList.add("hot"); });
-$("zone").addEventListener("dragleave", () => $("zone").classList.remove("hot"));
-$("zone").addEventListener("drop", e => { e.preventDefault(); $("zone").classList.remove("hot"); pick(e.dataTransfer.files); });
-addEventListener("dragover", e => e.preventDefault());
-addEventListener("drop", e => e.preventDefault());
-
-$("again").addEventListener("click", () => { drop.style.display = "flex"; $("panel").style.display = "none"; $("hint").style.display = "none"; });
-
-// deep-link: ?id=<splat> skips the uploader (shareable result + headless render).
-(async () => {
-  const id = P.get("id");
-  if (!id) return;
-  drop.style.display = "none"; $("panel").style.display = "block"; $("hint").style.display = "block";
-  try {
-    await loadSplat("/api/splat/" + id);
-    const rv = P.get("reveal");
-    if (rv != null) setReveal(Math.round(parseFloat(rv) * count)); else playBuildup();
-    setStatus(`ready · ${count.toLocaleString()} splats`);
-    $("s-splats").textContent = count.toLocaleString();
-  } catch (e) { setStatus(String(e.message || e), true); }
-})();
-
-$("go").addEventListener("click", async () => {
-  if (selected.length === 0) return;
-  $("go").disabled = true; $("err").textContent = "";
-  $("go").textContent = "Uploading…";
-  try {
-    const removeBg = $("bgremove").checked && $("bgrow").style.display !== "none";
-    const fd = new FormData();
-    for (const f of selected) fd.append("images", f);
-    fd.append("model", $("model").value || "");
-    if (removeBg) fd.append("remove_bg", "1");
-    const t0 = performance.now();
-    $("go").textContent = removeBg ? "Removing background + reconstructing…" : "Reconstructing… (Vulkan)";
-    const resp = await fetch("/api/reconstruct", { method: "POST", body: fd });
-    if (!resp.ok) throw new Error(await resp.text());
-    const meta = await resp.json();
-    // reveal the viewer
-    drop.style.display = "none"; $("panel").style.display = "block"; $("hint").style.display = "block";
-    $("s-views").textContent = meta.n_views;
-    $("s-time").textContent = meta.seconds.toFixed(2) + "s";
-    setStatus(`reconstructed ${meta.n_splats.toLocaleString()} gaussians`);
-    await loadSplat("/api/splat/" + meta.id);
-    setStatus(`ready · ${count.toLocaleString()} splats`);
-    playBuildup();
-  } catch (err) {
-    $("err").textContent = String(err.message || err);
-    console.error(err);
-  } finally {
-    $("go").disabled = selected.length === 0; $("go").textContent = "Reconstruct";
-  }
-});
-</script>
+  <div class="foot">one server · Vulkan engine · WebGL render</div>
 </body>
 </html>
diff --git a/server/web/reconstruct.html b/server/web/reconstruct.html
new file mode 100644
index 0000000..9ab2a73
--- /dev/null
+++ b/server/web/reconstruct.html
@@ -0,0 +1,556 @@
+<!doctype html>
+<!--
+  free-splatter.cpp web app.
+  Drop photos -> POST /api/reconstruct (Go server, purego -> Vulkan engine) ->
+  fetch the .splat -> render in WebGL2 with depth-sorted alpha blending.
+  The .splat is importance-sorted, so the "build-up" reveal slider/animation
+  shows the scene assembling from its most important gaussians first — the honest
+  analogue of a "step count" for a feed-forward (one-pass) reconstruction.
+  EWA splat shaders follow 3DGS (Kerbl et al.) / antimatter15's WebGL formulation.
+-->
+<html lang="en">
+<head>
+<meta charset="utf-8"/>
+<meta name="viewport" content="width=device-width, initial-scale=1"/>
+<title>free-splatter.cpp — photos to gaussian splat</title>
+<style>
+  :root { color-scheme: dark; }
+  * { margin: 0; box-sizing: border-box; }
+  html, body { height: 100%; background: #0b0d10; overflow: hidden; font-family: ui-sans-serif, system-ui, sans-serif; color: #e8eaed; }
+  #gl { position: fixed; inset: 0; width: 100%; height: 100%; display: block; }
+
+  .card { background: rgba(16,19,24,.86); backdrop-filter: blur(10px);
+    border: 1px solid rgba(255,255,255,.08); border-radius: 14px; box-shadow: 0 10px 40px rgba(0,0,0,.5); }
+
+  /* upload overlay (initial + "new photos") */
+  #drop { position: fixed; inset: 0; z-index: 20; display: flex; align-items: center; justify-content: center; }
+  #drop .card { width: min(440px, 92vw); padding: 26px 26px 22px; text-align: center; }
+  #drop h1 { font-size: 18px; font-weight: 680; letter-spacing: .2px; }
+  #model { width: 100%; margin-top: 12px; padding: 7px 9px; font-size: 12.5px; color: #e8eaed;
+    background: rgba(255,255,255,.06); border: 1px solid rgba(255,255,255,.12); border-radius: 8px; cursor: pointer; }
+  #model:hover { background: rgba(255,255,255,.1); }
+  #bgrow { display: flex; align-items: center; gap: 6px; font-size: 12px; color: #b9c0cc; margin-top: 12px; cursor: pointer; }
+  #bgrow input { accent-color: #6f9bff; }
+  #bgrow .bgnote { color: #8b93a1; font-size: 11px; }
+  #drop .sub { color: #8b93a1; font-size: 12.5px; margin-top: 4px; line-height: 1.5; }
+  #zone { margin-top: 18px; border: 1.5px dashed rgba(255,255,255,.18); border-radius: 12px;
+    padding: 26px 16px; cursor: pointer; transition: border-color .15s, background .15s; }
+  #zone.hot { border-color: #6f9bff; background: rgba(91,140,255,.08); }
+  #zone .big { font-size: 13.5px; }
+  #zone .small { color: #8b93a1; font-size: 11.5px; margin-top: 4px; }
+  #thumbs { display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin-top: 14px; }
+  #thumbs img { width: 62px; height: 62px; object-fit: cover; border-radius: 8px; border: 1px solid rgba(255,255,255,.12); }
+  .btn { margin-top: 16px; width: 100%; padding: 11px; font-size: 13.5px; font-weight: 640; color: #fff; cursor: pointer;
+    border: none; border-radius: 10px; background: linear-gradient(90deg,#5b8cff,#8a5bff); }
+  .btn:disabled { opacity: .45; cursor: not-allowed; }
+  .btn.ghost { background: rgba(255,255,255,.08); }
+  #drop .err { color: #ff8a8a; font-size: 12px; margin-top: 10px; min-height: 1em; }
+
+  /* viewer panel (after reconstruction) */
+  #panel { position: fixed; top: 16px; left: 16px; z-index: 10; width: 264px; padding: 14px 16px; font-size: 13px; line-height: 1.45; display: none; }
+  #panel h1 { font-size: 14px; font-weight: 650; }
+  #panel .sub { color: #8b93a1; font-size: 11.5px; margin-top: 2px; }
+  .stat { display: flex; justify-content: space-between; font-variant-numeric: tabular-nums; margin-top: 3px; }
+  .stat span:first-child { color: #8b93a1; }
+  #status { margin-top: 9px; color: #9fd3a0; font-size: 11.5px; min-height: 1.2em; }
+  #status.err { color: #ff8a8a; }
+  #autores { display: flex; align-items: center; gap: 6px; font-size: 11.5px; color: #8b93a1; margin-top: 7px; cursor: pointer; }
+  #autores input { accent-color: #6f9bff; }
+  .rev { margin-top: 12px; border-top: 1px solid rgba(255,255,255,.08); padding-top: 11px; }
+  .rev .lab { display: flex; justify-content: space-between; font-size: 11.5px; color: #8b93a1; }
+  #reveal { width: 100%; margin: 7px 0 4px; accent-color: #6f9bff; }
+  .row { display: flex; gap: 8px; margin-top: 8px; }
+  .row .btn { margin-top: 0; padding: 8px; font-size: 12px; }
+  #hint { position: fixed; bottom: 14px; left: 50%; transform: translateX(-50%); z-index: 10;
+    color: #6b7280; font-size: 11.5px; background: rgba(16,19,24,.7); padding: 6px 12px; border-radius: 8px; display: none; }
+</style>
+</head>
+<body>
+<canvas id="gl"></canvas>
+
+<div id="drop">
+  <div class="card">
+    <a href="/" style="display:inline-block;color:#7aa2ff;text-decoration:none;font-size:11.5px;margin-bottom:8px">‹ all demos</a>
+    <h1>Photos → 3D gaussian splat</h1>
+    <select id="model" title="reconstruction model"></select>
+    <div class="sub" id="sub">Drop photos of the same scene from different angles — they must overlap. One feed-forward pass on the GPU (Vulkan).</div>
+    <div id="zone">
+      <div class="big">📷 Drag photos here, or click to choose</div>
+      <div class="small" id="zone-hint">JPG / PNG · overlapping views</div>
+    </div>
+    <label id="bgrow" style="display:none"><input type="checkbox" id="bgremove" checked/> remove background <span class="bgnote">(recommended; ~2s/photo)</span></label>
+    <div id="thumbs"></div>
+    <button id="go" class="btn" disabled>Reconstruct</button>
+    <div id="err" class="err"></div>
+    <input id="file" type="file" accept="image/*" multiple hidden/>
+  </div>
+</div>
+
+<div id="panel" class="card">
+  <a href="/" style="display:inline-block;color:#7aa2ff;text-decoration:none;font-size:11.5px;margin-bottom:6px">‹ all demos</a>
+  <h1>free-splatter.cpp</h1>
+  <div class="sub">3D gaussian scene from your photos.</div>
+  <div class="stat"><span>views</span><span id="s-views">—</span></div>
+  <div class="stat"><span>splats</span><span id="s-splats">0</span></div>
+  <div class="stat"><span>reconstruct</span><span id="s-time">—</span></div>
+  <div class="stat"><span>fps</span><span id="s-fps">—</span></div>
+  <div class="stat"><span>render</span><span id="s-render">100%</span></div>
+  <label id="autores"><input type="checkbox" id="autores-cb" checked/> auto quality (smooth navigation)</label>
+  <label id="vibrow" style="display:flex;align-items:center;gap:8px;font-size:11.5px;color:#8b93a1;margin-top:7px;">
+    <span>vibrance</span><input id="vibrance" type="range" min="0.6" max="2.2" step="0.05" value="1.45" style="flex:1;accent-color:#6f9bff"/>
+  </label>
+  <div id="status">…</div>
+  <div class="rev">
+    <div class="lab"><span>build-up</span><span id="s-reveal">100%</span></div>
+    <input id="reveal" type="range" min="0" max="1000" value="1000"/>
+    <div class="row">
+      <button id="replay" class="btn ghost">▶ replay</button>
+      <button id="again" class="btn ghost">＋ new photos</button>
+    </div>
+  </div>
+</div>
+
+<div id="hint">drag to orbit · scroll to zoom · WASD/arrows to move</div>
+
+<script type="module">
+const ROW = 32;                       // bytes per splat
+const $ = (id) => document.getElementById(id);
+const P = new URLSearchParams(location.search);   // ?id=&yaw=&pitch=&dist=&reveal=
+const setStatus = (t, err=false) => { const s=$("status"); s.textContent=t; s.classList.toggle("err",err); };
+
+const canvas = $("gl");
+const gl = canvas.getContext("webgl2", { antialias: false, premultipliedAlpha: true, alpha: false });
+if (!gl) { $("err").textContent = "WebGL2 not available in this browser."; throw new Error("no webgl2"); }
+
+// ---------- shaders (EWA splatting) ----------
+const VERT = `#version 300 es
+precision highp float; precision highp int;
+uniform highp usampler2D u_tex;
+uniform mat4 u_proj, u_view;
+uniform vec2 u_focal, u_viewport;
+in vec2 a_corner;      // quad corner in [-2,2]
+in uint a_index;       // sorted splat index
+out vec4 v_color;
+out vec2 v_pos;
+void main() {
+  uint t = a_index * 2u;
+  ivec2 wh = textureSize(u_tex, 0);
+  uvec4 A = texelFetch(u_tex, ivec2(int(t % uint(wh.x)), int(t / uint(wh.x))), 0);
+  uvec4 B = texelFetch(u_tex, ivec2(int((t+1u) % uint(wh.x)), int((t+1u) / uint(wh.x))), 0);
+  vec3 center = uintBitsToFloat(A.xyz);
+  vec3 scale  = vec3(uintBitsToFloat(A.w), uintBitsToFloat(B.x), uintBitsToFloat(B.y));
+  uint rgba = B.z, rotp = B.w;
+  v_color = vec4(float(rgba & 0xffu), float((rgba>>8)&0xffu), float((rgba>>16)&0xffu), float((rgba>>24)&0xffu)) / 255.0;
+  vec4 q = (vec4(float(rotp&0xffu), float((rotp>>8)&0xffu), float((rotp>>16)&0xffu), float((rotp>>24)&0xffu)) - 128.0) / 128.0;
+  q = normalize(q);                    // (w,x,y,z)
+  float w=q.x, x=q.y, y=q.z, z=q.w;
+  mat3 R = mat3(
+    1.-2.*(y*y+z*z), 2.*(x*y+w*z),   2.*(x*z-w*y),
+    2.*(x*y-w*z),    1.-2.*(x*x+z*z),2.*(y*z+w*x),
+    2.*(x*z+w*y),    2.*(y*z-w*x),   1.-2.*(x*x+y*y));
+  mat3 S = mat3(scale.x,0.,0., 0.,scale.y,0., 0.,0.,scale.z);
+  mat3 M = R * S;
+  mat3 cov3d = M * transpose(M);
+
+  vec4 cam = u_view * vec4(center, 1.0);
+  if (cam.z >= -0.2) { gl_Position = vec4(0.0,0.0,2.0,1.0); return; }
+  vec4 clip = u_proj * cam;
+  if (clip.w <= 0.0 || abs(clip.x) > 1.3*clip.w || abs(clip.y) > 1.3*clip.w) {
+    gl_Position = vec4(0.0,0.0,2.0,1.0); return;
+  }
+  mat3 J = mat3(
+    u_focal.x/cam.z, 0.0, -(u_focal.x*cam.x)/(cam.z*cam.z),
+    0.0, -u_focal.y/cam.z, (u_focal.y*cam.y)/(cam.z*cam.z),
+    0.0, 0.0, 0.0);
+  mat3 W = transpose(mat3(u_view));
+  mat3 T = W * J;
+  mat3 cov2 = transpose(T) * cov3d * T;
+  float a = cov2[0][0] + 0.3, b = cov2[0][1], c = cov2[1][1] + 0.3;
+  float det = a*c - b*b;
+  if (det <= 0.0) { gl_Position = vec4(0.0,0.0,2.0,1.0); return; }
+  float mid = 0.5*(a+c);
+  float rad = sqrt(max(0.0, mid*mid - det));
+  float l1 = mid + rad;
+  float l2 = max(mid - rad, 0.1);
+  vec2 d = vec2(b, l1 - a);
+  vec2 e1 = (dot(d,d) > 1e-9) ? normalize(d) : vec2(1.0, 0.0);
+  float MAXPX = 0.20 * min(u_viewport.x, u_viewport.y);
+  vec2 v1 = min(sqrt(2.0*l1), MAXPX) * e1;
+  vec2 v2 = min(sqrt(2.0*l2), MAXPX) * vec2(e1.y, -e1.x);
+
+  vec2 ndc = clip.xy / clip.w;
+  vec2 offset = (a_corner.x*v1 + a_corner.y*v2) / u_viewport * 2.0;
+  v_pos = a_corner;
+  gl_Position = vec4(ndc + offset, clip.z/clip.w, 1.0);
+}`;
+
+const FRAG = `#version 300 es
+precision highp float;
+uniform float u_sat;     // saturation/vibrance boost (1.0 = as-decoded)
+in vec4 v_color; in vec2 v_pos;
+out vec4 o;
+void main() {
+  float p = -dot(v_pos, v_pos);
+  if (p < -4.0) discard;
+  float alpha = exp(p) * v_color.a;
+  // SH-DC-only color is flat/desaturated; push each splat's color away from its
+  // luma before the premultiplied blend (the blend itself also greys overlaps).
+  vec3 rgb = v_color.rgb;
+  float luma = dot(rgb, vec3(0.2126, 0.7152, 0.0722));
+  rgb = clamp(mix(vec3(luma), rgb, u_sat), 0.0, 1.0);
+  o = vec4(rgb * alpha, alpha);
+}`;
+
+function shader(type, src){ const s=gl.createShader(type); gl.shaderSource(s,src); gl.compileShader(s);
+  if(!gl.getShaderParameter(s,gl.COMPILE_STATUS)) throw new Error(gl.getShaderInfoLog(s)); return s; }
+const prog = gl.createProgram();
+gl.attachShader(prog, shader(gl.VERTEX_SHADER, VERT));
+gl.attachShader(prog, shader(gl.FRAGMENT_SHADER, FRAG));
+gl.linkProgram(prog);
+if(!gl.getProgramParameter(prog, gl.LINK_STATUS)) throw new Error(gl.getProgramInfoLog(prog));
+gl.useProgram(prog);
+
+const quad = new Float32Array([-2,-2, 2,-2, 2,2, -2,-2, 2,2, -2,2]);
+const quadBuf = gl.createBuffer(); gl.bindBuffer(gl.ARRAY_BUFFER, quadBuf); gl.bufferData(gl.ARRAY_BUFFER, quad, gl.STATIC_DRAW);
+const aCorner = gl.getAttribLocation(prog, "a_corner");
+const idxBuf = gl.createBuffer();
+const aIndex = gl.getAttribLocation(prog, "a_index");
+const tex = gl.createTexture();
+const TEX_W = 2048;
+const U = {
+  proj: gl.getUniformLocation(prog,"u_proj"), view: gl.getUniformLocation(prog,"u_view"),
+  focal: gl.getUniformLocation(prog,"u_focal"), viewport: gl.getUniformLocation(prog,"u_viewport"),
+  sat: gl.getUniformLocation(prog,"u_sat"),
+};
+gl.uniform1i(gl.getUniformLocation(prog,"u_tex"), 0);
+gl.disable(gl.DEPTH_TEST);
+gl.enable(gl.BLEND);
+gl.blendFuncSeparate(gl.ONE, gl.ONE_MINUS_SRC_ALPHA, gl.ONE, gl.ONE_MINUS_SRC_ALPHA);
+
+// ---------- splat store ----------
+let count = 0;                         // total splats ingested
+let revealCount = 0;                   // top-K (importance-sorted) actually shown
+let followFull = true;                 // reveal tracks count as data streams in
+// default render ceiling: show the top 300k for smooth navigation even when the
+// splat holds up to ~1M; the slider still reaches the full count. ?maxsplats= overrides.
+const VIEW_CAP = parseInt(P.get("maxsplats") ?? "300000", 10);
+// ease-in reveal: front-loaded importance-sorted splats fill fast then stall, so
+// reveal count slowly at first to even out the visible build-up. ?buildcurve= tunes.
+const BUILD_CURVE = parseFloat(P.get("buildcurve") ?? "3");
+let posSent = false;                   // positions uploaded to the sort worker?
+let positions = new Float32Array(0);   // xyz per splat (for sorting)
+let texData = new Uint32Array(0);      // 8 uint32 per splat
+
+function ensureCapacity(n) {
+  if (n*8 <= texData.length) return;
+  const cap = Math.max(n, Math.ceil(texData.length/8*1.5), 65536);
+  const td = new Uint32Array(cap*8); td.set(texData); texData = td;
+  const ps = new Float32Array(cap*3); ps.set(positions); positions = ps;
+}
+function uploadTexture() {
+  const rows = Math.ceil((count*2)/TEX_W);
+  const buf = new Uint32Array(TEX_W*rows*4);
+  buf.set(texData.subarray(0, count*8));
+  gl.activeTexture(gl.TEXTURE0); gl.bindTexture(gl.TEXTURE_2D, tex);
+  gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA32UI, TEX_W, rows, 0, gl.RGBA_INTEGER, gl.UNSIGNED_INT, buf);
+  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
+  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
+}
+function ingest(bytes) {
+  const n = Math.floor(bytes.byteLength / ROW);
+  ensureCapacity(n);
+  const f32 = new Float32Array(bytes.buffer, bytes.byteOffset, n*8);
+  const u32 = new Uint32Array(bytes.buffer, bytes.byteOffset, n*8);
+  texData.set(u32.subarray(0, n*8), 0);
+  for (let i=0;i<n;i++){ positions[i*3]=f32[i*8]; positions[i*3+1]=f32[i*8+1]; positions[i*3+2]=f32[i*8+2]; }
+  count = n;
+  posSent = false;                     // new positions -> resend to the sort worker
+  if (followFull) revealCount = Math.min(count, VIEW_CAP);
+  uploadTexture();
+  $("s-splats").textContent = count.toLocaleString();
+  syncRevealUI();
+  if (count > 5000) frameToData();
+}
+
+// ---------- reveal ----------
+function setReveal(k) {
+  revealCount = Math.max(0, Math.min(count, k|0));
+  followFull = revealCount >= count;
+  sortDirty = true;
+  syncRevealUI();
+}
+function syncRevealUI() {
+  const r = $("reveal");
+  r.max = Math.max(1, count);
+  r.value = revealCount;
+  const pct = count ? Math.round(revealCount/count*100) : 0;
+  $("s-reveal").textContent = revealCount.toLocaleString() + " · " + pct + "%";
+}
+$("reveal").addEventListener("input", e => setReveal(+e.target.value));
+
+let buildupRAF = 0;
+function playBuildup(ms=3500) {
+  if (count === 0) return;
+  cancelAnimationFrame(buildupRAF);
+  const cap = Math.min(count, VIEW_CAP);         // build up to the render ceiling, not the full 1M
+  const t0 = performance.now(), start = Math.max(1, Math.floor(cap*0.012));
+  const step = (now) => {
+    const u = Math.min(1, (now - t0) / ms);
+    const e = Math.pow(u, BUILD_CURVE);          // ease-in (even visible build-up)
+    setReveal(Math.floor(start + (cap-start)*e));
+    if (u < 1) buildupRAF = requestAnimationFrame(step); else setReveal(cap);
+  };
+  buildupRAF = requestAnimationFrame(step);
+}
+$("replay").addEventListener("click", () => playBuildup());
+
+// ---------- depth-sort worker ----------
+const worker = new Worker(URL.createObjectURL(new Blob([`
+let pos=null, n=0;
+onmessage = (e) => {
+  if (e.data.pos) { pos = e.data.pos; n = e.data.n; return; }
+  if (!pos) return;
+  const view = e.data.view, N = Math.min(n, e.data.n);
+  const d = new Float32Array(N); let dmin=Infinity, dmax=-Infinity;
+  for (let i=0;i<N;i++){
+    const z = view[2]*pos[i*3] + view[6]*pos[i*3+1] + view[10]*pos[i*3+2];
+    d[i]=z; if(z<dmin)dmin=z; if(z>dmax)dmax=z;
+  }
+  const B=65536, cnt=new Uint32Array(B);
+  const scale = dmax>dmin ? (B-1)/(dmax-dmin) : 0;
+  const key=new Uint16Array(N);
+  for(let i=0;i<N;i++){ let k=((d[i]-dmin)*scale)|0; k=k<0?0:k>65535?65535:k; key[i]=k; cnt[k]++; }
+  for(let i=1;i<B;i++) cnt[i]+=cnt[i-1];
+  const order=new Uint32Array(N);
+  for(let i=N-1;i>=0;i--) order[--cnt[key[i]]]=i;
+  postMessage({order, n:N}, [order.buffer]);
+};
+`], {type:"text/javascript"})));
+
+let sortedIndex = new Uint32Array(0), sortedCount = 0, sorting = false, sortDirty = true;
+worker.onmessage = (e) => {
+  sortedIndex = e.data.order; sortedCount = e.data.n; sorting = false;
+  gl.bindBuffer(gl.ARRAY_BUFFER, idxBuf);
+  gl.bufferData(gl.ARRAY_BUFFER, sortedIndex, gl.DYNAMIC_DRAW);
+};
+function requestSort(view) {
+  if (sorting || revealCount === 0) return;
+  sorting = true;
+  // positions change only on ingest; resend them then, not every camera move
+  // (the worker keeps the full set and sorts the first revealCount of them).
+  if (!posSent) { worker.postMessage({ pos: positions.slice(0, count*3), n: count }); posSent = true; }
+  worker.postMessage({ view, n: revealCount });
+}
+
+// ---------- camera ----------
+// yaw/pitch/dist overridable via query (shareable views + headless screenshots).
+let target = [0,0,0], framed = false;
+let yaw   = parseFloat(P.get("yaw")   ?? "0.0");
+let pitch = parseFloat(P.get("pitch") ?? "0.05");
+let dist  = parseFloat(P.get("dist")  ?? "2.0");
+function frameToData() {
+  if (framed || count === 0) return;
+  let lo = [1e30,1e30,1e30], hi = [-1e30,-1e30,-1e30];
+  for (let i=0;i<count;i++) for (let k=0;k<3;k++){ const v=positions[i*3+k]; if(v<lo[k])lo[k]=v; if(v>hi[k])hi[k]=v; }
+  const diag = Math.hypot(hi[0]-lo[0], hi[1]-lo[1], hi[2]-lo[2]);
+  target = [(lo[0]+hi[0])/2, (lo[1]+hi[1])/2, (lo[2]+hi[2])/2];
+  if (!P.has("dist")) dist = Math.max(0.5, 0.7*diag);
+  framed = true; sortDirty = true;
+}
+const keys = {};
+addEventListener("keydown", e => { if (drop.style.display==="none") keys[e.key.toLowerCase()] = true; });
+addEventListener("keyup",   e => keys[e.key.toLowerCase()] = false);
+let drag=false, lx=0, ly=0;
+canvas.addEventListener("pointerdown", e => { drag=true; lx=e.clientX; ly=e.clientY; canvas.setPointerCapture(e.pointerId); });
+canvas.addEventListener("pointerup",   e => drag=false);
+canvas.addEventListener("pointermove", e => { if(!drag)return; yaw -= (e.clientX-lx)*0.005; pitch -= (e.clientY-ly)*0.005;
+  pitch = Math.max(-1.5, Math.min(1.5, pitch)); lx=e.clientX; ly=e.clientY; sortDirty=true; });
+canvas.addEventListener("wheel", e => { dist *= Math.exp(e.deltaY*0.001); dist=Math.max(0.2,Math.min(10,dist)); sortDirty=true; e.preventDefault(); }, {passive:false});
+
+function mat4Mul(a,b){ const o=new Float32Array(16); for(let i=0;i<4;i++)for(let j=0;j<4;j++){let s=0;for(let k=0;k<4;k++)s+=a[k*4+j]*b[i*4+k];o[i*4+j]=s;} return o; }
+function perspective(fovy, aspect, n, f){ const t=1/Math.tan(fovy/2); return new Float32Array([t/aspect,0,0,0, 0,t,0,0, 0,0,(f+n)/(n-f),-1, 0,0,(2*f*n)/(n-f),0]); }
+function lookAt(eye, c){ const z=norm(sub(eye,c)), x=norm(cross([0,1,0],z)), y=cross(z,x);
+  return new Float32Array([x[0],y[0],z[0],0, x[1],y[1],z[1],0, x[2],y[2],z[2],0, -dot(x,eye),-dot(y,eye),-dot(z,eye),1]); }
+const sub=(a,b)=>[a[0]-b[0],a[1]-b[1],a[2]-b[2]], dot=(a,b)=>a[0]*b[0]+a[1]*b[1]+a[2]*b[2];
+const cross=(a,b)=>[a[1]*b[2]-a[2]*b[1],a[2]*b[0]-a[0]*b[2],a[0]*b[1]-a[1]*b[0]];
+const norm=(a)=>{const l=Math.hypot(...a)||1;return [a[0]/l,a[1]/l,a[2]/l];};
+
+// adaptive resolution: overdraw (not geometry) is the bottleneck when you move
+// into the cloud, so trade internal render scale for framerate and restore it
+// when FPS recovers. The CSS size stays full; only the GL backing shrinks.
+let renderScale = 1.0, autoRes = true;
+// vibrance (saturation) — cosmetic, in the demo layer only; ?sat= overrides so
+// headless/video renders match what you dialed in.
+let saturation = parseFloat(P.get("sat") ?? "1.45");
+{ const vib = $("vibrance"); vib.value = saturation;
+  vib.addEventListener("input", e => { saturation = +e.target.value; }); }
+function resize(){ const dpr=Math.min(devicePixelRatio,2)*renderScale;
+  canvas.width=Math.max(1,Math.round(innerWidth*dpr)); canvas.height=Math.max(1,Math.round(innerHeight*dpr));
+  gl.viewport(0,0,canvas.width,canvas.height); }
+addEventListener("resize", resize); resize();
+$("autores-cb").addEventListener("change", e => {
+  autoRes = e.target.checked;
+  if (!autoRes) { renderScale = 1.0; resize(); $("s-render").textContent = "100%"; }
+});
+
+let frames=0, fpsT=performance.now();
+function render(){
+  const speed = dist*0.02;
+  const fwd=[Math.sin(yaw)*Math.cos(pitch),Math.sin(pitch),Math.cos(yaw)*Math.cos(pitch)];
+  const right=norm(cross([0,1,0],fwd));
+  if(keys["w"]||keys["arrowup"]) { for(let i=0;i<3;i++) target[i]-=fwd[i]*speed; sortDirty=true; }
+  if(keys["s"]||keys["arrowdown"]){ for(let i=0;i<3;i++) target[i]+=fwd[i]*speed; sortDirty=true; }
+  if(keys["a"]||keys["arrowleft"]){ for(let i=0;i<3;i++) target[i]-=right[i]*speed; sortDirty=true; }
+  if(keys["d"]||keys["arrowright"]){for(let i=0;i<3;i++) target[i]+=right[i]*speed; sortDirty=true; }
+
+  const eye=[target[0]+dist*Math.sin(yaw)*Math.cos(pitch), target[1]+dist*Math.sin(pitch), target[2]+dist*Math.cos(yaw)*Math.cos(pitch)];
+  const view=lookAt(eye,target);
+  const aspect=canvas.width/canvas.height, fovy=1.0;
+  const proj=perspective(fovy, aspect, 0.05, 100);
+  const fy=canvas.height/(2*Math.tan(fovy/2)), fx=fy;
+
+  if(sortDirty){ requestSort(view); sortDirty=false; }
+
+  gl.clearColor(0.043,0.051,0.063,1); gl.clear(gl.COLOR_BUFFER_BIT);
+  gl.useProgram(prog);
+  gl.uniformMatrix4fv(U.view,false,view); gl.uniformMatrix4fv(U.proj,false,proj);
+  gl.uniform2f(U.focal,fx,fy); gl.uniform2f(U.viewport,canvas.width,canvas.height);
+  gl.uniform1f(U.sat, saturation);
+  gl.bindBuffer(gl.ARRAY_BUFFER,quadBuf); gl.enableVertexAttribArray(aCorner); gl.vertexAttribPointer(aCorner,2,gl.FLOAT,false,0,0);
+  if(sortedCount>0){
+    gl.bindBuffer(gl.ARRAY_BUFFER,idxBuf); gl.enableVertexAttribArray(aIndex);
+    gl.vertexAttribIPointer(aIndex,1,gl.UNSIGNED_INT,0,0); gl.vertexAttribDivisor(aIndex,1);
+    gl.drawArraysInstanced(gl.TRIANGLES,0,6,sortedCount);
+  }
+  frames++; const now=performance.now();
+  if(now-fpsT>500){
+    const fps=Math.round(frames*1000/(now-fpsT)); frames=0; fpsT=now;
+    $("s-fps").textContent=fps;
+    if(autoRes){                       // hold ~30fps by trading resolution
+      if(fps<26 && renderScale>0.35){ renderScale=Math.max(0.35,renderScale-0.12); resize(); }
+      else if(fps>52 && renderScale<1.0){ renderScale=Math.min(1.0,renderScale+0.12); resize(); }
+    }
+    $("s-render").textContent=Math.round(renderScale*100)+"%";
+  }
+  requestAnimationFrame(render);
+}
+requestAnimationFrame(render);
+
+// ---------- splat load (progressive) ----------
+async function loadSplat(url){
+  framed = false; followFull = true; count = 0; revealCount = 0; posSent = false;
+  const resp = await fetch(url);
+  if(!resp.ok) throw new Error("HTTP "+resp.status+" for splat");
+  const total = +resp.headers.get("content-length") || 0;
+  const reader = resp.body.getReader();
+  let chunks=[], received=0, lastIngest=0;
+  while(true){
+    const {done,value}=await reader.read();
+    if(done) break;
+    chunks.push(value); received+=value.byteLength;
+    if(received-lastIngest > 4096*ROW || !total){
+      const all=new Uint8Array(received); let o=0; for(const c of chunks){all.set(c,o); o+=c.byteLength;}
+      ingest(all); sortDirty=true; lastIngest=received;
+    }
+  }
+  const all=new Uint8Array(received); let o=0; for(const c of chunks){all.set(c,o); o+=c.byteLength;}
+  ingest(all); sortDirty=true;
+}
+
+// ---------- upload UI ----------
+const drop = $("drop");
+let selected = [];
+function showThumbs() {
+  const t = $("thumbs"); t.innerHTML = "";
+  for (const f of selected) { const im = document.createElement("img"); im.src = URL.createObjectURL(f); t.appendChild(im); }
+  $("go").disabled = selected.length === 0;
+}
+function pick(files) {
+  selected = [...files].filter(f => f.type.startsWith("image/")).slice(0, 8);
+  $("err").textContent = "";
+  showThumbs();
+}
+// model selector: populate from /api/models; update the hint per model. Hidden
+// when only one model is loaded.
+let MODELS = [], BG_AVAILABLE = false;
+(async () => {
+  const sel = $("model");
+  try {
+    const data = await (await fetch("/api/models")).json();
+    MODELS = data.models || []; BG_AVAILABLE = !!data.bg_remove;
+  } catch { MODELS = []; }
+  if (MODELS.length <= 1) sel.style.display = "none";
+  for (const m of MODELS) {
+    const o = document.createElement("option");
+    o.value = m.name; o.textContent = `${m.label} · ${m.views} views`;
+    sel.appendChild(o);
+  }
+  const applyHint = () => {
+    const m = MODELS.find(x => x.name === sel.value) || MODELS[0];
+    if (!m) return;
+    $("sub").textContent = m.hint;
+    $("zone-hint").textContent = `JPG / PNG · best with ${m.views} overlapping views`;
+    // background removal only helps the object path (isolate one object on white)
+    $("bgrow").style.display = (BG_AVAILABLE && sel.value === "object") ? "flex" : "none";
+  };
+  sel.addEventListener("change", applyHint);
+  applyHint();
+})();
+
+$("zone").addEventListener("click", () => $("file").click());
+$("file").addEventListener("change", e => pick(e.target.files));
+$("zone").addEventListener("dragover", e => { e.preventDefault(); $("zone").classList.add("hot"); });
+$("zone").addEventListener("dragleave", () => $("zone").classList.remove("hot"));
+$("zone").addEventListener("drop", e => { e.preventDefault(); $("zone").classList.remove("hot"); pick(e.dataTransfer.files); });
+addEventListener("dragover", e => e.preventDefault());
+addEventListener("drop", e => e.preventDefault());
+
+$("again").addEventListener("click", () => { drop.style.display = "flex"; $("panel").style.display = "none"; $("hint").style.display = "none"; });
+
+// deep-link: ?id=<splat> skips the uploader (shareable result + headless render).
+(async () => {
+  const id = P.get("id");
+  if (!id) return;
+  drop.style.display = "none"; $("panel").style.display = "block"; $("hint").style.display = "block";
+  try {
+    await loadSplat("/api/splat/" + id);
+    const rv = P.get("reveal");
+    if (rv != null) setReveal(Math.round(parseFloat(rv) * count)); else playBuildup();
+    setStatus(`ready · ${count.toLocaleString()} splats`);
+    $("s-splats").textContent = count.toLocaleString();
+  } catch (e) { setStatus(String(e.message || e), true); }
+})();
+
+$("go").addEventListener("click", async () => {
+  if (selected.length === 0) return;
+  $("go").disabled = true; $("err").textContent = "";
+  $("go").textContent = "Uploading…";
+  try {
+    const removeBg = $("bgremove").checked && $("bgrow").style.display !== "none";
+    const fd = new FormData();
+    for (const f of selected) fd.append("images", f);
+    fd.append("model", $("model").value || "");
+    if (removeBg) fd.append("remove_bg", "1");
+    const t0 = performance.now();
+    $("go").textContent = removeBg ? "Removing background + reconstructing…" : "Reconstructing… (Vulkan)";
+    const resp = await fetch("/api/reconstruct", { method: "POST", body: fd });
+    if (!resp.ok) throw new Error(await resp.text());
+    const meta = await resp.json();
+    // reveal the viewer
+    drop.style.display = "none"; $("panel").style.display = "block"; $("hint").style.display = "block";
+    $("s-views").textContent = meta.n_views;
+    $("s-time").textContent = meta.seconds.toFixed(2) + "s";
+    setStatus(`reconstructed ${meta.n_splats.toLocaleString()} gaussians`);
+    await loadSplat("/api/splat/" + meta.id);
+    setStatus(`ready · ${count.toLocaleString()} splats`);
+    playBuildup();
+  } catch (err) {
+    $("err").textContent = String(err.message || err);
+    console.error(err);
+  } finally {
+    $("go").disabled = selected.length === 0; $("go").textContent = "Reconstruct";
+  }
+});
+</script>
+</body>
+</html>
diff --git a/src/free_splatter.cpp b/src/free_splatter.cpp
index 3927622..14b8595 100644
--- a/src/free_splatter.cpp
+++ b/src/free_splatter.cpp
@@ -3,12 +3,16 @@
 #include "image.h"
 #include "model.h"
 #include "options.h"
+#include "pose.h"
 
+#include <cmath>
+#include <cstdio>
 #include <cstdlib>
 #include <cstring>
 #include <filesystem>
 #include <fstream>
 #include <string>
+#include <vector>
 
 // ---- opaque types ----------------------------------------------------------
 
@@ -90,7 +94,10 @@ free_splatter_ctx * free_splatter_load(const char * gguf_path, const free_splatt
     if (!ctx) return nullptr;
     if (opts) ctx->opts = opts->o;
 
-    if (!ctx->m.load(gguf_path, ctx->opts.device.empty() ? "cpu" : ctx->opts.device,
+    // Default to GPU and fail-closed: an unset/empty device resolves to vulkan, so
+    // a caller who never set one can't silently land on the (~50x slower) CPU path.
+    // CPU is opt-in only (explicit "cpu"). backend.cpp errors if no GPU is present.
+    if (!ctx->m.load(gguf_path, ctx->opts.device.empty() ? "vulkan" : ctx->opts.device,
                      ctx->opts.n_threads)) {
         ctx->error = ctx->m.error;
     }
@@ -156,3 +163,220 @@ int free_splatter_run(free_splatter_ctx * ctx, const float * images, int32_t n_v
 }
 
 void free_splatter_buf_free(void * buf) { free(buf); }
+
+// ---- downstream pose recovery + accumulation -------------------------------
+
+namespace {
+
+// De-interleave per-view contiguous points (3*P) + opacity (P) from a
+// [n_views,H,W,gc] engine buffer. Returns per-view base pointers into `store`.
+void deinterleave(const float * g, int nv, int P, int gc,
+                  std::vector<std::vector<float>> & pts,
+                  std::vector<std::vector<float>> & ops,
+                  std::vector<const float *> & pptr,
+                  std::vector<const float *> & optr) {
+    pts.assign(nv, {}); ops.assign(nv, {});
+    pptr.resize(nv); optr.resize(nv);
+    for (int v = 0; v < nv; v++) {
+        pts[v].resize((size_t) 3 * P); ops[v].resize(P);
+        for (int i = 0; i < P; i++) {
+            const float * a = &g[(size_t) (v * P + i) * gc];
+            pts[v][3*i] = a[0]; pts[v][3*i+1] = a[1]; pts[v][3*i+2] = a[2];
+            ops[v][i] = a[15];
+        }
+        pptr[v] = pts[v].data(); optr[v] = ops[v].data();
+    }
+}
+
+// Copy a pose vector's cloud into a malloc'd C array (ABI-stable layout).
+int emit_points(const std::vector<free_splatter::pose::AccumPoint> & src,
+                free_splatter_point ** out, size_t * n_out) {
+    *out = (free_splatter_point *) malloc(src.size() * sizeof(free_splatter_point));
+    if (!*out && !src.empty()) return -1;
+    for (size_t i = 0; i < src.size(); i++) {
+        (*out)[i].x = src[i].x; (*out)[i].y = src[i].y; (*out)[i].z = src[i].z;
+        (*out)[i].r = src[i].r; (*out)[i].g = src[i].g; (*out)[i].b = src[i].b; (*out)[i].opacity = src[i].opacity;
+        (*out)[i].sx = src[i].sx; (*out)[i].sy = src[i].sy; (*out)[i].sz = src[i].sz;
+        (*out)[i].qw = src[i].qw; (*out)[i].qx = src[i].qx; (*out)[i].qy = src[i].qy; (*out)[i].qz = src[i].qz;
+        (*out)[i].frame = src[i].frame;
+    }
+    *n_out = src.size();
+    return 0;
+}
+
+} // namespace
+
+struct free_splatter_accumulator {
+    free_splatter::pose::Accumulator acc;
+    int gc = 0;
+    free_splatter_accumulator(int H, int W, double thr) : acc(H, W, thr) {}
+};
+
+int free_splatter_estimate_poses(const float * gaussians, int32_t n_views, int32_t height,
+                                 int32_t width, int32_t gaussian_channels,
+                                 float opacity_threshold, float * cam2world_out, float * focal_out) {
+    if (!gaussians || !cam2world_out || n_views < 1 || height < 1 || width < 1 || gaussian_channels < 16)
+        return -1;
+    const int P = height * width;
+    std::vector<std::vector<float>> pts, ops;
+    std::vector<const float *> pptr, optr;
+    deinterleave(gaussians, n_views, P, gaussian_channels, pts, ops, pptr, optr);
+    free_splatter::pose::PoseResult pr =
+        free_splatter::pose::estimate_poses(pptr, optr, height, width, opacity_threshold);
+    for (int v = 0; v < n_views; v++)
+        for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++)
+            cam2world_out[v*16 + i*4 + j] = (float) pr.cam2world[v](i, j);
+    if (focal_out) *focal_out = (float) pr.focal;
+    return 0;
+}
+
+int free_splatter_pair_parallax(const float * gaussians, int32_t n_views, int32_t height,
+                                int32_t width, int32_t gaussian_channels,
+                                float opacity_threshold, free_splatter_parallax * out) {
+    if (!gaussians || !out || n_views < 2 || height < 1 || width < 1 || gaussian_channels < 16)
+        return -1;
+    const int P = height * width;
+    std::vector<std::vector<float>> pts, ops;
+    std::vector<const float *> pptr, optr;
+    deinterleave(gaussians, n_views, P, gaussian_channels, pts, ops, pptr, optr);
+    free_splatter::pose::Parallax px =
+        free_splatter::pose::pair_parallax(pptr, optr, height, width, opacity_threshold);
+    out->tri_angle_deg       = (float) px.tri_angle_deg;
+    out->lateral_angle_deg   = (float) px.lateral_angle_deg;
+    out->baseline_over_depth = (float) px.baseline_over_depth;
+    out->baseline            = (float) px.baseline;
+    out->median_depth        = (float) px.median_depth;
+    out->focal               = (float) px.focal;
+    out->n_points            = px.n_points;
+    return 0;
+}
+
+free_splatter_accumulator * free_splatter_accumulator_new(int32_t height, int32_t width,
+                                                          float opacity_threshold) {
+    if (height < 1 || width < 1) return nullptr;
+    return new (std::nothrow) free_splatter_accumulator(height, width, opacity_threshold);
+}
+
+void free_splatter_accumulator_free(free_splatter_accumulator * acc) { delete acc; }
+
+int free_splatter_accumulator_add_pair(free_splatter_accumulator * acc, const float * gaussians,
+                                       int32_t gaussian_channels) {
+    if (!acc || !gaussians || gaussian_channels < 16) return -1;
+    acc->gc = gaussian_channels;
+    const free_splatter::pose::ChainLink link = acc->acc.add_pair(gaussians, gaussian_channels);
+    if (std::getenv("FREE_SPLATTER_DEBUG_LINKS")) {
+        // Surface the per-link registration diagnostics + the COMPOUNDED global drift
+        // (open-loop chain T_k = T_{k-1}.sim, so per-link error accumulates).
+        const auto & g = link.global;
+        const double tn = std::sqrt(g.t[0]*g.t[0] + g.t[1]*g.t[1] + g.t[2]*g.t[2]);
+        double c = (g.R(0,0) + g.R(1,1) + g.R(2,2) - 1.0) * 0.5;
+        c = c > 1.0 ? 1.0 : (c < -1.0 ? -1.0 : c);
+        std::fprintf(stderr,
+            "[link] pair=%d  local: s=%.4f inlier=%.3f valid=%.3f resid=%.4f | "
+            "global: s=%.4f rot=%.2fdeg |t|=%.4f\n",
+            acc->acc.pair_count(), link.sim.s, link.inlier_frac, link.valid_frac,
+            link.resid_frac, g.s, std::acos(c) * 57.29577951308232, tn);
+    }
+    return 0;
+}
+
+int free_splatter_accumulator_frame_count(const free_splatter_accumulator * acc) {
+    return acc ? acc->acc.frame_count() : 0;
+}
+
+int free_splatter_accumulator_cloud(const free_splatter_accumulator * acc,
+                                    free_splatter_point ** out, size_t * n_out) {
+    if (out) *out = nullptr;
+    if (n_out) *n_out = 0;
+    if (!acc || !out || !n_out) return -1;
+    return emit_points(acc->acc.cloud(), out, n_out);
+}
+
+int free_splatter_accumulator_fuse(const free_splatter_accumulator * acc, float voxel_frac,
+                                   int32_t k, int32_t mode, free_splatter_point ** out, size_t * n_out) {
+    if (out) *out = nullptr;
+    if (n_out) *n_out = 0;
+    if (!acc || !out || !n_out || k < 1 || !(voxel_frac > 0)) return -1;
+    std::vector<free_splatter::pose::AccumPoint> fused;
+    free_splatter::pose::consensus_fuse(acc->acc.cloud(), voxel_frac, k, fused, mode);
+    return emit_points(fused, out, n_out);
+}
+
+int free_splatter_refine_cloud(free_splatter_point * points, size_t n,
+                               float voxel_frac, int32_t iters, float alpha) {
+    if (!points || n == 0 || !(voxel_frac > 0) || iters < 1) return -1;
+    // free_splatter_point and pose::AccumPoint are the same trivially-copyable layout
+    // (xyz, rgba/opacity, scale, quat, frame); copy into the typed vector and back.
+    std::vector<free_splatter::pose::AccumPoint> cloud(n);
+    std::memcpy(cloud.data(), points, n * sizeof(free_splatter_point));
+    free_splatter::pose::consensus_refine(cloud, voxel_frac, iters, alpha);
+    std::memcpy(points, cloud.data(), n * sizeof(free_splatter_point));
+    return 0;
+}
+
+int free_splatter_accumulator_refine(free_splatter_accumulator * acc, float voxel_frac,
+                                     int32_t iters, float alpha) {
+    if (!acc || !(voxel_frac > 0) || iters < 1) return -1;
+    acc->acc.refine(voxel_frac, iters, alpha);
+    return 0;
+}
+
+int free_splatter_fuse_cloud(const free_splatter_point * pts, size_t n, float voxel_frac,
+                             int32_t k, int32_t mode, free_splatter_point ** out, size_t * n_out) {
+    if (out) *out = nullptr;
+    if (n_out) *n_out = 0;
+    if (!pts || !out || !n_out || k < 1 || !(voxel_frac > 0)) return -1;
+    std::vector<free_splatter::pose::AccumPoint> cloud(n);
+    for (size_t i = 0; i < n; i++) {
+        const free_splatter_point & p = pts[i]; free_splatter::pose::AccumPoint & a = cloud[i];
+        a.x = p.x; a.y = p.y; a.z = p.z; a.r = p.r; a.g = p.g; a.b = p.b; a.opacity = p.opacity;
+        a.sx = p.sx; a.sy = p.sy; a.sz = p.sz; a.qw = p.qw; a.qx = p.qx; a.qy = p.qy; a.qz = p.qz;
+        a.frame = p.frame;
+    }
+    std::vector<free_splatter::pose::AccumPoint> fused;
+    free_splatter::pose::consensus_fuse(cloud, voxel_frac, k, fused, mode);
+    return emit_points(fused, out, n_out);
+}
+
+int free_splatter_tree_overlap(const float * const * pairs, int32_t n_pairs, int32_t gaussian_channels,
+                               int32_t height, int32_t width, float opacity_threshold,
+                               int32_t block, int32_t overlap,
+                               int32_t max_levels, float layout_spacing, int32_t per_node_cap,
+                               free_splatter_point ** out, size_t * n_out, int32_t * n_nodes) {
+    if (out) *out = nullptr;
+    if (n_out) *n_out = 0;
+    if (n_nodes) *n_nodes = 0;
+    if (!pairs || n_pairs < 1 || gaussian_channels < 16 || height < 1 || width < 1 || !out || !n_out)
+        return -1;
+    for (int32_t i = 0; i < n_pairs; i++) if (!pairs[i]) return -1;
+    std::vector<const float *> ps(pairs, pairs + n_pairs);
+    std::vector<free_splatter::pose::TreeMerge> merges;
+    int nn = 0;
+    std::vector<free_splatter::pose::AccumPoint> cloud =
+        free_splatter::pose::tree_accumulate_overlap(ps, height, width, gaussian_channels, opacity_threshold,
+                                                     block, overlap, 0.02, 300, 0, &merges,
+                                                     max_levels, layout_spacing, &nn, per_node_cap);
+    if (n_nodes) *n_nodes = nn;
+    if (std::getenv("FREE_SPLATTER_DEBUG_LINKS"))
+        for (const auto & m : merges)
+            std::fprintf(stderr,
+                "[tree] L%d range[%d..%d] sharedframes=%d  s=%.4f inlier=%.3f valid=%.3f resid=%.4f\n",
+                m.level, m.lo_frame, m.hi_frame, m.shared_frame,
+                m.scale, m.inlier_frac, m.valid_frac, m.resid_frac);
+    return emit_points(cloud, out, n_out);
+}
+
+int free_splatter_accumulator_camera_path(const free_splatter_accumulator * acc,
+                                          float ** out, int32_t * n_frames) {
+    if (out) *out = nullptr;
+    if (n_frames) *n_frames = 0;
+    if (!acc || !out || !n_frames) return -1;
+    std::vector<fsla::Mat4> path = acc->acc.camera_path();
+    *out = (float *) malloc(path.size() * 16 * sizeof(float));
+    if (!*out && !path.empty()) return -1;
+    for (size_t f = 0; f < path.size(); f++)
+        for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++)
+            (*out)[f*16 + i*4 + j] = (float) path[f](i, j);
+    *n_frames = (int32_t) path.size();
+    return 0;
+}
diff --git a/src/linalg.h b/src/linalg.h
new file mode 100644
index 0000000..a510471
--- /dev/null
+++ b/src/linalg.h
@@ -0,0 +1,213 @@
+// linalg.h — self-contained small dense linear algebra for the pose component.
+//
+// The downstream pose math (Umeyama similarity, DLT PnP, pose decode) needs only
+// a handful of small dense operations. Rather than pull in Eigen/OpenCV — neither
+// of which we want as a runtime dependency (see CLAUDE.md: everything ships in
+// C++ with no extra runtime deps beyond ggml) — every routine here reduces to one
+// primitive: a symmetric cyclic-Jacobi eigensolver. SVD of a 3x3 is built as the
+// eigendecomposition of MᵀM; the DLT nullspace is the smallest eigenvector of
+// AᵀA. This mirrored the dependency-free numpy reference solver of the (now
+// removed; see git history) pose/ prototype, itself verified bit-for-bit (~1e-7)
+// against cv2.
+//
+// Everything is f64, row-major, and header-only. Sizes are tiny (<=12) so the
+// O(n^3) Jacobi sweeps are negligible next to the engine forward pass.
+#ifndef FREE_SPLATTER_LINALG_H
+#define FREE_SPLATTER_LINALG_H
+
+#include <algorithm>
+#include <array>
+#include <cmath>
+#include <cstdint>
+#include <vector>
+
+namespace fsla {
+
+using Vec3 = std::array<double, 3>;
+
+// Row-major fixed matrices. m[r][c] lives at a[r*N + c].
+struct Mat3 { double a[9]; double  operator()(int r, int c) const { return a[r * 3 + c]; }
+                          double & operator()(int r, int c)       { return a[r * 3 + c]; } };
+struct Mat4 { double a[16]; double  operator()(int r, int c) const { return a[r * 4 + c]; }
+                           double & operator()(int r, int c)       { return a[r * 4 + c]; } };
+
+inline Mat3 mat3_identity() { return Mat3{ {1,0,0, 0,1,0, 0,0,1} }; }
+inline Mat4 mat4_identity() { return Mat4{ {1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1} }; }
+
+inline Mat3 mat3_mul(const Mat3 & A, const Mat3 & B) {
+    Mat3 C{};
+    for (int i = 0; i < 3; i++)
+        for (int j = 0; j < 3; j++) {
+            double s = 0;
+            for (int k = 0; k < 3; k++) s += A(i, k) * B(k, j);
+            C(i, j) = s;
+        }
+    return C;
+}
+
+inline Mat3 mat3_transpose(const Mat3 & A) {
+    Mat3 T{};
+    for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) T(i, j) = A(j, i);
+    return T;
+}
+
+inline Vec3 mat3_apply(const Mat3 & A, const Vec3 & x) {
+    return { A(0,0)*x[0] + A(0,1)*x[1] + A(0,2)*x[2],
+             A(1,0)*x[0] + A(1,1)*x[1] + A(1,2)*x[2],
+             A(2,0)*x[0] + A(2,1)*x[1] + A(2,2)*x[2] };
+}
+
+inline double det3(const Mat3 & M) {
+    return M(0,0) * (M(1,1)*M(2,2) - M(1,2)*M(2,1))
+         - M(0,1) * (M(1,0)*M(2,2) - M(1,2)*M(2,0))
+         + M(0,2) * (M(1,0)*M(2,1) - M(1,1)*M(2,0));
+}
+
+inline Mat3 inv3(const Mat3 & M) {
+    const double d = det3(M);
+    const double id = (std::fabs(d) > 0) ? 1.0 / d : 0.0;
+    Mat3 R{};
+    R(0,0) =  (M(1,1)*M(2,2) - M(1,2)*M(2,1)) * id;
+    R(0,1) = -(M(0,1)*M(2,2) - M(0,2)*M(2,1)) * id;
+    R(0,2) =  (M(0,1)*M(1,2) - M(0,2)*M(1,1)) * id;
+    R(1,0) = -(M(1,0)*M(2,2) - M(1,2)*M(2,0)) * id;
+    R(1,1) =  (M(0,0)*M(2,2) - M(0,2)*M(2,0)) * id;
+    R(1,2) = -(M(0,0)*M(1,2) - M(0,2)*M(1,0)) * id;
+    R(2,0) =  (M(1,0)*M(2,1) - M(1,1)*M(2,0)) * id;
+    R(2,1) = -(M(0,0)*M(2,1) - M(0,1)*M(2,0)) * id;
+    R(2,2) =  (M(0,0)*M(1,1) - M(0,1)*M(1,0)) * id;
+    return R;
+}
+
+// Inverse of a rigid 4x4 [[R,t],[0,1]] (R orthonormal): [[Rᵀ, -Rᵀt],[0,1]]. Exact
+// and cheap; used for cam2world = inv(world2cam) where world2cam is rigid.
+inline Mat4 inv_rigid4(const Mat4 & M) {
+    Mat4 R = mat4_identity();
+    for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i, j) = M(j, i);   // Rᵀ
+    for (int i = 0; i < 3; i++)
+        R(i, 3) = -(R(i,0)*M(0,3) + R(i,1)*M(1,3) + R(i,2)*M(2,3));
+    return R;
+}
+
+inline Mat4 mat4_mul(const Mat4 & A, const Mat4 & B) {
+    Mat4 C{};
+    for (int i = 0; i < 4; i++)
+        for (int j = 0; j < 4; j++) {
+            double s = 0;
+            for (int k = 0; k < 4; k++) s += A(i, k) * B(k, j);
+            C(i, j) = s;
+        }
+    return C;
+}
+
+// --- symmetric eigensolver -------------------------------------------------
+//
+// Cyclic Jacobi for a symmetric n x n matrix (Golub & Van Loan, Alg. 8.4.x).
+// Ain is row-major and assumed symmetric. Writes n eigenvalues to `eval` and the
+// corresponding eigenvectors as the COLUMNS of `evec` (evec[i*n+j] = component i
+// of eigenvector j). Output is NOT sorted. Converges in a handful of sweeps for
+// n <= 12; we cap at 100 as a safety net.
+inline void jacobi_eigh(const double * Ain, int n, double * eval, double * evec) {
+    std::vector<double> A(Ain, Ain + (size_t) n * n);
+    for (int i = 0; i < n * n; i++) evec[i] = 0.0;
+    for (int i = 0; i < n; i++)     evec[i * n + i] = 1.0;
+
+    for (int sweep = 0; sweep < 100; sweep++) {
+        double off = 0.0;
+        for (int p = 0; p < n; p++)
+            for (int q = p + 1; q < n; q++) off += A[p*n+q] * A[p*n+q];
+        if (off <= 1e-300) break;
+
+        for (int p = 0; p < n; p++) {
+            for (int q = p + 1; q < n; q++) {
+                const double apq = A[p*n+q];
+                if (std::fabs(apq) <= 1e-300) continue;
+                const double phi = 0.5 * (A[q*n+q] - A[p*n+p]) / apq;   // = tau
+                const double t = (phi >= 0 ? 1.0 : -1.0) /
+                                 (std::fabs(phi) + std::sqrt(phi*phi + 1.0));
+                const double c = 1.0 / std::sqrt(t*t + 1.0);
+                const double s = t * c;
+                // A <- Jᵀ A J : rotate columns p,q then rows p,q (keeps symmetry).
+                for (int i = 0; i < n; i++) {
+                    const double aip = A[i*n+p], aiq = A[i*n+q];
+                    A[i*n+p] = c*aip - s*aiq;
+                    A[i*n+q] = s*aip + c*aiq;
+                }
+                for (int i = 0; i < n; i++) {
+                    const double api = A[p*n+i], aqi = A[q*n+i];
+                    A[p*n+i] = c*api - s*aqi;
+                    A[q*n+i] = s*api + c*aqi;
+                }
+                for (int i = 0; i < n; i++) {     // accumulate eigenvectors V <- V J
+                    const double vip = evec[i*n+p], viq = evec[i*n+q];
+                    evec[i*n+p] = c*vip - s*viq;
+                    evec[i*n+q] = s*vip + c*viq;
+                }
+            }
+        }
+    }
+    for (int i = 0; i < n; i++) eval[i] = A[i*n+i];
+}
+
+// Smallest-eigenvalue eigenvector of a symmetric n x n matrix (the homogeneous
+// least-squares / DLT nullspace). Returns the n-vector.
+inline std::vector<double> smallest_eigenvector(const double * A, int n) {
+    std::vector<double> eval(n), evec((size_t) n * n);
+    jacobi_eigh(A, n, eval.data(), evec.data());
+    int k = 0;
+    for (int i = 1; i < n; i++) if (eval[i] < eval[k]) k = i;
+    std::vector<double> v(n);
+    for (int i = 0; i < n; i++) v[i] = evec[i*n + k];
+    return v;
+}
+
+// --- 3x3 SVD ---------------------------------------------------------------
+//
+// M = U diag(s) Vᵀ with s sorted DESCENDING (numpy/LAPACK convention). Built from
+// the eigendecomposition of the symmetric S = MᵀM: its eigenvectors are V, the
+// singular values are sqrt(eigenvalues), and U_i = M v_i / s_i. A (near-)zero
+// singular direction is completed by Gram-Schmidt so U stays orthonormal. This is
+// all the SVD the similarity fit and the PnP decode need.
+inline void svd3(const Mat3 & M, Mat3 & U, Vec3 & s, Mat3 & V) {
+    Mat3 S = mat3_mul(mat3_transpose(M), M);            // 3x3 symmetric PSD
+    double eval[3], evec[9];
+    jacobi_eigh(S.a, 3, eval, evec);
+
+    int ord[3] = { 0, 1, 2 };                           // sort eigenpairs descending
+    std::sort(ord, ord + 3, [&](int x, int y) { return eval[x] > eval[y]; });
+
+    for (int j = 0; j < 3; j++) {
+        const int e = ord[j];
+        s[j] = std::sqrt(std::max(eval[e], 0.0));
+        for (int i = 0; i < 3; i++) V(i, j) = evec[i*3 + e];
+    }
+    // Force a right-handed V (proper basis); flip the last column if reflected.
+    if (det3(V) < 0) for (int i = 0; i < 3; i++) V(i, 2) = -V(i, 2);
+
+    for (int j = 0; j < 3; j++) {
+        Vec3 vj = { V(0,j), V(1,j), V(2,j) };
+        Vec3 uj = mat3_apply(M, vj);
+        const double nrm = std::sqrt(uj[0]*uj[0] + uj[1]*uj[1] + uj[2]*uj[2]);
+        if (nrm > 1e-12) for (int i = 0; i < 3; i++) U(i, j) = uj[i] / nrm;
+        else             for (int i = 0; i < 3; i++) U(i, j) = 0.0;   // fixed up below
+    }
+    // Complete any degenerate U columns via Gram-Schmidt against the filled ones.
+    for (int j = 0; j < 3; j++) {
+        double nrm = std::sqrt(U(0,j)*U(0,j) + U(1,j)*U(1,j) + U(2,j)*U(2,j));
+        if (nrm > 1e-9) continue;
+        Vec3 cand[3] = { {1,0,0}, {0,1,0}, {0,0,1} };
+        for (auto & e : cand) {
+            for (int k = 0; k < 3; k++) {
+                if (k == j) continue;
+                const double d = e[0]*U(0,k) + e[1]*U(1,k) + e[2]*U(2,k);
+                e[0] -= d*U(0,k); e[1] -= d*U(1,k); e[2] -= d*U(2,k);
+            }
+            const double en = std::sqrt(e[0]*e[0] + e[1]*e[1] + e[2]*e[2]);
+            if (en > 1e-6) { U(0,j)=e[0]/en; U(1,j)=e[1]/en; U(2,j)=e[2]/en; break; }
+        }
+    }
+}
+
+} // namespace fsla
+
+#endif // FREE_SPLATTER_LINALG_H
diff --git a/src/options.h b/src/options.h
index e531e29..4bf8d1f 100644
--- a/src/options.h
+++ b/src/options.h
@@ -7,7 +7,7 @@
 namespace free_splatter {
 
 struct options {
-    std::string device       = "cpu";  // "" | cpu | gpu | cuda | vulkan [:N]
+    std::string device       = "vulkan";  // "" | cpu | gpu | cuda | vulkan [:N]; default GPU, fail-closed
     int         n_threads    = 0;       // <= 0 => auto (CPU)
     std::string dump_taps_dir;          // empty => tap dumping disabled
 };
diff --git a/src/pose.cpp b/src/pose.cpp
new file mode 100644
index 0000000..7ba152e
--- /dev/null
+++ b/src/pose.cpp
@@ -0,0 +1,1512 @@
+// pose.cpp — C++ port of the (now removed) pose/ research prototype: focal +
+// Umeyama/Sim(3) align + robust PnP + sliding-window accumulation / loop closure
+// / consensus fusion. See git history for the Python prototype it was validated
+// against, layer by layer.
+//
+// Faithful to FreeSplatter's scene recipe; the dependency-free solver is what
+// ships here (the prototype's numpy reference was verified ~1e-7 vs cv2 and
+// bit-exact vs upstream estimate_poses on real engine output). All linear algebra
+// goes through the self-contained Jacobi eigensolver in linalg.h — no Eigen, no
+// OpenCV.
+#include "pose.h"
+
+#include <algorithm>
+#include <cmath>
+#include <cstring>
+#include <unordered_map>
+
+namespace free_splatter {
+namespace pose {
+
+using fsla::smallest_eigenvector;
+using fsla::svd3;
+using fsla::det3;
+using fsla::inv3;
+using fsla::mat3_mul;
+using fsla::mat3_transpose;
+using fsla::mat3_apply;
+using fsla::mat3_identity;
+using fsla::mat4_identity;
+using fsla::inv_rigid4;
+
+namespace {
+
+// Deterministic SplitMix64 — so RANSAC is reproducible across platforms (we don't
+// need numpy's PCG64 bitstream, only a fixed inlier-converging sampler).
+struct Rng {
+    uint64_t s;
+    explicit Rng(uint64_t seed) : s(seed) {}
+    uint64_t next() {
+        uint64_t z = (s += 0x9E3779B97F4A7C15ULL);
+        z = (z ^ (z >> 30)) * 0xBF58476D1CE4E5B9ULL;
+        z = (z ^ (z >> 27)) * 0x94D049BB133111EBULL;
+        return z ^ (z >> 31);
+    }
+    int bounded(int n) { return (int) (next() % (uint64_t) n); }
+};
+
+void sample_distinct(Rng & rng, int N, int k, int * out) {
+    for (int i = 0; i < k; i++) {
+        int v;
+        bool dup;
+        do {
+            v = rng.bounded(N);
+            dup = false;
+            for (int j = 0; j < i; j++) if (out[j] == v) { dup = true; break; }
+        } while (dup);
+        out[i] = v;
+    }
+}
+
+double rms3(const double * r, int N) {            // sqrt(mean over rows of ||row||^2)
+    double s = 0;
+    for (int i = 0; i < N; i++) s += r[3*i]*r[3*i] + r[3*i+1]*r[3*i+1] + r[3*i+2]*r[3*i+2];
+    return std::sqrt(s / (double) N);
+}
+
+void mean3(const double * X, int N, double * m) {
+    m[0] = m[1] = m[2] = 0;
+    for (int i = 0; i < N; i++) { m[0] += X[3*i]; m[1] += X[3*i+1]; m[2] += X[3*i+2]; }
+    m[0] /= N; m[1] /= N; m[2] /= N;
+}
+
+// Solve A X = B in place (A n x n row-major, B n x m row-major) by Gaussian
+// elimination with partial pivoting. Result overwrites B. Used for the affine
+// normal equations in the residual ladder.
+void solve_lin(double * A, int n, double * B, int m) {
+    for (int col = 0; col < n; col++) {
+        int piv = col;
+        for (int r = col + 1; r < n; r++)
+            if (std::fabs(A[r*n+col]) > std::fabs(A[piv*n+col])) piv = r;
+        if (piv != col) {
+            for (int c = 0; c < n; c++) std::swap(A[col*n+c], A[piv*n+c]);
+            for (int c = 0; c < m; c++) std::swap(B[col*m+c], B[piv*m+c]);
+        }
+        const double d = A[col*n+col];
+        if (std::fabs(d) < 1e-300) continue;
+        for (int r = 0; r < n; r++) {
+            if (r == col) continue;
+            const double f = A[r*n+col] / d;
+            if (f == 0) continue;
+            for (int c = col; c < n; c++) A[r*n+c] -= f * A[col*n+c];
+            for (int c = 0; c < m; c++)   B[r*m+c] -= f * B[col*m+c];
+        }
+    }
+    for (int col = 0; col < n; col++) {
+        const double d = A[col*n+col];
+        if (std::fabs(d) > 1e-300) for (int c = 0; c < m; c++) B[col*m+c] /= d;
+    }
+}
+
+Mat3 rodrigues(const Vec3 & axis, double phi) {     // rotation about unit axis by phi
+    const double cphi = std::cos(phi), sphi = std::sin(phi), v = 1.0 - cphi;
+    const double x = axis[0], y = axis[1], z = axis[2];
+    Mat3 R{};
+    R(0,0) = cphi + x*x*v;   R(0,1) = x*y*v - z*sphi; R(0,2) = x*z*v + y*sphi;
+    R(1,0) = y*x*v + z*sphi; R(1,1) = cphi + y*y*v;   R(1,2) = y*z*v - x*sphi;
+    R(2,0) = z*x*v - y*sphi; R(2,1) = z*y*v + x*sphi; R(2,2) = cphi + z*z*v;
+    return R;
+}
+
+} // namespace
+
+// ---- focal ----------------------------------------------------------------
+
+double estimate_focal(const double * pts3d, const double * pixels, int N,
+                      double ppx, double ppy, int weiszfeld_iters) {
+    std::vector<double> P(2 * N), U(2 * N), pu(N), uu(N);
+    for (int i = 0; i < N; i++) {
+        const double X = pts3d[3*i], Y = pts3d[3*i+1], Z = pts3d[3*i+2];
+        double ux = (Z != 0.0) ? X / Z : 0.0;
+        double uy = (Z != 0.0) ? Y / Z : 0.0;
+        if (!std::isfinite(ux)) ux = 0.0;       // mirror np.nan_to_num(posinf/neginf->0)
+        if (!std::isfinite(uy)) uy = 0.0;
+        const double Px = pixels[2*i]   - ppx;
+        const double Py = pixels[2*i+1] - ppy;
+        P[2*i] = Px; P[2*i+1] = Py;
+        U[2*i] = ux; U[2*i+1] = uy;
+        pu[i] = Px*ux + Py*uy;
+        uu[i] = ux*ux + uy*uy;
+    }
+    double spu = 0, suu = 0;
+    for (int i = 0; i < N; i++) { spu += pu[i]; suu += uu[i]; }
+    double f = (suu != 0.0) ? spu / suu : 0.0;
+    for (int it = 0; it < weiszfeld_iters; it++) {
+        double num = 0, den = 0;
+        for (int i = 0; i < N; i++) {
+            const double dx = P[2*i] - f*U[2*i], dy = P[2*i+1] - f*U[2*i+1];
+            const double r = std::sqrt(dx*dx + dy*dy);
+            const double w = 1.0 / std::max(r, 1e-8);
+            num += w * pu[i];
+            den += w * uu[i];
+        }
+        f = (den != 0.0) ? num / den : f;
+    }
+    return f;
+}
+
+// ---- similarity -----------------------------------------------------------
+
+Sim3 sim_identity() { return { 1.0, mat3_identity(), {0,0,0} }; }
+
+Vec3 sim_apply(const Sim3 & T, const Vec3 & x) {
+    Vec3 Rx = mat3_apply(T.R, x);
+    return { T.s*Rx[0] + T.t[0], T.s*Rx[1] + T.t[1], T.s*Rx[2] + T.t[2] };
+}
+
+Sim3 sim_compose(const Sim3 & T2, const Sim3 & T1) {   // apply T1 then T2
+    Sim3 r;
+    r.s = T2.s * T1.s;
+    r.R = mat3_mul(T2.R, T1.R);
+    Vec3 R2t1 = mat3_apply(T2.R, T1.t);
+    r.t = { T2.s*R2t1[0] + T2.t[0], T2.s*R2t1[1] + T2.t[1], T2.s*R2t1[2] + T2.t[2] };
+    return r;
+}
+
+Sim3 sim_invert(const Sim3 & T) {
+    Sim3 r;
+    r.s = 1.0 / T.s;
+    r.R = mat3_transpose(T.R);
+    Vec3 Rit = mat3_apply(r.R, T.t);
+    r.t = { -r.s*Rit[0], -r.s*Rit[1], -r.s*Rit[2] };
+    return r;
+}
+
+Mat4 sim_matrix(const Sim3 & T) {
+    Mat4 M = mat4_identity();
+    for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) M(i, j) = T.s * T.R(i, j);
+    for (int i = 0; i < 3; i++) M(i, 3) = T.t[i];
+    return M;
+}
+
+Sim3 fit_similarity(const double * X, const double * Y, int N, bool with_scale) {
+    double mx[3], my[3];
+    mean3(X, N, mx);
+    mean3(Y, N, my);
+    // Sigma = (Ycᵀ Xc)/n, cross-covariance mapping X -> Y; var_x = mean ||Xc||^2.
+    Mat3 Sigma{};
+    double var_x = 0;
+    for (int i = 0; i < N; i++) {
+        const double xc[3] = { X[3*i]-mx[0], X[3*i+1]-mx[1], X[3*i+2]-mx[2] };
+        const double yc[3] = { Y[3*i]-my[0], Y[3*i+1]-my[1], Y[3*i+2]-my[2] };
+        for (int r = 0; r < 3; r++) for (int c = 0; c < 3; c++) Sigma(r, c) += yc[r]*xc[c];
+        var_x += xc[0]*xc[0] + xc[1]*xc[1] + xc[2]*xc[2];
+    }
+    for (int i = 0; i < 9; i++) Sigma.a[i] /= N;
+    var_x /= N;
+
+    Mat3 U, V; Vec3 D;
+    svd3(Sigma, U, D, V);                          // Sigma = U diag(D) Vᵀ
+    double sgn = (det3(U) * det3(V) < 0) ? -1.0 : 1.0;
+    // R = U diag(1,1,sgn) Vᵀ
+    Mat3 US = U;
+    for (int r = 0; r < 3; r++) US(r, 2) *= sgn;
+    Mat3 R = mat3_mul(US, mat3_transpose(V));
+
+    double s = 1.0;
+    if (with_scale && var_x > 0)
+        s = (D[0] + D[1] + sgn * D[2]) / var_x;
+
+    Vec3 mxv = { mx[0], mx[1], mx[2] };
+    Vec3 Rmx = mat3_apply(R, mxv);
+    Sim3 T;
+    T.s = s; T.R = R;
+    T.t = { my[0] - s*Rmx[0], my[1] - s*Rmx[1], my[2] - s*Rmx[2] };
+    return T;
+}
+
+Sim3 fit_similarity_ransac(const double * X, const double * Y, int N, double thresh,
+                           int iters, std::vector<char> & inliers,
+                           bool with_scale, uint64_t seed) {
+    // RANSAC's minimal sample is 3 pairs; with fewer there is nothing to sample
+    // (and the sampler would divide by N). Degenerate but valid at the public
+    // boundary (e.g. an image pair with no overlapping valid pixels): take all
+    // points as inliers, and a plain fit when there is at least one.
+    if (N < 3) {
+        inliers.assign(N, 1);
+        return (N >= 1) ? fit_similarity(X, Y, N, with_scale) : sim_identity();
+    }
+    Rng rng(seed);
+    std::vector<char> best(N, 0);
+    int best_cnt = 0;
+    std::vector<double> xs(9), ys(9);              // 3-point minimal samples
+    for (int it = 0; it < iters; it++) {
+        int idx[3];
+        sample_distinct(rng, N, 3, idx);
+        for (int j = 0; j < 3; j++)
+            for (int c = 0; c < 3; c++) { xs[3*j+c] = X[3*idx[j]+c]; ys[3*j+c] = Y[3*idx[j]+c]; }
+        Sim3 T = fit_similarity(xs.data(), ys.data(), 3, with_scale);
+        int cnt = 0;
+        for (int i = 0; i < N; i++) {
+            Vec3 xi = { X[3*i], X[3*i+1], X[3*i+2] };
+            Vec3 p = sim_apply(T, xi);
+            const double dx = p[0]-Y[3*i], dy = p[1]-Y[3*i+1], dz = p[2]-Y[3*i+2];
+            if (std::sqrt(dx*dx + dy*dy + dz*dz) < thresh) cnt++;
+        }
+        if (cnt > best_cnt) { best_cnt = cnt; best.assign(N, 0);
+            for (int i = 0; i < N; i++) {
+                Vec3 xi = { X[3*i], X[3*i+1], X[3*i+2] };
+                Vec3 p = sim_apply(T, xi);
+                const double dx = p[0]-Y[3*i], dy = p[1]-Y[3*i+1], dz = p[2]-Y[3*i+2];
+                best[i] = (std::sqrt(dx*dx + dy*dy + dz*dz) < thresh) ? 1 : 0;
+            }
+        }
+    }
+    if (best_cnt < 3) std::fill(best.begin(), best.end(), 1);
+    // refit on the inlier set
+    std::vector<double> xi, yi;
+    for (int i = 0; i < N; i++) if (best[i]) {
+        for (int c = 0; c < 3; c++) { xi.push_back(X[3*i+c]); yi.push_back(Y[3*i+c]); }
+    }
+    inliers = best;
+    return fit_similarity(xi.data(), yi.data(), (int) (xi.size()/3), with_scale);
+}
+
+// affine Y ~= A X + b (12-DoF), returns prediction rms only via caller; here we
+// fill `pred` (N*3). Normal equations on Xh=[X,1].
+static double affine_residual(const double * X, const double * Y, int N) {
+    double G[16] = {0};       // XhᵀXh (4x4)
+    double B[12] = {0};       // XhᵀY  (4x3)
+    for (int i = 0; i < N; i++) {
+        const double xh[4] = { X[3*i], X[3*i+1], X[3*i+2], 1.0 };
+        for (int r = 0; r < 4; r++) {
+            for (int c = 0; c < 4; c++) G[r*4+c] += xh[r]*xh[c];
+            for (int c = 0; c < 3; c++) B[r*3+c] += xh[r]*Y[3*i+c];
+        }
+    }
+    solve_lin(G, 4, B, 3);    // B now holds M (4x3): rows are coeffs
+    double s = 0;
+    for (int i = 0; i < N; i++) {
+        const double xh[4] = { X[3*i], X[3*i+1], X[3*i+2], 1.0 };
+        for (int c = 0; c < 3; c++) {
+            double p = 0;
+            for (int r = 0; r < 4; r++) p += xh[r]*B[r*3+c];
+            const double d = p - Y[3*i+c];
+            s += d*d;
+        }
+    }
+    return std::sqrt(s / (double) N);
+}
+
+Ladder diagnose(const double * X, const double * Y, int N, double tol, double corr_tol) {
+    Ladder L{};
+    double my[3]; mean3(Y, N, my);
+    std::vector<double> yc(3 * N);
+    for (int i = 0; i < N; i++) { yc[3*i]=Y[3*i]-my[0]; yc[3*i+1]=Y[3*i+1]-my[1]; yc[3*i+2]=Y[3*i+2]-my[2]; }
+    L.scene = rms3(yc.data(), N);
+
+    Sim3 Tr = fit_similarity(X, Y, N, /*with_scale=*/false);
+    Sim3 Ts = fit_similarity(X, Y, N, /*with_scale=*/true);
+    std::vector<double> res_r(3*N), res_s(3*N);
+    for (int i = 0; i < N; i++) {
+        Vec3 xi = { X[3*i], X[3*i+1], X[3*i+2] };
+        Vec3 pr = sim_apply(Tr, xi), ps = sim_apply(Ts, xi);
+        for (int c = 0; c < 3; c++) { res_r[3*i+c] = pr[c]-Y[3*i+c]; res_s[3*i+c] = ps[c]-Y[3*i+c]; }
+    }
+    L.rigid = rms3(res_r.data(), N);
+    L.similarity = rms3(res_s.data(), N);
+    L.affine = affine_residual(X, Y, N);
+    L.scale = Ts.s;
+
+    // depth_corr: Pearson(||X-mean||, ||similarity residual||)
+    double mx[3]; mean3(X, N, mx);
+    std::vector<double> depth(N), rmag(N);
+    for (int i = 0; i < N; i++) {
+        const double dx = X[3*i]-mx[0], dy = X[3*i+1]-mx[1], dz = X[3*i+2]-mx[2];
+        depth[i] = std::sqrt(dx*dx + dy*dy + dz*dz);
+        rmag[i] = std::sqrt(res_s[3*i]*res_s[3*i] + res_s[3*i+1]*res_s[3*i+1] + res_s[3*i+2]*res_s[3*i+2]);
+    }
+    double md = 0, mr = 0;
+    for (int i = 0; i < N; i++) { md += depth[i]; mr += rmag[i]; }
+    md /= N; mr /= N;
+    double cov = 0, vd = 0, vr = 0;
+    for (int i = 0; i < N; i++) { cov += (depth[i]-md)*(rmag[i]-mr); vd += (depth[i]-md)*(depth[i]-md); vr += (rmag[i]-mr)*(rmag[i]-mr); }
+    L.depth_corr = (vr > 1e-24) ? cov / std::sqrt(vd*vr) : 0.0;
+
+    const double sc = (L.scene > 0) ? L.scene : 1.0;
+    const double rr = L.rigid/sc, rs = L.similarity/sc, ra = L.affine/sc;
+    L.aff_gain = (L.similarity > 0) ? (L.similarity - L.affine)/L.similarity : 0.0;
+    L.structured = std::fabs(L.depth_corr) > corr_tol || L.aff_gain > 0.25;
+    if (rr < tol)         L.verdict = "rigid_ok";
+    else if (rs < tol)    L.verdict = "needs_scale";
+    else if (ra < tol)    L.verdict = "needs_affine";
+    else if (L.structured) L.verdict = "nonlinear";
+    else                  L.verdict = "similarity_plus_noise";
+    return L;
+}
+
+LoopError loop_closure_error(const std::vector<Sim3> & links) {
+    Sim3 T = sim_identity();
+    for (const Sim3 & Ti : links) T = sim_compose(Ti, T);
+    LoopError e;
+    e.scale_err = std::fabs(std::log(T.s));
+    const double tr = T.R(0,0) + T.R(1,1) + T.R(2,2);
+    e.rot_deg = std::acos(std::max(-1.0, std::min(1.0, (tr - 1.0)/2.0))) * 180.0 / M_PI;
+    e.trans = std::sqrt(T.t[0]*T.t[0] + T.t[1]*T.t[1] + T.t[2]*T.t[2]);
+    return e;
+}
+
+// M^f for a Sim(3) 4x4 [[sR,t],[0,1]], closed form via the group's one-parameter
+// subgroup: A^f = s^f R^f (R^f = same axis, f*angle), translation by
+// (A^f - I)(A - I)^{-1} t (the analytic continuation of the integer-power
+// geometric series). Equals numpy's eig-based principal value while angle < 180.
+Mat4 sim_frac_power(const Mat4 & M, double f) {
+    Mat3 A{};
+    Vec3 t{};
+    for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) A(i, j) = M(i, j); t[i] = M(i, 3); }
+    const double s = std::cbrt(det3(A));
+    Mat3 R = A;
+    for (int i = 0; i < 9; i++) R.a[i] /= s;
+    const double tr = R(0,0) + R(1,1) + R(2,2);
+    const double ang = std::acos(std::max(-1.0, std::min(1.0, (tr - 1.0)/2.0)));
+
+    Mat3 Rf;
+    if (ang < 1e-12) {
+        Rf = mat3_identity();
+    } else {
+        const double sa = std::sin(ang);
+        Vec3 axis = { (R(2,1)-R(1,2))/(2*sa), (R(0,2)-R(2,0))/(2*sa), (R(1,0)-R(0,1))/(2*sa) };
+        Rf = rodrigues(axis, f * ang);
+    }
+    const double sf = std::pow(s, f);
+    Mat3 Af = Rf;
+    for (int i = 0; i < 9; i++) Af.a[i] *= sf;
+
+    Mat4 out = mat4_identity();
+    for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) out(i, j) = Af(i, j);
+
+    // translation: (Af - I)(A - I)^{-1} t, with the A->I limit handled as f*t.
+    Mat3 AmI = A;
+    for (int i = 0; i < 3; i++) AmI(i, i) -= 1.0;
+    if (std::fabs(det3(AmI)) < 1e-12) {
+        for (int i = 0; i < 3; i++) out(i, 3) = f * t[i];
+    } else {
+        Mat3 AfmI = Af;
+        for (int i = 0; i < 3; i++) AfmI(i, i) -= 1.0;
+        Mat3 K = mat3_mul(AfmI, inv3(AmI));
+        Vec3 tf = mat3_apply(K, t);
+        for (int i = 0; i < 3; i++) out(i, 3) = tf[i];
+    }
+    return out;
+}
+
+// ---- PnP ------------------------------------------------------------------
+
+namespace {
+
+// DLT: smallest right singular vector of the 2N-row system, via the smallest
+// eigenvector of AᵀA (12x12). idx selects the k points used. Returns the 3x4
+// matrix p (row-major) up to sign/scale.
+std::array<double,12> dlt(const double * Xw, const double * xn, const int * idx, int k) {
+    double AtA[144] = {0};
+    for (int j = 0; j < k; j++) {
+        const int i = idx[j];
+        const double X = Xw[3*i], Y = Xw[3*i+1], Z = Xw[3*i+2];
+        const double u = xn[2*i], v = xn[2*i+1];
+        const double ra[12] = { X, Y, Z, 1, 0, 0, 0, 0, -u*X, -u*Y, -u*Z, -u };
+        const double rb[12] = { 0, 0, 0, 0, X, Y, Z, 1, -v*X, -v*Y, -v*Z, -v };
+        for (int r = 0; r < 12; r++)
+            for (int c = 0; c < 12; c++) AtA[r*12+c] += ra[r]*ra[c] + rb[r]*rb[c];
+    }
+    std::vector<double> v = smallest_eigenvector(AtA, 12);
+    std::array<double,12> p;
+    for (int i = 0; i < 12; i++) p[i] = v[i];
+    return p;
+}
+
+struct RT { Mat3 R; Vec3 t; double err; bool ok; };
+
+// Resolve the DLT sign and recover a proper (R,t) world2cam, picking the sign with
+// lower reprojection error among cheirality-valid candidates (points in front).
+RT decode(const std::array<double,12> & p, const double * Xw, const double * xn,
+          const int * idx, int k) {
+    RT best; best.ok = false; best.err = 1e300;
+    for (double sgn : { 1.0, -1.0 }) {
+        Mat3 M{};
+        Vec3 tau{};
+        for (int r = 0; r < 3; r++) { for (int c = 0; c < 3; c++) M(r, c) = sgn*p[r*4+c]; tau[r] = sgn*p[r*4+3]; }
+        Mat3 U, V; Vec3 Sv;
+        svd3(M, U, Sv, V);
+        const double d = (det3(U) * det3(V) < 0) ? -1.0 : 1.0;
+        Mat3 US = U;
+        for (int r = 0; r < 3; r++) US(r, 2) *= d;
+        Mat3 R = mat3_mul(US, mat3_transpose(V));      // nearest proper rotation
+        const double lam = (Sv[0] + Sv[1] + Sv[2]) / 3.0;
+        if (lam < 1e-12) continue;
+        Vec3 t = { tau[0]/lam, tau[1]/lam, tau[2]/lam };
+
+        double err = 0; bool behind = false;
+        for (int j = 0; j < k; j++) {
+            const int i = idx[j];
+            Vec3 xi = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] };
+            Vec3 Xc = mat3_apply(R, xi);
+            Xc[0] += t[0]; Xc[1] += t[1]; Xc[2] += t[2];
+            if (Xc[2] <= 0) { behind = true; break; }
+            const double px = Xc[0]/Xc[2], py = Xc[1]/Xc[2];
+            const double dx = px - xn[2*i], dy = py - xn[2*i+1];
+            err += std::sqrt(dx*dx + dy*dy);
+        }
+        if (behind) continue;
+        err /= k;
+        if (err < best.err) { best.err = err; best.R = R; best.t = t; best.ok = true; }
+    }
+    return best;
+}
+
+} // namespace
+
+void pixel_grid(int H, int W, std::vector<double> & out) {
+    out.resize((size_t) 2 * H * W);
+    for (int r = 0; r < H; r++)
+        for (int c = 0; c < W; c++) {
+            const size_t k = (size_t) r * W + c;
+            out[2*k] = c; out[2*k+1] = r;
+        }
+}
+
+Mat4 solve_pnp_numpy(const double * Xw, const double * pixels, int N, const Mat3 & K,
+                     std::vector<char> & inliers, double thresh_px, int iters, uint64_t seed) {
+    const Mat3 Kinv = inv3(K);
+    std::vector<double> xn((size_t) 2 * N);
+    for (int i = 0; i < N; i++) {
+        Vec3 uv = { pixels[2*i], pixels[2*i+1], 1.0 };
+        Vec3 n = mat3_apply(Kinv, uv);
+        xn[2*i] = n[0]; xn[2*i+1] = n[1];
+    }
+    const double thresh = thresh_px / K(0, 0);
+
+    Rng rng(seed);
+    std::vector<char> best(N, 0);
+    int best_cnt = 0;
+    for (int it = 0; it < iters; it++) {
+        int idx[6];
+        sample_distinct(rng, N, 6, idx);
+        RT rt = decode(dlt(Xw, xn.data(), idx, 6), Xw, xn.data(), idx, 6);
+        if (!rt.ok) continue;
+        int cnt = 0;
+        for (int i = 0; i < N; i++) {
+            Vec3 xi = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] };
+            Vec3 Xc = mat3_apply(rt.R, xi);
+            Xc[2] += rt.t[2];
+            if (Xc[2] <= 0) continue;
+            Xc[0] += rt.t[0]; Xc[1] += rt.t[1];
+            const double dx = Xc[0]/Xc[2] - xn[2*i], dy = Xc[1]/Xc[2] - xn[2*i+1];
+            if (std::sqrt(dx*dx + dy*dy) < thresh) cnt++;
+        }
+        if (cnt > best_cnt) {
+            best_cnt = cnt;
+            best.assign(N, 0);
+            for (int i = 0; i < N; i++) {
+                Vec3 xi = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] };
+                Vec3 Xc = mat3_apply(rt.R, xi);
+                Xc[0] += rt.t[0]; Xc[1] += rt.t[1]; Xc[2] += rt.t[2];
+                if (Xc[2] <= 0) continue;
+                const double dx = Xc[0]/Xc[2] - xn[2*i], dy = Xc[1]/Xc[2] - xn[2*i+1];
+                best[i] = (std::sqrt(dx*dx + dy*dy) < thresh) ? 1 : 0;
+            }
+        }
+    }
+    std::vector<int> sel;
+    for (int i = 0; i < N; i++) if (best[i]) sel.push_back(i);
+    if ((int) sel.size() < 6) { sel.clear(); for (int i = 0; i < N; i++) sel.push_back(i); }
+    RT rt = decode(dlt(Xw, xn.data(), sel.data(), (int) sel.size()),
+                   Xw, xn.data(), sel.data(), (int) sel.size());
+
+    Mat4 world2cam = mat4_identity();
+    for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) world2cam(i, j) = rt.R(i, j); world2cam(i, 3) = rt.t[i]; }
+    inliers = best;
+    return inv_rigid4(world2cam);
+}
+
+// ---- robust PnP: EPnP + Gauss-Newton --------------------------------------
+//
+// EPnP (Lepetit/Moreno-Noguer/Fua 2009): express each world point as a
+// barycentric combination of 4 control points; the camera-frame control points
+// live in the null space of a 2n x 12 system, recovered from the eigenvectors of
+// the 12x12 MᵀM (the same Jacobi eigensolver as everywhere else). It is
+// non-iterative, uses ALL points (no random minimal samples), and handles
+// near-planar configs — so it does not suffer the DLT's seed-dependent mirror
+// flips. We try N=1,2,3 null vectors, pick the lowest reprojection error, then
+// polish (R,t) with a Huber-robust Gauss-Newton on the reprojection residual.
+
+namespace {
+
+Mat3 skew_neg(const Vec3 & x) {     // -[x]_×  = ∂Xc/∂ω for left perturbation
+    return Mat3{ { 0,  x[2], -x[1],
+                  -x[2], 0,  x[0],
+                   x[1], -x[0], 0 } };
+}
+
+Mat3 so3_exp(const Vec3 & w) {      // rotation from an axis-angle (rotation vector)
+    const double th = std::sqrt(w[0]*w[0] + w[1]*w[1] + w[2]*w[2]);
+    if (th < 1e-12) {               // I + [w]_× to first order
+        Mat3 R = mat3_identity();
+        R(0,1) = -w[2]; R(0,2) = w[1]; R(1,0) = w[2]; R(1,2) = -w[0]; R(2,0) = -w[1]; R(2,1) = w[0];
+        return R;
+    }
+    return rodrigues({ w[0]/th, w[1]/th, w[2]/th }, th);
+}
+
+double reproj_rms(const double * Xw, const double * px, int N, const Mat3 & K,
+                  const Mat3 & R, const Vec3 & t) {
+    const double fu = K(0,0), fv = K(1,1), cu = K(0,2), cv = K(1,2);
+    double s = 0; int m = 0;
+    for (int i = 0; i < N; i++) {
+        Vec3 p = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] };
+        Vec3 Xc = mat3_apply(R, p); Xc[0]+=t[0]; Xc[1]+=t[1]; Xc[2]+=t[2];
+        if (Xc[2] <= 1e-6) { s += 1e6; m++; continue; }
+        const double dx = fu*Xc[0]/Xc[2] + cu - px[2*i];
+        const double dy = fv*Xc[1]/Xc[2] + cv - px[2*i+1];
+        s += dx*dx + dy*dy; m++;
+    }
+    return std::sqrt(s / std::max(m, 1));
+}
+
+// Recover camera-frame control points from the null-space vector x (12), fix the
+// global sign by cheirality (scene in front), and get world2cam (R,t) from the
+// rigid fit between the 4 world and 4 camera control points.
+void recover_Rt(const double * x12, const double cw[4][3], const double * alpha, int N,
+                Mat3 & R, Vec3 & t) {
+    double cc[12];
+    for (int i = 0; i < 12; i++) cc[i] = x12[i];
+    // mean camera-frame depth over the points (pc = sum_j alpha_ij cc_j)
+    double meanz = 0;
+    for (int i = 0; i < N; i++) {
+        double z = 0;
+        for (int j = 0; j < 4; j++) z += alpha[4*i+j] * cc[3*j+2];
+        meanz += z;
+    }
+    if (meanz < 0) for (int i = 0; i < 12; i++) cc[i] = -cc[i];
+    double cwf[12];
+    for (int j = 0; j < 4; j++) for (int k = 0; k < 3; k++) cwf[3*j+k] = cw[j][k];
+    Sim3 T = fit_similarity(cwf, cc, 4, /*with_scale=*/false);     // rigid world->camera
+    R = T.R; t = T.t;
+}
+
+// EPnP core: returns the best (R,t) world2cam over N=1,2,3 by reprojection error.
+void epnp(const double * Xw, const double * px, int N, const Mat3 & K, Mat3 & Rout, Vec3 & tout) {
+    // 1. control points: centroid + principal axes (sqrt-eigenvalue scaled).
+    double c0[3] = {0,0,0};
+    for (int i = 0; i < N; i++) { c0[0]+=Xw[3*i]; c0[1]+=Xw[3*i+1]; c0[2]+=Xw[3*i+2]; }
+    c0[0]/=N; c0[1]/=N; c0[2]/=N;
+    double Cov[9] = {0};
+    for (int i = 0; i < N; i++) {
+        const double d[3] = { Xw[3*i]-c0[0], Xw[3*i+1]-c0[1], Xw[3*i+2]-c0[2] };
+        for (int r = 0; r < 3; r++) for (int c = 0; c < 3; c++) Cov[r*3+c] += d[r]*d[c];
+    }
+    for (int i = 0; i < 9; i++) Cov[i] /= N;
+    double ev[3], evec[9];
+    fsla::jacobi_eigh(Cov, 3, ev, evec);
+    double lam_max = std::max({ev[0], ev[1], ev[2], 1e-12});
+    double cw[4][3];
+    for (int k = 0; k < 3; k++) cw[0][k] = c0[k];
+    for (int j = 0; j < 3; j++) {
+        // floor the axis length so a near-planar scene keeps 4 affinely-independent
+        // control points (an invertible barycentric matrix).
+        const double s = std::sqrt(std::max(ev[j], 1e-6 * lam_max));
+        for (int k = 0; k < 3; k++) cw[j+1][k] = c0[k] + s * evec[k*3 + j];
+    }
+
+    // 2. barycentric coordinates: alpha_i = C^{-1} [p_i; 1], C = [[cw^T];[1...]].
+    double C[16];
+    for (int j = 0; j < 4; j++) { for (int k = 0; k < 3; k++) C[k*4+j] = cw[j][k]; C[3*4+j] = 1.0; }
+    double Cinv[16] = {1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1};
+    solve_lin(C, 4, Cinv, 4);                       // Cinv now holds C^{-1}
+    std::vector<double> alpha((size_t) 4 * N);
+    for (int i = 0; i < N; i++) {
+        const double p[4] = { Xw[3*i], Xw[3*i+1], Xw[3*i+2], 1.0 };
+        for (int j = 0; j < 4; j++) {
+            double a = 0; for (int k = 0; k < 4; k++) a += Cinv[j*4+k]*p[k];
+            alpha[4*i+j] = a;
+        }
+    }
+
+    // 3. M (2n x 12) -> MtM (12 x 12).
+    const double fu = K(0,0), fv = K(1,1), cu = K(0,2), cv = K(1,2);
+    double MtM[144] = {0};
+    for (int i = 0; i < N; i++) {
+        double r1[12] = {0}, r2[12] = {0};
+        const double uu = px[2*i], vv = px[2*i+1];
+        for (int j = 0; j < 4; j++) {
+            const double a = alpha[4*i+j];
+            r1[3*j+0] = a*fu; r1[3*j+2] = a*(cu - uu);
+            r2[3*j+1] = a*fv; r2[3*j+2] = a*(cv - vv);
+        }
+        for (int r = 0; r < 12; r++) for (int c = 0; c < 12; c++) MtM[r*12+c] += r1[r]*r1[c] + r2[r]*r2[c];
+    }
+    double mev[12], mevec[144];
+    fsla::jacobi_eigh(MtM, 12, mev, mevec);
+    int ord[12]; for (int i = 0; i < 12; i++) ord[i] = i;
+    std::sort(ord, ord + 12, [&](int a, int b){ return mev[a] < mev[b]; });
+    double V[3][12];                                // the 3 smallest-eigenvalue vectors
+    for (int n = 0; n < 3; n++) for (int i = 0; i < 12; i++) V[n][i] = mevec[i*12 + ord[n]];
+
+    // control-point pairs and their world distances
+    const int pr[6][2] = {{0,1},{0,2},{0,3},{1,2},{1,3},{2,3}};
+    double dw2[6];
+    for (int p = 0; p < 6; p++) {
+        const int i = pr[p][0], j = pr[p][1];
+        double d = 0; for (int k = 0; k < 3; k++) { const double e = cw[i][k]-cw[j][k]; d += e*e; }
+        dw2[p] = d;
+    }
+    auto blkdiff = [&](int n, int p, double out[3]) {
+        const int i = pr[p][0], j = pr[p][1];
+        for (int k = 0; k < 3; k++) out[k] = V[n][3*i+k] - V[n][3*j+k];
+    };
+    auto dot3 = [](const double a[3], const double b[3]){ return a[0]*b[0]+a[1]*b[1]+a[2]*b[2]; };
+
+    Mat3 bestR{}; Vec3 bestT{}; double best_err = 1e300; bool have = false;
+    auto consider = [&](const double x12[12]) {
+        Mat3 R; Vec3 t;
+        recover_Rt(x12, cw, alpha.data(), N, R, t);
+        const double e = reproj_rms(Xw, px, N, K, R, t);
+        if (e < best_err) { best_err = e; bestR = R; bestT = t; have = true; }
+    };
+
+    // N=1: x = beta * v0, beta from the 6 distance constraints (least squares).
+    {
+        double num = 0, den = 0;
+        for (int p = 0; p < 6; p++) { double d0[3]; blkdiff(0,p,d0);
+            const double nd = std::sqrt(dot3(d0,d0)); num += nd*std::sqrt(dw2[p]); den += nd*nd; }
+        const double beta = (den > 0) ? num/den : 0.0;
+        double x[12]; for (int i = 0; i < 12; i++) x[i] = beta*V[0][i];
+        consider(x);
+    }
+    // N=2: x = b1 v0 + b2 v1; distance constraints linear in (b1^2,b1b2,b2^2).
+    {
+        double L[18], rho[6];
+        for (int p = 0; p < 6; p++) {
+            double d0[3], d1[3]; blkdiff(0,p,d0); blkdiff(1,p,d1);
+            L[p*3+0] = dot3(d0,d0); L[p*3+1] = 2*dot3(d0,d1); L[p*3+2] = dot3(d1,d1);
+            rho[p] = dw2[p];
+        }
+        double G[9] = {0}, g[3] = {0};               // normal equations (3x3)
+        for (int p = 0; p < 6; p++) { for (int r = 0; r < 3; r++) { for (int c = 0; c < 3; c++) G[r*3+c] += L[p*3+r]*L[p*3+c]; g[r] += L[p*3+r]*rho[p]; } }
+        solve_lin(G, 3, g, 1);                        // g = [b11,b12,b22]
+        const double b1 = std::sqrt(std::fabs(g[0]));
+        double b2 = std::sqrt(std::fabs(g[2]));
+        if (g[1] < 0) b2 = -b2;
+        double x[12]; for (int i = 0; i < 12; i++) x[i] = b1*V[0][i] + b2*V[1][i];
+        consider(x);
+    }
+    // N=3: x = b1 v0 + b2 v1 + b3 v2; 6 constraints in 6 quadratic unknowns.
+    {
+        double L[36], rho[6];
+        for (int p = 0; p < 6; p++) {
+            double d0[3], d1[3], d2[3]; blkdiff(0,p,d0); blkdiff(1,p,d1); blkdiff(2,p,d2);
+            L[p*6+0]=dot3(d0,d0); L[p*6+1]=2*dot3(d0,d1); L[p*6+2]=2*dot3(d0,d2);
+            L[p*6+3]=dot3(d1,d1); L[p*6+4]=2*dot3(d1,d2); L[p*6+5]=dot3(d2,d2);
+            rho[p]=dw2[p];
+        }
+        double Lc[36], rc[6];
+        for (int i = 0; i < 36; i++) Lc[i] = L[i];
+        for (int i = 0; i < 6; i++) rc[i] = rho[i];
+        solve_lin(Lc, 6, rc, 1);                      // rc = [b11,b12,b13,b22,b23,b33]
+        const double b1 = std::sqrt(std::fabs(rc[0]));
+        const double b2 = (b1 > 1e-12) ? rc[1]/b1 : 0.0;
+        const double b3 = (b1 > 1e-12) ? rc[2]/b1 : 0.0;
+        double x[12]; for (int i = 0; i < 12; i++) x[i] = b1*V[0][i] + b2*V[1][i] + b3*V[2][i];
+        consider(x);
+    }
+
+    if (!have) { Rout = mat3_identity(); tout = {0,0,0}; return; }
+    Rout = bestR; tout = bestT;
+}
+
+// Huber-robust Gauss-Newton on the reprojection residual; 6-DoF left perturbation
+// (R <- exp(dw) R, t += dt). Refines an EPnP init to the reprojection minimum and
+// downweights gross outliers, so the deterministic EPnP+GN matches cv2's
+// RANSAC+SQPNP+refine without random sampling.
+void gauss_newton_pnp(const double * Xw, const double * px, int N, const Mat3 & K,
+                      Mat3 & R, Vec3 & t, int iters, double huber_px) {
+    const double fu = K(0,0), fv = K(1,1), cu = K(0,2), cv = K(1,2);
+    for (int it = 0; it < iters; it++) {
+        double H[36] = {0}, b[6] = {0};
+        for (int i = 0; i < N; i++) {
+            Vec3 p = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] };
+            Vec3 Xc = mat3_apply(R, p); Xc[0]+=t[0]; Xc[1]+=t[1]; Xc[2]+=t[2];
+            if (Xc[2] <= 1e-6) continue;
+            const double iz = 1.0/Xc[2];
+            const double rx = fu*Xc[0]*iz + cu - px[2*i];
+            const double ry = fv*Xc[1]*iz + cv - px[2*i+1];
+            const double rn = std::sqrt(rx*rx + ry*ry);
+            const double w = (rn <= huber_px || rn < 1e-12) ? 1.0 : huber_px/rn;
+            // dproj/dXc (2x3)
+            const double Jp[2][3] = { { fu*iz, 0, -fu*Xc[0]*iz*iz },
+                                      { 0, fv*iz, -fv*Xc[1]*iz*iz } };
+            // dXc/d(delta) (3x6) = [ -[Xc]_x | I ]
+            const Mat3 S = skew_neg(Xc);
+            double Jd[3][6];
+            for (int r = 0; r < 3; r++) { Jd[r][0]=S(r,0); Jd[r][1]=S(r,1); Jd[r][2]=S(r,2);
+                                          Jd[r][3]=(r==0); Jd[r][4]=(r==1); Jd[r][5]=(r==2); }
+            double J[2][6];
+            for (int a = 0; a < 2; a++) for (int c = 0; c < 6; c++) {
+                double s = 0; for (int k = 0; k < 3; k++) s += Jp[a][k]*Jd[k][c]; J[a][c] = s; }
+            const double r2[2] = { rx, ry };
+            for (int a = 0; a < 2; a++) {
+                for (int c = 0; c < 6; c++) {
+                    b[c] += w * J[a][c] * r2[a];
+                    for (int d = 0; d < 6; d++) H[c*6+d] += w * J[a][c] * J[a][d];
+                }
+            }
+        }
+        double rhs[6]; for (int i = 0; i < 6; i++) rhs[i] = -b[i];
+        solve_lin(H, 6, rhs, 1);                      // solve H delta = -b
+        Vec3 dw = { rhs[0], rhs[1], rhs[2] };
+        R = mat3_mul(so3_exp(dw), R);
+        t[0]+=rhs[3]; t[1]+=rhs[4]; t[2]+=rhs[5];
+        if (std::fabs(rhs[0])+std::fabs(rhs[1])+std::fabs(rhs[2])+
+            std::fabs(rhs[3])+std::fabs(rhs[4])+std::fabs(rhs[5]) < 1e-12) break;
+    }
+}
+
+} // namespace
+
+Mat4 solve_pnp_epnp(const double * Xw, const double * pixels, int N, const Mat3 & K) {
+    Mat3 R; Vec3 t;
+    epnp(Xw, pixels, N, K, R, t);
+    Mat4 w2c = mat4_identity();
+    for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) w2c(i,j) = R(i,j); w2c(i,3) = t[i]; }
+    return inv_rigid4(w2c);
+}
+
+Mat4 solve_pnp(const double * Xw, const double * pixels, int N, const Mat3 & K,
+               std::vector<char> & inliers, double thresh_px, int gn_iters) {
+    Mat3 R; Vec3 t;
+    epnp(Xw, pixels, N, K, R, t);
+    gauss_newton_pnp(Xw, pixels, N, K, R, t, gn_iters, thresh_px);
+
+    inliers.assign(N, 0);
+    const double fu = K(0,0), fv = K(1,1), cu = K(0,2), cv = K(1,2);
+    for (int i = 0; i < N; i++) {
+        Vec3 p = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] };
+        Vec3 Xc = mat3_apply(R, p); Xc[0]+=t[0]; Xc[1]+=t[1]; Xc[2]+=t[2];
+        if (Xc[2] <= 1e-6) continue;
+        const double dx = fu*Xc[0]/Xc[2] + cu - pixels[2*i];
+        const double dy = fv*Xc[1]/Xc[2] + cv - pixels[2*i+1];
+        if (std::sqrt(dx*dx + dy*dy) < thresh_px) inliers[i] = 1;
+    }
+    Mat4 w2c = mat4_identity();
+    for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) w2c(i,j) = R(i,j); w2c(i,3) = t[i]; }
+    return inv_rigid4(w2c);
+}
+
+// ---- estimate_poses orchestration -----------------------------------------
+
+PoseResult estimate_poses(const std::vector<const float *> & points,
+                          const std::vector<const float *> & opacities,
+                          int H, int W, double opacity_threshold,
+                          double focal, int pnp_iter, bool normalize, uint64_t seed) {
+    (void) pnp_iter; (void) seed;        // robust PnP is deterministic (EPnP+GN)
+    const int Nv = (int) points.size();
+    const int P = H * W;
+    const double ppx = W / 2.0, ppy = H / 2.0;
+    std::vector<double> grid;
+    pixel_grid(H, W, grid);
+
+    // per-view opacity masks
+    std::vector<std::vector<char>> masks(Nv);
+    for (int v = 0; v < Nv; v++) {
+        masks[v].assign(P, 1);
+        if (v < (int) opacities.size() && opacities[v]) {
+            for (int i = 0; i < P; i++) masks[v][i] = (opacities[v][i] > opacity_threshold) ? 1 : 0;
+        }
+    }
+
+    // focal: view 0 only, ALL pixels (use_first_focal, scene recipe)
+    if (focal <= 0.0) {
+        std::vector<double> pts(3 * P), pix(2 * P);
+        for (int i = 0; i < P; i++) {
+            pts[3*i] = points[0][3*i]; pts[3*i+1] = points[0][3*i+1]; pts[3*i+2] = points[0][3*i+2];
+            pix[2*i] = grid[2*i]; pix[2*i+1] = grid[2*i+1];
+        }
+        focal = estimate_focal(pts.data(), pix.data(), P, ppx, ppy);
+    }
+    Mat3 K = mat3_identity();
+    K(0,0) = focal; K(1,1) = focal; K(0,2) = ppx; K(1,2) = ppy;
+
+    PoseResult res;
+    res.focal = focal;
+    res.scale = 1.0;
+    for (int v = 0; v < Nv; v++) {
+        std::vector<double> Xw, px;
+        for (int i = 0; i < P; i++) if (masks[v][i]) {
+            Xw.push_back(points[v][3*i]); Xw.push_back(points[v][3*i+1]); Xw.push_back(points[v][3*i+2]);
+            px.push_back(grid[2*i]); px.push_back(grid[2*i+1]);
+        }
+        std::vector<char> inl;
+        Mat4 c2w = solve_pnp(Xw.data(), px.data(), (int) (Xw.size()/3), K, inl, 5.0, 10);
+        res.cam2world.push_back(c2w);
+    }
+
+    if (normalize && Nv >= 2) {
+        Vec3 t0 = { res.cam2world.front()(0,3), res.cam2world.front()(1,3), res.cam2world.front()(2,3) };
+        Vec3 tl = { res.cam2world.back()(0,3),  res.cam2world.back()(1,3),  res.cam2world.back()(2,3) };
+        const double baseline = std::sqrt((tl[0]-t0[0])*(tl[0]-t0[0]) + (tl[1]-t0[1])*(tl[1]-t0[1]) + (tl[2]-t0[2])*(tl[2]-t0[2])) + 1e-2;
+        res.scale = 1.0 / baseline;
+        for (auto & c : res.cam2world) for (int i = 0; i < 3; i++) c(i, 3) *= res.scale;
+    }
+    return res;
+}
+
+// ---- after-inference parallax ---------------------------------------------
+
+Parallax parallax_stats(const Vec3 & C0, const Vec3 & C1, const Vec3 & axis0,
+                        const float * pts, const float * mask, int N, double mask_thr) {
+    constexpr double PI = 3.14159265358979323846;
+    auto sub = [](const Vec3 & a, const Vec3 & b) -> Vec3 { return { a[0]-b[0], a[1]-b[1], a[2]-b[2] }; };
+    auto dot = [](const Vec3 & a, const Vec3 & b) { return a[0]*b[0] + a[1]*b[1] + a[2]*b[2]; };
+    auto nrm = [&](const Vec3 & a) { return std::sqrt(dot(a, a)); };
+
+    Parallax px{};
+    // unit optical axis of view 0
+    Vec3 ax = axis0; const double an = nrm(ax);
+    if (an > 1e-12) { ax[0]/=an; ax[1]/=an; ax[2]/=an; }
+
+    // baseline split: forward (dolly, no parallax) vs lateral (strafe, parallax)
+    const Vec3 b = sub(C1, C0);
+    px.baseline = nrm(b);
+    const double fwd = dot(b, ax);
+    const Vec3 blat = { b[0]-fwd*ax[0], b[1]-fwd*ax[1], b[2]-fwd*ax[2] };
+    px.lateral_angle_deg = std::atan2(nrm(blat), std::fabs(fwd)) * 180.0 / PI;
+
+    // per confident point: depth from cam0 + triangulation angle (rays from C0,C1)
+    std::vector<double> depths, angles;
+    depths.reserve(N); angles.reserve(N);
+    for (int i = 0; i < N; i++) {
+        if (mask && !(mask[i] > mask_thr)) continue;
+        const Vec3 X = { (double)pts[3*i], (double)pts[3*i+1], (double)pts[3*i+2] };
+        if (!std::isfinite(X[0]) || !std::isfinite(X[1]) || !std::isfinite(X[2])) continue;
+        const Vec3 r0 = sub(X, C0), r1 = sub(X, C1);
+        const double n0 = nrm(r0), n1 = nrm(r1);
+        if (n0 < 1e-9 || n1 < 1e-9) continue;
+        const double depth = dot(r0, ax);
+        if (depth > 0) depths.push_back(depth);
+        double c = dot(r0, r1) / (n0 * n1);
+        c = c < -1.0 ? -1.0 : (c > 1.0 ? 1.0 : c);
+        angles.push_back(std::acos(c) * 180.0 / PI);
+    }
+    auto median = [](std::vector<double> & v) -> double {
+        if (v.empty()) return 0.0;
+        const size_t m = v.size() / 2;
+        std::nth_element(v.begin(), v.begin() + m, v.end());
+        return v[m];
+    };
+    px.median_depth = median(depths);
+    px.tri_angle_deg = median(angles);
+    px.baseline_over_depth = (px.median_depth > 1e-9) ? px.baseline / px.median_depth : 0.0;
+    px.n_points = (int) angles.size();
+    return px;
+}
+
+Parallax pair_parallax(const std::vector<const float *> & points,
+                       const std::vector<const float *> & opacities,
+                       int H, int W, double opacity_threshold, double focal) {
+    Parallax px{};
+    if (points.size() < 2) return px;
+    PoseResult pr = estimate_poses(points, opacities, H, W, opacity_threshold,
+                                   focal, 100, /*normalize=*/false, /*seed=*/0);
+    if (pr.cam2world.size() < 2) return px;
+    const Mat4 & c0 = pr.cam2world[0];
+    const Mat4 & c1 = pr.cam2world[1];
+    const Vec3 C0 = { c0(0,3), c0(1,3), c0(2,3) };
+    const Vec3 C1 = { c1(0,3), c1(1,3), c1(2,3) };
+    const Vec3 axis0 = { c0(0,2), c0(1,2), c0(2,2) };   // camera +z (optical axis) in world
+    px = parallax_stats(C0, C1, axis0, points[0],
+                        opacities.empty() ? nullptr : opacities[0], H * W, opacity_threshold);
+    px.focal = pr.focal;
+    return px;
+}
+
+// ---- accumulation ---------------------------------------------------------
+
+namespace {
+inline float clamp01(float v) { return v < 0.0f ? 0.0f : (v > 1.0f ? 1.0f : v); }
+const double SH_C0 = 0.28209479177387814;   // SH degree-0 basis (DC -> rgb)
+
+// quaternion (w,x,y,z) from a rotation matrix (Shepperd's method)
+std::array<double,4> mat3_to_quat(const Mat3 & R) {
+    const double tr = R(0,0) + R(1,1) + R(2,2);
+    double w,x,y,z;
+    if (tr > 0) {
+        double s = std::sqrt(tr + 1.0) * 2;
+        w = 0.25*s; x = (R(2,1)-R(1,2))/s; y = (R(0,2)-R(2,0))/s; z = (R(1,0)-R(0,1))/s;
+    } else if (R(0,0) > R(1,1) && R(0,0) > R(2,2)) {
+        double s = std::sqrt(1.0 + R(0,0) - R(1,1) - R(2,2)) * 2;
+        w = (R(2,1)-R(1,2))/s; x = 0.25*s; y = (R(0,1)+R(1,0))/s; z = (R(0,2)+R(2,0))/s;
+    } else if (R(1,1) > R(2,2)) {
+        double s = std::sqrt(1.0 + R(1,1) - R(0,0) - R(2,2)) * 2;
+        w = (R(0,2)-R(2,0))/s; x = (R(0,1)+R(1,0))/s; y = 0.25*s; z = (R(1,2)+R(2,1))/s;
+    } else {
+        double s = std::sqrt(1.0 + R(2,2) - R(0,0) - R(1,1)) * 2;
+        w = (R(1,0)-R(0,1))/s; x = (R(0,2)+R(2,0))/s; y = (R(1,2)+R(2,1))/s; z = 0.25*s;
+    }
+    return {w,x,y,z};
+}
+// Hamilton product (w,x,y,z): compose rotation a then... q = a ⊗ b applies b first.
+std::array<double,4> quat_mul(const std::array<double,4> & a, const std::array<double,4> & b) {
+    return { a[0]*b[0] - a[1]*b[1] - a[2]*b[2] - a[3]*b[3],
+             a[0]*b[1] + a[1]*b[0] + a[2]*b[3] - a[3]*b[2],
+             a[0]*b[2] - a[1]*b[3] + a[2]*b[0] + a[3]*b[1],
+             a[0]*b[3] + a[1]*b[2] - a[2]*b[1] + a[3]*b[0] };
+}
+std::array<double,4> quat_normalize(std::array<double,4> q) {
+    const double n = std::sqrt(q[0]*q[0]+q[1]*q[1]+q[2]*q[2]+q[3]*q[3]);
+    if (n > 1e-12) for (auto & c : q) c /= n; else q = {1,0,0,0};
+    return q;
+}
+} // namespace
+
+Accumulator::Accumulator(int H, int W, double opacity_threshold,
+                         double ransac_thresh_frac, int ransac_iters, uint64_t seed)
+    : H_(H), W_(W), thr_(opacity_threshold), rthr_(ransac_thresh_frac),
+      riters_(ransac_iters), seed_(seed), T_(sim_identity()), final_cam_(mat4_identity()) {}
+
+void Accumulator::add_view(const float * pts, const float * op, const float * rgb,
+                           const float * scl, const float * rot, const Sim3 & T, int frame) {
+    const int P = H_ * W_;
+    // A similarity x -> s*(R@x)+t scales each gaussian's covariance by s^2 and
+    // rotates it by R: so the world-frame axis lengths are s*scale and the world
+    // rotation is qT ⊗ q_local. Compute qT (from T.R) once for the whole view.
+    const std::array<double,4> qT = mat3_to_quat(T.R);
+    for (int i = 0; i < P; i++) {
+        if (op[i] <= thr_) continue;
+        Vec3 x = { pts[3*i], pts[3*i+1], pts[3*i+2] };
+        Vec3 w = sim_apply(T, x);
+        AccumPoint p;
+        p.x = (float) w[0]; p.y = (float) w[1]; p.z = (float) w[2];
+        p.r = rgb[3*i]; p.g = rgb[3*i+1]; p.b = rgb[3*i+2]; p.opacity = op[i];
+        p.sx = (float) (T.s * scl[3*i]); p.sy = (float) (T.s * scl[3*i+1]); p.sz = (float) (T.s * scl[3*i+2]);
+        std::array<double,4> ql = quat_normalize({ rot[4*i], rot[4*i+1], rot[4*i+2], rot[4*i+3] });
+        std::array<double,4> qw = quat_normalize(quat_mul(qT, ql));
+        p.qw = (float) qw[0]; p.qx = (float) qw[1]; p.qy = (float) qw[2]; p.qz = (float) qw[3];
+        p.frame = frame;
+        cloud_.push_back(p);
+    }
+}
+
+ChainLink Accumulator::add_pair(const float * g, int gc, double focal) {
+    const int P = H_ * W_;
+    // de-interleave the two views: points (ch 0:3), opacity (ch 15, already
+    // sigmoid-activated), rgb = clip(SH_DC * C0 + 0.5, 0, 1) (ch 3:6).
+    std::vector<float> pts0(3*P), pts1(3*P), op0(P), op1(P), rgb0(3*P), rgb1(3*P);
+    std::vector<float> scl0(3*P), scl1(3*P), rot0(4*P), rot1(4*P);
+    for (int i = 0; i < P; i++) {
+        const float * a0 = g + (size_t) i * gc;
+        const float * a1 = g + (size_t) (P + i) * gc;
+        for (int c = 0; c < 3; c++) { pts0[3*i+c] = a0[c]; pts1[3*i+c] = a1[c]; }
+        op0[i] = a0[15]; op1[i] = a1[15];
+        for (int c = 0; c < 3; c++) {
+            rgb0[3*i+c] = clamp01((float) (a0[3+c] * SH_C0 + 0.5));
+            rgb1[3*i+c] = clamp01((float) (a1[3+c] * SH_C0 + 0.5));
+        }
+        // gaussian shape: scale ch16:19, rotation quaternion (w,x,y,z) ch19:23
+        for (int c = 0; c < 3; c++) { scl0[3*i+c] = a0[16+c]; scl1[3*i+c] = a1[16+c]; }
+        for (int c = 0; c < 4; c++) { rot0[4*i+c] = a0[19+c]; rot1[4*i+c] = a1[19+c]; }
+    }
+
+    // recover this pair's cameras (view-0 frame, use_first_focal)
+    std::vector<const float *> pts = { pts0.data(), pts1.data() };
+    std::vector<const float *> ops = { op0.data(),  op1.data()  };
+    PoseResult pr = estimate_poses(pts, ops, H_, W_, thr_, focal, 100, false, seed_);
+    const Mat4 c2w0 = pr.cam2world[0], c2w1 = pr.cam2world[1];
+
+    ChainLink link{};
+    if (!have_prev_) {
+        T_ = sim_identity();
+        link.sim = sim_identity();
+        link.global = T_;
+        link.scale = 1.0; link.inlier_frac = 1.0; link.valid_frac = 1.0; link.resid_frac = 0.0;
+        add_view(pts0.data(), op0.data(), rgb0.data(), scl0.data(), rot0.data(), T_, next_frame_++);   // f_0
+        add_view(pts1.data(), op1.data(), rgb1.data(), scl1.data(), rot1.data(), T_, next_frame_++);   // f_1
+    } else {
+        // fit Sim(3) from this run's view 0 (the shared frame, run-k coords) to the
+        // previous run's view 1 (same frame, run-(k-1) coords). Mask = valid in both.
+        std::vector<double> XA, XB;
+        for (int i = 0; i < P; i++) if (op0[i] > thr_ && prev_mask1_[i]) {
+            XA.push_back(pts0[3*i]); XA.push_back(pts0[3*i+1]); XA.push_back(pts0[3*i+2]);
+            XB.push_back(prev_pts1_[3*i]); XB.push_back(prev_pts1_[3*i+1]); XB.push_back(prev_pts1_[3*i+2]);
+        }
+        const int n = (int) (XA.size() / 3);
+        double myB[3]; mean3(XB.data(), n, myB);
+        std::vector<double> ycB(3*n);
+        for (int i = 0; i < n; i++) for (int c = 0; c < 3; c++) ycB[3*i+c] = XB[3*i+c] - myB[c];
+        const double scene = rms3(ycB.data(), n);
+
+        std::vector<char> inl;
+        Sim3 S = fit_similarity_ransac(XA.data(), XB.data(), n, rthr_ * scene, riters_, inl, true, seed_);
+        T_ = sim_compose(T_, S);                 // run k -> global
+
+        int n_inl = 0;
+        std::vector<double> rA, rB;
+        for (int i = 0; i < n; i++) if (inl[i]) {
+            n_inl++;
+            Vec3 a = { XA[3*i], XA[3*i+1], XA[3*i+2] };
+            Vec3 sa = sim_apply(S, a);
+            for (int c = 0; c < 3; c++) { rA.push_back(sa[c] - XB[3*i+c]); }
+            (void) rB;
+        }
+        link.sim = S;
+        link.global = T_;
+        link.scale = S.s;
+        link.inlier_frac = (n > 0) ? (double) n_inl / n : 0.0;
+        link.valid_frac  = (double) n / P;
+        link.resid_frac  = (scene > 0 && n_inl > 0) ? rms3(rA.data(), n_inl) / scene : 0.0;
+
+        add_view(pts1.data(), op1.data(), rgb1.data(), scl1.data(), rot1.data(), T_, next_frame_++);   // f_{k+1}
+    }
+
+    cams_.push_back(mat4_mul(sim_matrix(T_), c2w0));     // frame f_k camera (view 0)
+    final_cam_ = mat4_mul(sim_matrix(T_), c2w1);         // latest view-1 camera
+
+    // this run's view 1 becomes the next run's shared frame
+    prev_pts1_.assign(pts1.begin(), pts1.end());
+    prev_mask1_.assign(P, 0);
+    for (int i = 0; i < P; i++) prev_mask1_[i] = (op1[i] > thr_) ? 1 : 0;
+    have_prev_ = true;
+    links_.push_back(link);
+    return link;
+}
+
+double Accumulator::refine(double voxel_frac, int iters, double alpha) {
+    return consensus_refine(cloud_, voxel_frac, iters, alpha);
+}
+
+std::vector<Mat4> Accumulator::camera_path() const {
+    std::vector<Mat4> path = cams_;
+    if (!cams_.empty()) path.push_back(final_cam_);
+    return path;
+}
+
+// ---- hierarchical tree accumulation ---------------------------------------
+
+// One gaussian (gc-channel engine layout) -> AccumPoint at identity (leaf frame).
+static AccumPoint tree_leaf_point(const float * a, int frame) {
+    AccumPoint p;
+    p.x = a[0]; p.y = a[1]; p.z = a[2];
+    p.r = clamp01((float) (a[3] * SH_C0 + 0.5));
+    p.g = clamp01((float) (a[4] * SH_C0 + 0.5));
+    p.b = clamp01((float) (a[5] * SH_C0 + 0.5));
+    p.opacity = a[15];
+    p.sx = a[16]; p.sy = a[17]; p.sz = a[18];
+    std::array<double,4> q = quat_normalize({ a[19], a[20], a[21], a[22] });
+    p.qw = (float) q[0]; p.qx = (float) q[1]; p.qy = (float) q[2]; p.qz = (float) q[3];
+    p.frame = frame;
+    return p;
+}
+
+// Apply a Sim(3) to a whole cloud (position s*(R@x)+t, scale *s, orientation qS⊗q).
+static void tree_apply_sim_cloud(std::vector<AccumPoint> & c, const Sim3 & S) {
+    const std::array<double,4> qS = mat3_to_quat(S.R);
+    for (auto & p : c) {
+        Vec3 w = sim_apply(S, { p.x, p.y, p.z });
+        p.x = (float) w[0]; p.y = (float) w[1]; p.z = (float) w[2];
+        p.sx = (float) (S.s * p.sx); p.sy = (float) (S.s * p.sy); p.sz = (float) (S.s * p.sz);
+        std::array<double,4> q = quat_normalize(quat_mul(qS, { p.qw, p.qx, p.qy, p.qz }));
+        p.qw = (float) q[0]; p.qx = (float) q[1]; p.qy = (float) q[2]; p.qz = (float) q[3];
+    }
+}
+static void tree_apply_sim_pts(std::vector<float> & pts, const Sim3 & S) {
+    for (size_t i = 0; i + 3 <= pts.size(); i += 3) {
+        Vec3 w = sim_apply(S, { pts[i], pts[i+1], pts[i+2] });
+        pts[i] = (float) w[0]; pts[i+1] = (float) w[1]; pts[i+2] = (float) w[2];
+    }
+}
+
+// Robustly register one band of shared-frame correspondences (XA -> XB, both flat
+// 3*n): RANSAC similarity at a scene-relative threshold, plus the inlier residual.
+// The single registration primitive for both tree merges (caller derives the
+// inlier/valid fractions, which it normalises differently per tree).
+struct BandFit { Sim3 S; int n_inl; double scene; double resid; };
+static BandFit fit_band_sim(const std::vector<double> & XA, const std::vector<double> & XB,
+                            double rthr, int riters, uint64_t seed) {
+    const int n = (int) (XA.size() / 3);
+    BandFit f{ sim_identity(), 0, 0.0, 0.0 };
+    if (n < 3) return f;
+    double mB[3]; mean3(XB.data(), n, mB);
+    std::vector<double> yc(3 * n);
+    for (int k = 0; k < n; k++) for (int c = 0; c < 3; c++) yc[3*k+c] = XB[3*k+c] - mB[c];
+    f.scene = rms3(yc.data(), n);
+    std::vector<char> inl;
+    f.S = fit_similarity_ransac(XA.data(), XB.data(), n, rthr * f.scene, riters, inl, true, seed);
+    std::vector<double> rA;
+    for (int k = 0; k < n; k++) if (inl[k]) {
+        f.n_inl++;
+        Vec3 sa = sim_apply(f.S, { XA[3*k], XA[3*k+1], XA[3*k+2] });
+        for (int c = 0; c < 3; c++) rA.push_back(sa[c] - XB[3*k+c]);
+    }
+    f.resid = (f.scene > 0 && f.n_inl > 0) ? rms3(rA.data(), f.n_inl) / f.scene : 0.0;
+    return f;
+}
+
+// ---- multi-frame-overlap tree ---------------------------------------------
+
+// A submap covering a contiguous global frame range, carrying the per-pixel point
+// arrays of every frame it spans (in the submap's local frame) so a neighbour can
+// align on the whole band of frames they share, not just one.
+struct ONode {
+    std::vector<AccumPoint> cloud;
+    int lo, hi;                              // inclusive global frame range
+    std::vector<std::vector<float>> fpts;    // [f-lo] -> P*3 per-pixel points
+    std::vector<std::vector<char>>  fmsk;    // [f-lo] -> P opacity mask
+};
+
+// One pair (frames a, a+1) as a 2-frame submap (identity local frame).
+static ONode opair(const float * g, int a, int P, int gc, double thr) {
+    ONode nd; nd.lo = a; nd.hi = a + 1;
+    nd.fpts.assign(2, {}); nd.fmsk.assign(2, {});
+    nd.fpts[0].resize(3*P); nd.fpts[1].resize(3*P);
+    nd.fmsk[0].assign(P, 0); nd.fmsk[1].assign(P, 0);
+    nd.cloud.reserve(2*P);
+    for (int i = 0; i < P; i++) {
+        const float * a0 = g + (size_t) i * gc;
+        const float * a1 = g + (size_t) (P + i) * gc;
+        for (int c = 0; c < 3; c++) { nd.fpts[0][3*i+c] = a0[c]; nd.fpts[1][3*i+c] = a1[c]; }
+        nd.fmsk[0][i] = (a0[15] > thr) ? 1 : 0;
+        nd.fmsk[1][i] = (a1[15] > thr) ? 1 : 0;
+        if (a0[15] > thr) nd.cloud.push_back(tree_leaf_point(a0, a));
+        if (a1[15] > thr) nd.cloud.push_back(tree_leaf_point(a1, a + 1));
+    }
+    return nd;
+}
+
+// Merge R into L's frame, fitting one Sim(3) over EVERY frame they share. Returns
+// the combined submap; optionally records the merge diagnostics.
+static ONode omerge(ONode & L, ONode & R, int P, double rthr, int riters, uint64_t seed,
+                    std::vector<TreeMerge> * merges, int level) {
+    const int so = std::max(L.lo, R.lo), sh = std::min(L.hi, R.hi);   // shared frame range
+    std::vector<double> XA, XB;
+    for (int f = so; f <= sh; f++) {
+        const std::vector<float> & ra = R.fpts[f - R.lo]; const std::vector<char> & rm = R.fmsk[f - R.lo];
+        const std::vector<float> & la = L.fpts[f - L.lo]; const std::vector<char> & lm = L.fmsk[f - L.lo];
+        for (int i = 0; i < P; i++) if (rm[i] && lm[i]) {
+            XA.push_back(ra[3*i]); XA.push_back(ra[3*i+1]); XA.push_back(ra[3*i+2]);
+            XB.push_back(la[3*i]); XB.push_back(la[3*i+1]); XB.push_back(la[3*i+2]);
+        }
+    }
+    const int n = (int) (XA.size() / 3);
+    const int shared = sh - so + 1;
+    BandFit fit = fit_band_sim(XA, XB, rthr, riters, seed);
+    const Sim3 & S = fit.S;
+    if (merges) merges->push_back(TreeMerge{ level, std::min(L.lo,R.lo), std::max(L.hi,R.hi), shared,
+        S.s, (n > 0) ? (double) fit.n_inl / n : 0.0, (double) n / ((double) shared * P), fit.resid });
+
+    tree_apply_sim_cloud(R.cloud, S);
+    for (auto & a : R.fpts) tree_apply_sim_pts(a, S);
+    ONode m; m.lo = std::min(L.lo, R.lo); m.hi = std::max(L.hi, R.hi);
+    const int F = m.hi - m.lo + 1;
+    m.fpts.assign(F, {}); m.fmsk.assign(F, {});
+    for (int f = L.lo; f <= L.hi; f++) { m.fpts[f-m.lo] = std::move(L.fpts[f-L.lo]); m.fmsk[f-m.lo] = std::move(L.fmsk[f-L.lo]); }
+    for (int f = R.lo; f <= R.hi; f++) if (m.fpts[f-m.lo].empty()) { m.fpts[f-m.lo] = std::move(R.fpts[f-R.lo]); m.fmsk[f-m.lo] = std::move(R.fmsk[f-R.lo]); }
+    m.cloud = std::move(L.cloud);
+    m.cloud.insert(m.cloud.end(), R.cloud.begin(), R.cloud.end());
+    return m;
+}
+
+std::vector<AccumPoint> tree_accumulate_overlap(const std::vector<const float *> & pairs,
+                                                int H, int W, int gc, double thr,
+                                                int block, int overlap,
+                                                double rthr, int riters, uint64_t seed,
+                                                std::vector<TreeMerge> * merges,
+                                                int max_levels, double layout_spacing,
+                                                int * n_nodes_out, int per_node_cap) {
+    const int P = H * W;
+    const int M = (int) pairs.size();
+    if (n_nodes_out) *n_nodes_out = 0;
+    if (M < 1) return {};
+    const int nframes = M + 1;
+    overlap = std::max(1, overlap);
+    block = std::max(block, overlap + 1);
+    block = std::min(block, nframes);
+    overlap = std::min(overlap, block - 1);
+    const int step = std::max(1, block - overlap);
+
+    // overlapping base submaps: each chains `block-1` consecutive pairs (single-frame
+    // internal links), adjacent submaps share `overlap` frames.
+    std::vector<ONode> nodes;
+    for (int s = 0; s < M; s += step) {
+        const int w = std::min(block, nframes - s);          // frames in this submap
+        if (w < 2) break;
+        ONode nd = opair(pairs[s], s, P, gc, thr);           // frames s, s+1
+        for (int p = s + 1; p <= s + w - 2; p++) {           // fold the rest of the block
+            ONode nx = opair(pairs[p], p, P, gc, thr);
+            nd = omerge(nd, nx, P, rthr, riters, seed, nullptr, -1);
+        }
+        nodes.push_back(std::move(nd));
+        if (s + block >= nframes) break;                     // last submap reached the end
+    }
+
+    // median base-submap extent — the auto layout pitch for side-by-side stages.
+    // Only the auto-layout (layout_spacing<0, staged demo) path reads it, so the
+    // full-merge callers skip these two passes over every submap.
+    double leaf_extent = 1.0;
+    if (layout_spacing < 0.0) {
+        std::vector<double> ex;
+        for (auto & nd : nodes) {
+            double m[3] = {0,0,0}; int64_t nf = 0;
+            for (auto & p : nd.cloud) if (std::isfinite(p.x) && std::isfinite(p.y) && std::isfinite(p.z)) {
+                m[0]+=p.x; m[1]+=p.y; m[2]+=p.z; nf++; }
+            if (nf == 0) continue;
+            for (int c = 0; c < 3; c++) m[c] /= nf;
+            double ss = 0;
+            for (auto & p : nd.cloud) if (std::isfinite(p.x) && std::isfinite(p.y) && std::isfinite(p.z)) {
+                double dx=p.x-m[0], dy=p.y-m[1], dz=p.z-m[2]; ss += dx*dx+dy*dy+dz*dz; }
+            ex.push_back(std::sqrt(ss / nf));
+        }
+        if (!ex.empty()) { std::sort(ex.begin(), ex.end()); leaf_extent = ex[ex.size()/2]; }
+    }
+
+    // balanced tree-merge of submaps; each merge fits over the overlap band. Stop
+    // after max_levels rounds (-1 = fully to the root) for the staged demo.
+    int level = 0;
+    while (nodes.size() > 1 && (max_levels < 0 || level < max_levels)) {
+        std::vector<ONode> nxt;
+        nxt.reserve(nodes.size()/2 + 1);
+        for (size_t i = 0; i + 1 < nodes.size(); i += 2)
+            nxt.push_back(omerge(nodes[i], nodes[i+1], P, rthr, riters, seed, merges, level));
+        if (nodes.size() % 2 == 1) nxt.push_back(std::move(nodes.back()));
+        nodes = std::move(nxt); level++;
+    }
+    if (n_nodes_out) *n_nodes_out = (int) nodes.size();
+    if (nodes.empty()) return {};
+
+    // concatenate remaining nodes, laid out side by side (layout_spacing != 0) and
+    // capped per node (opacity*radius) — same as tree_accumulate.
+    const double sigma = (layout_spacing < 0.0) ? 4.0 * leaf_extent : layout_spacing;
+    const double gcentre = 0.5 * M;
+    auto importance = [](const AccumPoint & p) -> double {
+        const double vol = (double) std::max(p.sx,1e-9f) * std::max(p.sy,1e-9f) * std::max(p.sz,1e-9f);
+        return (double) std::max(p.opacity, 0.0f) * std::cbrt(vol);
+    };
+    std::vector<AccumPoint> out;
+    size_t total = 0;
+    for (auto & nd : nodes)
+        total += (per_node_cap > 0) ? std::min((size_t) per_node_cap, nd.cloud.size()) : nd.cloud.size();
+    out.reserve(total);
+    for (auto & nd : nodes) {
+        const double dx = (sigma != 0.0) ? (0.5 * (nd.lo + nd.hi) - gcentre) * sigma : 0.0;
+        if (per_node_cap > 0 && (int) nd.cloud.size() > per_node_cap) {  // keep each scene's own detail
+            std::vector<size_t> idx(nd.cloud.size());
+            for (size_t i = 0; i < idx.size(); i++) idx[i] = i;
+            std::partial_sort(idx.begin(), idx.begin() + per_node_cap, idx.end(),
+                [&](size_t a, size_t b){ return importance(nd.cloud[a]) > importance(nd.cloud[b]); });
+            for (int i = 0; i < per_node_cap; i++) { AccumPoint p = nd.cloud[idx[i]]; p.x = (float) (p.x + dx); out.push_back(p); }
+        } else {
+            for (const auto & src : nd.cloud) { AccumPoint p = src; p.x = (float) (p.x + dx); out.push_back(p); }
+        }
+    }
+    return out;
+}
+
+// ---- consensus fusion -----------------------------------------------------
+
+FuseStats consensus_fuse(const std::vector<AccumPoint> & cloud, double voxel_frac, int k,
+                         std::vector<AccumPoint> & fused, int mode) {
+    fused.clear();
+    FuseStats st{};
+    st.raw_points = (int64_t) cloud.size();
+    if (cloud.empty()) return st;
+
+    // A non-finite point (NaN/Inf — possible since the cloud is engine output and
+    // this is a public boundary) would poison the extent and, worse, make the
+    // float->int voxel-coord cast UB. Skip them throughout; fuse only finite points.
+    auto finite = [](const AccumPoint & p) {
+        return std::isfinite(p.x) && std::isfinite(p.y) && std::isfinite(p.z);
+    };
+    // extent = rms distance to centroid; voxel = voxel_frac * extent; coords from min.
+    double mean[3] = {0,0,0}, lo[3] = {1e300,1e300,1e300};
+    int64_t nf = 0;
+    for (const auto & p : cloud) {
+        if (!finite(p)) continue;
+        mean[0]+=p.x; mean[1]+=p.y; mean[2]+=p.z;
+        lo[0]=std::min(lo[0],(double)p.x); lo[1]=std::min(lo[1],(double)p.y); lo[2]=std::min(lo[2],(double)p.z);
+        nf++;
+    }
+    if (nf == 0) return st;
+    for (int c = 0; c < 3; c++) mean[c] /= nf;
+    double ss = 0;
+    for (const auto & p : cloud) {
+        if (!finite(p)) continue;
+        const double dx=p.x-mean[0], dy=p.y-mean[1], dz=p.z-mean[2];
+        ss += dx*dx + dy*dy + dz*dz;
+    }
+    const double ext = std::sqrt(ss / nf);
+    const double v = voxel_frac * ext;
+    if (!(v > 0)) return st;
+
+    struct Vox { double sx=0,sy=0,sz=0,sr=0,sg=0,sb=0,sop=0; double ssx=0,ssy=0,ssz=0; int64_t cnt=0;
+                 float q[4]={1,0,0,0}; bool has_q=false;
+                 std::vector<std::pair<int32_t,double>> frames;   // (frame, opacity weight)
+                 int best_frame() const {
+                     int32_t b=-1; double bw=-1;
+                     for (auto & fr : frames) if (fr.second > bw) { bw=fr.second; b=fr.first; }
+                     return b; } };
+    struct Key { int32_t i,j,k; bool operator==(const Key&o) const { return i==o.i&&j==o.j&&k==o.k; } };
+    struct KeyHash { size_t operator()(const Key&q) const {
+        uint64_t h = (uint64_t)(uint32_t)q.i * 0x9E3779B97F4A7C15ULL;
+        h ^= (uint64_t)(uint32_t)q.j + 0x9E3779B9U + (h<<6) + (h>>2);
+        h ^= (uint64_t)(uint32_t)q.k + 0x85EBCA6BU + (h<<6) + (h>>2);
+        return (size_t) h; } };
+    std::unordered_map<Key, Vox, KeyHash> grid;
+    grid.reserve(cloud.size() / 2 + 1);
+
+    auto vcoord = [&](double x, double lov) -> int32_t {
+        const double c = std::floor((x - lov) / v);     // finite (x finite, v>0)
+        // clamp to int32 so an extreme-but-finite coordinate can't overflow the cast
+        if (c < -2147483640.0) return -2147483640;
+        if (c >  2147483640.0) return  2147483640;
+        return (int32_t) c;
+    };
+    for (const auto & p : cloud) {
+        if (!finite(p)) continue;
+        Key key{ vcoord(p.x, lo[0]), vcoord(p.y, lo[1]), vcoord(p.z, lo[2]) };
+        Vox & vx = grid[key];
+        vx.sx+=p.x; vx.sy+=p.y; vx.sz+=p.z; vx.sr+=p.r; vx.sg+=p.g; vx.sb+=p.b; vx.sop+=p.opacity; vx.cnt++;
+        vx.ssx+=p.sx; vx.ssy+=p.sy; vx.ssz+=p.sz;
+        if (!vx.has_q) { vx.q[0]=p.qw; vx.q[1]=p.qx; vx.q[2]=p.qy; vx.q[3]=p.qz; vx.has_q=true; } // representative orientation
+        const double w = std::max((double) p.opacity, 1e-3);
+        bool seen = false;
+        for (auto & fr : vx.frames) if (fr.first == p.frame) { fr.second += w; seen = true; break; }
+        if (!seen) vx.frames.push_back({ p.frame, w });
+    }
+
+    st.voxels = (int64_t) grid.size();
+    for (const auto & kv : grid) {
+        if ((int) kv.second.frames.size() < k) continue;
+        st.kept_voxels++;
+        st.points_kept += kv.second.cnt;
+    }
+    st.points_dropped = st.raw_points - st.points_kept;
+
+    if (mode == FUSE_KEPT) {
+        // DENSE: keep every raw gaussian whose voxel is corroborated by >= k frames
+        // (floaters dropped, nothing averaged/decimated away).
+        fused.reserve((size_t) st.points_kept);
+        for (const auto & p : cloud) {
+            if (!finite(p)) continue;
+            Key key{ vcoord(p.x, lo[0]), vcoord(p.y, lo[1]), vcoord(p.z, lo[2]) };
+            auto it = grid.find(key);
+            if (it != grid.end() && (int) it->second.frames.size() >= k) fused.push_back(p);
+        }
+    } else if (mode == FUSE_BEST) {
+        // DENSE + DE-GHOSTED: in each consensus voxel keep only the single most-
+        // confident frame's gaussians, so overlapping (disagreeing) copies aren't
+        // stacked — one frame represents each region.
+        for (const auto & p : cloud) {
+            if (!finite(p)) continue;
+            Key key{ vcoord(p.x, lo[0]), vcoord(p.y, lo[1]), vcoord(p.z, lo[2]) };
+            auto it = grid.find(key);
+            if (it != grid.end() && (int) it->second.frames.size() >= k && p.frame == it->second.best_frame())
+                fused.push_back(p);
+        }
+    } else {
+        // DENOISED: one averaged gaussian per consensus voxel.
+        for (const auto & kv : grid) {
+            const Vox & vx = kv.second;
+            if ((int) vx.frames.size() < k) continue;
+            AccumPoint p;
+            p.x = (float) (vx.sx / vx.cnt); p.y = (float) (vx.sy / vx.cnt); p.z = (float) (vx.sz / vx.cnt);
+            p.r = (float) (vx.sr / vx.cnt); p.g = (float) (vx.sg / vx.cnt); p.b = (float) (vx.sb / vx.cnt);
+            p.opacity = (float) (vx.sop / vx.cnt);
+            p.sx = (float) (vx.ssx / vx.cnt); p.sy = (float) (vx.ssy / vx.cnt); p.sz = (float) (vx.ssz / vx.cnt);
+            p.qw = vx.q[0]; p.qx = vx.q[1]; p.qy = vx.q[2]; p.qz = vx.q[3];
+            p.frame = (int32_t) vx.frames.size();   // support count (informational)
+            fused.push_back(p);
+        }
+    }
+    return st;
+}
+
+// ---- gaussian-level geometric de-ghost -------------------------------------
+
+double consensus_refine(std::vector<AccumPoint> & cloud, double voxel_frac, int iters, double alpha) {
+    if (cloud.size() < 2 || iters < 1) return 0.0;
+    auto finite = [](const AccumPoint & p) {
+        return std::isfinite(p.x) && std::isfinite(p.y) && std::isfinite(p.z);
+    };
+    struct Key { int32_t i,j,k; bool operator==(const Key&o) const { return i==o.i&&j==o.j&&k==o.k; } };
+    struct KeyHash { size_t operator()(const Key&q) const {
+        uint64_t h = (uint64_t)(uint32_t)q.i * 0x9E3779B97F4A7C15ULL;
+        h ^= (uint64_t)(uint32_t)q.j + 0x9E3779B9U + (h<<6) + (h>>2);
+        h ^= (uint64_t)(uint32_t)q.k + 0x85EBCA6BU + (h<<6) + (h>>2);
+        return (size_t) h; } };
+    struct FC { int32_t f; double sx,sy,sz,w; };   // per-frame opacity-weighted centroid
+
+    double disagree = 0.0;
+    for (int it = 0; it < iters; it++) {
+        double mean[3]={0,0,0}, lo[3]={1e300,1e300,1e300}; int64_t nf=0;
+        for (const auto & p : cloud) { if (!finite(p)) continue;
+            mean[0]+=p.x; mean[1]+=p.y; mean[2]+=p.z;
+            lo[0]=std::min(lo[0],(double)p.x); lo[1]=std::min(lo[1],(double)p.y); lo[2]=std::min(lo[2],(double)p.z); nf++; }
+        if (nf == 0) return disagree;
+        for (int c=0;c<3;c++) mean[c]/=nf;
+        double ss=0; for (const auto & p : cloud) { if (!finite(p)) continue;
+            const double dx=p.x-mean[0], dy=p.y-mean[1], dz=p.z-mean[2]; ss+=dx*dx+dy*dy+dz*dz; }
+        const double ext = std::sqrt(ss / nf);
+        const double frac = (iters > 1)
+            ? voxel_frac * std::pow(2.0, 1.0 - (double) it / (iters - 1))   // 2*vf -> vf (coarse->fine)
+            : voxel_frac;
+        const double v = frac * ext;
+        if (!(v > 0)) return disagree;
+        auto vc = [&](double x, double l) -> int32_t {
+            const double c = std::floor((x - l) / v);
+            if (c < -2.1e9) return -2100000000;
+            if (c >  2.1e9) return  2100000000;
+            return (int32_t) c; };
+
+        // pass 1: per-voxel, per-frame opacity-weighted centroid (a snapshot)
+        std::unordered_map<Key, std::vector<FC>, KeyHash> grid;
+        grid.reserve(cloud.size()/2 + 1);
+        for (const auto & p : cloud) { if (!finite(p)) continue;
+            Key key{ vc(p.x,lo[0]), vc(p.y,lo[1]), vc(p.z,lo[2]) };
+            const double w = std::max((double)p.opacity, 1e-3);
+            auto & vec = grid[key]; bool hit=false;
+            for (auto & fc : vec) if (fc.f==p.frame) { fc.sx+=w*p.x; fc.sy+=w*p.y; fc.sz+=w*p.z; fc.w+=w; hit=true; break; }
+            if (!hit) vec.push_back({ p.frame, w*(double)p.x, w*(double)p.y, w*(double)p.z, w });
+        }
+        // pass 2: move each point toward the OTHER frames' consensus in its voxel
+        double dsum=0; int64_t dcnt=0;
+        for (auto & p : cloud) { if (!finite(p)) continue;
+            Key key{ vc(p.x,lo[0]), vc(p.y,lo[1]), vc(p.z,lo[2]) };
+            auto git = grid.find(key);
+            if (git == grid.end() || git->second.size() < 2) continue;
+            double tx=0,ty=0,tz=0,tw=0;
+            for (const auto & fc : git->second) { if (fc.f==p.frame) continue; tx+=fc.sx; ty+=fc.sy; tz+=fc.sz; tw+=fc.w; }
+            if (tw <= 0) continue;
+            tx/=tw; ty/=tw; tz/=tw;
+            const double dx=tx-p.x, dy=ty-p.y, dz=tz-p.z;
+            dsum += dx*dx+dy*dy+dz*dz; dcnt++;
+            p.x += (float)(alpha*dx); p.y += (float)(alpha*dy); p.z += (float)(alpha*dz);
+        }
+        disagree = (dcnt && ext > 0) ? std::sqrt(dsum/dcnt)/ext : 0.0;
+    }
+    return disagree;
+}
+
+// ---- loop closure ---------------------------------------------------------
+
+Mat4 sim4_invert(const Mat4 & M) {
+    Mat3 A{}; Vec3 t{};
+    for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) A(i,j) = M(i,j); t[i] = M(i,3); }
+    const double s = std::cbrt(det3(A));
+    Mat3 R = A;
+    for (int i = 0; i < 9; i++) R.a[i] /= s;
+    Sim3 inv = sim_invert(Sim3{ s, R, t });
+    return sim_matrix(inv);
+}
+
+std::vector<Vec3> distribute_drift(const Mat4 & D, const std::vector<Mat4> & global_poses) {
+    const int m = (int) global_poses.size();
+    std::vector<Vec3> out(m);
+    const int n = std::max(m - 1, 1);
+    for (int k = 0; k < m; k++) {
+        Mat4 Dk = sim_frac_power(D, (double) k / n);
+        Vec3 c = { global_poses[k](0,3), global_poses[k](1,3), global_poses[k](2,3) };
+        Vec3 r = mat3_apply(Mat3{ { Dk(0,0),Dk(0,1),Dk(0,2), Dk(1,0),Dk(1,1),Dk(1,2), Dk(2,0),Dk(2,1),Dk(2,2) } }, c);
+        out[k] = { r[0] + Dk(0,3), r[1] + Dk(1,3), r[2] + Dk(2,3) };
+    }
+    return out;
+}
+
+} // namespace pose
+} // namespace free_splatter
diff --git a/src/pose.h b/src/pose.h
new file mode 100644
index 0000000..dd0e50a
--- /dev/null
+++ b/src/pose.h
@@ -0,0 +1,288 @@
+// pose.h — downstream camera-pose recovery + cross-run alignment (C++ port).
+//
+// This is the C++ port of the (now removed; see git history) proven `pose/`
+// research prototype: given the engine's per-pixel gaussians it recovers each
+// view's camera (PnP), and aligns successive runs into one accumulating world
+// (Sim(3)). It is dependency-free — only the self-contained linalg.h — so it
+// ships from the CLI / C API with no Python and no Eigen/OpenCV (see CLAUDE.md).
+//
+// Faithful to FreeSplatter's scene recipe (estimate_poses -> DUSt3R fast_pnp):
+// the prototype's numpy reference solver was verified ~1e-7 against cv2 and
+// bit-exact to upstream estimate_poses on real engine output.
+//
+// Conventions: f64 throughout; a similarity acts as x -> s*(R@x)+t; gaussian
+// channel layout (scene, 23ch) is xyz[0:3] SH[3:15] opacity[15] scale[16:19]
+// rotation[19:23], opacity already activated (sigmoid) in [0,1].
+#ifndef FREE_SPLATTER_POSE_H
+#define FREE_SPLATTER_POSE_H
+
+#include "linalg.h"
+
+#include <string>
+#include <vector>
+
+namespace free_splatter {
+namespace pose {
+
+using fsla::Vec3;
+using fsla::Mat3;
+using fsla::Mat4;
+
+// ---- focal (Weiszfeld shared-focal, mirrors pose/focal.py) -----------------
+// Robust (L1) focal from camera-frame points and their pixels, principal point
+// pp. N points, pts3d is N*3 (x,y,z), pixels is N*2 (col,row), all row-major.
+double estimate_focal(const double * pts3d, const double * pixels, int N,
+                      double ppx, double ppy, int weiszfeld_iters = 10);
+
+// ---- similarity alignment (mirrors pose/align.py) --------------------------
+struct Sim3 { double s; Mat3 R; Vec3 t; };           // x -> s*(R@x)+t
+
+Sim3 sim_identity();
+Vec3 sim_apply(const Sim3 & T, const Vec3 & x);
+Sim3 sim_compose(const Sim3 & T2, const Sim3 & T1);  // apply T1 then T2
+Sim3 sim_invert(const Sim3 & T);
+Mat4 sim_matrix(const Sim3 & T);                     // 4x4 [[sR,t],[0,1]]
+
+// Umeyama closed-form (s,R,t) minimizing ||sRX+t - Y||^2. with_scale=false -> rigid.
+// X, Y are N*3 row-major.
+Sim3 fit_similarity(const double * X, const double * Y, int N, bool with_scale = true);
+
+// RANSAC similarity robust to gross outliers; fills `inliers` (size N, 0/1).
+Sim3 fit_similarity_ransac(const double * X, const double * Y, int N, double thresh,
+                           int iters, std::vector<char> & inliers,
+                           bool with_scale = true, uint64_t seed = 0);
+
+// Residual ladder (rigid -> similarity -> affine) + depth correlation, and the
+// verdict that classifies the A<->B mismatch. Mirrors align.diagnose.
+struct Ladder {
+    double scene, rigid, similarity, affine, scale, depth_corr, aff_gain;
+    bool structured;
+    std::string verdict;   // rigid_ok | needs_scale | needs_affine | nonlinear | similarity_plus_noise
+};
+Ladder diagnose(const double * X, const double * Y, int N,
+                double tol = 1e-3, double corr_tol = 0.3);
+
+// Loop-closure metrics for a closed chain of similarities (deviation from I).
+struct LoopError { double scale_err, rot_deg, trans; };
+LoopError loop_closure_error(const std::vector<Sim3> & links);
+
+// Fractional power M^f of a Sim(3) 4x4 (even loop-closure relaxation). Real part
+// of the eigendecomposition; valid while rotation < 180deg. Mirrors sim_frac_power.
+Mat4 sim_frac_power(const Mat4 & M, double f);
+
+// ---- PnP -------------------------------------------------------------------
+// RANSAC DLT PnP with known intrinsics K (the prototype's numpy backend).
+// Xw is N*3 world points, pixels is N*2 (col,row). Returns cam2world (4x4) and
+// fills `inliers` (size N, 0/1). Kept as the asset-free golden-test reference; on
+// real scenes the DLT inherits the planar/mirror degeneracy — use solve_pnp.
+Mat4 solve_pnp_numpy(const double * Xw, const double * pixels, int N, const Mat3 & K,
+                     std::vector<char> & inliers,
+                     double thresh_px = 5.0, int iters = 100, uint64_t seed = 0);
+
+// Robust PnP: EPnP (planar-robust, deterministic — uses all points, no random
+// minimal samples) for the initial pose, then a Gauss-Newton / Huber reprojection
+// refine. Reaches cv2/SQPNP-grade poses on real scenes with no OpenCV dependency.
+// Fills `inliers` (reprojection < thresh_px). This is what estimate_poses uses.
+Mat4 solve_pnp(const double * Xw, const double * pixels, int N, const Mat3 & K,
+               std::vector<char> & inliers, double thresh_px = 5.0, int gn_iters = 10);
+
+// EPnP-only (no refine) — exposed for testing the planar-robust initialization.
+Mat4 solve_pnp_epnp(const double * Xw, const double * pixels, int N, const Mat3 & K);
+
+// Integer pixel grid (col,row), row-major over H*W — matches upstream xy_grid
+// (no half-pixel offset). Fills `out` with 2*H*W doubles.
+void pixel_grid(int H, int W, std::vector<double> & out);
+
+// ---- estimate_poses orchestration (scene recipe, mirrors pnp.estimate_poses) -
+struct PoseResult {
+    std::vector<Mat4> cam2world;
+    double focal;
+    double scale;
+};
+// points: N views, each H*W*3 (gaussian centers, view-0 frame). opacities: N
+// views, each H*W (activated; pass nullptr entries for no mask). focal<=0 ->
+// estimate from view 0 over all pixels (use_first_focal). normalize -> the
+// runner's 1/baseline camera rescale.
+PoseResult estimate_poses(const std::vector<const float *> & points,
+                          const std::vector<const float *> & opacities,
+                          int H, int W,
+                          double opacity_threshold = 0.05,
+                          double focal = -1.0, int pnp_iter = 100,
+                          bool normalize = false, uint64_t seed = 0);
+
+// ---- after-inference parallax (depth conditioning of a recovered pair) -------
+// Quantifies how well a 2-view pair constrains depth, from the model's OWN
+// recovered geometry. All angle fields are scale-invariant; B/Z is consistent
+// because cameras and points share the (un-normalized) view-0 frame.
+struct Parallax {
+    double tri_angle_deg;       // median triangulation angle over confident points
+    double lateral_angle_deg;   // baseline angle off view-0 optical axis (0=dolly, 90=strafe)
+    double baseline_over_depth; // ||C1-C0|| / median point depth from camera 0
+    double baseline;
+    double median_depth;
+    double focal;               // estimated (or supplied) focal, px
+    int    n_points;            // confident points used for the medians
+};
+
+// Pure geometry core (no PnP) — directly testable against hand-computed angles.
+// pts: 3*N view-0-frame points; mask: per-point weight (>thr kept), or nullptr.
+// axis0 need not be unit; it is normalized internally.
+Parallax parallax_stats(const Vec3 & C0, const Vec3 & C1, const Vec3 & axis0,
+                        const float * pts, const float * mask, int N, double mask_thr);
+
+// Full path: recover the pair's cameras (normalize=false), then parallax_stats.
+// points/opacities: 2 views (view-0-frame xyz + activated opacity), as estimate_poses.
+Parallax pair_parallax(const std::vector<const float *> & points,
+                       const std::vector<const float *> & opacities,
+                       int H, int W, double opacity_threshold = 0.05, double focal = -1.0);
+
+// ---- accumulation: chain successive runs into one world (mirrors accumulate.py)
+// One accumulated 3D gaussian in the global frame, tagged with the source frame
+// it came from (consensus fusion needs the frame id). Carries the full anisotropic
+// shape (scale + rotation quaternion w,x,y,z), transformed through the cross-run
+// Sim(3), so the cloud renders as oriented splats, not isotropic points. rgb [0,1].
+struct AccumPoint {
+    float x, y, z;
+    float r, g, b, opacity;  // color [0,1] and activated opacity [0,1] (the splat alpha)
+    float sx, sy, sz;        // anisotropic gaussian scale (global frame)
+    float qw, qx, qy, qz;    // gaussian rotation quaternion (w,x,y,z), global frame
+    int32_t frame;
+};
+
+// Diagnostics for chaining one run into the global frame.
+struct ChainLink {
+    Sim3   sim;          // run k -> run k-1 (local similarity); identity for run 0
+    Sim3   global;       // run k -> global frame (T_k = T_{k-1} . sim)
+    double scale;        // sim.s — per-link scale drift
+    double inlier_frac;  // RANSAC inliers / shared-frame valid points
+    double valid_frac;   // pixels valid (opacity) in BOTH shared views
+    double resid_frac;   // inlier RANSAC residual / target scene extent
+};
+
+// Sliding-window accumulator: feed it each consecutive pair's engine output
+// ([2,H,W,gc], where view 0 shares a frame with the previous pair's view 1) and
+// it recovers each pair's cameras (estimate_poses), fits the cross-run Sim(3) on
+// the shared frame, composes a global chain, and drops every new frame's gaussians
+// into one world. This is the live accumulating-reconstruction loop, in C++.
+class Accumulator {
+public:
+    Accumulator(int H, int W, double opacity_threshold = 0.05,
+                double ransac_thresh_frac = 0.02, int ransac_iters = 300,
+                uint64_t seed = 0);
+
+    // gaussians: 2*H*W*gc floats (two views, engine channel layout). Recovers the
+    // pair's cameras, chains the Sim(3), accumulates the new frame(s). focal<=0 ->
+    // estimate per pair (use_first_focal). Returns the chain link just added.
+    ChainLink add_pair(const float * gaussians, int gaussian_channels, double focal = -1.0);
+
+    const std::vector<AccumPoint> & cloud() const { return cloud_; }
+    const std::vector<ChainLink>  & links() const { return links_; }
+    // count+1 global cam2world (similarity) matrices: frame f_k = T_k . c2w_k[view0],
+    // plus the final frame f_n = T_{n-1} . c2w_{n-1}[view1]. Mirrors accumulate.py's `rec`.
+    std::vector<Mat4> camera_path() const;
+    int  frame_count() const { return next_frame_; }
+    int  pair_count()  const { return (int) links_.size(); }
+    const Sim3 & global_transform() const { return T_; }
+    // Geometric de-ghost the accumulated cloud in place (consensus_refine).
+    double refine(double voxel_frac = 0.03, int iters = 8, double alpha = 0.5);
+
+private:
+    void add_view(const float * pts, const float * op, const float * rgb,
+                  const float * scl, const float * rot,    // P*3 scale, P*4 quat (w,x,y,z)
+                  const Sim3 & T, int frame);
+
+    int H_, W_;
+    double thr_, rthr_;
+    int riters_;
+    uint64_t seed_;
+    Sim3 T_;
+    bool have_prev_ = false;
+    std::vector<float> prev_pts1_;   // 3*P previous view-1 points (the shared frame)
+    std::vector<char>  prev_mask1_;  // P   previous view-1 opacity mask
+    std::vector<AccumPoint> cloud_;
+    std::vector<Mat4>       cams_;       // per-pair view-0 global cam2world
+    Mat4                    final_cam_;  // latest view-1 global cam2world (appended last)
+    std::vector<ChainLink>  links_;
+    int next_frame_ = 0;
+};
+
+// ---- hierarchical tree accumulation (balanced binary merge) ----------------
+// Linear chaining composes N Sim(3)s in series, so registration drift compounds
+// over N hops and piles onto the last frame (the far end smears). This instead
+// builds short submaps and merges them pairwise up a balanced binary tree — so
+// drift compounds over only ~log2(N) hops and is spread evenly rather than dumped
+// on the end.
+struct TreeMerge {                 // diagnostics per merge (defs as ChainLink)
+    int level, lo_frame, hi_frame, shared_frame;
+    double scale, inlier_frac, valid_frac, resid_frac;
+};
+
+// Multi-frame-overlap tree. pairs[k]: one engine output [2*H*W*gc] (pair k = frames
+// k,k+1; pair k's view-1 and pair k+1's view-0 are the same physical frame).
+// Adjacent submaps of `block` frames overlap by `overlap` frames and each merge fits
+// its Sim(3) over ALL shared frames at once — the fit is over-determined, so the
+// per-frame depth-inconsistency noise (the reason a single-frame alignment is only
+// ~13% rigid-consistent) averages out and the compounded drift drops. Base submaps
+// are short windows of `block` frames (block > overlap), chained internally by
+// single-frame links, then merged in a balanced tree on their overlap bands. The
+// degenerate block=2,overlap=1 is the plain overlap-by-one (single shared boundary)
+// tree. Returns the merged root cloud in the first submap's frame; `merges`
+// (optional) gets one entry per merge.
+//
+// For the staged side-by-side demo: max_levels caps the merge rounds (-1 = full):
+// 0 returns the independent base submaps, 1 the first merge round, ... . The
+// remaining (>1) nodes are shifted apart along X by their frame-range midpoint when
+// layout_spacing != 0 (0 = none, <0 = auto 4x median submap extent, >0 = that
+// spacing), each capped to its top per_node_cap points by opacity*radius (0 =
+// uncapped) so every laid-out scene renders at its own detail. n_nodes_out
+// (optional) reports how many nodes remain (1 once fully merged).
+std::vector<AccumPoint> tree_accumulate_overlap(const std::vector<const float *> & pairs,
+                                                int H, int W, int gaussian_channels,
+                                                double opacity_threshold = 0.05,
+                                                int block = 4, int overlap = 2,
+                                                double ransac_thresh_frac = 0.02,
+                                                int ransac_iters = 300, uint64_t seed = 0,
+                                                std::vector<TreeMerge> * merges = nullptr,
+                                                int max_levels = -1, double layout_spacing = 0.0,
+                                                int * n_nodes_out = nullptr, int per_node_cap = 0);
+
+// ---- consensus fusion (mirrors fuse.py) -----------------------------------
+struct FuseStats {
+    int64_t raw_points, voxels, kept_voxels, points_kept, points_dropped;
+};
+// Voxelize the cloud at voxel_frac * cloud-extent and keep only voxels corroborated
+// by >= k DISTINCT source frames (single-frame floaters dropped). `mode` selects how
+// a consensus voxel is emitted:
+//   FUSE_AVERAGED (0): one averaged gaussian per voxel — denoised but decimated
+//                      (sparse unless many frames overlap).
+//   FUSE_KEPT (1):     every raw gaussian in the voxel — dense, nothing averaged.
+//   FUSE_BEST (2):     only the single most-confident frame's gaussians per voxel —
+//                      dense AND de-ghosted (no overlapping copies stacked).
+enum FuseMode { FUSE_AVERAGED = 0, FUSE_KEPT = 1, FUSE_BEST = 2 };
+FuseStats consensus_fuse(const std::vector<AccumPoint> & cloud, double voxel_frac, int k,
+                         std::vector<AccumPoint> & fused, int mode = FUSE_AVERAGED);
+
+// ---- gaussian-level geometric de-ghost (multi-view consensus refinement) ----
+// Pairwise Sim(3) chaining leaves per-OBJECT (non-rigid) misregistration that a
+// per-frame pose fit can't remove, so overlapping frames show doubled objects.
+// This refines the cloud IN PLACE at the gaussian level: each iteration, for every
+// point, moves it a fraction `alpha` toward the opacity-weighted consensus of the
+// OTHER frames' points in its (coarse-to-fine) voxel neighbourhood. Non-rigid and
+// local, so spatially-varying ghosting collapses to one surface; single-frame
+// regions (no other frame nearby) are left untouched. Returns the RMS point-to-
+// consensus distance after refinement, as a fraction of cloud extent.
+double consensus_refine(std::vector<AccumPoint> & cloud, double voxel_frac = 0.03,
+                        int iters = 8, double alpha = 0.5);
+
+// ---- loop closure (mirrors loop_closure.py) -------------------------------
+// Inverse of a similarity 4x4 [[sR,t],[0,1]] (sR = s*rotation): [[(1/s)Rᵀ,...],[0,1]].
+Mat4 sim4_invert(const Mat4 & M);
+// Distribute an accumulated Sim(3) drift D over a chain of n+1 nodes by applying
+// D^(k/n) at node k (even relaxation, sim_frac_power). Returns the corrected camera
+// centers. global_poses are the open-loop global cam2world (similarity) matrices.
+std::vector<Vec3> distribute_drift(const Mat4 & D, const std::vector<Mat4> & global_poses);
+
+} // namespace pose
+} // namespace free_splatter
+
+#endif // FREE_SPLATTER_POSE_H
diff --git a/src/splat.h b/src/splat.h
new file mode 100644
index 0000000..f5cc2e3
--- /dev/null
+++ b/src/splat.h
@@ -0,0 +1,44 @@
+// splat.h — the single, shared encoder for an antimatter15 .splat record.
+//
+// There are two producers of .splat output: the single-run path (write_splat,
+// straight from the engine's gaussians) and the accumulated-cloud path
+// (write_cloud_splat, from pose::AccumPoint). They MUST encode a gaussian
+// identically — same OpenCV->OpenGL convention, same quaternion remap, same byte
+// packing, same opacity->alpha. They used to be two copies and drifted: the cloud
+// path silently dropped first the rotation/scale, then the opacity (rendering a
+// fully-opaque, swirling, blurry soup instead of a proper alpha-blended surface).
+//
+// This is the ONE definition both call, so they cannot diverge again. It is pinned
+// byte-for-byte by tests/test_pose.cpp::test_splat_record.
+#ifndef FREE_SPLATTER_SPLAT_H
+#define FREE_SPLATTER_SPLAT_H
+
+#include <cmath>
+#include <cstring>
+
+namespace free_splatter {
+
+// One 32-byte .splat record: pos[3]f32, scale[3]f32, rgba[4]u8, rot[4]u8 (w,x,y,z).
+// Inputs are in the engine's OpenCV frame (y down, z forward); `rgb` and `opacity`
+// in [0,1]; `quat_wxyz` as (w,x,y,z). Writes exactly 32 bytes to `out`.
+inline void encode_splat_record(unsigned char out[32], const float pos[3], const float scale[3],
+                                const float quat_wxyz[4], const float rgb[3], float opacity) {
+    auto u8  = [](float v) -> unsigned char { float t = v < 0 ? 0 : (v > 255 ? 255 : v); return (unsigned char) t; };
+    auto c01 = [](float v) -> float { return v < 0.0f ? 0.0f : (v > 1.0f ? 1.0f : v); };
+    // OpenCV (y down, z forward) -> viewer OpenGL (y up): 180deg about X = diag(1,-1,-1):
+    // position.yz negate, quaternion (w,x,y,z) -> (-x, w, -z, y).
+    const float p[3] = { pos[0], -pos[1], -pos[2] };
+    float q[4] = { -quat_wxyz[1], quat_wxyz[0], -quat_wxyz[3], quat_wxyz[2] };
+    const float nrm = std::sqrt(q[0]*q[0] + q[1]*q[1] + q[2]*q[2] + q[3]*q[3]) + 1e-12f;
+    std::memcpy(out + 0,  p,     12);
+    std::memcpy(out + 12, scale, 12);
+    out[24] = u8(c01(rgb[0]) * 255.0f);
+    out[25] = u8(c01(rgb[1]) * 255.0f);
+    out[26] = u8(c01(rgb[2]) * 255.0f);
+    out[27] = u8(c01(opacity) * 255.0f);                  // opacity -> alpha (NOT forced opaque)
+    for (int c = 0; c < 4; c++) out[28 + c] = u8(q[c] / nrm * 128.0f + 128.0f);
+}
+
+} // namespace free_splatter
+
+#endif // FREE_SPLATTER_SPLAT_H
diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
index 7f9f3dc..49ddd7a 100644
--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt
@@ -13,6 +13,7 @@ endfunction()
 free_splatter_add_test(test_loader)        # GGUF KV/hparams round-trip
 free_splatter_add_test(test_graph_blocks)  # golden-op pins (ggml ops we depend on)
 free_splatter_add_test(test_image)         # untrusted image-input validation
+free_splatter_add_test(test_pose)          # pose math golden tests (focal/align/PnP)
 
 # Per-layer parity vs the float64 PyTorch reference. Labelled "model": runs only
 # under `ctest -L model` and self-skips (77) unless FREE_SPLATTER_GGUF and
diff --git a/tests/test_pose.cpp b/tests/test_pose.cpp
new file mode 100644
index 0000000..a4816f5
--- /dev/null
+++ b/tests/test_pose.cpp
@@ -0,0 +1,774 @@
+// Asset-free golden tests for the C++ pose component (src/pose.{h,cpp}) — the
+// direct mirror of pose/test_pose.py. No model, no fixtures, no cv2: synthetic
+// geometry with KNOWN ground truth, the way CLAUDE.md wants new ops verified
+// ("pinned to hand-computed references... a wrong op should fail with zero
+// fixtures"). Runs in the fast tier (ctest -LE model).
+//
+// The RNG differs from numpy's (the checks are tolerance-based on recovered
+// quantities, not bitstream-matched), so this validates the C++ port to the same
+// correctness bar the Python prototype meets, independently of numpy.
+#include "pose.h"
+#include "splat.h"
+
+#include <algorithm>
+#include <cmath>
+#include <cstdio>
+#include <cstring>
+#include <random>
+#include <vector>
+
+using namespace free_splatter::pose;
+using fsla::Vec3;
+using fsla::Mat3;
+using fsla::Mat4;
+
+static int failures = 0;
+static std::mt19937_64 RNG(12345);
+
+static void check(const char * name, bool cond, const char * detail = "") {
+    std::printf("  [%s] %s%s%s\n", cond ? "PASS" : "FAIL", name,
+                detail[0] ? "  " : "", detail);
+    if (!cond) failures++;
+}
+
+// --- helpers ---------------------------------------------------------------
+
+static Mat3 rand_rotation() {
+    std::normal_distribution<double> g(0, 1);
+    Vec3 ax = { g(RNG), g(RNG), g(RNG) };
+    double n = std::sqrt(ax[0]*ax[0] + ax[1]*ax[1] + ax[2]*ax[2]);
+    ax = { ax[0]/n, ax[1]/n, ax[2]/n };
+    std::uniform_real_distribution<double> a(0.2, M_PI - 0.2);
+    const double ang = a(RNG), c = std::cos(ang), s = std::sin(ang), v = 1 - c;
+    const double x = ax[0], y = ax[1], z = ax[2];
+    return Mat3{ { c+x*x*v, x*y*v-z*s, x*z*v+y*s,
+                   y*x*v+z*s, c+y*y*v, y*z*v-x*s,
+                   z*x*v-y*s, z*y*v+x*s, c+z*z*v } };
+}
+
+static double rot_angle(const Mat3 & R) {
+    const double tr = R(0,0) + R(1,1) + R(2,2);
+    return std::acos(std::max(-1.0, std::min(1.0, (tr-1)/2))) * 180.0 / M_PI;
+}
+
+static Mat3 mul(const Mat3 & A, const Mat3 & B) { return fsla::mat3_mul(A, B); }
+static Mat3 T(const Mat3 & A) { return fsla::mat3_transpose(A); }
+
+// apply (s,R,t) to a row-major N*3 array -> out
+static void apply_sim_arr(double s, const Mat3 & R, const Vec3 & t,
+                          const double * X, int N, double * out) {
+    for (int i = 0; i < N; i++) {
+        Vec3 x = { X[3*i], X[3*i+1], X[3*i+2] };
+        Vec3 Rx = fsla::mat3_apply(R, x);
+        out[3*i]   = s*Rx[0] + t[0];
+        out[3*i+1] = s*Rx[1] + t[1];
+        out[3*i+2] = s*Rx[2] + t[2];
+    }
+}
+
+static double rms3(const double * a, const double * b, int N) {
+    double s = 0;
+    for (int i = 0; i < 3*N; i++) { const double d = a[i]-b[i]; s += d*d; }
+    return std::sqrt(s / N);
+}
+
+static Vec3 apply_mat4(const Mat4 & M, const Vec3 & p) {
+    return { M(0,0)*p[0]+M(0,1)*p[1]+M(0,2)*p[2]+M(0,3),
+             M(1,0)*p[0]+M(1,1)*p[1]+M(1,2)*p[2]+M(1,3),
+             M(2,0)*p[0]+M(2,1)*p[1]+M(2,2)*p[2]+M(2,3) };
+}
+
+// ===========================================================================
+// Coordinate-system normalization (the scale question)
+// ===========================================================================
+
+static void test_similarity_roundtrip() {
+    std::printf("test_similarity_roundtrip\n");
+    const int N = 200;
+    std::normal_distribution<double> g(0, 1);
+    std::vector<double> X(3*N), Y(3*N);
+    for (int i = 0; i < N; i++) { X[3*i]=g(RNG)*3; X[3*i+1]=g(RNG)*2; X[3*i+2]=g(RNG)*5; }
+    const double s_t = 1.7; Mat3 R_t = rand_rotation(); Vec3 t_t = { 4.0, -1.0, 2.0 };
+    apply_sim_arr(s_t, R_t, t_t, X.data(), N, Y.data());
+
+    Sim3 T_ = fit_similarity(X.data(), Y.data(), N);
+    char buf[64];
+    std::snprintf(buf, sizeof buf, "%.12f vs %g", T_.s, s_t);
+    check("scale", std::fabs(T_.s - s_t) < 1e-9, buf);
+    check("rotation", rot_angle(mul(T_.R, T(R_t))) < 1e-7);
+    check("translation", std::sqrt((T_.t[0]-t_t[0])*(T_.t[0]-t_t[0]) +
+        (T_.t[1]-t_t[1])*(T_.t[1]-t_t[1]) + (T_.t[2]-t_t[2])*(T_.t[2]-t_t[2])) < 1e-7);
+    std::vector<double> Yp(3*N);
+    apply_sim_arr(T_.s, T_.R, T_.t, X.data(), N, Yp.data());
+    check("residual~0", rms3(Yp.data(), Y.data(), N) < 1e-9);
+}
+
+static void test_scale_detection() {
+    std::printf("test_scale_detection\n");
+    const int N = 300;
+    std::normal_distribution<double> g(0, 1);
+    std::vector<double> X(3*N), Y(3*N);
+    for (int i = 0; i < 3*N; i++) X[i] = g(RNG)*2;
+    const double s_t = 2.5;
+    apply_sim_arr(s_t, rand_rotation(), {1.0, 2.0, -3.0}, X.data(), N, Y.data());
+    Ladder L = diagnose(X.data(), Y.data(), N);
+    check("recovered scale", std::fabs(L.scale - s_t) < 1e-8);
+    check("rigid residual large", L.rigid / L.scene > 0.1);
+    check("similarity residual ~0", L.similarity / L.scene < 1e-9);
+    check("verdict needs_scale", L.verdict == "needs_scale", L.verdict.c_str());
+}
+
+static void test_nonlinear_detection() {
+    std::printf("test_nonlinear_detection\n");
+    const int N = 400;
+    std::normal_distribution<double> g(0, 1);
+    std::vector<double> X(3*N);
+    for (int i = 0; i < N; i++) { X[3*i]=g(RNG)*2; X[3*i+1]=g(RNG)*2; X[3*i+2]=g(RNG)*2 + 6.0; }
+
+    // (a) anisotropic (affine) warp
+    Mat3 A = mul(Mat3{ {1.0,0,0, 0,1.6,0, 0,0,0.7} }, rand_rotation());
+    std::vector<double> Yaff(3*N);
+    for (int i = 0; i < N; i++) {
+        Vec3 x = { X[3*i], X[3*i+1], X[3*i+2] };
+        Vec3 Ax = fsla::mat3_apply(A, x);
+        Yaff[3*i]=Ax[0]+1.0; Yaff[3*i+1]=Ax[1]; Yaff[3*i+2]=Ax[2]+2.0;
+    }
+    Ladder La = diagnose(X.data(), Yaff.data(), N);
+    check("affine: similarity fails", La.similarity / La.scene > 1e-3);
+    check("affine: affine fits", La.affine / La.scene < 1e-9);
+    check("affine: verdict needs_affine", La.verdict == "needs_affine", La.verdict.c_str());
+
+    // (b) depth-quadratic (focal-error fingerprint)
+    double zmean = 0; for (int i = 0; i < N; i++) zmean += X[3*i+2]; zmean /= N;
+    double zvar = 0; for (int i = 0; i < N; i++) zvar += (X[3*i+2]-zmean)*(X[3*i+2]-zmean); zvar /= N;
+    std::vector<double> Ynl(X);
+    for (int i = 0; i < N; i++) {
+        const double Z = X[3*i+2];
+        Ynl[3*i] *= (1.0 + 0.15 * (Z-zmean)*(Z-zmean) / zvar);
+    }
+    Ladder Ln = diagnose(X.data(), Ynl.data(), N);
+    check("nonlinear: affine fails", Ln.affine / Ln.scene > 1e-3);
+    check("nonlinear: verdict nonlinear", Ln.verdict == "nonlinear", Ln.verdict.c_str());
+    check("nonlinear: depth-correlated residual", Ln.structured);
+}
+
+static void test_outlier_robustness() {
+    std::printf("test_outlier_robustness\n");
+    const int N = 400;
+    std::normal_distribution<double> g(0, 1);
+    std::vector<double> X(3*N), Y(3*N);
+    for (int i = 0; i < 3*N; i++) X[i] = g(RNG)*2;
+    const double s_t = 1.3; Mat3 R_t = rand_rotation(); Vec3 t_t = { 0.5, -2.0, 1.0 };
+    apply_sim_arr(s_t, R_t, t_t, X.data(), N, Y.data());
+    const int n_out = (int)(0.3 * N);
+    std::vector<int> oidx(N); for (int i = 0; i < N; i++) oidx[i] = i;
+    std::shuffle(oidx.begin(), oidx.end(), RNG);
+    for (int k = 0; k < n_out; k++) { int i = oidx[k]; Y[3*i]+=g(RNG)*10; Y[3*i+1]+=g(RNG)*10; Y[3*i+2]+=g(RNG)*10; }
+    std::vector<char> inl;
+    Sim3 T_ = fit_similarity_ransac(X.data(), Y.data(), N, 0.05, 300, inl);
+    check("scale recovered", std::fabs(T_.s - s_t) < 1e-3);
+    check("rotation recovered", rot_angle(mul(T_.R, T(R_t))) < 0.1);
+    check("translation recovered", std::sqrt((T_.t[0]-t_t[0])*(T_.t[0]-t_t[0]) +
+        (T_.t[1]-t_t[1])*(T_.t[1]-t_t[1]) + (T_.t[2]-t_t[2])*(T_.t[2]-t_t[2])) < 1e-2);
+    int rejected = 0; for (int k = 0; k < n_out; k++) if (!inl[oidx[k]]) rejected++;
+    check("outliers excluded", rejected > 0.9 * n_out);
+}
+
+static void test_loop_correction() {
+    std::printf("test_loop_correction\n");
+    Mat3 R = rand_rotation();
+    Mat4 M = sim_matrix({ 1.07, R, {0.3, -0.2, 0.1} });
+    Mat4 m0 = sim_frac_power(M, 0.0), m1 = sim_frac_power(M, 1.0), mh = sim_frac_power(M, 0.5);
+    bool id_ok = true, m1_ok = true;
+    for (int i = 0; i < 16; i++) { id_ok &= std::fabs(m0.a[i] - fsla::mat4_identity().a[i]) < 1e-9;
+                                   m1_ok &= std::fabs(m1.a[i] - M.a[i]) < 1e-9; }
+    Mat4 mh2 = fsla::mat4_mul(mh, mh);
+    bool half_ok = true; for (int i = 0; i < 16; i++) half_ok &= std::fabs(mh2.a[i] - M.a[i]) < 1e-9;
+    check("M^0 == I", id_ok);
+    check("M^1 == M", m1_ok);
+    check("(M^.5)^2 == M", half_ok);
+
+    const int n = 12;
+    Mat4 D = sim_matrix({ 1.15, R, {0.4, -0.2, 0.1} });
+    std::vector<Vec3> clean(n+1), drift(n+1), corr(n+1);
+    for (int k = 0; k <= n; k++) {
+        const double tt = 2*M_PI * k / n;
+        clean[k] = { std::cos(tt), std::sin(tt), 0.1*tt };
+        drift[k] = apply_mat4(sim_frac_power(D, -(double)k/n), clean[k]);   // D^{-k/n}
+        corr[k]  = apply_mat4(sim_frac_power(D,  (double)k/n), drift[k]);   // D^{ k/n}
+    }
+    double pre = 0, post = 0;
+    for (int k = 0; k <= n; k++) {
+        for (int c = 0; c < 3; c++) { pre += (drift[k][c]-clean[k][c])*(drift[k][c]-clean[k][c]);
+                                      post += (corr[k][c]-clean[k][c])*(corr[k][c]-clean[k][c]); }
+    }
+    check("drift present before", std::sqrt(pre/(n+1)) > 0.1);
+    char buf[64]; std::snprintf(buf, sizeof buf, "ATE %.1e", std::sqrt(post/(n+1)));
+    check("recovers clean loop", std::sqrt(post/(n+1)) < 1e-9, buf);
+}
+
+static void test_loop_closure() {
+    std::printf("test_loop_closure\n");
+    std::uniform_real_distribution<double> us(0.8, 1.2);
+    std::normal_distribution<double> g(0, 1);
+    std::vector<Sim3> links;
+    for (int i = 0; i < 4; i++) links.push_back({ us(RNG), rand_rotation(), {g(RNG), g(RNG), g(RNG)} });
+    Sim3 acc = sim_identity();
+    for (int i = 0; i < 3; i++) acc = sim_compose(links[i], acc);
+    std::vector<Sim3> closed = { links[0], links[1], links[2], sim_invert(acc) };
+    LoopError e = loop_closure_error(closed);
+    // rot_deg has a ~sqrt(eps) (~1e-6 deg) floor: it is acos(~1) of RᵀR, whose
+    // deviation from I is at the fp roundoff level — scale_err and trans are
+    // exactly 0 here, so 1e-5 deg cleanly separates "closed" from any real drift.
+    check("closed loop ~ identity", e.scale_err < 1e-9 && e.rot_deg < 1e-5 && e.trans < 1e-9);
+    std::vector<Sim3> bad = closed;
+    bad[0].s *= 1.12;
+    LoopError e2 = loop_closure_error(bad);
+    check("scale drift detected", std::fabs(e2.scale_err - std::log(1.12)) < 1e-9);
+}
+
+// ===========================================================================
+// PnP (verified against analytic ground truth)
+// ===========================================================================
+
+static Mat3 make_K(double f, double W, double H) {
+    Mat3 K = fsla::mat3_identity();
+    K(0,0) = f; K(1,1) = f; K(0,2) = W/2; K(1,2) = H/2;
+    return K;
+}
+
+// project world points by (R,t) world2cam through K; fill px (N*2) and z (N)
+static void project(const double * Pw, int N, const Mat3 & R, const Vec3 & t,
+                    const Mat3 & K, double * px, double * z) {
+    for (int i = 0; i < N; i++) {
+        Vec3 p = { Pw[3*i], Pw[3*i+1], Pw[3*i+2] };
+        Vec3 Xc = fsla::mat3_apply(R, p);
+        Xc[0]+=t[0]; Xc[1]+=t[1]; Xc[2]+=t[2];
+        px[2*i]   = K(0,0)*Xc[0]/Xc[2] + K(0,2);
+        px[2*i+1] = K(1,1)*Xc[1]/Xc[2] + K(1,2);
+        z[i] = Xc[2];
+    }
+}
+
+static void test_focal_recovery() {
+    std::printf("test_focal_recovery\n");
+    const int N = 500; const double f = 437.0, W = 512, H = 512;
+    std::uniform_real_distribution<double> ux(-2,2), uy(-2,2), uz(4,9);
+    std::vector<double> Xc(3*N), px(2*N);
+    for (int i = 0; i < N; i++) {
+        Xc[3*i]=ux(RNG); Xc[3*i+1]=uy(RNG); Xc[3*i+2]=uz(RNG);
+        px[2*i]   = f*Xc[3*i]  /Xc[3*i+2] + W/2;
+        px[2*i+1] = f*Xc[3*i+1]/Xc[3*i+2] + H/2;
+    }
+    double fe = estimate_focal(Xc.data(), px.data(), N, W/2, H/2);
+    char buf[64]; std::snprintf(buf, sizeof buf, "%.9f vs %g", fe, f);
+    check("exact focal", std::fabs(fe - f) < 1e-6, buf);
+    std::normal_distribution<double> g(0, 0.5);
+    std::vector<double> pn(px);
+    for (int i = 0; i < 2*N; i++) pn[i] += g(RNG);
+    double fn = estimate_focal(Xc.data(), pn.data(), N, W/2, H/2);
+    check("focal under noise", std::fabs(fn - f)/f < 0.02);
+}
+
+static void test_pnp_recovery() {
+    std::printf("test_pnp_recovery\n");
+    const int N = 500; Mat3 K = make_K(400, 512, 512);
+    std::uniform_real_distribution<double> ux(-2,2), uy(-2,2), uz(4,8);
+    std::vector<double> Pw(3*N);
+    for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=uz(RNG); }
+    std::normal_distribution<double> g(0, 1);
+    std::uniform_real_distribution<double> ut(-0.6, 0.6), ua(3, 20);
+    double worst_R = 0, worst_t = 0;
+    for (int trial = 0; trial < 8; trial++) {
+        Vec3 ax = { g(RNG), g(RNG), g(RNG) };
+        double n = std::sqrt(ax[0]*ax[0]+ax[1]*ax[1]+ax[2]*ax[2]);
+        ax = { ax[0]/n, ax[1]/n, ax[2]/n };
+        const double ang = ua(RNG) * M_PI/180.0, c = std::cos(ang), s = std::sin(ang), v = 1-c;
+        Mat3 R{ { c+ax[0]*ax[0]*v, ax[0]*ax[1]*v-ax[2]*s, ax[0]*ax[2]*v+ax[1]*s,
+                  ax[1]*ax[0]*v+ax[2]*s, c+ax[1]*ax[1]*v, ax[1]*ax[2]*v-ax[0]*s,
+                  ax[2]*ax[0]*v-ax[1]*s, ax[2]*ax[1]*v+ax[0]*s, c+ax[2]*ax[2]*v } };
+        Vec3 t = { ut(RNG), ut(RNG), ut(RNG) };
+        std::vector<double> px(2*N), z(N);
+        project(Pw.data(), N, R, t, K, px.data(), z.data());
+        bool behind = false; for (int i = 0; i < N; i++) if (z[i] <= 0) behind = true;
+        if (behind) continue;
+        std::vector<char> inl;
+        Mat4 c2w = solve_pnp_numpy(Pw.data(), px.data(), N, K, inl);
+        Mat4 w2c = fsla::inv_rigid4(c2w);
+        Mat3 Rr{}; Vec3 tr{};
+        for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); }
+        worst_R = std::max(worst_R, (double) rot_angle(mul(Rr, T(R))));
+        worst_t = std::max(worst_t, std::sqrt((tr[0]-t[0])*(tr[0]-t[0])+(tr[1]-t[1])*(tr[1]-t[1])+(tr[2]-t[2])*(tr[2]-t[2])));
+    }
+    char buf[64]; std::snprintf(buf, sizeof buf, "worst %.2e deg", worst_R);
+    check("rotation recovered", worst_R < 1e-3, buf);
+    check("translation recovered", worst_t < 1e-4);
+}
+
+static void test_pnp_outliers() {
+    std::printf("test_pnp_outliers\n");
+    const int N = 600; Mat3 K = make_K(400, 512, 512);
+    std::uniform_real_distribution<double> ux(-2,2), uy(-2,2), uz(4,8);
+    std::vector<double> Pw(3*N);
+    for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=uz(RNG); }
+    Mat3 R = rand_rotation();
+    // shrink towards identity so all points stay in front
+    for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i,j) = (i==j?1.0:0.0)*0.85 + R(i,j)*0.15;
+    { fsla::Mat3 U,V; Vec3 s; fsla::svd3(R, U, s, V); R = mul(U, T(V)); if (fsla::det3(R)<0) for(int i=0;i<3;i++) R(i,2)=-R(i,2); }
+    Vec3 t = { 0.3, -0.2, 0.4 };
+    std::vector<double> px(2*N), z(N);
+    project(Pw.data(), N, R, t, K, px.data(), z.data());
+    // keep only points in front
+    std::vector<double> Pk, pk;
+    for (int i = 0; i < N; i++) if (z[i] > 0) { for(int c=0;c<3;c++) Pk.push_back(Pw[3*i+c]); pk.push_back(px[2*i]); pk.push_back(px[2*i+1]); }
+    const int M = (int) pk.size()/2;
+    const int n_out = (int)(0.25 * M);
+    std::vector<int> oidx(M); for (int i = 0; i < M; i++) oidx[i]=i;
+    std::shuffle(oidx.begin(), oidx.end(), RNG);
+    std::uniform_real_distribution<double> corrupt(-150, 150);
+    for (int k = 0; k < n_out; k++) { int i = oidx[k]; pk[2*i]+=corrupt(RNG); pk[2*i+1]+=corrupt(RNG); }
+    std::vector<char> inl;
+    Mat4 c2w = solve_pnp_numpy(Pk.data(), pk.data(), M, K, inl, 2.0, 300);
+    Mat4 w2c = fsla::inv_rigid4(c2w);
+    Mat3 Rr{}; Vec3 tr{};
+    for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); }
+    char buf[64]; std::snprintf(buf, sizeof buf, "%.3f deg", (double) rot_angle(mul(Rr, T(R))));
+    check("rotation under outliers", rot_angle(mul(Rr, T(R))) < 0.2, buf);
+    check("translation under outliers", std::sqrt((tr[0]-t[0])*(tr[0]-t[0])+(tr[1]-t[1])*(tr[1]-t[1])+(tr[2]-t[2])*(tr[2]-t[2])) < 0.02);
+    int rejected = 0; for (int k = 0; k < n_out; k++) if (!inl[oidx[k]]) rejected++;
+    check("outliers rejected", rejected > 0.85 * n_out);
+}
+
+// ===========================================================================
+// Robust PnP: EPnP + Gauss-Newton (the shipped real-data solver)
+// ===========================================================================
+
+static void test_pnp_robust_recovery() {
+    std::printf("test_pnp_robust_recovery\n");
+    const int N = 500; Mat3 K = make_K(400, 512, 512);
+    std::uniform_real_distribution<double> ux(-2,2), uy(-2,2), uz(4,8);
+    std::vector<double> Pw(3*N);
+    for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=uz(RNG); }
+    std::normal_distribution<double> g(0, 1);
+    std::uniform_real_distribution<double> ut(-0.6, 0.6);
+    double worst_R = 0, worst_t = 0;
+    for (int trial = 0; trial < 8; trial++) {
+        Mat3 R = rand_rotation();
+        for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i,j) = (i==j?1.0:0.0)*0.8 + R(i,j)*0.2;
+        { fsla::Mat3 U,V; Vec3 s; fsla::svd3(R,U,s,V); R = mul(U,T(V)); if (fsla::det3(R)<0) for(int i=0;i<3;i++) R(i,2)=-R(i,2); }
+        Vec3 t = { ut(RNG), ut(RNG), ut(RNG) };
+        std::vector<double> px(2*N), z(N);
+        project(Pw.data(), N, R, t, K, px.data(), z.data());
+        bool behind = false; for (int i = 0; i < N; i++) if (z[i] <= 0) behind = true;
+        if (behind) continue;
+        std::vector<char> inl;
+        Mat4 c2w = solve_pnp(Pw.data(), px.data(), N, K, inl);
+        Mat4 w2c = fsla::inv_rigid4(c2w);
+        Mat3 Rr{}; Vec3 tr{};
+        for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); }
+        worst_R = std::max(worst_R, (double) rot_angle(mul(Rr, T(R))));
+        worst_t = std::max(worst_t, std::sqrt((tr[0]-t[0])*(tr[0]-t[0])+(tr[1]-t[1])*(tr[1]-t[1])+(tr[2]-t[2])*(tr[2]-t[2])));
+    }
+    char buf[64]; std::snprintf(buf, sizeof buf, "worst %.2e deg", worst_R);
+    check("rotation recovered", worst_R < 1e-3, buf);
+    check("translation recovered", worst_t < 1e-5);
+}
+
+static void test_pnp_robust_planar() {
+    // Near-planar scene (a thin slab) — the config where the DLT's coplanar
+    // minimal samples flip. EPnP's all-point control basis stays well-posed.
+    std::printf("test_pnp_robust_planar\n");
+    const int N = 500; Mat3 K = make_K(400, 512, 512);
+    std::uniform_real_distribution<double> ux(-2,2), uy(-2,2);
+    std::normal_distribution<double> thin(0, 0.02);              // ~planar, slight thickness
+    std::vector<double> Pw(3*N);
+    for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=6.0 + thin(RNG); }
+    Mat3 R = rand_rotation();
+    for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i,j) = (i==j?1.0:0.0)*0.85 + R(i,j)*0.15;
+    { fsla::Mat3 U,V; Vec3 s; fsla::svd3(R,U,s,V); R = mul(U,T(V)); if (fsla::det3(R)<0) for(int i=0;i<3;i++) R(i,2)=-R(i,2); }
+    Vec3 t = { 0.2, -0.1, 0.3 };
+    std::vector<double> px(2*N), z(N);
+    project(Pw.data(), N, R, t, K, px.data(), z.data());
+    std::vector<char> inl;
+    Mat4 w2c = fsla::inv_rigid4(solve_pnp(Pw.data(), px.data(), N, K, inl));
+    Mat3 Rr{}; Vec3 tr{};
+    for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); }
+    char buf[64]; std::snprintf(buf, sizeof buf, "%.4f deg", (double) rot_angle(mul(Rr, T(R))));
+    check("planar rotation recovered", rot_angle(mul(Rr, T(R))) < 0.05, buf);
+    check("planar translation recovered", std::sqrt((tr[0]-t[0])*(tr[0]-t[0])+(tr[1]-t[1])*(tr[1]-t[1])+(tr[2]-t[2])*(tr[2]-t[2])) < 0.01);
+}
+
+static void test_pnp_robust_outliers() {
+    std::printf("test_pnp_robust_outliers\n");
+    const int N = 600; Mat3 K = make_K(400, 512, 512);
+    std::uniform_real_distribution<double> ux(-2,2), uy(-2,2), uz(4,8);
+    std::vector<double> Pw(3*N);
+    for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=uz(RNG); }
+    Mat3 R = rand_rotation();
+    for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i,j) = (i==j?1.0:0.0)*0.85 + R(i,j)*0.15;
+    { fsla::Mat3 U,V; Vec3 s; fsla::svd3(R,U,s,V); R = mul(U,T(V)); if (fsla::det3(R)<0) for(int i=0;i<3;i++) R(i,2)=-R(i,2); }
+    Vec3 t = { 0.3, -0.2, 0.4 };
+    std::vector<double> px(2*N), z(N);
+    project(Pw.data(), N, R, t, K, px.data(), z.data());
+    std::vector<double> Pk, pk;
+    for (int i = 0; i < N; i++) if (z[i] > 0) { for(int c=0;c<3;c++) Pk.push_back(Pw[3*i+c]); pk.push_back(px[2*i]); pk.push_back(px[2*i+1]); }
+    const int M = (int) pk.size()/2;
+    const int n_out = (int)(0.15 * M);                          // 15% gross pixel corruption
+    std::vector<int> oidx(M); for (int i = 0; i < M; i++) oidx[i]=i;
+    std::shuffle(oidx.begin(), oidx.end(), RNG);
+    std::uniform_real_distribution<double> corrupt(-120, 120);
+    for (int k = 0; k < n_out; k++) { int i = oidx[k]; pk[2*i]+=corrupt(RNG); pk[2*i+1]+=corrupt(RNG); }
+    std::vector<char> inl;
+    Mat4 w2c = fsla::inv_rigid4(solve_pnp(Pk.data(), pk.data(), M, K, inl, 3.0));
+    Mat3 Rr{}; Vec3 tr{};
+    for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); }
+    char buf[64]; std::snprintf(buf, sizeof buf, "%.3f deg", (double) rot_angle(mul(Rr, T(R))));
+    check("rotation under outliers", rot_angle(mul(Rr, T(R))) < 0.5, buf);
+    int rejected = 0; for (int k = 0; k < n_out; k++) if (!inl[oidx[k]]) rejected++;
+    check("outliers rejected", rejected > 0.85 * n_out);
+}
+
+// ===========================================================================
+// Accumulation / fusion / loop closure (the orchestration on top of the
+// primitives, mirrors accumulate.py / fuse.py / loop_closure.py)
+// ===========================================================================
+
+// Synthetic multi-frame clip with KNOWN geometry: a pinhole camera on a smooth
+// trajectory; each pair's gaussians are exact back-projections (engine channel
+// layout), and consecutive runs are rescaled by a distinct per-run factor (the
+// 1/baseline normalization that makes cross-run alignment a *similarity*). The
+// accumulator must recover the camera trajectory (up to a global Sim(3)) and the
+// per-link scale ratios.
+static Mat4 cam2world_frame(int i) {
+    const double ang = i * 4.0 * M_PI / 180.0;                 // small yaw per frame
+    const double c = std::cos(ang), s = std::sin(ang);
+    Mat4 M = fsla::mat4_identity();
+    M(0,0)=c; M(0,2)=s; M(2,0)=-s; M(2,2)=c;
+    M(0,3) = i * 0.4;                                          // translate along x
+    return M;
+}
+static double depth_at(int i, int r, int c) {
+    return 5.0 + 0.3*std::sin(0.7*i) + 0.05*r - 0.03*c
+         + 0.4 * (((r*7 + c*13 + i*5) % 11) / 11.0);           // non-coplanar spread
+}
+
+static void test_accumulate_chain() {
+    std::printf("test_accumulate_chain\n");
+    const int H = 16, W = 16, gc = 23, P = H*W, F = 5;        // 5 frames -> 4 pairs
+    const double f = 18.0;
+    Mat3 K = make_K(f, W, H), Kinv = fsla::inv3(K);
+    auto backproj = [&](int i, int r, int c) -> Vec3 {
+        const double d = depth_at(i, r, c);
+        Vec3 uv = { (double) c, (double) r, 1.0 };
+        Vec3 n = fsla::mat3_apply(Kinv, uv);
+        return { n[0]*d, n[1]*d, n[2]*d };
+    };
+
+    Accumulator acc(H, W, /*opacity_threshold=*/0.05);
+    std::vector<double> sigma(F);
+    for (int k = 0; k < F-1; k++) {                           // pair (k, k+1) -> run k
+        const double sk = 1.0 + 0.1*k;                        // distinct per-run scale
+        sigma[k] = sk;
+        Mat4 w2c_k = fsla::inv_rigid4(cam2world_frame(k));
+        Mat4 rel = fsla::mat4_mul(w2c_k, cam2world_frame(k+1));   // cam_k <- cam_{k+1}
+        std::vector<float> buf((size_t) 2 * P * gc, 0.0f);
+        for (int r = 0; r < H; r++) for (int c = 0; c < W; c++) {
+            const int i = r*W + c;
+            Vec3 X0 = backproj(k, r, c);                      // camera-k frame
+            Vec3 Xn = backproj(k+1, r, c);                    // camera-(k+1) frame
+            Vec3 X1 = apply_mat4(rel, Xn);                    // -> camera-k frame
+            float * v0 = &buf[(size_t) i * gc];
+            float * v1 = &buf[(size_t) (P + i) * gc];
+            for (int t = 0; t < 3; t++) { v0[t] = (float)(sk*X0[t]); v1[t] = (float)(sk*X1[t]); }
+            v0[15] = v1[15] = 0.9f;                           // opacity (activated)
+            for (int t = 0; t < 3; t++) { v0[3+t] = 0.1f*t; v1[3+t] = 0.1f*t; } // color
+        }
+        acc.add_pair(buf.data(), gc);
+    }
+
+    check("pair count", acc.pair_count() == F-1);
+    check("frame count", acc.frame_count() == F);
+    check("cloud non-empty", acc.cloud().size() == (size_t) F * P);   // all opacities valid
+
+    // per-link scale recovers sigma_{k-1}/sigma_k
+    bool scales_ok = true; double worst_scale = 0;
+    for (int k = 1; k < F-1; k++) {
+        const double expect = sigma[k-1]/sigma[k];
+        worst_scale = std::max(worst_scale, std::fabs(acc.links()[k].scale - expect));
+        scales_ok &= std::fabs(acc.links()[k].scale - expect) < 1e-6;
+    }
+    { char buf[64]; std::snprintf(buf, sizeof buf, "worst |Δs| %.2e", worst_scale);
+      check("per-link scale recovered", scales_ok, buf); }
+    double worst_resid = 0;
+    for (int k = 1; k < F-1; k++) worst_resid = std::max(worst_resid, acc.links()[k].resid_frac);
+    { char buf[64]; std::snprintf(buf, sizeof buf, "%.2e", worst_resid);
+      check("per-link residual ~0", worst_resid < 1e-6, buf); }
+
+    // recovered trajectory vs GT, after a global Sim(3) alignment (ATE)
+    std::vector<Mat4> path = acc.camera_path();
+    std::vector<double> rec(3*F), gt(3*F);
+    for (int i = 0; i < F; i++) {
+        rec[3*i]=path[i](0,3); rec[3*i+1]=path[i](1,3); rec[3*i+2]=path[i](2,3);
+        Mat4 g = cam2world_frame(i);
+        gt[3*i]=g(0,3); gt[3*i+1]=g(1,3); gt[3*i+2]=g(2,3);
+    }
+    Sim3 A = fit_similarity(rec.data(), gt.data(), F);
+    std::vector<double> rec_al(3*F);
+    apply_sim_arr(A.s, A.R, A.t, rec.data(), F, rec_al.data());
+    double mg[3]={0,0,0}; for (int i=0;i<F;i++) for(int c=0;c<3;c++) mg[c]+=gt[3*i+c];
+    for (int c=0;c<3;c++) mg[c]/=F;
+    double ext=0; for(int i=0;i<F;i++) for(int c=0;c<3;c++){double d=gt[3*i+c]-mg[c]; ext+=d*d;}
+    ext = std::sqrt(ext/F);
+    const double ate = rms3(rec_al.data(), gt.data(), F) / ext;
+    { char buf[64]; std::snprintf(buf, sizeof buf, "ATE %.2e of extent", ate);
+      check("trajectory recovered", ate < 1e-3, buf); }
+}
+
+static void test_consensus_fuse() {
+    std::printf("test_consensus_fuse\n");
+    std::vector<AccumPoint> cloud;
+    // 5 surface points, each corroborated by frames {0,1,2} (tight cluster -> one
+    // voxel, support 3); 20 isolated single-frame floaters far apart (support 1).
+    // AccumPoint: {xyz, rgb, opacity, scale[3], quat(w,x,y,z), frame}; opacity/
+    // scale/quat are unused by fusion's counts here, set to a unit opaque gaussian.
+    for (int loc = 0; loc < 5; loc++)
+        for (int fr = 0; fr < 3; fr++)
+            cloud.push_back({ (float)(2.0*loc) + 0.001f*fr, 0.0f, 0.0f, 0.5f, 0.5f, 0.5f,
+                              1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, fr });
+    for (int j = 0; j < 20; j++)
+        cloud.push_back({ 40.0f + 2.0f*j, 0.0f, 0.0f, 0.2f, 0.2f, 0.2f,
+                          1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, j % 4 });
+
+    std::vector<AccumPoint> fused;
+    FuseStats st = consensus_fuse(cloud, /*voxel_frac=*/0.02, /*k=*/2, fused, FUSE_AVERAGED);
+    check("raw points", st.raw_points == 35);
+    check("kept voxels (the surface)", st.kept_voxels == 5);
+    check("fused size (averaged: one per voxel)", fused.size() == 5);
+    check("points kept", st.points_kept == 15);
+    check("floaters dropped", st.points_dropped == 20);
+    // FUSE_KEPT: dense — every raw point in a consensus voxel, floaters dropped
+    std::vector<AccumPoint> kept;
+    consensus_fuse(cloud, /*voxel_frac=*/0.02, /*k=*/2, kept, FUSE_KEPT);
+    check("kept mode: dense (all 15 consensus raw points)", kept.size() == 15);
+    // FUSE_BEST: per voxel only the most-confident frame. Each surface voxel has 3
+    // frames with equal opacity (0.5) so frame 0 (first seen) wins the tie -> 5 points.
+    std::vector<AccumPoint> best;
+    consensus_fuse(cloud, /*voxel_frac=*/0.02, /*k=*/2, best, FUSE_BEST);
+    check("best mode: one frame per voxel (5 points)", best.size() == 5);
+    // a fused point sits at its cluster centroid (~2*loc), color preserved
+    bool centroid_ok = true;
+    for (const auto & p : fused) {
+        const double loc = std::round(p.x / 2.0);
+        centroid_ok &= std::fabs(p.x - 2.0*loc) < 0.01 && std::fabs(p.r - 0.5) < 1e-6;
+    }
+    check("fused centroid + color", centroid_ok);
+}
+
+// Gaussian-level de-ghost: a base scene seen by 3 frames through small per-object
+// misalignments ghosts; consensus_refine (non-rigid, per-point) must pull the
+// overlapping copies together.
+static void test_consensus_refine() {
+    std::printf("test_consensus_refine\n");
+    // well-separated base points (a grid, spacing 1) so each object's 3 copies share
+    // a voxel but distinct objects don't — the real (2D surface) regime.
+    const int side = 4, M = side*side*side, Fr = 3;
+    std::normal_distribution<double> g(0, 1);
+    std::vector<double> base(3*M);
+    { int idx=0; for (int a=0;a<side;a++) for (int b=0;b<side;b++) for (int c=0;c<side;c++) {
+        base[3*idx]=a; base[3*idx+1]=b; base[3*idx+2]=c; idx++; } }
+    auto small_rot = [&](double deg) {
+        Vec3 ax = { g(RNG), g(RNG), g(RNG) };
+        double n = std::sqrt(ax[0]*ax[0]+ax[1]*ax[1]+ax[2]*ax[2]); ax={ax[0]/n,ax[1]/n,ax[2]/n};
+        const double a=deg*M_PI/180, c=std::cos(a), s=std::sin(a), v=1-c, x=ax[0], y=ax[1], z=ax[2];
+        return Mat3{ { c+x*x*v, x*y*v-z*s, x*z*v+y*s,  y*x*v+z*s, c+y*y*v, y*z*v-x*s,  z*x*v-y*s, z*y*v+x*s, c+z*z*v } };
+    };
+    Sim3 P[3] = { sim_identity(),
+                  { 1.0, small_rot(1.0), { 0.04, -0.03, 0.035 } },
+                  { 1.0, small_rot(1.0), { -0.035, 0.04, -0.03 } } };
+    std::vector<AccumPoint> cloud(Fr*M);
+    for (int f = 0; f < Fr; f++) for (int i = 0; i < M; i++) {
+        Vec3 b = { base[3*i], base[3*i+1], base[3*i+2] };
+        Vec3 w = sim_apply(P[f], b);
+        AccumPoint p{}; p.x=(float)w[0]; p.y=(float)w[1]; p.z=(float)w[2];
+        p.opacity=1.0f; p.sx=p.sy=p.sz=0.01f; p.qw=1.0f; p.frame=f;
+        cloud[(size_t)f*M + i] = p;
+    }
+    double mean[3]={0,0,0}; for (auto&p:cloud){ mean[0]+=p.x;mean[1]+=p.y;mean[2]+=p.z; }
+    for (int c=0;c<3;c++) mean[c]/=cloud.size();
+    double ss=0; for (auto&p:cloud){ double dx=p.x-mean[0],dy=p.y-mean[1],dz=p.z-mean[2]; ss+=dx*dx+dy*dy+dz*dz; }
+    const double ext = std::sqrt(ss/cloud.size());
+    auto spread = [&](const std::vector<AccumPoint>& c) {
+        double s=0;
+        for (int i=0;i<M;i++) { double cx=0,cy=0,cz=0;
+            for (int f=0;f<Fr;f++){ cx+=c[(size_t)f*M+i].x; cy+=c[(size_t)f*M+i].y; cz+=c[(size_t)f*M+i].z; }
+            cx/=Fr; cy/=Fr; cz/=Fr;
+            for (int f=0;f<Fr;f++){ const auto&p=c[(size_t)f*M+i]; s += (p.x-cx)*(p.x-cx)+(p.y-cy)*(p.y-cy)+(p.z-cz)*(p.z-cz); } }
+        return std::sqrt(s/(M*Fr))/ext;
+    };
+    const double pre = spread(cloud);
+    consensus_refine(cloud, /*voxel_frac=*/0.1, /*iters=*/12, /*alpha=*/0.6);
+    const double post = spread(cloud);
+    char buf[80]; std::snprintf(buf, sizeof buf, "ghosting %.3f%% -> %.3f%% of extent", 100*pre, 100*post);
+    check("ghosting present before", pre > 0.01);
+    check("consensus_refine de-ghosts", post < 0.4 * pre, buf);
+}
+
+static void test_loop_distribute() {
+    std::printf("test_loop_distribute\n");
+    Mat3 R = rand_rotation();
+    Mat4 D = sim_matrix({ 1.15, R, {0.4, -0.2, 0.1} });
+    // sim4_invert is a true inverse of a similarity 4x4
+    Mat4 S = sim_matrix({ 1.3, rand_rotation(), {1.0, -2.0, 0.5} });
+    Mat4 SI = fsla::mat4_mul(sim4_invert(S), S);
+    bool inv_ok = true; for (int i = 0; i < 16; i++) inv_ok &= std::fabs(SI.a[i] - fsla::mat4_identity().a[i]) < 1e-9;
+    check("sim4_invert is an inverse", inv_ok);
+
+    // a clean loop, drifted by D^{-k/n}; distribute_drift(D) must cancel it.
+    const int n = 8;
+    std::vector<Mat4> poses(n+1);
+    std::vector<Vec3> clean(n+1);
+    for (int k = 0; k <= n; k++) {
+        const double tt = 2*M_PI * k / n;
+        clean[k] = { std::cos(tt), std::sin(tt), 0.1*tt };
+        Vec3 drifted = apply_mat4(sim_frac_power(D, -(double)k/n), clean[k]);
+        Mat4 Pk = fsla::mat4_identity();
+        Pk(0,3)=drifted[0]; Pk(1,3)=drifted[1]; Pk(2,3)=drifted[2];
+        poses[k] = Pk;
+    }
+    std::vector<Vec3> corr = distribute_drift(D, poses);
+    double post = 0;
+    for (int k = 0; k <= n; k++) for (int c = 0; c < 3; c++) post += (corr[k][c]-clean[k][c])*(corr[k][c]-clean[k][c]);
+    char buf[64]; std::snprintf(buf, sizeof buf, "ATE %.1e", std::sqrt(post/(n+1)));
+    check("distribute_drift recovers clean loop", std::sqrt(post/(n+1)) < 1e-9, buf);
+}
+
+// ===========================================================================
+// Regression guards for the gaussian -> .splat seam (the class of bug that
+// dropped rotation, then opacity, from the accumulated cloud writer)
+// ===========================================================================
+
+// #1 guard: pin the shared encoder's byte output. Both writers (single-run and
+// accumulated cloud) call this; if it ever drops a channel or changes the
+// convention the bytes change here. The opacity->alpha byte is the exact field
+// the regression got wrong (it had been forced to 255).
+static void test_splat_record() {
+    std::printf("test_splat_record\n");
+    const float pos[3]   = { 1.0f, 2.0f, 3.0f };
+    const float scale[3] = { 0.01f, 0.02f, 0.03f };
+    const float quat[4]  = { 0.70710678f, 0.70710678f, 0.0f, 0.0f };  // (w,x,y,z), unit
+    const float rgb[3]   = { 0.2f, 0.4f, 0.6f };
+    unsigned char rec[32];
+    free_splatter::encode_splat_record(rec, pos, scale, quat, rgb, 0.5f);
+
+    float p[3], sc[3]; std::memcpy(p, rec, 12); std::memcpy(sc, rec+12, 12);
+    check("pos: OpenCV->GL flips y,z", p[0]==1.0f && p[1]==-2.0f && p[2]==-3.0f);
+    check("scale: passthrough", sc[0]==0.01f && sc[1]==0.02f && sc[2]==0.03f);
+    check("rgb bytes", rec[24]==(unsigned char)(0.2f*255.0f) &&
+                       rec[25]==(unsigned char)(0.4f*255.0f) &&
+                       rec[26]==(unsigned char)(0.6f*255.0f));
+    // THE regression field: opacity must become the alpha, not a forced 255.
+    char buf[48]; std::snprintf(buf, sizeof buf, "alpha=%d (must be 127, not 255)", rec[27]);
+    check("opacity -> alpha (not forced opaque)", rec[27] == 127, buf);
+    // rotation must be encoded with the (w,x,y,z)->(-x,w,-z,y) remap, not dropped.
+    // q=(0.7071,0.7071,0,0) -> (-0.7071,0.7071,0,0) -> bytes (37,218,128,128).
+    check("rotation: remap encoded (not identity)",
+          rec[28]==37 && rec[29]==218 && rec[30]==128 && rec[31]==128);
+    // boundary opacities
+    unsigned char r1[32], r0[32];
+    free_splatter::encode_splat_record(r1, pos, scale, quat, rgb, 1.0f);
+    free_splatter::encode_splat_record(r0, pos, scale, quat, rgb, 0.0f);
+    check("alpha boundaries (1->255, 0->0)", r1[27]==255 && r0[27]==0);
+}
+
+// #2 guard: run-0 accumulation must preserve EVERY gaussian channel. With one pair
+// the global transform is identity, so each AccumPoint should equal the engine's
+// gaussian for that pixel (xyz, SH->rgb, opacity, scale, rotation, frame). A
+// dropped channel — the exact bug — fails here, asset-free.
+static void test_accumulate_channels() {
+    std::printf("test_accumulate_channels\n");
+    const int H = 4, W = 4, gc = 23, P = H*W;
+    std::vector<float> g((size_t) 2 * P * gc, 0.0f);
+    for (int v = 0; v < 2; v++) for (int i = 0; i < P; i++) {
+        float * a = &g[(size_t)(v*P + i) * gc];
+        const float b = (float)(v*100 + i);
+        a[0]=b+0.11f; a[1]=b+0.22f; a[2]=b+0.33f;               // xyz (distinct)
+        a[3]=0.5f; a[4]=-0.5f; a[5]=1.0f;                       // SH-DC -> in-range rgb
+        a[15]=0.2f + 0.5f*((i%3)/2.0f);                         // opacity in {0.2,0.45,0.7} > thr
+        a[16]=0.01f+0.001f*i; a[17]=0.02f; a[18]=0.03f;         // scale (distinct)
+        a[19]=0.8f; a[20]=0.6f; a[21]=0.0f; a[22]=0.0f;         // unit quat (w,x,y,z)
+    }
+    Accumulator acc(H, W, /*opacity_threshold=*/0.05);
+    acc.add_pair(g.data(), gc);
+    const auto & cloud = acc.cloud();
+    check("run-0 cloud = all pixels (both views kept)", cloud.size() == (size_t) 2 * P);
+
+    const double C0 = 0.28209479177387814;
+    bool ok = cloud.size() == (size_t) 2 * P;
+    auto cl = [](float x, float y) { return std::fabs(x - y) < 1e-5f; };
+    for (int v = 0; v < 2 && ok; v++) for (int i = 0; i < P && ok; i++) {
+        const float * a = &g[(size_t)(v*P + i) * gc];
+        const AccumPoint & p = cloud[(size_t) v*P + i];
+        ok &= cl(p.x,a[0]) && cl(p.y,a[1]) && cl(p.z,a[2]);                 // xyz (identity sim)
+        float er=(float)(a[3]*C0+0.5), eg=(float)(a[4]*C0+0.5), eb=(float)(a[5]*C0+0.5);
+        ok &= cl(p.r,er) && cl(p.g,eg) && cl(p.b,eb);                       // SH -> rgb
+        ok &= cl(p.opacity,a[15]);                                          // opacity (the dropped one)
+        ok &= cl(p.sx,a[16]) && cl(p.sy,a[17]) && cl(p.sz,a[18]);           // scale (T.s=1)
+        ok &= cl(p.qw,a[19]) && cl(p.qx,a[20]) && cl(p.qy,a[21]) && cl(p.qz,a[22]); // rotation
+        ok &= (p.frame == v);                                              // source frame
+    }
+    check("every gaussian channel survives run-0 accumulation", ok);
+}
+
+// Golden: parallax_stats on synthetic cameras with hand-computed angles. A
+// lateral (strafe) baseline yields a large triangulation/lateral angle; a pure
+// forward (dolly) baseline of the SAME length yields ~0 — the whole point of the
+// metric (it measures depth-resolving motion, not raw camera displacement).
+static void test_parallax_geometry() {
+    std::printf("test_parallax_geometry\n");
+    const double DEG = 180.0 / 3.14159265358979323846;
+    const Vec3 axis0 = { 0, 0, 1 };                 // view-0 looks down +z
+    // points at depth 1 (a few, all same depth -> clean medians)
+    const int N = 5;
+    std::vector<float> pts(3 * N);
+    for (int i = 0; i < N; i++) { pts[3*i]=0.0f; pts[3*i+1]=0.0f; pts[3*i+2]=1.0f; }
+
+    // (1) pure strafe: baseline 0.1 along x, perpendicular to the optical axis
+    Parallax s = parallax_stats({0,0,0}, {0.1,0,0}, axis0, pts.data(), nullptr, N, 0.0);
+    const double tri_expect = std::atan(0.1) * DEG;            // 5.7106 deg
+    char b1[96]; std::snprintf(b1, sizeof b1, "strafe lateral=%.3f tri=%.3f (expect 90, %.3f)", s.lateral_angle_deg, s.tri_angle_deg, tri_expect);
+    check("strafe -> lateral angle ~90 deg", std::fabs(s.lateral_angle_deg - 90.0) < 1e-3, b1);
+    check("strafe -> tri angle = atan(B/Z)", std::fabs(s.tri_angle_deg - tri_expect) < 1e-3, b1);
+    check("strafe -> B/Z = 0.1", std::fabs(s.baseline_over_depth - 0.1) < 1e-4);
+    check("strafe -> median depth = 1", std::fabs(s.median_depth - 1.0) < 1e-4);
+
+    // (2) pure dolly: SAME baseline length 0.1 but along the optical axis
+    Parallax d = parallax_stats({0,0,0}, {0,0,0.1}, axis0, pts.data(), nullptr, N, 0.0);
+    char b2[96]; std::snprintf(b2, sizeof b2, "dolly lateral=%.3f tri=%.3f (expect ~0)", d.lateral_angle_deg, d.tri_angle_deg);
+    check("dolly -> lateral angle ~0 deg", std::fabs(d.lateral_angle_deg) < 1e-3, b2);
+    check("dolly -> tri angle ~0 deg (no parallax)", std::fabs(d.tri_angle_deg) < 1e-3, b2);
+    check("dolly baseline equals strafe baseline", std::fabs(d.baseline - s.baseline) < 1e-6);
+}
+
+int main() {
+    test_similarity_roundtrip();
+    test_parallax_geometry();
+    test_scale_detection();
+    test_nonlinear_detection();
+    test_outlier_robustness();
+    test_loop_correction();
+    test_loop_closure();
+    test_focal_recovery();
+    test_pnp_recovery();
+    test_pnp_outliers();
+    test_pnp_robust_recovery();
+    test_pnp_robust_planar();
+    test_pnp_robust_outliers();
+    test_accumulate_chain();
+    test_consensus_fuse();
+    test_consensus_refine();
+    test_loop_distribute();
+    test_splat_record();
+    test_accumulate_channels();
+    std::printf(failures ? "\ntest_pose: %d FAILURES\n" : "\ntest_pose: ok\n", failures);
+    return failures ? 1 : 0;
+}
diff --git a/tools/free_splatter-cli.cpp b/tools/free_splatter-cli.cpp
index 495f15a..36aee40 100644
--- a/tools/free_splatter-cli.cpp
+++ b/tools/free_splatter-cli.cpp
@@ -8,6 +8,7 @@
 //                     [--opacity-threshold T] [--max-splats N] [--dump-taps DIR]
 //                     MODEL.gguf  (IMAGES... | INPUT.f32)
 #include "free_splatter.h"
+#include "splat.h"              // the single shared .splat record encoder
 
 #include "stb_image.h"          // implementation in tools/stb_impl.cpp
 #include "stb_image_resize2.h"
@@ -70,42 +71,93 @@ static bool write_splat(const float * g, size_t n, int gc, float opac_thr,
 
     std::ofstream f(path, std::ios::binary);
     if (!f) { std::fprintf(stderr, "cannot write %s\n", path); return false; }
-    auto u8 = [](float v) -> unsigned char { float t = v < 0 ? 0 : v > 255 ? 255 : v; return (unsigned char) t; };
     for (size_t k = 0; k < m; k++) {
         const float * x = &g[keep[k].second * gc];
-        // FreeSplatter's reference frame is OpenCV (y down, z forward); convert
-        // to the viewer's OpenGL convention (y up) via a 180deg rotation about X
-        // = diag(1,-1,-1): position.yz negate, quaternion (w,x,y,z)->(-x,w,-z,y).
-        float pos[3]   = { x[0], -x[1], -x[2] };
-        float scale[3] = { x[16], x[17], x[18] };
-        unsigned char rgba[4], rot[4];
-        for (int c = 0; c < 3; c++) { float v = 0.5f + (float) C0 * x[3+c]; rgba[c] = u8((v<0?0:v>1?1:v) * 255.0f); }
-        rgba[3] = u8(std::min(std::max(x[15], 0.0f), 1.0f) * 255.0f);
-        float q[4] = { -x[20], x[19], -x[22], x[21] };
-        float nrm = std::sqrt(q[0]*q[0]+q[1]*q[1]+q[2]*q[2]+q[3]*q[3]) + 1e-12f;
-        for (int c = 0; c < 4; c++) rot[c] = u8(q[c]/nrm * 128.0f + 128.0f);
-        f.write((const char *) pos, 12);
-        f.write((const char *) scale, 12);
-        f.write((const char *) rgba, 4);
-        f.write((const char *) rot, 4);
+        const float pos[3]   = { x[0], x[1], x[2] };           // OpenCV; the encoder flips y,z
+        const float scale[3] = { x[16], x[17], x[18] };
+        const float quat[4]  = { x[19], x[20], x[21], x[22] };  // (w,x,y,z)
+        const float rgb[3]   = { 0.5f + (float) C0 * x[3], 0.5f + (float) C0 * x[4], 0.5f + (float) C0 * x[5] };
+        unsigned char rec[32];
+        free_splatter::encode_splat_record(rec, pos, scale, quat, rgb, x[15]);
+        f.write((const char *) rec, 32);
     }
     std::printf("wrote %s: %zu splats (pruned/cap of %zu kept)\n", path, m, keep.size());
     return true;
 }
 
+// Write an accumulated gaussian cloud as an antimatter15 .splat, emitting each
+// point's true anisotropic scale + rotation (carried through the Sim(3)) AND its
+// activated opacity as the splat alpha — so it renders exactly like the single-run
+// write_splat (proper alpha blending), not a fully-opaque, swirling, blurry soup.
+// scale_mult scales the radii (1.0 = as-predicted). When capping, keep the most
+// important splats (opacity * volume), matching write_splat (not a uniform stride).
+static bool write_cloud_splat(const free_splatter_point * pts, size_t n, size_t max_splats,
+                              float scale_mult, const char * path) {
+    auto importance = [&](size_t i) -> double {
+        const free_splatter_point & p = pts[i];
+        const double vol = (double) std::max(p.sx,1e-9f) * std::max(p.sy,1e-9f) * std::max(p.sz,1e-9f);
+        return (double) std::max(p.opacity, 0.0f) * vol;
+    };
+    std::vector<size_t> idx(n);
+    for (size_t i = 0; i < n; i++) idx[i] = i;
+    if (max_splats > 0 && n > max_splats) {
+        std::partial_sort(idx.begin(), idx.begin() + max_splats, idx.end(),
+                          [&](size_t a, size_t b){ return importance(a) > importance(b); });
+        idx.resize(max_splats);
+    }
+
+    std::ofstream f(path, std::ios::binary);
+    if (!f) { std::fprintf(stderr, "cannot write %s\n", path); return false; }
+    for (size_t i : idx) {
+        const free_splatter_point & p = pts[i];
+        const float pos[3]   = { p.x, p.y, p.z };               // OpenCV; the encoder flips y,z
+        const float scale[3] = { scale_mult*p.sx, scale_mult*p.sy, scale_mult*p.sz };
+        const float quat[4]  = { p.qw, p.qx, p.qy, p.qz };      // (w,x,y,z)
+        const float rgb[3]   = { p.r, p.g, p.b };
+        unsigned char rec[32];
+        free_splatter::encode_splat_record(rec, pos, scale, quat, rgb, p.opacity);
+        f.write((const char *) rec, 32);
+    }
+    std::printf("wrote %s: %zu splats (of %zu cloud points)\n", path, idx.size(), n);
+    return true;
+}
+
 static int usage(const char * a0) {
     std::fprintf(stderr,
         "usage: %s [--device DEV] [--splat OUT.splat] [--out OUT.f32]\n"
         "          [--opacity-threshold T] [--max-splats N] [--dump-taps DIR]\n"
-        "          MODEL.gguf (IMAGES... | INPUT.f32)\n", a0);
+        "          MODEL.gguf (IMAGES... | INPUT.f32)\n"
+        "  --device defaults to GPU/vulkan and FAILS if none is present; pass\n"
+        "  --device cpu to run on CPU explicitly.\n"
+        "\n"
+        "accumulate mode (>=2 images -> one world from the photo stream):\n"
+        "  %s --accumulate [--splat-prefix P] [--fuse] [--voxel V] [--fuse-k K]\n"
+        "     [--min-parallax DEG] [--splat-scale S] [--max-splats N] MODEL.gguf IMG0 IMG1 ...\n"
+        "  runs the engine on each consecutive pair, chains the runs (Sim(3)) into\n"
+        "  one accumulating cloud; writes P_<nframes>.splat after each pair (the\n"
+        "  evolving reconstruction) and, with --fuse, a consensus-fused P_fused.splat.\n"
+        "  --min-parallax gates by keyframe: a candidate frame is folded in only if its\n"
+        "  triangulation angle vs the last kept frame is >= DEG (else its depth is\n"
+        "  ill-conditioned); image mode only.\n"
+        "\n"
+        "parallax (depth conditioning of one pair, after-inference):\n"
+        "  %s --parallax MODEL.gguf (IMG0 IMG1 | PAIR.f32)\n",
+        a0, a0, a0);
     return 2;
 }
 
 int main(int argc, char ** argv) {
     const char * device = nullptr, * taps_dir = nullptr, * out_path = nullptr, * splat_path = nullptr;
-    const char * model = nullptr;
+    const char * model = nullptr, * splat_prefix = nullptr;
     float opac_thr = 5e-3f;
     long  max_splats = 0;
+    bool  accumulate = false, fuse = false, parallax = false, tree = false, tree_stages = false;
+    int   tov_block = 4, tov_overlap = 2;        // tree: submap width / shared frames (2/1 = overlap-by-one)
+    int   fuse_mode = 1;                         // 0 averaged, 1 kept, 2 best-frame
+    float voxel = 0.02f, splat_scale = 1.0f;   // multiplier on the predicted gaussian radii
+    int   fuse_k = 2;
+    bool  refine = false; int refine_iters = 8; float refine_voxel = 0.03f, refine_alpha = 0.5f;  // geometric de-ghost
+    float min_parallax = 0.0f;                   // deg; >0 gates accumulate by keyframe parallax
     std::vector<std::string> inputs;
 
     for (int i = 1; i < argc; i++) {
@@ -116,6 +168,24 @@ int main(int argc, char ** argv) {
         else if (a == "--splat" && i+1 < argc)          splat_path = argv[++i];
         else if (a == "--opacity-threshold" && i+1<argc) opac_thr = (float) atof(argv[++i]);
         else if (a == "--max-splats" && i+1 < argc)     max_splats = atol(argv[++i]);
+        else if (a == "--accumulate")                   accumulate = true;
+        else if (a == "--accumulate-tree")            { accumulate = true; tree = true; }
+        else if (a == "--tree-stages")                { accumulate = true; tree_stages = true; }
+        else if (a == "--tree-block" && i+1 < argc)     tov_block = atoi(argv[++i]);
+        else if (a == "--tree-overlap" && i+1 < argc)   tov_overlap = atoi(argv[++i]);
+        else if (a == "--parallax")                     parallax = true;
+        else if (a == "--splat-prefix" && i+1 < argc)   splat_prefix = argv[++i];
+        else if (a == "--fuse")                         fuse = true;
+        else if (a == "--voxel" && i+1 < argc)          voxel = (float) atof(argv[++i]);
+        else if (a == "--fuse-k" && i+1 < argc)         fuse_k = atoi(argv[++i]);
+        else if (a == "--fuse-mode" && i+1 < argc) { std::string m = argv[++i];
+            fuse_mode = (m=="averaged") ? 0 : (m=="best") ? 2 : 1; }
+        else if (a == "--refine")                       refine = true;
+        else if (a == "--refine-iters" && i+1 < argc)   refine_iters = atoi(argv[++i]);
+        else if (a == "--refine-voxel" && i+1 < argc)   refine_voxel = (float) atof(argv[++i]);
+        else if (a == "--refine-alpha" && i+1 < argc)   refine_alpha = (float) atof(argv[++i]);
+        else if (a == "--min-parallax" && i+1 < argc)   min_parallax = (float) atof(argv[++i]);
+        else if (a == "--splat-scale" && i+1 < argc)    splat_scale = (float) atof(argv[++i]);
         else if (a == "-h" || a == "--help")            return usage(argv[0]);
         else if (!model)                                model = argv[i];
         else                                            inputs.push_back(argv[i]);
@@ -129,7 +199,12 @@ int main(int argc, char ** argv) {
     free_splatter_options_free(opts);
     if (!ctx) { std::fprintf(stderr, "load: out of memory\n"); return 1; }
     if (const char * err = free_splatter_last_error(ctx)) {
-        std::fprintf(stderr, "load failed: %s\n", err); free_splatter_free(ctx); return 1;
+        std::fprintf(stderr, "load failed: %s\n", err);
+        // Default device is GPU/Vulkan (fail-closed): if none was requested
+        // explicitly, point the user at the CPU opt-in instead of guessing.
+        if (!device) std::fprintf(stderr,
+            "  (no GPU device — pass --device cpu to run on CPU, or build with FREE_SPLATTER_VULKAN)\n");
+        free_splatter_free(ctx); return 1;
     }
 
     free_splatter_geometry geo;
@@ -138,6 +213,247 @@ int main(int argc, char ** argv) {
                 geo.image_width, geo.image_height, geo.in_channels, geo.gaussian_channels);
     if (inputs.empty()) { free_splatter_free(ctx); return 0; }
 
+    // ---- parallax mode: depth-conditioning of a pair (after-inference) --------
+    // Input: exactly two images (engine runs the pair) or one [2,H,W,gc] .f32
+    // gaussian-output dump. Prints the recovered-geometry parallax stats.
+    if (parallax) {
+        const int gc = geo.gaussian_channels;
+        const size_t pair_floats = (size_t) 2 * geo.image_height * geo.image_width * gc;
+        std::vector<float> gbuf;
+        if (inputs.size() == 1 && ends_with(inputs[0], ".f32")) {
+            std::ifstream f(inputs[0], std::ios::binary | std::ios::ate);
+            if (!f) { std::fprintf(stderr, "cannot open %s\n", inputs[0].c_str()); free_splatter_free(ctx); return 1; }
+            const std::streamsize bytes = f.tellg(); f.seekg(0);
+            if ((size_t)(bytes / sizeof(float)) != pair_floats) {
+                std::fprintf(stderr, "dump %s: %zu floats, expected 2*%d*%d*%d\n", inputs[0].c_str(),
+                             (size_t)(bytes/sizeof(float)), geo.image_height, geo.image_width, gc);
+                free_splatter_free(ctx); return 1; }
+            gbuf.resize(pair_floats); f.read((char *) gbuf.data(), bytes);
+        } else if (inputs.size() == 2) {
+            std::vector<float> img;
+            for (const std::string & p : inputs)
+                if (!load_image_chw(p.c_str(), geo.image_width, img)) { free_splatter_free(ctx); return 1; }
+            float * out = nullptr; size_t n_out = 0;
+            if (free_splatter_run(ctx, img.data(), 2, geo.image_height, geo.image_width, &out, &n_out) != 0) {
+                std::fprintf(stderr, "run failed: %s\n", free_splatter_last_error(ctx)); free_splatter_free(ctx); return 1; }
+            gbuf.assign(out, out + n_out); free_splatter_buf_free(out);
+        } else {
+            std::fprintf(stderr, "--parallax needs exactly 2 images or one .f32 pair dump\n");
+            free_splatter_free(ctx); return 2;
+        }
+        free_splatter_parallax px;
+        if (free_splatter_pair_parallax(gbuf.data(), 2, geo.image_height, geo.image_width, gc, opac_thr, &px) != 0) {
+            std::fprintf(stderr, "parallax failed\n"); free_splatter_free(ctx); return 1; }
+        std::printf("parallax: tri_angle=%.3f deg  lateral_angle=%.2f deg  B/Z=%.4f  baseline=%.4f  median_depth=%.4f  focal=%.1f  npts=%d\n",
+                    px.tri_angle_deg, px.lateral_angle_deg, px.baseline_over_depth,
+                    px.baseline, px.median_depth, px.focal, px.n_points);
+        free_splatter_free(ctx);
+        return 0;
+    }
+
+    // ---- accumulate mode: chain a photo stream into one world -----------------
+    // Two input forms: a stream of IMAGES (the engine runs on each consecutive
+    // pair), or pre-computed `.f32` pair dumps (each a [2,H,W,gc] engine output —
+    // the engine is skipped, for fast re-bakes / fusion sweeps off cached runs).
+    if (accumulate) {
+        const int sz = geo.image_width, gc = geo.gaussian_channels;
+        const bool dump_mode = std::all_of(inputs.begin(), inputs.end(),
+                                           [](const std::string & s){ return ends_with(s, ".f32"); });
+        if (!dump_mode && inputs.size() < 2) {
+            std::fprintf(stderr, "--accumulate needs >=2 images (or N .f32 pair dumps)\n"); free_splatter_free(ctx); return 2; }
+        const size_t npairs = dump_mode ? inputs.size() : inputs.size() - 1;
+        const int64_t per_view = (int64_t) geo.in_channels * sz * sz;
+        const size_t  pair_floats = (size_t) 2 * sz * sz * gc;
+
+        // image mode: decode every frame once (CHW per view)
+        std::vector<std::vector<float>> frames;
+        if (!dump_mode)
+            for (const std::string & p : inputs) {
+                std::vector<float> img;
+                if (!load_image_chw(p.c_str(), sz, img)) { free_splatter_free(ctx); return 1; }
+                frames.push_back(std::move(img));
+            }
+
+        free_splatter_accumulator * acc = free_splatter_accumulator_new(sz, sz, opac_thr);
+        if (!acc) { std::fprintf(stderr, "accumulator alloc failed\n"); free_splatter_free(ctx); return 1; }
+
+        // --accumulate-tree: collect the (gated) pair engine outputs and merge them
+        // up a balanced binary tree instead of chaining linearly (less compounded drift).
+        std::vector<std::vector<float>> tree_pairs;
+
+        // emit the accumulated cloud after a pair is folded in (optional de-ghost)
+        auto emit_step = [&](int nframes) {
+            if (!splat_prefix) return;
+            free_splatter_point * cloud = nullptr; size_t nc = 0;
+            free_splatter_accumulator_cloud(acc, &cloud, &nc);
+            if (refine && nframes >= 2)               // geometric de-ghost a copy (internal untouched)
+                free_splatter_refine_cloud(cloud, nc, refine_voxel, refine_iters, refine_alpha);
+            char path[1024];
+            std::snprintf(path, sizeof path, "%s_%d.splat", splat_prefix, nframes);
+            write_cloud_splat(cloud, nc, (size_t) max_splats, splat_scale, path);
+            free_splatter_buf_free(cloud);
+        };
+
+        if (dump_mode) {
+            // fixed pre-computed pairs: cannot re-anchor, so the parallax gate
+            // (which works by re-pairing keyframes) does not apply here.
+            if (min_parallax > 0.0f)
+                std::fprintf(stderr, "note: --min-parallax re-pairs keyframes; ignored for "
+                                     ".f32 dumps (fixed pairs) -- accumulating all.\n");
+            for (size_t k = 0; k < npairs; k++) {
+                std::ifstream f(inputs[k], std::ios::binary | std::ios::ate);
+                if (!f) { std::fprintf(stderr, "cannot open %s\n", inputs[k].c_str()); free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; }
+                const std::streamsize bytes = f.tellg(); f.seekg(0);
+                if ((size_t)(bytes / sizeof(float)) != pair_floats) {
+                    std::fprintf(stderr, "dump %s: %zu floats, expected 2*%d*%d*%d\n", inputs[k].c_str(),
+                                 (size_t)(bytes/sizeof(float)), sz, sz, gc);
+                    free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; }
+                std::vector<float> dumpbuf(pair_floats); f.read((char *) dumpbuf.data(), bytes);
+                if (tree || tree_stages) { tree_pairs.push_back(std::move(dumpbuf)); std::printf("pair %zu collected\n", k); continue; }
+                free_splatter_accumulator_add_pair(acc, dumpbuf.data(), gc);
+                const int nframes = free_splatter_accumulator_frame_count(acc);
+                std::printf("pair %zu -> %d frames\n", k, nframes);
+                emit_step(nframes);
+            }
+        } else {
+            // image mode: keyframe gating. Pair the next candidate against the last
+            // KEPT frame; fold it in only if its parallax clears --min-parallax
+            // (else its depth is ill-conditioned). Threshold 0 -> every consecutive
+            // pair, identical to the ungated path.
+            size_t last_kept = 0; int skipped = 0;
+            for (size_t j = 1; j < frames.size(); j++) {
+                std::vector<float> pair(2 * per_view);
+                std::memcpy(&pair[0],        frames[last_kept].data(), per_view * sizeof(float));
+                std::memcpy(&pair[per_view], frames[j].data(),         per_view * sizeof(float));
+                float * runout = nullptr; size_t ng = 0;
+                if (free_splatter_run(ctx, pair.data(), 2, sz, sz, &runout, &ng) != 0) {
+                    std::fprintf(stderr, "run pair (%zu,%zu) failed: %s\n", last_kept, j, free_splatter_last_error(ctx));
+                    free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; }
+                if (min_parallax > 0.0f) {
+                    free_splatter_parallax px;
+                    free_splatter_pair_parallax(runout, 2, sz, sz, gc, opac_thr, &px);
+                    if (px.tri_angle_deg < min_parallax) {
+                        std::printf("skip frame %zu: parallax %.1f deg < %.1f (vs kept frame %zu)\n",
+                                    j, px.tri_angle_deg, min_parallax, last_kept);
+                        free_splatter_buf_free(runout); skipped++; continue;
+                    }
+                    std::printf("keep frame %zu: parallax %.1f deg (vs kept frame %zu)\n", j, px.tri_angle_deg, last_kept);
+                }
+                if (tree || tree_stages) {
+                    tree_pairs.emplace_back(runout, runout + ng);
+                    free_splatter_buf_free(runout);
+                    std::printf("pair (%zu,%zu) collected\n", last_kept, j);
+                    last_kept = j; continue;
+                }
+                free_splatter_accumulator_add_pair(acc, runout, gc);
+                free_splatter_buf_free(runout);
+                const int nframes = free_splatter_accumulator_frame_count(acc);
+                std::printf("pair (%zu,%zu) -> %d frames\n", last_kept, j, nframes);
+                emit_step(nframes);
+                last_kept = j;
+            }
+            if (skipped) std::printf("gated: skipped %d low-parallax frame(s)\n", skipped);
+        }
+        if (tree || tree_stages) {
+            std::vector<const float *> ps; ps.reserve(tree_pairs.size());
+            for (auto & v : tree_pairs) ps.push_back(v.data());
+            if (ps.empty()) { std::fprintf(stderr, "tree: need >=1 pair\n");
+                free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; }
+            const std::string pfx = splat_prefix ? splat_prefix : "stage";
+
+            if (tree_stages) {
+                // write one laid-out .splat per merge level (stage 0 = the independent
+                // leaf scenes side by side, ... then the single merged world, then a
+                // consensus-fused clean scene) + a manifest the viewer steps through.
+                std::string dir = ".", base = pfx;
+                const size_t slash = pfx.find_last_of('/');
+                if (slash != std::string::npos) { dir = pfx.substr(0, slash); base = pfx.substr(slash + 1); }
+                // per-scene budget: each laid-out scene keeps its own detail (a single
+                // shared cap would starve every scene to ~1/N and fill it with floaters).
+                const int per_node = (max_splats > 0 && max_splats < 200000) ? (int) max_splats : 150000;
+                std::vector<std::string> labels;
+                auto stage_path = [&](char * p, size_t cap) { std::snprintf(p, cap, "%s_%zu.splat", pfx.c_str(), labels.size()); };
+                // multi-scene stages of the multi-frame-overlap tree: each laid-out
+                // scene gets its own per_node budget.
+                for (int L = 0; L <= 64; L++) {
+                    free_splatter_point * sc = nullptr; size_t nsc = 0; int nnodes = 0;
+                    if (free_splatter_tree_overlap(ps.data(), (int) ps.size(), gc, sz, sz,
+                            opac_thr, tov_block, tov_overlap, L, -1.0f /*auto layout*/, per_node, &sc, &nsc, &nnodes) != 0) {
+                        std::fprintf(stderr, "tree-stages failed at level %d\n", L);
+                        free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; }
+                    if (nnodes <= 1) { free_splatter_buf_free(sc); break; }   // single scene: full budget below
+                    char path[1024]; stage_path(path, sizeof path);
+                    write_cloud_splat(sc, nsc, 0 /*already per-node capped*/, splat_scale, path);
+                    free_splatter_buf_free(sc);
+                    std::printf("stage %zu: %d scenes\n", labels.size(), nnodes);
+                    labels.push_back(L == 0 ? (std::to_string(nnodes) + " independent scenes")
+                                   : ("merge " + std::to_string(L) + " — " + std::to_string(nnodes) + " scenes"));
+                }
+                // the single merged world, at full budget: first the raw union (the
+                // overlapping per-frame copies, ghosted), then the consensus-fused
+                // (best-frame) clean scene — same contrast as the linear demo's acc/fused.
+                free_splatter_point * root = nullptr; size_t nroot = 0;
+                if (free_splatter_tree_overlap(ps.data(), (int) ps.size(), gc, sz, sz, opac_thr,
+                                               tov_block, tov_overlap, -1, 0.0f, 0, &root, &nroot, nullptr) == 0) {
+                    char path[1024]; stage_path(path, sizeof path);
+                    write_cloud_splat(root, nroot, (size_t) max_splats, splat_scale, path);
+                    std::printf("stage %zu: merged raw union (%zu pts)\n", labels.size(), nroot);
+                    labels.push_back("merged — one scene (raw union)");
+                    free_splatter_point * fz = nullptr; size_t nfz = 0;
+                    if (free_splatter_fuse_cloud(root, nroot, voxel, fuse_k, 2 /*best-frame*/, &fz, &nfz) == 0 && nfz > 0) {
+                        char path2[1024]; stage_path(path2, sizeof path2);
+                        write_cloud_splat(fz, nfz, (size_t) max_splats, splat_scale, path2);
+                        std::printf("stage %zu: consensus-fused (%zu -> %zu pts)\n", labels.size(), nroot, nfz);
+                        labels.push_back("consensus-fused — one clean scene");
+                        free_splatter_buf_free(fz);
+                    }
+                    free_splatter_buf_free(root);
+                }
+                std::ofstream mf(dir + "/manifest.json");
+                mf << "{ \"reframe\": true, \"steps\": [\n";
+                for (size_t L = 0; L < labels.size(); L++)
+                    mf << "    {\"splat\":\"" << base << "_" << L << ".splat\",\"images\":[],\"n\":" << (L + 1)
+                       << ",\"label\":\"" << labels[L] << "\"}" << (L + 1 < labels.size() ? "," : "") << "\n";
+                mf << "  ] }\n";
+                std::printf("tree-stages: %zu stages -> %s/manifest.json\n", labels.size(), dir.c_str());
+            } else {
+                // one merged world from the overlap tree (block=tov_block, overlap=tov_overlap;
+                // 2/1 = the plain overlap-by-one tree). With --fuse, also the de-ghosted scene.
+                std::printf("tree-accumulate: %zu pairs, block=%d overlap=%d\n", ps.size(), tov_block, tov_overlap);
+                free_splatter_point * tc = nullptr; size_t ntc = 0;
+                if (free_splatter_tree_overlap(ps.data(), (int) ps.size(), gc, sz, sz, opac_thr,
+                                               tov_block, tov_overlap, -1, 0.0f, 0, &tc, &ntc, nullptr) != 0) {
+                    std::fprintf(stderr, "tree accumulate failed\n");
+                    free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; }
+                char path[1024];
+                std::snprintf(path, sizeof path, "%s_tree.splat", pfx.c_str());
+                write_cloud_splat(tc, ntc, (size_t) max_splats, splat_scale, path);
+                if (fuse) {                                  // best-frame consensus, de-ghosted
+                    free_splatter_point * fz = nullptr; size_t nfz = 0;
+                    if (free_splatter_fuse_cloud(tc, ntc, voxel, fuse_k, 2, &fz, &nfz) == 0 && nfz > 0) {
+                        std::snprintf(path, sizeof path, "%s_tree_fused.splat", pfx.c_str());
+                        write_cloud_splat(fz, nfz, (size_t) max_splats, splat_scale, path);
+                        free_splatter_buf_free(fz);
+                    }
+                }
+                free_splatter_buf_free(tc);
+            }
+            free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 0;
+        }
+        if (fuse && splat_prefix) {
+            if (refine) free_splatter_accumulator_refine(acc, refine_voxel, refine_iters, refine_alpha);
+            free_splatter_point * fc = nullptr; size_t nf = 0;
+            free_splatter_accumulator_fuse(acc, voxel, fuse_k, fuse_mode, &fc, &nf);
+            char path[1024];
+            std::snprintf(path, sizeof path, "%s_fused.splat", splat_prefix);
+            write_cloud_splat(fc, nf, (size_t) max_splats, splat_scale, path);
+            free_splatter_buf_free(fc);
+        }
+        free_splatter_accumulator_free(acc);
+        free_splatter_free(ctx);
+        return 0;
+    }
+
     // assemble input: one raw .f32, or several decoded images
     std::vector<float> buf;
     int32_t n_views = 0;
diff --git a/web/README.md b/web/README.md
index 7ca0ec3..8884613 100644
--- a/web/README.md
+++ b/web/README.md
@@ -24,6 +24,66 @@ python3 -m http.server -d web 8000      # then open http://localhost:8000
 
 Load a different file with `?splat=other.splat`.
 
+## Accumulating-reconstruction demo (`accumulate.html`)
+
+A second viewer that plays the **growing world** the pose pipeline builds: the
+cloud reconstructed from 2 photos, then 3, then 4, … as each new view is folded
+in (`free_splatter-cli --accumulate` → cross-run Sim(3) alignment). The input
+photos appear one-per-step in the top-right filmstrip, newest highlighted; the
+WebGL renderer is the same EWA splatting as `index.html`.
+
+```sh
+# bake the demo (engine pass over a photo stream -> a self-contained demo dir)
+nix develop -c scripts/make_accumulate_demo.sh .cache/demo/accumulate \
+    frames/f0000.png frames/f0020.png frames/f0040.png frames/f0060.png ...
+
+# serve it (the bake copies accumulate.html in as index.html)
+python3 -m http.server -d .cache/demo/accumulate 8080   # open http://localhost:8080
+```
+
+URL params: `?ms=2600` step interval, `?start=N` begin at step N, `?auto=0` no
+auto-advance, `?spin=0` no auto-orbit. The bake also writes a consensus-fused
+`acc_fused.splat` (single-view floaters removed — only voxels seen by ≥ K frames
+survive), shown as a final step. `--fuse-mode kept` keeps every raw gaussian in a
+consensus voxel (dense); `best` keeps only the most-confident frame per voxel
+(dense AND de-ghosted); `averaged` collapses each voxel to one denoised point
+(cleaner but sparse — only worthwhile with many overlapping frames).
+
+**De-ghosting.** Pairwise chaining overlays each frame's reconstruction in world
+space, so where two frames disagree about a surface you see doubled ("ghosted")
+copies. The fused step removes them by **best-frame selection** (`--fuse-mode
+best`): within each voxel it keeps only the single most-confident frame's
+gaussians, so disagreeing copies aren't stacked — one frame represents each
+region. This de-ghosts by *selection* (no averaging → no blur), at the cost of
+visible seams where adjacent voxels pick different frames. The two artifact-free
+alternatives are more expensive: fixing the registration (only helps when the
+disagreement is a *rigid* pose drift — usually it isn't, it's per-pair stereo
+depth disagreement baked into the network output, which no rigid alignment can
+undo) or photometric 3DGS re-optimization. A gaussian-level averaging pass
+(`--refine`, **off** by default) exists but is **not recommended**: it averages
+points that already share a voxel — blurring the in-register surface — while
+leaving ghosts that are more than a voxel apart untouched, so it trades sharpness
+for no real de-ghosting.
+
+**Pick frames with lateral baseline.** Two-view depth needs the camera to
+*translate sideways* — a pure forward dolly (the camera moving along the direction
+it faces) gives near-zero parallax and reconstructs blurry. Frames a few apart
+from an orbiting / strafing clip work far better than tightly-spaced frames of a
+walk-forward clip. Rule of thumb: the per-pair lateral baseline should be a few %
+of scene depth or more.
+
+**…or let the gate pick them (`MIN_PARALLAX`).** The bake passes
+`--min-parallax DEG` (default 8°, `MIN_PARALLAX=0` to disable): a candidate frame
+is folded in only if its triangulation angle vs the last *kept* frame clears the
+threshold — otherwise its depth is ill-conditioned and the model invents it.
+Skipped frames re-anchor against the last kept one (keyframe selection), so you can
+feed a long, dense frame stream and let the gate curate the well-conditioned
+subset. The threshold is the engine's *after-inference* angle
+(`free_splatter-cli --parallax …`), which over-reports, so keep it well above
+COLMAP's 1–2°. `scripts/parallax_ref.py` (cv2, dev-shell only) is the independent
+geometric cross-check — agreement on a pair validates it; a large gap (geometric
+angle tiny, model angle confident) flags the model hallucinating depth.
+
 ## Sanity-checking the data separately
 
 The `.splat` uses the antimatter15 layout, so you can verify the *data* (apart