diff --git a/.gitignore b/.gitignore index 97a3841..15c3e90 100644 --- a/.gitignore +++ b/.gitignore @@ -20,6 +20,9 @@ __pycache__/ # reference code (third-party, fetched on demand — never vendored here). .cache/ +# Throwaway working dir for ad-hoc harnesses / scratch (never committed). +/scratchpad/ + # Model + data artifacts (generated; never committed) *.gguf *.safetensors @@ -50,5 +53,6 @@ timeout-* /server/fsserver /server/server -# Demo-video drop folder (user photos; baked into .cache/demo) +# Demo drop folders (user photos / videos; baked into .cache/demo) /demo-photos/ +/demo-vids/ diff --git a/CLAUDE.md b/CLAUDE.md index 1d1ec3e..15bc41a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,19 +2,46 @@ ## What this project is -A GGML/C++ inference engine for the **neural-network front half** of -TencentARC/FreeSplatter: image patch tokenizer → multi-view self-attention -transformer → per-pixel 3D-Gaussian parameter head. Given N uncalibrated views -it returns, per input pixel, the activated Gaussian parameters. - -**Scope:** pieces 1–3 only. The **rasterizer and the PnP pose solver are OUT OF -SCOPE** — they are a downstream consumer of the `[N, H, W, gaussian_channels]` -tensor this engine produces. Do not add them here. +A GGML/C++ engine for the **neural-network front half** of +TencentARC/FreeSplatter (image patch tokenizer → multi-view self-attention +transformer → per-pixel 3D-Gaussian parameter head) **plus the downstream +camera-pose recovery and cross-run registration that consume its output**. Given +N uncalibrated views it returns, per input pixel, the activated Gaussian +parameters; **PnP** then recovers each view's camera, and successive runs are +aligned into one accumulating world — the path toward live reconstruction from a +moving camera. + +**Scope (updated):** the engine (pieces 1–3), **PnP pose recovery (now IN +scope)**, and the cross-run Sim(3) alignment / accumulation. Keep the seam at the +`[N, H, W, gaussian_channels]` tensor clean — it is the contract between the +engine and the pose consumer. Rendering itself stays in the demo viewer +(Vulkan/WebGL), not the engine. Target checkpoint first: **`freesplatter-scene`** (`gaussian_channels=23`, `sh_residual=true`, black background). The transformer backbone is identical across all three variants, so object/object-2dgs are cheap follow-ons. +## Language & dependency policy + +- **Everything ships in C++** (the engine, PnP, and the alignment/accumulation + once proven). **Go only for the demo web server** — the purego layer that drives + the C API → Vulkan + WebGL viewer. +- **The CLI and the C API must have NO Python dependency at runtime.** Every piece + of current and future functionality is reachable from `free_splatter-cli` and + `include/free_splatter.h` without invoking Python. +- **Python is confined to two places, neither shipped:** + 1. **Dev-time reference / conversion / validation** that runs in + `docker/Dockerfile.cuda` (`scripts/hf_dump.py`, `convert.py`, + `compare_taps.py`, …) — the only place torch runs; never a runtime dependency. + 2. **The `pose/` research prototype — now DONE and DELETED.** It was the + temporary Python (numpy + cv2) prototype that proved the + accumulating-reconstruction approach. That approach is proven, the whole + pipeline (focal, Sim(3) align, robust PnP, accumulation, loop closure, + consensus fusion) is **rewritten in C++** (`src/pose.{h,cpp}`), exposed via + `free_splatter-cli` + `include/free_splatter.h`, and the Python prototype has + been **removed** (see git history for it and its layer-by-layer parity + harnesses). No Python remains in the pose path. + ## Validation is the backbone (non-negotiable) This is a numerical port. Correctness means matching the PyTorch reference @@ -47,11 +74,16 @@ This is a numerical port. Correctness means matching the PyTorch reference ## Per-component discipline -Each component (`image`, `gguf_loader`, `backend`, `model`, the head) has its own +Each component (`image`, `gguf_loader`, `backend`, `model`, the head, and +**`pose` = PnP + focal + Sim(3) alignment + accumulation/loop/fusion**) has its own unit test and is made independently green **before** cross-component parity. Component boundaries match the file layout. Keep the seam at the -`[N,H,W,gaussian_channels]` tensor clean — that is the contract with any -rasterizer. +`[N,H,W,gaussian_channels]` tensor clean — that is the contract between the engine +and the pose/rendering consumers. The C++ `pose` component (`tests/test_pose.cpp`, +asset-free golden tier) carries the parity discipline the Python prototype +established: **bit-exact to upstream `estimate_poses`** and **validated against +independent ground-truth poses** — see git history for the prototype's +`check_upstream_parity.py` / `re10k_experiment.py` harnesses. ## Debugging philosophy (mandatory sequence) diff --git a/CMakeLists.txt b/CMakeLists.txt index 83fabbc..2e757af 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -122,6 +122,7 @@ add_library(free_splatter ${_fs_libtype} src/gguf_loader.cpp src/model.cpp src/image.cpp + src/pose.cpp src/free_splatter.cpp ) target_include_directories(free_splatter PUBLIC include) @@ -144,12 +145,19 @@ if (FREE_SPLATTER_BUILD_TOOLS AND NOT EMSCRIPTEN) endif() if (FREE_SPLATTER_FUZZ) - foreach(target fuzz_image fuzz_options) + foreach(target fuzz_image fuzz_options fuzz_pose) add_executable(${target} fuzz/${target}.cpp) target_link_libraries(${target} PRIVATE free_splatter) target_include_directories(${target} PRIVATE src) target_link_options(${target} PRIVATE -fsanitize=fuzzer) # libFuzzer main endforeach() + # fuzz_decode also drives the vendored (third-party) stb decoder, so it needs + # the stb implementation TU + headers (built warning-quiet like the CLI). + add_executable(fuzz_decode fuzz/fuzz_decode.cpp tools/stb_impl.cpp) + target_link_libraries(fuzz_decode PRIVATE free_splatter) + target_include_directories(fuzz_decode PRIVATE src ${CMAKE_SOURCE_DIR}/third_party/stb) + target_link_options(fuzz_decode PRIVATE -fsanitize=fuzzer) + set_source_files_properties(tools/stb_impl.cpp PROPERTIES COMPILE_OPTIONS "-w") endif() # --- WASM (Emscripten) ------------------------------------------------------- diff --git a/bench/free_splatter-bench.cpp b/bench/free_splatter-bench.cpp index 8f3b165..e505433 100644 --- a/bench/free_splatter-bench.cpp +++ b/bench/free_splatter-bench.cpp @@ -88,7 +88,7 @@ int main(int argc, char ** argv) { } std::printf("model: %s device=%s %dx%d in=%d gc=%d views=%d S=%lld load=%.1f ms\n", - model, device ? device : "cpu", geo.image_width, geo.image_height, + model, device ? device : "vulkan", geo.image_width, geo.image_height, geo.in_channels, geo.gaussian_channels, views, (long long) ((int64_t) views * (geo.image_height/8) * (geo.image_width/8)), load_ms); @@ -124,7 +124,7 @@ int main(int argc, char ** argv) { "(%.2f views/s, %.2f scenes/s)\n", mn, med, mean, mx, vps, 1000.0 / med); std::printf("RESULT engine device=%s views=%d gc=%d load_ms=%.1f min_ms=%.1f " "median_ms=%.1f mean_ms=%.1f max_ms=%.1f views_per_s=%.3f\n", - device ? device : "cpu", views, geo.gaussian_channels, load_ms, + device ? device : "vulkan", views, geo.gaussian_channels, load_ms, mn, med, mean, mx, vps); return 0; } diff --git a/flake.nix b/flake.nix index 7976a20..5a252a5 100644 --- a/flake.nix +++ b/flake.nix @@ -11,7 +11,10 @@ # hf_dump.py) run in docker/Dockerfile.cuda alongside the GPU model. The # M0 gate synthesizes its test GGUF in C++ (tests/test_loader.cpp), so the # build + fast test tier need no Python at all. - pyEnv = pkgs.python3.withPackages (ps: with ps; [ numpy ]); + # opencv4 (cv2): the downstream pose/ prototype mirrors FreeSplatter's + # cv2.solvePnPRansac for the exact-upstream PnP path; numpy-only fallback + # otherwise. Not needed by the engine build/test. + pyEnv = pkgs.python3.withPackages (ps: with ps; [ numpy opencv4 ]); in { devShells.${system} = { diff --git a/fuzz/README.md b/fuzz/README.md index 8789e0e..e36f47d 100644 --- a/fuzz/README.md +++ b/fuzz/README.md @@ -6,8 +6,10 @@ libFuzzer harnesses for the **untrusted** input surfaces, built with ```sh nix develop .#fuzz cmake --preset fuzz && cmake --build --preset fuzz -mkdir -p corpus_img && ./build/fuzz/fuzz_image -max_total_time=60 corpus_img -mkdir -p corpus_opt && ./build/fuzz/fuzz_options -max_total_time=60 corpus_opt +mkdir -p corpus_img && ./build/fuzz/fuzz_image -max_total_time=60 corpus_img +mkdir -p corpus_opt && ./build/fuzz/fuzz_options -max_total_time=60 corpus_opt +mkdir -p corpus_pose && ./build/fuzz/fuzz_pose -max_total_time=60 corpus_pose +mkdir -p corpus_decode && ./build/fuzz/fuzz_decode -max_total_time=60 corpus_decode ``` ## Targets @@ -18,9 +20,20 @@ mkdir -p corpus_opt && ./build/fuzz/fuzz_options -max_total_time=60 corpus_opt crashes, reads OOB, or returns a populated buffer on a rejected input. - **`fuzz_options`** — the options-builder / device-string setters with arbitrary byte strings; asserts no crash and NULL-safe frees. +- **`fuzz_pose`** — the public pose C-API: `free_splatter_estimate_poses` and the + accumulator (`add_pair` → `cloud` / `fuse` / `camera_path`) driven with + arbitrary float gaussian buffers (NaN/Inf/denormals) and fuzz-chosen geometry. + Caught a real SIGFPE (the RANSAC sampler's `% N` with N=0 valid + correspondences) and motivated the non-finite guards in `consensus_fuse` + (the float→int voxel-coordinate cast). +- **`fuzz_decode`** — the untrusted image-FILE path: arbitrary bytes → + `stb_image` → center-crop + resize → CHW (the surface a user photo crosses in + the CLI / demo). stb is vendored THIRD-PARTY; per CLAUDE.md we fuzz the boundary + and would guard (not patch) any stb-internal sanitizer trip — none seen so far. ## Trust boundary (intentional) The **GGUF model file is TRUSTED** and is deliberately **not** fuzzed: it is our -own converter's output, loaded by us. Image inputs and the public C-API string -arguments are **untrusted** and fuzzed. Keep this asymmetry — see `CLAUDE.md`. +own converter's output, loaded by us. Image inputs (encoded files and decoded +pixels) and the public C-API arguments are **untrusted** and fuzzed. Keep this +asymmetry — see `CLAUDE.md`. diff --git a/fuzz/fuzz_decode.cpp b/fuzz/fuzz_decode.cpp new file mode 100644 index 0000000..dcb0a5c --- /dev/null +++ b/fuzz/fuzz_decode.cpp @@ -0,0 +1,39 @@ +// libFuzzer harness for the UNTRUSTED image-FILE decode path: arbitrary bytes -> +// stb_image -> the CLI's center-crop + resize -> [0,1] CHW. This is the actual +// surface a user photo crosses in `free_splatter-cli` / the demo. stb_image is a +// vendored THIRD-PARTY decoder; per CLAUDE.md we fuzz the boundary and, where stb +// itself trips a sanitizer on malformed input, GUARD it rather than patch stb. +#include "stb_image.h" +#include "stb_image_resize2.h" + +#include +#include +#include + +extern "C" int LLVMFuzzerTestOneInput(const uint8_t * data, size_t size) { + int w = 0, h = 0, n = 0; + // Decode arbitrary bytes as an image, forcing RGB (the CLI's contract). + unsigned char * px = stbi_load_from_memory(data, (int) size, &w, &h, &n, 3); + if (!px) return 0; // rejected -> nothing to do + + // Guard against a decoded size that would overflow the crop/resize math (stb + // can report large dims on crafted headers); the CLI works on sane photos. + if ((int64_t) w * h > 64ll * 1024 * 1024 || w <= 0 || h <= 0) { stbi_image_free(px); return 0; } + + // center-crop to a square, then resize to a small size (mirror the CLI). + const int s = w < h ? w : h, left = (w - s) / 2, top = (h - s) / 2; + std::vector sq((size_t) s * s * 3); + for (int y = 0; y < s; y++) + std::memcpy(&sq[(size_t) y * s * 3], &px[((size_t)(top + y) * w + left) * 3], (size_t) s * 3); + stbi_image_free(px); + + const int outsz = 32; + std::vector rz((size_t) outsz * outsz * 3); + stbir_resize_uint8_linear(sq.data(), s, s, 0, rz.data(), outsz, outsz, 0, STBIR_RGB); + + std::vector chw((size_t) 3 * outsz * outsz); + for (int c = 0; c < 3; c++) + for (int i = 0; i < outsz * outsz; i++) + chw[(size_t) c * outsz * outsz + i] = rz[(size_t) i * 3 + c] / 255.0f; + return 0; +} diff --git a/fuzz/fuzz_pose.cpp b/fuzz/fuzz_pose.cpp new file mode 100644 index 0000000..de8a016 --- /dev/null +++ b/fuzz/fuzz_pose.cpp @@ -0,0 +1,67 @@ +// libFuzzer harness for the public pose C-API surface (free_splatter_estimate_poses +// + the accumulator). These now take caller-supplied float buffers across the ABI, +// so a binding that feeds NaN/Inf/degenerate geometry must not crash, read OOB, or +// trip UBSan (e.g. a non-finite float -> int voxel-coord cast). The gaussian buffer +// is engine output, but the C-API is a public boundary; keep it robust. +#include "free_splatter.h" + +#include +#include +#include +#include + +extern "C" int LLVMFuzzerTestOneInput(const uint8_t * data, size_t size) { + if (size < 8) return 0; + + // small, fuzz-driven geometry so many inputs are explored cheaply + const int gc = 23; + const int H = 4 + (data[0] % 5) * 4; // 4,8,12,16,20 + const int W = 4 + (data[1] % 5) * 4; + const int P = H * W; + + auto fill = [&](std::vector & buf, size_t off) { + // reinterpret the fuzz bytes as floats (NaN/Inf/denormals all appear), + // tiling deterministically to fill the whole byte range. + uint8_t * b = (uint8_t *) buf.data(); + const size_t nbytes = buf.size() * sizeof(float); + for (size_t i = 0; i < nbytes; i++) b[i] = data[(off + i) % size]; + }; + + // 1) estimate_poses on a 2-view buffer + { + std::vector g((size_t) 2 * P * gc); + fill(g, 2); + std::vector c2w((size_t) 2 * 16); + float focal = 0; + free_splatter_estimate_poses(g.data(), 2, H, W, gc, 0.05f, c2w.data(), &focal); + } + + // 2) accumulator: add a few pairs, then cloud / fuse / camera_path + free_splatter_accumulator * acc = free_splatter_accumulator_new(H, W, 0.05f); + if (acc) { + const int npairs = 1 + (data[2] % 3); // 1..3 pairs + std::vector g((size_t) 2 * P * gc); + for (int k = 0; k < npairs; k++) { + fill(g, (size_t) 3 + k); + free_splatter_accumulator_add_pair(acc, g.data(), gc); + } + free_splatter_point * cloud = nullptr; size_t nc = 0; + if (free_splatter_accumulator_cloud(acc, &cloud, &nc) == 0) { + free_splatter_refine_cloud(cloud, nc, 0.01f + (data[6] % 8) * 0.02f, 1 + (data[7] % 4), 0.5f); + free_splatter_buf_free(cloud); + } + free_splatter_accumulator_refine(acc, 0.03f, 2, 0.5f); // de-ghost on garbage + + const float voxel = 0.005f + (data[3] % 16) * 0.01f; // 0.005..0.155 + const int k = 1 + (data[4] % 4); // 1..4 + const int mode = data[5] % 3; // averaged / kept / best + free_splatter_point * fused = nullptr; size_t nf = 0; + if (free_splatter_accumulator_fuse(acc, voxel, k, mode, &fused, &nf) == 0) free_splatter_buf_free(fused); + + float * path = nullptr; int32_t nfr = 0; + if (free_splatter_accumulator_camera_path(acc, &path, &nfr) == 0) free_splatter_buf_free(path); + + free_splatter_accumulator_free(acc); + } + return 0; +} diff --git a/include/free_splatter.h b/include/free_splatter.h index 4707170..22ea889 100644 --- a/include/free_splatter.h +++ b/include/free_splatter.h @@ -77,6 +77,132 @@ FREE_SPLATTER_API int free_splatter_run(free_splatter_ctx * ctx, const float * int32_t height, int32_t width, float ** out, size_t * n_out); FREE_SPLATTER_API void free_splatter_buf_free(void * buf); // NULL-safe +// ---- downstream pose recovery + accumulation ---- +// The pose consumer of the engine seam: given the engine's [N,H,W,gaussian_channels] +// output it recovers each view's camera (PnP) and stitches successive runs into one +// accumulating world (cross-run Sim(3)). Dependency-free (no Python, no OpenCV). + +// Recover each view's camera from an engine output buffer. gaussians: +// n_views*height*width*gaussian_channels float32 (the layout free_splatter_run +// returns). cam2world_out: caller-owned n_views*16 float32, filled with a +// row-major 4x4 cam2world per view. If focal_out is non-NULL it receives the +// estimated shared focal. opacity_threshold gates valid pixels (engine output is +// already activated). Returns 0 on success, -1 on failure. +FREE_SPLATTER_API int free_splatter_estimate_poses( + const float * gaussians, int32_t n_views, int32_t height, int32_t width, + int32_t gaussian_channels, float opacity_threshold, + float * cam2world_out, float * focal_out); + +// After-inference parallax: how well a 2-view pair constrains depth, measured +// from the model's OWN recovered geometry. Angles are scale-invariant. +typedef struct { + float tri_angle_deg; // median triangulation angle over confident points + float lateral_angle_deg; // baseline angle off view-0 optical axis (0=dolly, 90=strafe) + float baseline_over_depth; // ||C1-C0|| / median point depth from camera 0 + float baseline; + float median_depth; + float focal; // estimated (or supplied) focal, px + int32_t n_points; // confident points used for the medians +} free_splatter_parallax; + +// Estimate parallax for the first two views of an engine output buffer. +// gaussians: n_views(>=2)*height*width*gaussian_channels f32. Returns 0 on +// success, -1 on bad args. A low tri_angle_deg (< ~1-2 deg) or near-zero +// lateral_angle_deg means depth is ill-conditioned (reconstruction unreliable). +FREE_SPLATTER_API int free_splatter_pair_parallax( + const float * gaussians, int32_t n_views, int32_t height, int32_t width, + int32_t gaussian_channels, float opacity_threshold, free_splatter_parallax * out); + +// One accumulated 3D gaussian in the global frame: position, color in [0,1], the +// anisotropic scale and rotation quaternion (w,x,y,z) — transformed through the +// cross-run Sim(3) so the cloud renders as oriented splats — and the source frame +// it came from (for consensus fusion). +typedef struct { + float x, y, z; + float r, g, b, opacity; + float sx, sy, sz; + float qw, qx, qy, qz; + int32_t frame; +} free_splatter_point; + +// Sliding-window accumulator: feed each consecutive pair's engine output and it +// chains the runs into one world. Opaque handle, not thread-safe. +typedef struct free_splatter_accumulator free_splatter_accumulator; +FREE_SPLATTER_API free_splatter_accumulator * free_splatter_accumulator_new( + int32_t height, int32_t width, float opacity_threshold); +FREE_SPLATTER_API void free_splatter_accumulator_free(free_splatter_accumulator * acc); // NULL-safe + +// Add one pair's engine output: 2*height*width*gaussian_channels float32, where +// view 0 shares a frame with the previous pair's view 1. Returns 0 on success. +FREE_SPLATTER_API int free_splatter_accumulator_add_pair( + free_splatter_accumulator * acc, const float * gaussians, int32_t gaussian_channels); + +// Number of frames accumulated so far (== pairs_added + 1, or 0 before any pair). +FREE_SPLATTER_API int free_splatter_accumulator_frame_count(const free_splatter_accumulator * acc); + +// Copy the current accumulated cloud: *out is malloc'd free_splatter_point[*n_out] +// (free with free_splatter_buf_free). Returns 0 on success, -1 on failure. +FREE_SPLATTER_API int free_splatter_accumulator_cloud( + const free_splatter_accumulator * acc, free_splatter_point ** out, size_t * n_out); + +// Consensus-fuse the current cloud: keep only voxels (size voxel_frac * extent) +// corroborated by >= k distinct source frames (single-frame floaters dropped). mode: +// 0 = averaged (one denoised point per voxel; sparse), 1 = kept (every raw gaussian +// in a consensus voxel; dense), 2 = best (only the most-confident frame's gaussians +// per voxel; dense AND de-ghosted). *out malloc'd as above. Returns 0 on success. +FREE_SPLATTER_API int free_splatter_accumulator_fuse( + const free_splatter_accumulator * acc, float voxel_frac, int32_t k, int32_t mode, + free_splatter_point ** out, size_t * n_out); + +// Copy the global camera trajectory: *out malloc'd (*n_frames)*16 float32, a +// row-major 4x4 cam2world (similarity) per frame. Returns 0 on success. +FREE_SPLATTER_API int free_splatter_accumulator_camera_path( + const free_splatter_accumulator * acc, float ** out, int32_t * n_frames); + +// Geometric de-ghost a cloud in place (gaussian-level consensus refinement): each +// iteration non-rigidly pulls every gaussian a fraction `alpha` toward the +// multi-frame consensus of its (coarse-to-fine voxel_frac) neighbourhood — removing +// the doubled objects the pairwise pose chain leaves. Returns 0 on success. +FREE_SPLATTER_API int free_splatter_refine_cloud( + free_splatter_point * points, size_t n, float voxel_frac, int32_t iters, float alpha); + +// Same, applied to the accumulator's internal cloud (so a subsequent _fuse also +// operates on the de-ghosted cloud). Returns 0 on success, -1 on bad args. +FREE_SPLATTER_API int free_splatter_accumulator_refine( + free_splatter_accumulator * acc, float voxel_frac, int32_t iters, float alpha); + +// Consensus-fuse an arbitrary point cloud (e.g. a tree root) using each point's +// `frame` tag: keep voxels (size voxel_frac*extent) seen by >= k distinct frames, +// emitting per `mode` (0 averaged, 1 kept, 2 best-frame — dense AND de-ghosted). +// *out malloc'd (free with free_splatter_buf_free). Returns 0 on success. +FREE_SPLATTER_API int free_splatter_fuse_cloud( + const free_splatter_point * pts, size_t n, float voxel_frac, int32_t k, int32_t mode, + free_splatter_point ** out, size_t * n_out); + +// Hierarchical (balanced binary tree) accumulation — batch alternative to the +// linear accumulator. pairs[k] is one engine output [2*H*W*gc] (pair k = frames +// k,k+1; pair k's view-1 and pair k+1's view-0 are the same frame). Adjacent +// submaps of `block` frames overlap by `overlap` frames, and each merge fits its +// Sim(3) over all shared frames at once (over-determined -> averages per-frame +// noise) — so drift compounds over ~log2(N) hops, not N, and is spread evenly +// instead of dumped on the last frame. block=2,overlap=1 is the plain +// overlap-by-one (single shared boundary) tree. +// +// The remaining args are for visualising the merge side by side (pass max_levels=-1, +// layout_spacing=0, per_node_cap=0, n_nodes=NULL for a plain full merge to the root): +// max_levels stops after that many rounds (-1 = full), so 0 returns the independent +// base submaps, 1 the first merge round, …; step 0..D to watch the tree collapse to +// one scene. The remaining nodes are laid out side by side (layout_spacing: 0 = none, +// <0 = auto, >0 = explicit units) and each capped to its most important +// (opacity*radius) points (per_node_cap>0) so every scene renders at its own detail. +// *n_nodes (optional) gets the remaining node count (1 once fully merged). *out +// malloc'd as free_splatter_point[*n_out] (free with free_splatter_buf_free). 0 on success. +FREE_SPLATTER_API int free_splatter_tree_overlap( + const float * const * pairs, int32_t n_pairs, int32_t gaussian_channels, + int32_t height, int32_t width, float opacity_threshold, int32_t block, int32_t overlap, + int32_t max_levels, float layout_spacing, int32_t per_node_cap, + free_splatter_point ** out, size_t * n_out, int32_t * n_nodes); + #ifdef __cplusplus } #endif diff --git a/scripts/demo/bake-vids.sh b/scripts/demo/bake-vids.sh new file mode 100755 index 0000000..4c8635e --- /dev/null +++ b/scripts/demo/bake-vids.sh @@ -0,0 +1,65 @@ +#!/usr/bin/env bash +# Auto-discover videos and bake each into an accumulating-reconstruction demo. +# +# Mirrors the drop-a-folder convenience of demo-photos/ (scripts/demo/bake.sh), +# but for VIDEO: drop one folder per scene under demo-vids/ with a clip inside — +# demo-vids/office-corner/clip.mp4 +# demo-vids/flower-bed/clip.mp4 +# — then run this. Each clip is sampled into frames, the parallax gate curates the +# well-conditioned subset (scripts/make_accumulate_demo.sh --min-parallax), and the +# result is a self-contained growing-reconstruction demo (accumulate.html) under +# .cache/demo//. +# +# nix develop -c scripts/demo/bake-vids.sh # bake every demo-vids/ folder +# MIN_PARALLAX=6 MAXFRAMES=30 nix develop -c scripts/demo/bake-vids.sh +# +# Env: VIDS (demo-vids/), OUTBASE (.cache/demo), MAXFRAMES (frames sampled per +# clip, default 24), MIN_PARALLAX (gate degrees, default 8; 0 disables). +set -euo pipefail +cd "$(dirname "$0")/../.." +ROOT=$(pwd) +VIDS=${VIDS:-$ROOT/demo-vids} +OUTBASE=${OUTBASE:-$ROOT/.cache/demo} +MAXFRAMES=${MAXFRAMES:-24} +export MIN_PARALLAX=${MIN_PARALLAX:-8} + +command -v ffmpeg >/dev/null || { echo "ffmpeg required" >&2; exit 1; } +mkdir -p "$VIDS" +[ -f "$VIDS/README.txt" ] || cat >"$VIDS/README.txt" <<'TXT' +Drop one folder per scene here with a video clip inside (a time-lapse / slow pan +works best), then run scripts/demo/bake-vids.sh: + office-corner/ clip.mp4 + flower-bed/ clip.mp4 +Each clip is sampled to frames, the parallax gate keeps the well-conditioned ones, +and the result is a growing-reconstruction demo under .cache/demo//. +TXT + +shopt -s nullglob +baked=() +for D in "$VIDS"/*/; do + D=${D%/}; name=$(basename "$D") + vid=$(find "$D" -maxdepth 1 -type f \( -iname '*.mp4' -o -iname '*.mov' -o -iname '*.webm' -o -iname '*.mkv' -o -iname '*.m4v' \) | sort | head -1) + [ -n "$vid" ] || { echo " no video in $name/ — skipping"; continue; } + safe=$(printf '%s' "$name" | tr -c 'A-Za-z0-9_-' '_') + + # Sample frames: decode all, then take an even stride down to ~MAXFRAMES. (Time- + # lapse clips are short, so decoding all is cheap and avoids frame-count guessing.) + fdir="$OUTBASE/_frames/$safe"; rm -rf "$fdir"; mkdir -p "$fdir" + ffmpeg -y -loglevel error -i "$vid" "$fdir/all%04d.png" + mapfile -t ALL < <(ls "$fdir"/all*.png | sort) + nb=${#ALL[@]} + [ "$nb" -ge 2 ] || { echo " $name: <2 frames decoded — skipping"; continue; } + stride=$(( (nb + MAXFRAMES - 1) / MAXFRAMES )); [ "$stride" -lt 1 ] && stride=1 + FR=(); for ((i=0;i ${#FR[@]} sampled, stride $stride, gate ${MIN_PARALLAX}deg) ==" + + bash scripts/make_accumulate_demo.sh "$OUTBASE/$safe" "${FR[@]}" + baked+=("$safe") +done + +if [ "${#baked[@]}" -eq 0 ]; then + echo "no video folders found under $VIDS (drop demo-vids//.mp4)"; exit 0 +fi +echo +echo "baked ${#baked[@]} demo(s): ${baked[*]}" +echo "serve one with: python3 -m http.server -d $OUTBASE/ 8080 # open http://localhost:8080/" diff --git a/scripts/make_accumulate_demo.sh b/scripts/make_accumulate_demo.sh new file mode 100755 index 0000000..e837418 --- /dev/null +++ b/scripts/make_accumulate_demo.sh @@ -0,0 +1,88 @@ +#!/usr/bin/env bash +# Bake the accumulating-reconstruction demo: run the engine over a stream of +# photos, fold them into one growing world (free_splatter-cli --accumulate), and +# lay out a self-contained web demo — the cloud reconstructed from 2 photos, then +# 3, then 4, …, plus the input thumbnails and the accumulate.html viewer. +# +# MODEL=.cache/freesplatter-scene-f16.gguf \ +# scripts/make_accumulate_demo.sh OUT_DIR FRAME0 FRAME1 FRAME2 ... +# +# Then serve it: cd OUT_DIR && python3 -m http.server 8080 (open index.html) +set -euo pipefail +cd "$(dirname "$0")/.." +ROOT=$(pwd) +MODEL=${MODEL:-.cache/freesplatter-scene-f16.gguf} +MAXSPLATS=${MAXSPLATS:-500000} +# GPU by default (CPU is ~50x slower); DEVICE=cpu is the explicit opt-in. The CLI +# is resolved to the matching build, and the vulkan CLI is built if missing (it +# isn't part of the default CPU build) — mirroring scripts/serve.sh for the .so. +DEVICE=${DEVICE:-vulkan} +if [ "$DEVICE" = cpu ]; then + CLI=${CLI:-$ROOT/build/release/bin/free_splatter-cli} +else + CLI=${CLI:-$ROOT/build/vulkan/bin/free_splatter-cli} + if [ ! -x "$CLI" ]; then + echo "building vulkan CLI ($CLI) ..." >&2 + cmake -S "$ROOT" -B "$ROOT/build/vulkan" -DFREE_SPLATTER_VULKAN=ON -DFREE_SPLATTER_BUILD_SHARED=ON >&2 + cmake --build "$ROOT/build/vulkan" --target free_splatter-cli >&2 + fi +fi + +[ "$#" -ge 3 ] || { echo "usage: $0 OUT_DIR FRAME0 FRAME1 [FRAME2 ...] (>=2 frames)" >&2; exit 1; } +OUT=$1; shift +FRAMES=("$@") +mkdir -p "$OUT" + +# 1) input thumbnails: center-crop square + resize, mirroring the engine's own +# preprocessing so the strip shows exactly the view that was reconstructed. +for i in "${!FRAMES[@]}"; do + ffmpeg -y -loglevel error -i "${FRAMES[$i]}" \ + -vf "crop='min(iw,ih)':'min(iw,ih)',scale=360:360" "$OUT/view_$i.jpg" +done + +# 2) accumulate: one engine pass over the stream writes acc_2.splat, acc_3.splat, +# ... (the cloud after each KEPT photo) + a best-frame consensus acc_fused.splat. +# --min-parallax gates by keyframe: a candidate frame is folded in only if its +# triangulation angle vs the last KEPT frame clears MIN_PARALLAX deg (else its +# depth is ill-conditioned and the model would invent it); skipped frames re- +# anchor against the last kept one. This is the after-inference angle, which the +# model over-reports, so keep it well above COLMAP's 1-2 deg. MIN_PARALLAX=0 +# folds in every frame. --refine (OFF) is the gaussian averaging pass (blurs). +RFLAG=; [ "${REFINE:-0}" = 1 ] && RFLAG=--refine +MINPAR=${MIN_PARALLAX:-8} +PFLAG=; [ "$MINPAR" != "0" ] && PFLAG="--min-parallax $MINPAR" +LOG="$OUT/accumulate.log" +"$CLI" --device "$DEVICE" --accumulate $RFLAG $PFLAG --fuse --fuse-mode best \ + --splat-prefix "$OUT/acc" --max-splats "$MAXSPLATS" "$MODEL" "${FRAMES[@]}" | tee "$LOG" + +# 3) manifest.json: one step per KEPT photo (acc_n.splat + its n kept thumbnails), +# then the consensus-fused step. With gating only the frames that carried enough +# parallax are kept: frame 0 is the anchor and each "keep frame J" line adds +# input frame J. Ungated (MIN_PARALLAX=0), every frame is kept. +if [ -z "$PFLAG" ]; then + kept=(); for i in "${!FRAMES[@]}"; do kept+=("$i"); done +else + kept=(0) + while read -r j; do kept+=("$j"); done < <(grep -oP '^keep frame \K[0-9]+' "$LOG") +fi +nkept=${#kept[@]} +[ "$nkept" -ge 2 ] || { echo "only $nkept frame(s) kept (MIN_PARALLAX=$MINPAR too high?)" >&2; exit 1; } +thumb() { printf '"view_%s.jpg"' "$1"; } # thumbnail referenced by original index +{ + echo '{ "steps": [' + for ((k=2;k<=nkept;k++)); do + imgs="" + for ((j=0;j "$OUT/manifest.json" +echo "kept frames (input indices): ${kept[*]} of ${#FRAMES[@]}" + +# 4) the viewer, self-contained next to its assets +cp "$ROOT/server/web/accumulate.html" "$OUT/index.html" # single source; standalone-fallback to ./manifest.json +echo "demo baked -> $OUT" +echo "serve: (cd $OUT && python3 -m http.server 8080) then open http://localhost:8080/" diff --git a/scripts/parallax_ref.py b/scripts/parallax_ref.py new file mode 100644 index 0000000..eab6bfe --- /dev/null +++ b/scripts/parallax_ref.py @@ -0,0 +1,137 @@ +#!/usr/bin/env python3 +"""Independent (model-free) parallax reference for a frame pair. + +Dev-time validation ONLY (never shipped; runs in the nix devShell, which provides +cv2+numpy per flake.nix). The C++ engine's after-inference `--parallax` measures +depth conditioning from the model's OWN recovered geometry; this measures it from +raw 2D image evidence (feature matches -> epipolar geometry), so the two can be +cross-checked. Agreement on a known-good-depth scene validates the engine metric; +a large gap (geometric angle tiny, model angle confident) flags the model +hallucinating depth. + +Two outputs: + * R_H -- ORB-SLAM homography-vs-fundamental score ratio. ~>0.45 => a homography + explains the motion => pure-rotation / planar => no usable parallax. + Calibration-free. + * median triangulation angle (deg) -- COLMAP's depth-conditioning number; below + ~1-2 deg depth is ill-posed. Uses an assumed/supplied focal. + +Images are center-cropped square and resized to 512 to mirror the engine's +preprocessing, so a focal in 512-px units matches the model's estimated focal. + + nix develop -c python3 scripts/parallax_ref.py IMG0 IMG1 [--focal-px F] +""" +import argparse, sys +import numpy as np +import cv2 + + +def preprocess(path, size=512): + img = cv2.imread(path, cv2.IMREAD_COLOR) + if img is None: + sys.exit(f"cannot read {path}") + h, w = img.shape[:2] + s = min(h, w) + img = img[(h - s) // 2:(h - s) // 2 + s, (w - s) // 2:(w - s) // 2 + s] + return cv2.resize(img, (size, size), interpolation=cv2.INTER_AREA) + + +def match(img0, img1): + g0, g1 = (cv2.cvtColor(i, cv2.COLOR_BGR2GRAY) for i in (img0, img1)) + try: + det = cv2.SIFT_create(nfeatures=4000) + norm = cv2.NORM_L2 + except AttributeError: + det = cv2.ORB_create(nfeatures=4000) + norm = cv2.NORM_HAMMING + k0, d0 = det.detectAndCompute(g0, None) + k1, d1 = det.detectAndCompute(g1, None) + if d0 is None or d1 is None or len(k0) < 8 or len(k1) < 8: + sys.exit("too few features") + bf = cv2.BFMatcher(norm) + good = [m for m, n in bf.knnMatch(d0, d1, k=2) if m.distance < 0.75 * n.distance] + p0 = np.float32([k0[m.queryIdx].pt for m in good]) + p1 = np.float32([k1[m.trainIdx].pt for m in good]) + return p0, p1 + + +def score_homography(H, p0, p1, sigma=1.0): + if H is None: + return 0.0 + th, inv = 5.991, 1.0 / sigma ** 2 + Hi = np.linalg.inv(H) + def transfer(M, a, b): + ah = np.c_[a, np.ones(len(a))] + q = (M @ ah.T).T + q = q[:, :2] / q[:, 2:3] + chi = np.sum((q - b) ** 2, axis=1) * inv + return np.where(chi < th, th - chi, 0.0) + return float(np.sum(transfer(H, p0, p1) + transfer(Hi, p1, p0))) + + +def score_fundamental(F, p0, p1, sigma=1.0): + if F is None: + return 0.0 + thF, thScore, inv = 3.841, 5.991, 1.0 / sigma ** 2 + a0 = np.c_[p0, np.ones(len(p0))] + a1 = np.c_[p1, np.ones(len(p1))] + def epi(F, src, dst): # point-line chi in dst image + l = (F @ src.T).T # lines in dst + num = np.sum(l[:, :2] * dst, axis=1) + l[:, 2] + d2 = num ** 2 / (l[:, 0] ** 2 + l[:, 1] ** 2 + 1e-12) + chi = d2 * inv + return np.where(chi < thF, thScore - chi, 0.0) + s01 = epi(F, a0, p1) + s10 = epi(F.T, a1, p0) + return float(np.sum(s01 + s10)) + + +def triangulation_angle(p0, p1, f, cx=256.0, cy=256.0): + K = np.array([[f, 0, cx], [0, f, cy], [0, 0, 1]], np.float64) + E, _ = cv2.findEssentialMat(p0, p1, K, cv2.RANSAC, 0.999, 1.0) + if E is None or E.shape != (3, 3): + return None, 0 + _, R, t, mP = cv2.recoverPose(E, p0, p1, K) + inl = (mP.ravel() > 0) + if inl.sum() < 8: + return None, int(inl.sum()) + P0 = K @ np.eye(3, 4) + P1 = K @ np.hstack([R, t]) + X4 = cv2.triangulatePoints(P0, P1, p0[inl].T, p1[inl].T) + X = (X4[:3] / X4[3]).T + C1 = (-R.T @ t).ravel() + r0 = X - 0.0 + r1 = X - C1 + n0 = np.linalg.norm(r0, axis=1) + n1 = np.linalg.norm(r1, axis=1) + ok = (n0 > 1e-9) & (n1 > 1e-9) & (X[:, 2] > 0) + cosang = np.sum(r0[ok] * r1[ok], axis=1) / (n0[ok] * n1[ok]) + ang = np.degrees(np.arccos(np.clip(cosang, -1, 1))) + if len(ang) == 0: + return None, int(inl.sum()) + return float(np.median(ang)), int(ok.sum()) + + +def main(): + ap = argparse.ArgumentParser() + ap.add_argument("img0"); ap.add_argument("img1") + ap.add_argument("--focal-px", type=float, default=None, + help="focal in 512-px units; default from --fov-deg") + ap.add_argument("--fov-deg", type=float, default=60.0) + a = ap.parse_args() + f = a.focal_px if a.focal_px else 0.5 * 512 / np.tan(np.radians(a.fov_deg) / 2) + + i0, i1 = preprocess(a.img0), preprocess(a.img1) + p0, p1 = match(i0, i1) + H, _ = cv2.findHomography(p0, p1, cv2.RANSAC, 3.0) + F, _ = cv2.findFundamentalMat(p0, p1, cv2.FM_RANSAC, 3.0, 0.99) + sH, sF = score_homography(H, p0, p1), score_fundamental(F, p0, p1) + rH = sH / (sH + sF) if (sH + sF) > 0 else float("nan") + tri, ntri = triangulation_angle(p0, p1, f) + tri_s = f"{tri:.3f}" if tri is not None else "n/a" + print(f"parallax_ref: matches={len(p0)} R_H={rH:.3f} (>0.45 => degenerate) " + f"tri_angle={tri_s} deg (npts={ntri}) focal={f:.1f}px") + + +if __name__ == "__main__": + main() diff --git a/server/README.md b/server/README.md index e85b60d..ec424e4 100644 --- a/server/README.md +++ b/server/README.md @@ -9,6 +9,10 @@ This is a thin Go HTTP server that binds the `free_splatter` C API with engine — the numerical port, validation, and build details live in the repo root `CLAUDE.md`. +One server, one port. `/` is a **menu** linking to the demo pages, all served from +here: `/reconstruct.html` (photos → splat), `/accumulate.html` (accumulating scenes ++ video upload), `/demo.html` (storyboard). + ## Run ```sh @@ -26,7 +30,26 @@ CGO_ENABLED=0 go run . \ `net/http` the pure-Go resolver so the server builds on a bare `PATH`. Flags: `-addr` `-lib` `-models` (comma list of `name=path.gguf`; the first is the -default) `-device` (`cpu`|`vulkan`|`vulkan:N`) `-max-splats` `-opacity-threshold`. +default) `-device` (`vulkan` by default and **fail-closed** — it errors if no GPU +is present rather than silently using the slow CPU path; pass `cpu` to opt in) +`-max-splats` `-opacity-threshold` `-demo-dir` `-scenes-dir`. + +## Accumulating scenes + video upload (`/accumulate.html`) + +A second page plays the **growing-world** demo: pick a scene from the dropdown and +watch its reconstruction build up as photos are folded in. The scenes are the +accumulating-reconstruction dirs under `-scenes-dir` (default: the `-demo-dir` +value) — any subdir whose `manifest.json` has a `steps` array, produced by +`scripts/demo/bake-vids.sh` or by an upload. `GET /api/scenes` lists them (no +restart needed when new ones appear). + +**Upload a video → it makes the scene.** The page's "make scene from video" control +POSTs a clip to `POST /api/scene/from-video`; the server samples frames, runs the +keyframe **parallax gate** (`--min-parallax`, default 8°) and the accumulate + +best-frame fuse **in-process on the loaded GPU engine** (no shell-out, no second +model load), and writes the scene dir. Progress streams via +`GET /api/scene/status/{job}`; one bake runs at a time (HTTP 429 if busy). The new +scene then appears in the dropdown automatically. Needs `ffmpeg` on `PATH`. ## Models diff --git a/server/convert.go b/server/convert.go index 2655e2b..da0ca52 100644 --- a/server/convert.go +++ b/server/convert.go @@ -142,6 +142,52 @@ func encodeSplat(g []float32, opacThr float32, maxSplats int) ([]byte, int) { return buf, m } +// encodeCloudSplat converts an accumulator cloud ([]point) to antimatter15 .splat +// bytes — mirroring tools/free_splatter-cli.cpp write_cloud_splat + src/splat.h +// encode_splat_record. CRITICAL: unlike encodeSplat (raw gaussians), the cloud's +// R/G/B are ALREADY activated linear colour in [0,1] — do NOT apply the SH-DC +// transform. Importance = opacity*sx*sy*sz; sort+cap only when capping (else keep +// order). Scale is multiplied by scaleMult; pos/quat get the OpenCV->OpenGL flip. +func encodeCloudSplat(pts []point, maxSplats int, scaleMult float32) []byte { + n := len(pts) + idx := make([]int, n) + for i := range idx { + idx[i] = i + } + imp := func(i int) float64 { + p := pts[i] + vol := float64(maxf(p.Sx, 1e-9)) * float64(maxf(p.Sy, 1e-9)) * float64(maxf(p.Sz, 1e-9)) + return float64(maxf(p.Opacity, 0)) * vol + } + m := n + if maxSplats > 0 && n > maxSplats { + sort.Slice(idx, func(a, b int) bool { return imp(idx[a]) > imp(idx[b]) }) + m = maxSplats + idx = idx[:m] + } + buf := make([]byte, m*32) + for k, i := range idx { + p := pts[i] + o := k * 32 + putf(buf[o:], p.X) + putf(buf[o+4:], -p.Y) + putf(buf[o+8:], -p.Z) + putf(buf[o+12:], scaleMult*p.Sx) + putf(buf[o+16:], scaleMult*p.Sy) + putf(buf[o+20:], scaleMult*p.Sz) + buf[o+24] = u8(clamp01(float64(p.R)) * 255.0) + buf[o+25] = u8(clamp01(float64(p.G)) * 255.0) + buf[o+26] = u8(clamp01(float64(p.B)) * 255.0) + buf[o+27] = u8(clamp01(float64(p.Opacity)) * 255.0) + q := [4]float64{-float64(p.Qx), float64(p.Qw), -float64(p.Qz), float64(p.Qy)} + nrm := math.Sqrt(q[0]*q[0]+q[1]*q[1]+q[2]*q[2]+q[3]*q[3]) + 1e-12 + for c := 0; c < 4; c++ { + buf[o+28+c] = u8(q[c]/nrm*128.0 + 128.0) + } + } + return buf +} + func putf(b []byte, v float32) { binary.LittleEndian.PutUint32(b, math.Float32bits(v)) } func maxf(a, b float32) float32 { diff --git a/server/engine.go b/server/engine.go index 6166005..010b675 100644 --- a/server/engine.go +++ b/server/engine.go @@ -22,6 +22,19 @@ type geometry struct { gaussianChannels int32 } +// point mirrors free_splatter_point exactly: 14 float32 + 1 int32 = 60 bytes, all +// 4-byte aligned (no padding). Field ORDER is load-bearing across the C ABI. +type point struct { + X, Y, Z, R, G, B, Opacity, Sx, Sy, Sz, Qw, Qx, Qy, Qz float32 + Frame int32 +} + +// parallax mirrors free_splatter_parallax: 6 float32 + 1 int32 = 28 bytes. +type parallax struct { + TriAngleDeg, LateralAngleDeg, BaselineOverDepth, Baseline, MedianDepth, Focal float32 + NPoints int32 +} + // Lib is the dlopen'd shared library with the C API bound by name. type Lib struct { handle uintptr @@ -36,6 +49,15 @@ type Lib struct { run func(uintptr, unsafe.Pointer, int32, int32, int32, unsafe.Pointer, unsafe.Pointer) int32 bufFree func(uintptr) abiVersion func() int32 + + // accumulator + parallax (for in-process video->scene baking) + accNew func(int32, int32, float32) uintptr + accFree func(uintptr) + accAddPair func(uintptr, unsafe.Pointer, int32) int32 + accFrameCnt func(uintptr) int32 + accCloud func(uintptr, unsafe.Pointer, unsafe.Pointer) int32 + accFuse func(uintptr, float32, int32, int32, unsafe.Pointer, unsafe.Pointer) int32 + parallaxFn func(unsafe.Pointer, int32, int32, int32, int32, float32, unsafe.Pointer) int32 } // Engine is one loaded model (context) belonging to a Lib. @@ -79,6 +101,13 @@ func OpenLib(libPath string) (*Lib, error) { purego.RegisterLibFunc(&l.run, handle, "free_splatter_run") purego.RegisterLibFunc(&l.bufFree, handle, "free_splatter_buf_free") purego.RegisterLibFunc(&l.abiVersion, handle, "free_splatter_abi_version") + purego.RegisterLibFunc(&l.accNew, handle, "free_splatter_accumulator_new") + purego.RegisterLibFunc(&l.accFree, handle, "free_splatter_accumulator_free") + purego.RegisterLibFunc(&l.accAddPair, handle, "free_splatter_accumulator_add_pair") + purego.RegisterLibFunc(&l.accFrameCnt, handle, "free_splatter_accumulator_frame_count") + purego.RegisterLibFunc(&l.accCloud, handle, "free_splatter_accumulator_cloud") + purego.RegisterLibFunc(&l.accFuse, handle, "free_splatter_accumulator_fuse") + purego.RegisterLibFunc(&l.parallaxFn, handle, "free_splatter_pair_parallax") if v := l.abiVersion(); v != 1 { purego.Dlclose(handle) return nil, fmt.Errorf("ABI mismatch: library reports %d, expected 1", v) @@ -149,3 +178,55 @@ func (e *Engine) Close() { e.ctx = 0 } } + +// --- accumulator + parallax helpers (in-process video->scene baking) --- + +// pointsFromC copies a malloc'd free_splatter_point[n] to a Go slice and frees it. +func (l *Lib) pointsFromC(ptr uintptr, n uint64) []point { + if ptr == 0 { + return nil + } + out := make([]point, int(n)) + if n > 0 { + copy(out, unsafe.Slice((*point)(unsafe.Pointer(ptr)), int(n))) + } + l.bufFree(ptr) + return out +} + +// accCloudCopy returns a copy of the accumulator's current cloud. +func (l *Lib) accCloudCopy(acc uintptr) []point { + var ptr uintptr + var n uint64 + if l.accCloud(acc, unsafe.Pointer(&ptr), unsafe.Pointer(&n)) != 0 { + return nil + } + return l.pointsFromC(ptr, n) +} + +// accFuseCopy returns the consensus-fused cloud (voxelFrac, k, mode: 0 avg/1 kept/2 best). +func (l *Lib) accFuseCopy(acc uintptr, voxelFrac float32, k, mode int32) []point { + var ptr uintptr + var n uint64 + if l.accFuse(acc, voxelFrac, k, mode, unsafe.Pointer(&ptr), unsafe.Pointer(&n)) != 0 { + return nil + } + return l.pointsFromC(ptr, n) +} + +// pairParallax computes the after-inference parallax of a 2-view gaussian buffer +// (the layout Engine.Run returns for nViews=2). +func (l *Lib) pairParallax(g []float32, h, w, gc int32, thr float32) (parallax, bool) { + var px parallax + rc := l.parallaxFn(unsafe.Pointer(&g[0]), 2, h, w, gc, thr, unsafe.Pointer(&px)) + runtime.KeepAlive(g) + return px, rc == 0 +} + +// accAddPairSlice folds one pair's gaussians into the accumulator (the C side +// copies them, so g may be reused after). Returns 0 on success. +func (l *Lib) accAddPairSlice(acc uintptr, g []float32, gc int32) int32 { + rc := l.accAddPair(acc, unsafe.Pointer(&g[0]), gc) + runtime.KeepAlive(g) + return rc +} diff --git a/server/main.go b/server/main.go index 8afee0b..981f2a0 100644 --- a/server/main.go +++ b/server/main.go @@ -6,10 +6,14 @@ // cd server && go run . [-models scene=A.gguf,object=B.gguf] [-device vulkan] [-addr :8080] // // REST API: -// GET /api/models -> [{name,label,hint,views,size}, ...] -// POST /api/reconstruct multipart "images" (>=1) + optional "model" -> {id,model,n_views,n_splats,size,seconds} -// GET /api/splat/{id} -> the .splat bytes (importance-sorted, 32B/splat) -// GET / -> embedded frontend (web/) +// GET /api/models -> [{name,label,hint,views,size}, ...] +// POST /api/reconstruct multipart "images" (>=1) + optional "model" -> {id,model,n_views,n_splats,size,seconds} +// GET /api/splat/{id} -> the .splat bytes (importance-sorted, 32B/splat) +// GET /api/scenes -> accumulating-reconstruction scenes [{name,label,steps,thumb,source}, ...] +// POST /api/scene/from-video multipart "video" + "name" -> {job} (gated in-process accumulate) +// GET /api/scene/status/{job} -> {state,total,done,kept,scene,error} +// GET /scenes-assets/... -> a scene's manifest.json + .splat + thumbnails +// GET / -> menu; /reconstruct.html, /accumulate.html, /demo.html are the pages package main import ( @@ -53,12 +57,17 @@ type server struct { opts options bgRemoveCmd string // external matting command ("" = feature disabled) demoDir string // static dir served at /demo-assets/ (manifest + splats + images); "" = off + scenesDir string // accumulating-reconstruction scenes (per-subdir steps-manifest); "" = off workDir string // scratch root for demo video frames + encoded MP4s mu sync.Mutex results map[string][]byte order2 []string counter int64 + + jobsMu sync.Mutex + jobs map[string]*sceneJob + bakeSem chan struct{} // capacity 1: at most one video->scene bake at a time } func respondJSON(w http.ResponseWriter, v any) { @@ -255,6 +264,7 @@ func main() { bgCmd := flag.String("bgremove-cmd", "", "external background-removal command for the object path ({in}/{out} are batch dirs), e.g. 'rembg p -m u2netp {in} {out}'") demoDir := flag.String("demo-dir", "", "directory of baked demo assets (manifest.json + .splat + images) served at /demo-assets/") + scenesDir := flag.String("scenes-dir", "", "root of accumulating-reconstruction scenes (per-subdir steps-manifest); listed at /api/scenes, served at /scenes-assets/, written by video uploads. Default: the -demo-dir value") workDir := flag.String("work-dir", filepath.Join(os.TempDir(), "freesplatter-demo"), "scratch root for demo video frames + encoded MP4s") flag.Parse() @@ -274,13 +284,20 @@ func main() { if *demoDir != "" { demoAbs, _ = filepath.Abs(*demoDir) } + scenesAbs := demoAbs // default: scenes live alongside the baked demo assets + if *scenesDir != "" { + scenesAbs, _ = filepath.Abs(*scenesDir) + } s := &server{ models: map[string]*Engine{}, opts: options{maxSplats: *maxSplats, opacThr: float32(*opacThr)}, bgRemoveCmd: *bgCmd, demoDir: demoAbs, + scenesDir: scenesAbs, workDir: *workDir, results: map[string][]byte{}, + jobs: map[string]*sceneJob{}, + bakeSem: make(chan struct{}, 1), } if *bgCmd != "" { log.Printf("background removal enabled: %s", *bgCmd) @@ -309,10 +326,17 @@ func main() { mux.HandleFunc("/api/demo/frame", s.handleDemoFrame) mux.HandleFunc("/api/demo/encode", s.handleDemoEncode) mux.HandleFunc("/api/demo/video/", s.handleDemoVideo) + mux.HandleFunc("/api/scenes", s.handleScenes) + mux.HandleFunc("/api/scene/from-video", s.handleSceneFromVideo) + mux.HandleFunc("/api/scene/status/", s.handleSceneStatus) if s.demoDir != "" { mux.Handle("/demo-assets/", http.StripPrefix("/demo-assets/", http.FileServer(http.Dir(s.demoDir)))) log.Printf("serving demo assets from %s at /demo-assets/ (work dir %s)", s.demoDir, s.workDir) } + if s.scenesDir != "" { + mux.Handle("/scenes-assets/", http.StripPrefix("/scenes-assets/", http.FileServer(http.Dir(s.scenesDir)))) + log.Printf("serving scenes from %s at /scenes-assets/ (list: /api/scenes)", s.scenesDir) + } mux.Handle("/", http.FileServer(http.FS(sub))) log.Printf("serving %d model(s) %v on http://localhost%s", len(s.order), s.order, *addr) diff --git a/server/scenes.go b/server/scenes.go new file mode 100644 index 0000000..36dd3de --- /dev/null +++ b/server/scenes.go @@ -0,0 +1,364 @@ +// scenes.go — the accumulating-reconstruction side of the web app: list the baked +// "growing world" scenes, and generate a new one from an uploaded video in-process +// (reuse the already-loaded GPU engine, no shell-out, no second model load). +// +// GET /api/scenes -> [{name,label,steps,thumb,source}, ...] +// POST /api/scene/from-video multipart "video" + "name" -> {job} +// GET /api/scene/status/{job} -> {state,total,done,kept,scene,error} +// GET /scenes-assets//... static scene files (manifest.json, *.splat, view_*.jpg) +// +// A scene is a subdir of scenesDir whose manifest.json has a top-level "steps" +// array (acc_2.splat, acc_3.splat, ... + a fused step) — the format produced by +// scripts/make_accumulate_demo.sh and by handleSceneFromVideo here. The viewer +// (server/web/accumulate.html) plays the steps as the reconstruction grows. +package main + +import ( + "encoding/json" + "fmt" + "io" + "net/http" + "os" + "os/exec" + "path/filepath" + "sort" + "strconv" + "strings" +) + +// --- manifest format (shared with server/web/accumulate.html + make_accumulate_demo.sh) --- + +type manifestStep struct { + Splat string `json:"splat"` + Images []string `json:"images"` + N int `json:"n"` + Label string `json:"label,omitempty"` +} +type sceneManifest struct { + Steps []manifestStep `json:"steps"` +} + +// --- listing ------------------------------------------------------------------- + +type sceneInfo struct { + Name string `json:"name"` + Label string `json:"label"` + Steps int `json:"steps"` + Thumb string `json:"thumb"` // /scenes-assets//, for the picker + Source string `json:"source"` +} + +// scanScenes lists the accumulating-reconstruction scenes under scenesDir: each +// immediate subdir with a manifest.json that has a non-empty "steps" array. Cheap +// enough to run per request, so externally-baked and uploaded scenes appear with +// no restart. +func (s *server) scanScenes() []sceneInfo { + if s.scenesDir == "" { + return nil + } + ents, err := os.ReadDir(s.scenesDir) + if err != nil { + return nil + } + out := []sceneInfo{} + for _, e := range ents { + if !e.IsDir() { + continue + } + name := e.Name() + raw, err := os.ReadFile(filepath.Join(s.scenesDir, name, "manifest.json")) + if err != nil { + continue + } + var m sceneManifest + if json.Unmarshal(raw, &m) != nil || len(m.Steps) == 0 { + continue // storyboard manifests ("stations") and malformed dirs are skipped + } + thumb := "" + if len(m.Steps[0].Images) > 0 { + thumb = "/scenes-assets/" + name + "/" + m.Steps[0].Images[0] + } + label := strings.ReplaceAll(name, "-", " ") + out = append(out, sceneInfo{ + Name: name, Label: label, Steps: len(m.Steps), Thumb: thumb, Source: "baked", + }) + } + sort.Slice(out, func(a, b int) bool { return out[a].Name < out[b].Name }) + return out +} + +func (s *server) handleScenes(w http.ResponseWriter, r *http.Request) { + respondJSON(w, map[string]any{"scenes": s.scanScenes()}) +} + +// --- video -> scene (async) ---------------------------------------------------- + +type sceneJob struct { + State string `json:"state"` // "running" | "done" | "error" + Total int `json:"total"` // candidate frames + Done int `json:"done"` // candidates processed + Kept int `json:"kept"` // frames folded in so far + Scene string `json:"scene"` // slug, set on done + Err string `json:"error,omitempty"` +} + +func (s *server) putJob(id string, mutate func(*sceneJob)) { + s.jobsMu.Lock() + defer s.jobsMu.Unlock() + j := s.jobs[id] + if j == nil { + j = &sceneJob{} + s.jobs[id] = j + } + mutate(j) +} + +func (s *server) handleSceneStatus(w http.ResponseWriter, r *http.Request) { + id := strings.TrimPrefix(r.URL.Path, "/api/scene/status/") + s.jobsMu.Lock() + j := s.jobs[id] + var cp sceneJob + if j != nil { + cp = *j + } + s.jobsMu.Unlock() + if j == nil { + http.Error(w, "no such job", http.StatusNotFound) + return + } + respondJSON(w, cp) +} + +// handleSceneFromVideo accepts a clip, returns a job id immediately, and bakes the +// gated accumulating scene in a background goroutine (max one at a time). +func (s *server) handleSceneFromVideo(w http.ResponseWriter, r *http.Request) { + if r.Method != http.MethodPost { + http.Error(w, "POST only", http.StatusMethodNotAllowed) + return + } + if s.scenesDir == "" { + http.Error(w, "no scenes dir configured (-scenes-dir / -demo-dir)", http.StatusServiceUnavailable) + return + } + if err := r.ParseMultipartForm(256 << 20); err != nil { + http.Error(w, "bad multipart form: "+err.Error(), http.StatusBadRequest) + return + } + name, ok := safeToken(r.FormValue("name")) + if !ok { + http.Error(w, "bad or missing scene name (a-z0-9-_ only)", http.StatusBadRequest) + return + } + minParallax := float32(8) + if v, err := strconv.ParseFloat(r.FormValue("min_parallax"), 32); err == nil { + minParallax = float32(v) + } + maxFrames := 24 + if v, err := strconv.Atoi(r.FormValue("max_frames")); err == nil && v >= 2 && v <= 200 { + maxFrames = v + } + file, _, err := r.FormFile("video") + if err != nil { + http.Error(w, `no video (expected multipart field "video")`, http.StatusBadRequest) + return + } + defer file.Close() + + // one bake at a time — a bake monopolizes the single engine for many GPU-seconds. + select { + case s.bakeSem <- struct{}{}: + default: + http.Error(w, "a scene bake is already running; try again shortly", http.StatusTooManyRequests) + return + } + + jobDir := filepath.Join(s.workDir, "scene", name) + if err := os.MkdirAll(jobDir, 0o755); err != nil { + <-s.bakeSem + http.Error(w, "mkdir: "+err.Error(), http.StatusInternalServerError) + return + } + inPath := filepath.Join(jobDir, "input") + in, err := os.Create(inPath) + if err != nil { + <-s.bakeSem + http.Error(w, "save upload: "+err.Error(), http.StatusInternalServerError) + return + } + if _, err := io.Copy(in, file); err != nil { + in.Close() + <-s.bakeSem + http.Error(w, "save upload: "+err.Error(), http.StatusInternalServerError) + return + } + in.Close() + + jobID := name // one bake at a time + scene name as slug -> stable status URL + s.putJob(jobID, func(j *sceneJob) { *j = sceneJob{State: "running"} }) + + go func() { + defer func() { <-s.bakeSem }() + scene, err := s.bakeSceneFromVideo(name, inPath, jobDir, minParallax, maxFrames) + s.putJob(jobID, func(j *sceneJob) { + if err != nil { + j.State, j.Err = "error", err.Error() + return + } + j.State, j.Scene = "done", scene + }) + }() + + respondJSON(w, map[string]any{"job": jobID, "name": name}) +} + +// bakeSceneFromVideo: sample frames -> gated in-process accumulate -> scene dir. +func (s *server) bakeSceneFromVideo(name, inPath, jobDir string, minParallax float32, maxFrames int) (string, error) { + eng := s.models[s.order[0]] // scene model is the default/first + if eng == nil { + return "", fmt.Errorf("no scene model loaded") + } + sz := int(eng.Geo.imageWidth) + gc := int(eng.Geo.gaussianChannels) + + // 1) decode all frames, then even-stride down to <= maxFrames (time-lapse clips + // are short; decoding all avoids frame-count guessing — mirrors bake-vids.sh). + frameDir := filepath.Join(jobDir, "frames") + os.RemoveAll(frameDir) + if err := os.MkdirAll(frameDir, 0o755); err != nil { + return "", err + } + if err := runFFmpeg("-i", inPath, filepath.Join(frameDir, "all%04d.png")); err != nil { + return "", fmt.Errorf("frame extraction: %w", err) + } + all, _ := filepath.Glob(filepath.Join(frameDir, "all*.png")) + sort.Strings(all) + if len(all) < 2 { + return "", fmt.Errorf("video decoded to %d frames (need >= 2)", len(all)) + } + stride := (len(all) + maxFrames - 1) / maxFrames + if stride < 1 { + stride = 1 + } + var pngs []string + for i := 0; i < len(all); i += stride { + pngs = append(pngs, all[i]) + } + + // 2) preprocess sampled frames to model input (CHW [0,1]). + frames := make([][]float32, len(pngs)) + for i, p := range pngs { + data, err := os.ReadFile(p) + if err != nil { + return "", err + } + px, err := preprocessImage(data, sz) + if err != nil { + return "", fmt.Errorf("preprocess %s: %w", filepath.Base(p), err) + } + frames[i] = px + } + s.putJob(name, func(j *sceneJob) { j.Total = len(frames) - 1 }) + + // 3) gated in-process accumulate (mirrors free_splatter-cli.cpp keyframe loop). + acc := eng.lib.accNew(int32(sz), int32(sz), s.opts.opacThr) + if acc == 0 { + return "", fmt.Errorf("accumulator alloc failed") + } + defer eng.lib.accFree(acc) + + sceneDir := filepath.Join(s.scenesDir, name) + if err := os.MkdirAll(sceneDir, 0o755); err != nil { + return "", err + } + // frame 0 is the anchor; thumbnail it as view_0.jpg. + if err := thumbnail(pngs[0], filepath.Join(sceneDir, "view_0.jpg")); err != nil { + return "", err + } + + keptIdx := []int{0} + lastKept := 0 + for j := 1; j < len(frames); j++ { + pair := make([]float32, 0, 2*len(frames[0])) + pair = append(pair, frames[lastKept]...) + pair = append(pair, frames[j]...) + g, err := eng.Run(pair, 2) + if err != nil { + return "", fmt.Errorf("inference pair (%d,%d): %w", lastKept, j, err) + } + if minParallax > 0 { + if px, ok := eng.lib.pairParallax(g, int32(sz), int32(sz), int32(gc), s.opts.opacThr); ok && px.TriAngleDeg < minParallax { + s.putJob(name, func(jb *sceneJob) { jb.Done = j }) + continue // re-anchor: lastKept unchanged, try the next candidate + } + } + if eng.lib.accAddPairSlice(acc, g, int32(gc)) != 0 { + return "", fmt.Errorf("add_pair (%d,%d) failed", lastKept, j) + } + nframes := int(eng.lib.accFrameCnt(acc)) + cloud := eng.lib.accCloudCopy(acc) + splat := encodeCloudSplat(cloud, s.opts.maxSplats, 1.0) + if err := os.WriteFile(filepath.Join(sceneDir, fmt.Sprintf("acc_%d.splat", nframes)), splat, 0o644); err != nil { + return "", err + } + keptIdx = append(keptIdx, j) + if err := thumbnail(pngs[j], filepath.Join(sceneDir, fmt.Sprintf("view_%d.jpg", len(keptIdx)-1))); err != nil { + return "", err + } + lastKept = j + s.putJob(name, func(jb *sceneJob) { jb.Done, jb.Kept = j, len(keptIdx) }) + } + if len(keptIdx) < 2 { + return "", fmt.Errorf("only %d frame kept (min-parallax %.0f too high for this clip?)", len(keptIdx), minParallax) + } + + // 4) consensus-fused surface (best-frame, k=2, voxel 0.02 — same as the CLI bake). + fused := encodeCloudSplat(eng.lib.accFuseCopy(acc, 0.02, 2, 2), s.opts.maxSplats, 1.0) + if err := os.WriteFile(filepath.Join(sceneDir, "acc_fused.splat"), fused, 0o644); err != nil { + return "", err + } + + // 5) manifest: one step per kept frame (acc_2..acc_N) + the fused step. + if err := writeSceneManifest(sceneDir, len(keptIdx)); err != nil { + return "", err + } + return name, nil +} + +// writeSceneManifest emits the steps manifest from `nkept` kept frames, referencing +// view_0.jpg .. view_.jpg per step — identical shape to make_accumulate_demo.sh. +func writeSceneManifest(dir string, nkept int) error { + var m sceneManifest + imgs := func(k int) []string { + out := make([]string, k) + for j := 0; j < k; j++ { + out[j] = fmt.Sprintf("view_%d.jpg", j) + } + return out + } + for k := 2; k <= nkept; k++ { + m.Steps = append(m.Steps, manifestStep{Splat: fmt.Sprintf("acc_%d.splat", k), Images: imgs(k), N: k}) + } + m.Steps = append(m.Steps, manifestStep{ + Splat: "acc_fused.splat", Images: imgs(nkept), N: nkept, + Label: "consensus-fused — best-frame, parallax-gated", + }) + raw, err := json.MarshalIndent(m, "", " ") + if err != nil { + return err + } + return os.WriteFile(filepath.Join(dir, "manifest.json"), raw, 0o644) +} + +// thumbnail center-crops src to a square and scales to 360px, mirroring the engine +// preprocessing (so the filmstrip shows exactly the reconstructed view). +func thumbnail(src, dst string) error { + return runFFmpeg("-i", src, "-vf", "crop='min(iw,ih)':'min(iw,ih)',scale=360:360", dst) +} + +func runFFmpeg(args ...string) error { + full := append([]string{"-y", "-loglevel", "error"}, args...) + if out, err := exec.Command("ffmpeg", full...).CombinedOutput(); err != nil { + return fmt.Errorf("ffmpeg: %v: %s", err, strings.TrimSpace(string(out))) + } + return nil +} diff --git a/server/web/accumulate.html b/server/web/accumulate.html new file mode 100644 index 0000000..5b6692e --- /dev/null +++ b/server/web/accumulate.html @@ -0,0 +1,489 @@ + + + + + + +free-splatter.cpp — accumulating reconstruction + + + + +
+ ‹ all demos +

accumulating reconstruction

+
free-splatter.cpp — one world from a growing photo stream.
+
+
photos folded in0
+
splats0
+
fps
+
starting…
+
+ +
+
+
+ + + +
+
+
+
+
+ +
drag to orbit · scroll to zoom · the cloud grows as each photo is added
+ + + + diff --git a/server/web/demo.html b/server/web/demo.html index f0549c2..ab78579 100644 --- a/server/web/demo.html +++ b/server/web/demo.html @@ -45,6 +45,7 @@
+ ‹ all demos

free-splatter.cpp · demo render

loading storyboard…
diff --git a/server/web/index.html b/server/web/index.html index 2c0cb09..cc73877 100644 --- a/server/web/index.html +++ b/server/web/index.html @@ -1,554 +1,60 @@ -free-splatter.cpp — photos to gaussian splat +free-splatter.cpp — demos - - -
-
-

Photos → 3D gaussian splat

- -
Drop photos of the same scene from different angles — they must overlap. One feed-forward pass on the GPU (Vulkan).
-
-
📷 Drag photos here, or click to choose
-
JPG / PNG · overlapping views
-
- -
- -
- +
+

free-splatter.cpp

+

FreeSplatter, in C++/GGML on the GPU — pick a demo.

-
- -
-

free-splatter.cpp

-
3D gaussian scene from your photos.
-
views
-
splats0
-
reconstruct
-
fps
-
render100%
- - -
- - -
drag to orbit · scroll to zoom · WASD/arrows to move
- - +
one server · Vulkan engine · WebGL render
diff --git a/server/web/reconstruct.html b/server/web/reconstruct.html new file mode 100644 index 0000000..9ab2a73 --- /dev/null +++ b/server/web/reconstruct.html @@ -0,0 +1,556 @@ + + + + + + +free-splatter.cpp — photos to gaussian splat + + + + + +
+
+ ‹ all demos +

Photos → 3D gaussian splat

+ +
Drop photos of the same scene from different angles — they must overlap. One feed-forward pass on the GPU (Vulkan).
+
+
📷 Drag photos here, or click to choose
+
JPG / PNG · overlapping views
+
+ +
+ +
+ +
+
+ +
+ ‹ all demos +

free-splatter.cpp

+
3D gaussian scene from your photos.
+
views
+
splats0
+
reconstruct
+
fps
+
render100%
+ + +
+
+
build-up100%
+ +
+ + +
+
+
+ +
drag to orbit · scroll to zoom · WASD/arrows to move
+ + + + diff --git a/src/free_splatter.cpp b/src/free_splatter.cpp index 3927622..14b8595 100644 --- a/src/free_splatter.cpp +++ b/src/free_splatter.cpp @@ -3,12 +3,16 @@ #include "image.h" #include "model.h" #include "options.h" +#include "pose.h" +#include +#include #include #include #include #include #include +#include // ---- opaque types ---------------------------------------------------------- @@ -90,7 +94,10 @@ free_splatter_ctx * free_splatter_load(const char * gguf_path, const free_splatt if (!ctx) return nullptr; if (opts) ctx->opts = opts->o; - if (!ctx->m.load(gguf_path, ctx->opts.device.empty() ? "cpu" : ctx->opts.device, + // Default to GPU and fail-closed: an unset/empty device resolves to vulkan, so + // a caller who never set one can't silently land on the (~50x slower) CPU path. + // CPU is opt-in only (explicit "cpu"). backend.cpp errors if no GPU is present. + if (!ctx->m.load(gguf_path, ctx->opts.device.empty() ? "vulkan" : ctx->opts.device, ctx->opts.n_threads)) { ctx->error = ctx->m.error; } @@ -156,3 +163,220 @@ int free_splatter_run(free_splatter_ctx * ctx, const float * images, int32_t n_v } void free_splatter_buf_free(void * buf) { free(buf); } + +// ---- downstream pose recovery + accumulation ------------------------------- + +namespace { + +// De-interleave per-view contiguous points (3*P) + opacity (P) from a +// [n_views,H,W,gc] engine buffer. Returns per-view base pointers into `store`. +void deinterleave(const float * g, int nv, int P, int gc, + std::vector> & pts, + std::vector> & ops, + std::vector & pptr, + std::vector & optr) { + pts.assign(nv, {}); ops.assign(nv, {}); + pptr.resize(nv); optr.resize(nv); + for (int v = 0; v < nv; v++) { + pts[v].resize((size_t) 3 * P); ops[v].resize(P); + for (int i = 0; i < P; i++) { + const float * a = &g[(size_t) (v * P + i) * gc]; + pts[v][3*i] = a[0]; pts[v][3*i+1] = a[1]; pts[v][3*i+2] = a[2]; + ops[v][i] = a[15]; + } + pptr[v] = pts[v].data(); optr[v] = ops[v].data(); + } +} + +// Copy a pose vector's cloud into a malloc'd C array (ABI-stable layout). +int emit_points(const std::vector & src, + free_splatter_point ** out, size_t * n_out) { + *out = (free_splatter_point *) malloc(src.size() * sizeof(free_splatter_point)); + if (!*out && !src.empty()) return -1; + for (size_t i = 0; i < src.size(); i++) { + (*out)[i].x = src[i].x; (*out)[i].y = src[i].y; (*out)[i].z = src[i].z; + (*out)[i].r = src[i].r; (*out)[i].g = src[i].g; (*out)[i].b = src[i].b; (*out)[i].opacity = src[i].opacity; + (*out)[i].sx = src[i].sx; (*out)[i].sy = src[i].sy; (*out)[i].sz = src[i].sz; + (*out)[i].qw = src[i].qw; (*out)[i].qx = src[i].qx; (*out)[i].qy = src[i].qy; (*out)[i].qz = src[i].qz; + (*out)[i].frame = src[i].frame; + } + *n_out = src.size(); + return 0; +} + +} // namespace + +struct free_splatter_accumulator { + free_splatter::pose::Accumulator acc; + int gc = 0; + free_splatter_accumulator(int H, int W, double thr) : acc(H, W, thr) {} +}; + +int free_splatter_estimate_poses(const float * gaussians, int32_t n_views, int32_t height, + int32_t width, int32_t gaussian_channels, + float opacity_threshold, float * cam2world_out, float * focal_out) { + if (!gaussians || !cam2world_out || n_views < 1 || height < 1 || width < 1 || gaussian_channels < 16) + return -1; + const int P = height * width; + std::vector> pts, ops; + std::vector pptr, optr; + deinterleave(gaussians, n_views, P, gaussian_channels, pts, ops, pptr, optr); + free_splatter::pose::PoseResult pr = + free_splatter::pose::estimate_poses(pptr, optr, height, width, opacity_threshold); + for (int v = 0; v < n_views; v++) + for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++) + cam2world_out[v*16 + i*4 + j] = (float) pr.cam2world[v](i, j); + if (focal_out) *focal_out = (float) pr.focal; + return 0; +} + +int free_splatter_pair_parallax(const float * gaussians, int32_t n_views, int32_t height, + int32_t width, int32_t gaussian_channels, + float opacity_threshold, free_splatter_parallax * out) { + if (!gaussians || !out || n_views < 2 || height < 1 || width < 1 || gaussian_channels < 16) + return -1; + const int P = height * width; + std::vector> pts, ops; + std::vector pptr, optr; + deinterleave(gaussians, n_views, P, gaussian_channels, pts, ops, pptr, optr); + free_splatter::pose::Parallax px = + free_splatter::pose::pair_parallax(pptr, optr, height, width, opacity_threshold); + out->tri_angle_deg = (float) px.tri_angle_deg; + out->lateral_angle_deg = (float) px.lateral_angle_deg; + out->baseline_over_depth = (float) px.baseline_over_depth; + out->baseline = (float) px.baseline; + out->median_depth = (float) px.median_depth; + out->focal = (float) px.focal; + out->n_points = px.n_points; + return 0; +} + +free_splatter_accumulator * free_splatter_accumulator_new(int32_t height, int32_t width, + float opacity_threshold) { + if (height < 1 || width < 1) return nullptr; + return new (std::nothrow) free_splatter_accumulator(height, width, opacity_threshold); +} + +void free_splatter_accumulator_free(free_splatter_accumulator * acc) { delete acc; } + +int free_splatter_accumulator_add_pair(free_splatter_accumulator * acc, const float * gaussians, + int32_t gaussian_channels) { + if (!acc || !gaussians || gaussian_channels < 16) return -1; + acc->gc = gaussian_channels; + const free_splatter::pose::ChainLink link = acc->acc.add_pair(gaussians, gaussian_channels); + if (std::getenv("FREE_SPLATTER_DEBUG_LINKS")) { + // Surface the per-link registration diagnostics + the COMPOUNDED global drift + // (open-loop chain T_k = T_{k-1}.sim, so per-link error accumulates). + const auto & g = link.global; + const double tn = std::sqrt(g.t[0]*g.t[0] + g.t[1]*g.t[1] + g.t[2]*g.t[2]); + double c = (g.R(0,0) + g.R(1,1) + g.R(2,2) - 1.0) * 0.5; + c = c > 1.0 ? 1.0 : (c < -1.0 ? -1.0 : c); + std::fprintf(stderr, + "[link] pair=%d local: s=%.4f inlier=%.3f valid=%.3f resid=%.4f | " + "global: s=%.4f rot=%.2fdeg |t|=%.4f\n", + acc->acc.pair_count(), link.sim.s, link.inlier_frac, link.valid_frac, + link.resid_frac, g.s, std::acos(c) * 57.29577951308232, tn); + } + return 0; +} + +int free_splatter_accumulator_frame_count(const free_splatter_accumulator * acc) { + return acc ? acc->acc.frame_count() : 0; +} + +int free_splatter_accumulator_cloud(const free_splatter_accumulator * acc, + free_splatter_point ** out, size_t * n_out) { + if (out) *out = nullptr; + if (n_out) *n_out = 0; + if (!acc || !out || !n_out) return -1; + return emit_points(acc->acc.cloud(), out, n_out); +} + +int free_splatter_accumulator_fuse(const free_splatter_accumulator * acc, float voxel_frac, + int32_t k, int32_t mode, free_splatter_point ** out, size_t * n_out) { + if (out) *out = nullptr; + if (n_out) *n_out = 0; + if (!acc || !out || !n_out || k < 1 || !(voxel_frac > 0)) return -1; + std::vector fused; + free_splatter::pose::consensus_fuse(acc->acc.cloud(), voxel_frac, k, fused, mode); + return emit_points(fused, out, n_out); +} + +int free_splatter_refine_cloud(free_splatter_point * points, size_t n, + float voxel_frac, int32_t iters, float alpha) { + if (!points || n == 0 || !(voxel_frac > 0) || iters < 1) return -1; + // free_splatter_point and pose::AccumPoint are the same trivially-copyable layout + // (xyz, rgba/opacity, scale, quat, frame); copy into the typed vector and back. + std::vector cloud(n); + std::memcpy(cloud.data(), points, n * sizeof(free_splatter_point)); + free_splatter::pose::consensus_refine(cloud, voxel_frac, iters, alpha); + std::memcpy(points, cloud.data(), n * sizeof(free_splatter_point)); + return 0; +} + +int free_splatter_accumulator_refine(free_splatter_accumulator * acc, float voxel_frac, + int32_t iters, float alpha) { + if (!acc || !(voxel_frac > 0) || iters < 1) return -1; + acc->acc.refine(voxel_frac, iters, alpha); + return 0; +} + +int free_splatter_fuse_cloud(const free_splatter_point * pts, size_t n, float voxel_frac, + int32_t k, int32_t mode, free_splatter_point ** out, size_t * n_out) { + if (out) *out = nullptr; + if (n_out) *n_out = 0; + if (!pts || !out || !n_out || k < 1 || !(voxel_frac > 0)) return -1; + std::vector cloud(n); + for (size_t i = 0; i < n; i++) { + const free_splatter_point & p = pts[i]; free_splatter::pose::AccumPoint & a = cloud[i]; + a.x = p.x; a.y = p.y; a.z = p.z; a.r = p.r; a.g = p.g; a.b = p.b; a.opacity = p.opacity; + a.sx = p.sx; a.sy = p.sy; a.sz = p.sz; a.qw = p.qw; a.qx = p.qx; a.qy = p.qy; a.qz = p.qz; + a.frame = p.frame; + } + std::vector fused; + free_splatter::pose::consensus_fuse(cloud, voxel_frac, k, fused, mode); + return emit_points(fused, out, n_out); +} + +int free_splatter_tree_overlap(const float * const * pairs, int32_t n_pairs, int32_t gaussian_channels, + int32_t height, int32_t width, float opacity_threshold, + int32_t block, int32_t overlap, + int32_t max_levels, float layout_spacing, int32_t per_node_cap, + free_splatter_point ** out, size_t * n_out, int32_t * n_nodes) { + if (out) *out = nullptr; + if (n_out) *n_out = 0; + if (n_nodes) *n_nodes = 0; + if (!pairs || n_pairs < 1 || gaussian_channels < 16 || height < 1 || width < 1 || !out || !n_out) + return -1; + for (int32_t i = 0; i < n_pairs; i++) if (!pairs[i]) return -1; + std::vector ps(pairs, pairs + n_pairs); + std::vector merges; + int nn = 0; + std::vector cloud = + free_splatter::pose::tree_accumulate_overlap(ps, height, width, gaussian_channels, opacity_threshold, + block, overlap, 0.02, 300, 0, &merges, + max_levels, layout_spacing, &nn, per_node_cap); + if (n_nodes) *n_nodes = nn; + if (std::getenv("FREE_SPLATTER_DEBUG_LINKS")) + for (const auto & m : merges) + std::fprintf(stderr, + "[tree] L%d range[%d..%d] sharedframes=%d s=%.4f inlier=%.3f valid=%.3f resid=%.4f\n", + m.level, m.lo_frame, m.hi_frame, m.shared_frame, + m.scale, m.inlier_frac, m.valid_frac, m.resid_frac); + return emit_points(cloud, out, n_out); +} + +int free_splatter_accumulator_camera_path(const free_splatter_accumulator * acc, + float ** out, int32_t * n_frames) { + if (out) *out = nullptr; + if (n_frames) *n_frames = 0; + if (!acc || !out || !n_frames) return -1; + std::vector path = acc->acc.camera_path(); + *out = (float *) malloc(path.size() * 16 * sizeof(float)); + if (!*out && !path.empty()) return -1; + for (size_t f = 0; f < path.size(); f++) + for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++) + (*out)[f*16 + i*4 + j] = (float) path[f](i, j); + *n_frames = (int32_t) path.size(); + return 0; +} diff --git a/src/linalg.h b/src/linalg.h new file mode 100644 index 0000000..a510471 --- /dev/null +++ b/src/linalg.h @@ -0,0 +1,213 @@ +// linalg.h — self-contained small dense linear algebra for the pose component. +// +// The downstream pose math (Umeyama similarity, DLT PnP, pose decode) needs only +// a handful of small dense operations. Rather than pull in Eigen/OpenCV — neither +// of which we want as a runtime dependency (see CLAUDE.md: everything ships in +// C++ with no extra runtime deps beyond ggml) — every routine here reduces to one +// primitive: a symmetric cyclic-Jacobi eigensolver. SVD of a 3x3 is built as the +// eigendecomposition of MᵀM; the DLT nullspace is the smallest eigenvector of +// AᵀA. This mirrored the dependency-free numpy reference solver of the (now +// removed; see git history) pose/ prototype, itself verified bit-for-bit (~1e-7) +// against cv2. +// +// Everything is f64, row-major, and header-only. Sizes are tiny (<=12) so the +// O(n^3) Jacobi sweeps are negligible next to the engine forward pass. +#ifndef FREE_SPLATTER_LINALG_H +#define FREE_SPLATTER_LINALG_H + +#include +#include +#include +#include +#include + +namespace fsla { + +using Vec3 = std::array; + +// Row-major fixed matrices. m[r][c] lives at a[r*N + c]. +struct Mat3 { double a[9]; double operator()(int r, int c) const { return a[r * 3 + c]; } + double & operator()(int r, int c) { return a[r * 3 + c]; } }; +struct Mat4 { double a[16]; double operator()(int r, int c) const { return a[r * 4 + c]; } + double & operator()(int r, int c) { return a[r * 4 + c]; } }; + +inline Mat3 mat3_identity() { return Mat3{ {1,0,0, 0,1,0, 0,0,1} }; } +inline Mat4 mat4_identity() { return Mat4{ {1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1} }; } + +inline Mat3 mat3_mul(const Mat3 & A, const Mat3 & B) { + Mat3 C{}; + for (int i = 0; i < 3; i++) + for (int j = 0; j < 3; j++) { + double s = 0; + for (int k = 0; k < 3; k++) s += A(i, k) * B(k, j); + C(i, j) = s; + } + return C; +} + +inline Mat3 mat3_transpose(const Mat3 & A) { + Mat3 T{}; + for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) T(i, j) = A(j, i); + return T; +} + +inline Vec3 mat3_apply(const Mat3 & A, const Vec3 & x) { + return { A(0,0)*x[0] + A(0,1)*x[1] + A(0,2)*x[2], + A(1,0)*x[0] + A(1,1)*x[1] + A(1,2)*x[2], + A(2,0)*x[0] + A(2,1)*x[1] + A(2,2)*x[2] }; +} + +inline double det3(const Mat3 & M) { + return M(0,0) * (M(1,1)*M(2,2) - M(1,2)*M(2,1)) + - M(0,1) * (M(1,0)*M(2,2) - M(1,2)*M(2,0)) + + M(0,2) * (M(1,0)*M(2,1) - M(1,1)*M(2,0)); +} + +inline Mat3 inv3(const Mat3 & M) { + const double d = det3(M); + const double id = (std::fabs(d) > 0) ? 1.0 / d : 0.0; + Mat3 R{}; + R(0,0) = (M(1,1)*M(2,2) - M(1,2)*M(2,1)) * id; + R(0,1) = -(M(0,1)*M(2,2) - M(0,2)*M(2,1)) * id; + R(0,2) = (M(0,1)*M(1,2) - M(0,2)*M(1,1)) * id; + R(1,0) = -(M(1,0)*M(2,2) - M(1,2)*M(2,0)) * id; + R(1,1) = (M(0,0)*M(2,2) - M(0,2)*M(2,0)) * id; + R(1,2) = -(M(0,0)*M(1,2) - M(0,2)*M(1,0)) * id; + R(2,0) = (M(1,0)*M(2,1) - M(1,1)*M(2,0)) * id; + R(2,1) = -(M(0,0)*M(2,1) - M(0,1)*M(2,0)) * id; + R(2,2) = (M(0,0)*M(1,1) - M(0,1)*M(1,0)) * id; + return R; +} + +// Inverse of a rigid 4x4 [[R,t],[0,1]] (R orthonormal): [[Rᵀ, -Rᵀt],[0,1]]. Exact +// and cheap; used for cam2world = inv(world2cam) where world2cam is rigid. +inline Mat4 inv_rigid4(const Mat4 & M) { + Mat4 R = mat4_identity(); + for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i, j) = M(j, i); // Rᵀ + for (int i = 0; i < 3; i++) + R(i, 3) = -(R(i,0)*M(0,3) + R(i,1)*M(1,3) + R(i,2)*M(2,3)); + return R; +} + +inline Mat4 mat4_mul(const Mat4 & A, const Mat4 & B) { + Mat4 C{}; + for (int i = 0; i < 4; i++) + for (int j = 0; j < 4; j++) { + double s = 0; + for (int k = 0; k < 4; k++) s += A(i, k) * B(k, j); + C(i, j) = s; + } + return C; +} + +// --- symmetric eigensolver ------------------------------------------------- +// +// Cyclic Jacobi for a symmetric n x n matrix (Golub & Van Loan, Alg. 8.4.x). +// Ain is row-major and assumed symmetric. Writes n eigenvalues to `eval` and the +// corresponding eigenvectors as the COLUMNS of `evec` (evec[i*n+j] = component i +// of eigenvector j). Output is NOT sorted. Converges in a handful of sweeps for +// n <= 12; we cap at 100 as a safety net. +inline void jacobi_eigh(const double * Ain, int n, double * eval, double * evec) { + std::vector A(Ain, Ain + (size_t) n * n); + for (int i = 0; i < n * n; i++) evec[i] = 0.0; + for (int i = 0; i < n; i++) evec[i * n + i] = 1.0; + + for (int sweep = 0; sweep < 100; sweep++) { + double off = 0.0; + for (int p = 0; p < n; p++) + for (int q = p + 1; q < n; q++) off += A[p*n+q] * A[p*n+q]; + if (off <= 1e-300) break; + + for (int p = 0; p < n; p++) { + for (int q = p + 1; q < n; q++) { + const double apq = A[p*n+q]; + if (std::fabs(apq) <= 1e-300) continue; + const double phi = 0.5 * (A[q*n+q] - A[p*n+p]) / apq; // = tau + const double t = (phi >= 0 ? 1.0 : -1.0) / + (std::fabs(phi) + std::sqrt(phi*phi + 1.0)); + const double c = 1.0 / std::sqrt(t*t + 1.0); + const double s = t * c; + // A <- Jᵀ A J : rotate columns p,q then rows p,q (keeps symmetry). + for (int i = 0; i < n; i++) { + const double aip = A[i*n+p], aiq = A[i*n+q]; + A[i*n+p] = c*aip - s*aiq; + A[i*n+q] = s*aip + c*aiq; + } + for (int i = 0; i < n; i++) { + const double api = A[p*n+i], aqi = A[q*n+i]; + A[p*n+i] = c*api - s*aqi; + A[q*n+i] = s*api + c*aqi; + } + for (int i = 0; i < n; i++) { // accumulate eigenvectors V <- V J + const double vip = evec[i*n+p], viq = evec[i*n+q]; + evec[i*n+p] = c*vip - s*viq; + evec[i*n+q] = s*vip + c*viq; + } + } + } + } + for (int i = 0; i < n; i++) eval[i] = A[i*n+i]; +} + +// Smallest-eigenvalue eigenvector of a symmetric n x n matrix (the homogeneous +// least-squares / DLT nullspace). Returns the n-vector. +inline std::vector smallest_eigenvector(const double * A, int n) { + std::vector eval(n), evec((size_t) n * n); + jacobi_eigh(A, n, eval.data(), evec.data()); + int k = 0; + for (int i = 1; i < n; i++) if (eval[i] < eval[k]) k = i; + std::vector v(n); + for (int i = 0; i < n; i++) v[i] = evec[i*n + k]; + return v; +} + +// --- 3x3 SVD --------------------------------------------------------------- +// +// M = U diag(s) Vᵀ with s sorted DESCENDING (numpy/LAPACK convention). Built from +// the eigendecomposition of the symmetric S = MᵀM: its eigenvectors are V, the +// singular values are sqrt(eigenvalues), and U_i = M v_i / s_i. A (near-)zero +// singular direction is completed by Gram-Schmidt so U stays orthonormal. This is +// all the SVD the similarity fit and the PnP decode need. +inline void svd3(const Mat3 & M, Mat3 & U, Vec3 & s, Mat3 & V) { + Mat3 S = mat3_mul(mat3_transpose(M), M); // 3x3 symmetric PSD + double eval[3], evec[9]; + jacobi_eigh(S.a, 3, eval, evec); + + int ord[3] = { 0, 1, 2 }; // sort eigenpairs descending + std::sort(ord, ord + 3, [&](int x, int y) { return eval[x] > eval[y]; }); + + for (int j = 0; j < 3; j++) { + const int e = ord[j]; + s[j] = std::sqrt(std::max(eval[e], 0.0)); + for (int i = 0; i < 3; i++) V(i, j) = evec[i*3 + e]; + } + // Force a right-handed V (proper basis); flip the last column if reflected. + if (det3(V) < 0) for (int i = 0; i < 3; i++) V(i, 2) = -V(i, 2); + + for (int j = 0; j < 3; j++) { + Vec3 vj = { V(0,j), V(1,j), V(2,j) }; + Vec3 uj = mat3_apply(M, vj); + const double nrm = std::sqrt(uj[0]*uj[0] + uj[1]*uj[1] + uj[2]*uj[2]); + if (nrm > 1e-12) for (int i = 0; i < 3; i++) U(i, j) = uj[i] / nrm; + else for (int i = 0; i < 3; i++) U(i, j) = 0.0; // fixed up below + } + // Complete any degenerate U columns via Gram-Schmidt against the filled ones. + for (int j = 0; j < 3; j++) { + double nrm = std::sqrt(U(0,j)*U(0,j) + U(1,j)*U(1,j) + U(2,j)*U(2,j)); + if (nrm > 1e-9) continue; + Vec3 cand[3] = { {1,0,0}, {0,1,0}, {0,0,1} }; + for (auto & e : cand) { + for (int k = 0; k < 3; k++) { + if (k == j) continue; + const double d = e[0]*U(0,k) + e[1]*U(1,k) + e[2]*U(2,k); + e[0] -= d*U(0,k); e[1] -= d*U(1,k); e[2] -= d*U(2,k); + } + const double en = std::sqrt(e[0]*e[0] + e[1]*e[1] + e[2]*e[2]); + if (en > 1e-6) { U(0,j)=e[0]/en; U(1,j)=e[1]/en; U(2,j)=e[2]/en; break; } + } + } +} + +} // namespace fsla + +#endif // FREE_SPLATTER_LINALG_H diff --git a/src/options.h b/src/options.h index e531e29..4bf8d1f 100644 --- a/src/options.h +++ b/src/options.h @@ -7,7 +7,7 @@ namespace free_splatter { struct options { - std::string device = "cpu"; // "" | cpu | gpu | cuda | vulkan [:N] + std::string device = "vulkan"; // "" | cpu | gpu | cuda | vulkan [:N]; default GPU, fail-closed int n_threads = 0; // <= 0 => auto (CPU) std::string dump_taps_dir; // empty => tap dumping disabled }; diff --git a/src/pose.cpp b/src/pose.cpp new file mode 100644 index 0000000..7ba152e --- /dev/null +++ b/src/pose.cpp @@ -0,0 +1,1512 @@ +// pose.cpp — C++ port of the (now removed) pose/ research prototype: focal + +// Umeyama/Sim(3) align + robust PnP + sliding-window accumulation / loop closure +// / consensus fusion. See git history for the Python prototype it was validated +// against, layer by layer. +// +// Faithful to FreeSplatter's scene recipe; the dependency-free solver is what +// ships here (the prototype's numpy reference was verified ~1e-7 vs cv2 and +// bit-exact vs upstream estimate_poses on real engine output). All linear algebra +// goes through the self-contained Jacobi eigensolver in linalg.h — no Eigen, no +// OpenCV. +#include "pose.h" + +#include +#include +#include +#include + +namespace free_splatter { +namespace pose { + +using fsla::smallest_eigenvector; +using fsla::svd3; +using fsla::det3; +using fsla::inv3; +using fsla::mat3_mul; +using fsla::mat3_transpose; +using fsla::mat3_apply; +using fsla::mat3_identity; +using fsla::mat4_identity; +using fsla::inv_rigid4; + +namespace { + +// Deterministic SplitMix64 — so RANSAC is reproducible across platforms (we don't +// need numpy's PCG64 bitstream, only a fixed inlier-converging sampler). +struct Rng { + uint64_t s; + explicit Rng(uint64_t seed) : s(seed) {} + uint64_t next() { + uint64_t z = (s += 0x9E3779B97F4A7C15ULL); + z = (z ^ (z >> 30)) * 0xBF58476D1CE4E5B9ULL; + z = (z ^ (z >> 27)) * 0x94D049BB133111EBULL; + return z ^ (z >> 31); + } + int bounded(int n) { return (int) (next() % (uint64_t) n); } +}; + +void sample_distinct(Rng & rng, int N, int k, int * out) { + for (int i = 0; i < k; i++) { + int v; + bool dup; + do { + v = rng.bounded(N); + dup = false; + for (int j = 0; j < i; j++) if (out[j] == v) { dup = true; break; } + } while (dup); + out[i] = v; + } +} + +double rms3(const double * r, int N) { // sqrt(mean over rows of ||row||^2) + double s = 0; + for (int i = 0; i < N; i++) s += r[3*i]*r[3*i] + r[3*i+1]*r[3*i+1] + r[3*i+2]*r[3*i+2]; + return std::sqrt(s / (double) N); +} + +void mean3(const double * X, int N, double * m) { + m[0] = m[1] = m[2] = 0; + for (int i = 0; i < N; i++) { m[0] += X[3*i]; m[1] += X[3*i+1]; m[2] += X[3*i+2]; } + m[0] /= N; m[1] /= N; m[2] /= N; +} + +// Solve A X = B in place (A n x n row-major, B n x m row-major) by Gaussian +// elimination with partial pivoting. Result overwrites B. Used for the affine +// normal equations in the residual ladder. +void solve_lin(double * A, int n, double * B, int m) { + for (int col = 0; col < n; col++) { + int piv = col; + for (int r = col + 1; r < n; r++) + if (std::fabs(A[r*n+col]) > std::fabs(A[piv*n+col])) piv = r; + if (piv != col) { + for (int c = 0; c < n; c++) std::swap(A[col*n+c], A[piv*n+c]); + for (int c = 0; c < m; c++) std::swap(B[col*m+c], B[piv*m+c]); + } + const double d = A[col*n+col]; + if (std::fabs(d) < 1e-300) continue; + for (int r = 0; r < n; r++) { + if (r == col) continue; + const double f = A[r*n+col] / d; + if (f == 0) continue; + for (int c = col; c < n; c++) A[r*n+c] -= f * A[col*n+c]; + for (int c = 0; c < m; c++) B[r*m+c] -= f * B[col*m+c]; + } + } + for (int col = 0; col < n; col++) { + const double d = A[col*n+col]; + if (std::fabs(d) > 1e-300) for (int c = 0; c < m; c++) B[col*m+c] /= d; + } +} + +Mat3 rodrigues(const Vec3 & axis, double phi) { // rotation about unit axis by phi + const double cphi = std::cos(phi), sphi = std::sin(phi), v = 1.0 - cphi; + const double x = axis[0], y = axis[1], z = axis[2]; + Mat3 R{}; + R(0,0) = cphi + x*x*v; R(0,1) = x*y*v - z*sphi; R(0,2) = x*z*v + y*sphi; + R(1,0) = y*x*v + z*sphi; R(1,1) = cphi + y*y*v; R(1,2) = y*z*v - x*sphi; + R(2,0) = z*x*v - y*sphi; R(2,1) = z*y*v + x*sphi; R(2,2) = cphi + z*z*v; + return R; +} + +} // namespace + +// ---- focal ---------------------------------------------------------------- + +double estimate_focal(const double * pts3d, const double * pixels, int N, + double ppx, double ppy, int weiszfeld_iters) { + std::vector P(2 * N), U(2 * N), pu(N), uu(N); + for (int i = 0; i < N; i++) { + const double X = pts3d[3*i], Y = pts3d[3*i+1], Z = pts3d[3*i+2]; + double ux = (Z != 0.0) ? X / Z : 0.0; + double uy = (Z != 0.0) ? Y / Z : 0.0; + if (!std::isfinite(ux)) ux = 0.0; // mirror np.nan_to_num(posinf/neginf->0) + if (!std::isfinite(uy)) uy = 0.0; + const double Px = pixels[2*i] - ppx; + const double Py = pixels[2*i+1] - ppy; + P[2*i] = Px; P[2*i+1] = Py; + U[2*i] = ux; U[2*i+1] = uy; + pu[i] = Px*ux + Py*uy; + uu[i] = ux*ux + uy*uy; + } + double spu = 0, suu = 0; + for (int i = 0; i < N; i++) { spu += pu[i]; suu += uu[i]; } + double f = (suu != 0.0) ? spu / suu : 0.0; + for (int it = 0; it < weiszfeld_iters; it++) { + double num = 0, den = 0; + for (int i = 0; i < N; i++) { + const double dx = P[2*i] - f*U[2*i], dy = P[2*i+1] - f*U[2*i+1]; + const double r = std::sqrt(dx*dx + dy*dy); + const double w = 1.0 / std::max(r, 1e-8); + num += w * pu[i]; + den += w * uu[i]; + } + f = (den != 0.0) ? num / den : f; + } + return f; +} + +// ---- similarity ----------------------------------------------------------- + +Sim3 sim_identity() { return { 1.0, mat3_identity(), {0,0,0} }; } + +Vec3 sim_apply(const Sim3 & T, const Vec3 & x) { + Vec3 Rx = mat3_apply(T.R, x); + return { T.s*Rx[0] + T.t[0], T.s*Rx[1] + T.t[1], T.s*Rx[2] + T.t[2] }; +} + +Sim3 sim_compose(const Sim3 & T2, const Sim3 & T1) { // apply T1 then T2 + Sim3 r; + r.s = T2.s * T1.s; + r.R = mat3_mul(T2.R, T1.R); + Vec3 R2t1 = mat3_apply(T2.R, T1.t); + r.t = { T2.s*R2t1[0] + T2.t[0], T2.s*R2t1[1] + T2.t[1], T2.s*R2t1[2] + T2.t[2] }; + return r; +} + +Sim3 sim_invert(const Sim3 & T) { + Sim3 r; + r.s = 1.0 / T.s; + r.R = mat3_transpose(T.R); + Vec3 Rit = mat3_apply(r.R, T.t); + r.t = { -r.s*Rit[0], -r.s*Rit[1], -r.s*Rit[2] }; + return r; +} + +Mat4 sim_matrix(const Sim3 & T) { + Mat4 M = mat4_identity(); + for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) M(i, j) = T.s * T.R(i, j); + for (int i = 0; i < 3; i++) M(i, 3) = T.t[i]; + return M; +} + +Sim3 fit_similarity(const double * X, const double * Y, int N, bool with_scale) { + double mx[3], my[3]; + mean3(X, N, mx); + mean3(Y, N, my); + // Sigma = (Ycᵀ Xc)/n, cross-covariance mapping X -> Y; var_x = mean ||Xc||^2. + Mat3 Sigma{}; + double var_x = 0; + for (int i = 0; i < N; i++) { + const double xc[3] = { X[3*i]-mx[0], X[3*i+1]-mx[1], X[3*i+2]-mx[2] }; + const double yc[3] = { Y[3*i]-my[0], Y[3*i+1]-my[1], Y[3*i+2]-my[2] }; + for (int r = 0; r < 3; r++) for (int c = 0; c < 3; c++) Sigma(r, c) += yc[r]*xc[c]; + var_x += xc[0]*xc[0] + xc[1]*xc[1] + xc[2]*xc[2]; + } + for (int i = 0; i < 9; i++) Sigma.a[i] /= N; + var_x /= N; + + Mat3 U, V; Vec3 D; + svd3(Sigma, U, D, V); // Sigma = U diag(D) Vᵀ + double sgn = (det3(U) * det3(V) < 0) ? -1.0 : 1.0; + // R = U diag(1,1,sgn) Vᵀ + Mat3 US = U; + for (int r = 0; r < 3; r++) US(r, 2) *= sgn; + Mat3 R = mat3_mul(US, mat3_transpose(V)); + + double s = 1.0; + if (with_scale && var_x > 0) + s = (D[0] + D[1] + sgn * D[2]) / var_x; + + Vec3 mxv = { mx[0], mx[1], mx[2] }; + Vec3 Rmx = mat3_apply(R, mxv); + Sim3 T; + T.s = s; T.R = R; + T.t = { my[0] - s*Rmx[0], my[1] - s*Rmx[1], my[2] - s*Rmx[2] }; + return T; +} + +Sim3 fit_similarity_ransac(const double * X, const double * Y, int N, double thresh, + int iters, std::vector & inliers, + bool with_scale, uint64_t seed) { + // RANSAC's minimal sample is 3 pairs; with fewer there is nothing to sample + // (and the sampler would divide by N). Degenerate but valid at the public + // boundary (e.g. an image pair with no overlapping valid pixels): take all + // points as inliers, and a plain fit when there is at least one. + if (N < 3) { + inliers.assign(N, 1); + return (N >= 1) ? fit_similarity(X, Y, N, with_scale) : sim_identity(); + } + Rng rng(seed); + std::vector best(N, 0); + int best_cnt = 0; + std::vector xs(9), ys(9); // 3-point minimal samples + for (int it = 0; it < iters; it++) { + int idx[3]; + sample_distinct(rng, N, 3, idx); + for (int j = 0; j < 3; j++) + for (int c = 0; c < 3; c++) { xs[3*j+c] = X[3*idx[j]+c]; ys[3*j+c] = Y[3*idx[j]+c]; } + Sim3 T = fit_similarity(xs.data(), ys.data(), 3, with_scale); + int cnt = 0; + for (int i = 0; i < N; i++) { + Vec3 xi = { X[3*i], X[3*i+1], X[3*i+2] }; + Vec3 p = sim_apply(T, xi); + const double dx = p[0]-Y[3*i], dy = p[1]-Y[3*i+1], dz = p[2]-Y[3*i+2]; + if (std::sqrt(dx*dx + dy*dy + dz*dz) < thresh) cnt++; + } + if (cnt > best_cnt) { best_cnt = cnt; best.assign(N, 0); + for (int i = 0; i < N; i++) { + Vec3 xi = { X[3*i], X[3*i+1], X[3*i+2] }; + Vec3 p = sim_apply(T, xi); + const double dx = p[0]-Y[3*i], dy = p[1]-Y[3*i+1], dz = p[2]-Y[3*i+2]; + best[i] = (std::sqrt(dx*dx + dy*dy + dz*dz) < thresh) ? 1 : 0; + } + } + } + if (best_cnt < 3) std::fill(best.begin(), best.end(), 1); + // refit on the inlier set + std::vector xi, yi; + for (int i = 0; i < N; i++) if (best[i]) { + for (int c = 0; c < 3; c++) { xi.push_back(X[3*i+c]); yi.push_back(Y[3*i+c]); } + } + inliers = best; + return fit_similarity(xi.data(), yi.data(), (int) (xi.size()/3), with_scale); +} + +// affine Y ~= A X + b (12-DoF), returns prediction rms only via caller; here we +// fill `pred` (N*3). Normal equations on Xh=[X,1]. +static double affine_residual(const double * X, const double * Y, int N) { + double G[16] = {0}; // XhᵀXh (4x4) + double B[12] = {0}; // XhᵀY (4x3) + for (int i = 0; i < N; i++) { + const double xh[4] = { X[3*i], X[3*i+1], X[3*i+2], 1.0 }; + for (int r = 0; r < 4; r++) { + for (int c = 0; c < 4; c++) G[r*4+c] += xh[r]*xh[c]; + for (int c = 0; c < 3; c++) B[r*3+c] += xh[r]*Y[3*i+c]; + } + } + solve_lin(G, 4, B, 3); // B now holds M (4x3): rows are coeffs + double s = 0; + for (int i = 0; i < N; i++) { + const double xh[4] = { X[3*i], X[3*i+1], X[3*i+2], 1.0 }; + for (int c = 0; c < 3; c++) { + double p = 0; + for (int r = 0; r < 4; r++) p += xh[r]*B[r*3+c]; + const double d = p - Y[3*i+c]; + s += d*d; + } + } + return std::sqrt(s / (double) N); +} + +Ladder diagnose(const double * X, const double * Y, int N, double tol, double corr_tol) { + Ladder L{}; + double my[3]; mean3(Y, N, my); + std::vector yc(3 * N); + for (int i = 0; i < N; i++) { yc[3*i]=Y[3*i]-my[0]; yc[3*i+1]=Y[3*i+1]-my[1]; yc[3*i+2]=Y[3*i+2]-my[2]; } + L.scene = rms3(yc.data(), N); + + Sim3 Tr = fit_similarity(X, Y, N, /*with_scale=*/false); + Sim3 Ts = fit_similarity(X, Y, N, /*with_scale=*/true); + std::vector res_r(3*N), res_s(3*N); + for (int i = 0; i < N; i++) { + Vec3 xi = { X[3*i], X[3*i+1], X[3*i+2] }; + Vec3 pr = sim_apply(Tr, xi), ps = sim_apply(Ts, xi); + for (int c = 0; c < 3; c++) { res_r[3*i+c] = pr[c]-Y[3*i+c]; res_s[3*i+c] = ps[c]-Y[3*i+c]; } + } + L.rigid = rms3(res_r.data(), N); + L.similarity = rms3(res_s.data(), N); + L.affine = affine_residual(X, Y, N); + L.scale = Ts.s; + + // depth_corr: Pearson(||X-mean||, ||similarity residual||) + double mx[3]; mean3(X, N, mx); + std::vector depth(N), rmag(N); + for (int i = 0; i < N; i++) { + const double dx = X[3*i]-mx[0], dy = X[3*i+1]-mx[1], dz = X[3*i+2]-mx[2]; + depth[i] = std::sqrt(dx*dx + dy*dy + dz*dz); + rmag[i] = std::sqrt(res_s[3*i]*res_s[3*i] + res_s[3*i+1]*res_s[3*i+1] + res_s[3*i+2]*res_s[3*i+2]); + } + double md = 0, mr = 0; + for (int i = 0; i < N; i++) { md += depth[i]; mr += rmag[i]; } + md /= N; mr /= N; + double cov = 0, vd = 0, vr = 0; + for (int i = 0; i < N; i++) { cov += (depth[i]-md)*(rmag[i]-mr); vd += (depth[i]-md)*(depth[i]-md); vr += (rmag[i]-mr)*(rmag[i]-mr); } + L.depth_corr = (vr > 1e-24) ? cov / std::sqrt(vd*vr) : 0.0; + + const double sc = (L.scene > 0) ? L.scene : 1.0; + const double rr = L.rigid/sc, rs = L.similarity/sc, ra = L.affine/sc; + L.aff_gain = (L.similarity > 0) ? (L.similarity - L.affine)/L.similarity : 0.0; + L.structured = std::fabs(L.depth_corr) > corr_tol || L.aff_gain > 0.25; + if (rr < tol) L.verdict = "rigid_ok"; + else if (rs < tol) L.verdict = "needs_scale"; + else if (ra < tol) L.verdict = "needs_affine"; + else if (L.structured) L.verdict = "nonlinear"; + else L.verdict = "similarity_plus_noise"; + return L; +} + +LoopError loop_closure_error(const std::vector & links) { + Sim3 T = sim_identity(); + for (const Sim3 & Ti : links) T = sim_compose(Ti, T); + LoopError e; + e.scale_err = std::fabs(std::log(T.s)); + const double tr = T.R(0,0) + T.R(1,1) + T.R(2,2); + e.rot_deg = std::acos(std::max(-1.0, std::min(1.0, (tr - 1.0)/2.0))) * 180.0 / M_PI; + e.trans = std::sqrt(T.t[0]*T.t[0] + T.t[1]*T.t[1] + T.t[2]*T.t[2]); + return e; +} + +// M^f for a Sim(3) 4x4 [[sR,t],[0,1]], closed form via the group's one-parameter +// subgroup: A^f = s^f R^f (R^f = same axis, f*angle), translation by +// (A^f - I)(A - I)^{-1} t (the analytic continuation of the integer-power +// geometric series). Equals numpy's eig-based principal value while angle < 180. +Mat4 sim_frac_power(const Mat4 & M, double f) { + Mat3 A{}; + Vec3 t{}; + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) A(i, j) = M(i, j); t[i] = M(i, 3); } + const double s = std::cbrt(det3(A)); + Mat3 R = A; + for (int i = 0; i < 9; i++) R.a[i] /= s; + const double tr = R(0,0) + R(1,1) + R(2,2); + const double ang = std::acos(std::max(-1.0, std::min(1.0, (tr - 1.0)/2.0))); + + Mat3 Rf; + if (ang < 1e-12) { + Rf = mat3_identity(); + } else { + const double sa = std::sin(ang); + Vec3 axis = { (R(2,1)-R(1,2))/(2*sa), (R(0,2)-R(2,0))/(2*sa), (R(1,0)-R(0,1))/(2*sa) }; + Rf = rodrigues(axis, f * ang); + } + const double sf = std::pow(s, f); + Mat3 Af = Rf; + for (int i = 0; i < 9; i++) Af.a[i] *= sf; + + Mat4 out = mat4_identity(); + for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) out(i, j) = Af(i, j); + + // translation: (Af - I)(A - I)^{-1} t, with the A->I limit handled as f*t. + Mat3 AmI = A; + for (int i = 0; i < 3; i++) AmI(i, i) -= 1.0; + if (std::fabs(det3(AmI)) < 1e-12) { + for (int i = 0; i < 3; i++) out(i, 3) = f * t[i]; + } else { + Mat3 AfmI = Af; + for (int i = 0; i < 3; i++) AfmI(i, i) -= 1.0; + Mat3 K = mat3_mul(AfmI, inv3(AmI)); + Vec3 tf = mat3_apply(K, t); + for (int i = 0; i < 3; i++) out(i, 3) = tf[i]; + } + return out; +} + +// ---- PnP ------------------------------------------------------------------ + +namespace { + +// DLT: smallest right singular vector of the 2N-row system, via the smallest +// eigenvector of AᵀA (12x12). idx selects the k points used. Returns the 3x4 +// matrix p (row-major) up to sign/scale. +std::array dlt(const double * Xw, const double * xn, const int * idx, int k) { + double AtA[144] = {0}; + for (int j = 0; j < k; j++) { + const int i = idx[j]; + const double X = Xw[3*i], Y = Xw[3*i+1], Z = Xw[3*i+2]; + const double u = xn[2*i], v = xn[2*i+1]; + const double ra[12] = { X, Y, Z, 1, 0, 0, 0, 0, -u*X, -u*Y, -u*Z, -u }; + const double rb[12] = { 0, 0, 0, 0, X, Y, Z, 1, -v*X, -v*Y, -v*Z, -v }; + for (int r = 0; r < 12; r++) + for (int c = 0; c < 12; c++) AtA[r*12+c] += ra[r]*ra[c] + rb[r]*rb[c]; + } + std::vector v = smallest_eigenvector(AtA, 12); + std::array p; + for (int i = 0; i < 12; i++) p[i] = v[i]; + return p; +} + +struct RT { Mat3 R; Vec3 t; double err; bool ok; }; + +// Resolve the DLT sign and recover a proper (R,t) world2cam, picking the sign with +// lower reprojection error among cheirality-valid candidates (points in front). +RT decode(const std::array & p, const double * Xw, const double * xn, + const int * idx, int k) { + RT best; best.ok = false; best.err = 1e300; + for (double sgn : { 1.0, -1.0 }) { + Mat3 M{}; + Vec3 tau{}; + for (int r = 0; r < 3; r++) { for (int c = 0; c < 3; c++) M(r, c) = sgn*p[r*4+c]; tau[r] = sgn*p[r*4+3]; } + Mat3 U, V; Vec3 Sv; + svd3(M, U, Sv, V); + const double d = (det3(U) * det3(V) < 0) ? -1.0 : 1.0; + Mat3 US = U; + for (int r = 0; r < 3; r++) US(r, 2) *= d; + Mat3 R = mat3_mul(US, mat3_transpose(V)); // nearest proper rotation + const double lam = (Sv[0] + Sv[1] + Sv[2]) / 3.0; + if (lam < 1e-12) continue; + Vec3 t = { tau[0]/lam, tau[1]/lam, tau[2]/lam }; + + double err = 0; bool behind = false; + for (int j = 0; j < k; j++) { + const int i = idx[j]; + Vec3 xi = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] }; + Vec3 Xc = mat3_apply(R, xi); + Xc[0] += t[0]; Xc[1] += t[1]; Xc[2] += t[2]; + if (Xc[2] <= 0) { behind = true; break; } + const double px = Xc[0]/Xc[2], py = Xc[1]/Xc[2]; + const double dx = px - xn[2*i], dy = py - xn[2*i+1]; + err += std::sqrt(dx*dx + dy*dy); + } + if (behind) continue; + err /= k; + if (err < best.err) { best.err = err; best.R = R; best.t = t; best.ok = true; } + } + return best; +} + +} // namespace + +void pixel_grid(int H, int W, std::vector & out) { + out.resize((size_t) 2 * H * W); + for (int r = 0; r < H; r++) + for (int c = 0; c < W; c++) { + const size_t k = (size_t) r * W + c; + out[2*k] = c; out[2*k+1] = r; + } +} + +Mat4 solve_pnp_numpy(const double * Xw, const double * pixels, int N, const Mat3 & K, + std::vector & inliers, double thresh_px, int iters, uint64_t seed) { + const Mat3 Kinv = inv3(K); + std::vector xn((size_t) 2 * N); + for (int i = 0; i < N; i++) { + Vec3 uv = { pixels[2*i], pixels[2*i+1], 1.0 }; + Vec3 n = mat3_apply(Kinv, uv); + xn[2*i] = n[0]; xn[2*i+1] = n[1]; + } + const double thresh = thresh_px / K(0, 0); + + Rng rng(seed); + std::vector best(N, 0); + int best_cnt = 0; + for (int it = 0; it < iters; it++) { + int idx[6]; + sample_distinct(rng, N, 6, idx); + RT rt = decode(dlt(Xw, xn.data(), idx, 6), Xw, xn.data(), idx, 6); + if (!rt.ok) continue; + int cnt = 0; + for (int i = 0; i < N; i++) { + Vec3 xi = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] }; + Vec3 Xc = mat3_apply(rt.R, xi); + Xc[2] += rt.t[2]; + if (Xc[2] <= 0) continue; + Xc[0] += rt.t[0]; Xc[1] += rt.t[1]; + const double dx = Xc[0]/Xc[2] - xn[2*i], dy = Xc[1]/Xc[2] - xn[2*i+1]; + if (std::sqrt(dx*dx + dy*dy) < thresh) cnt++; + } + if (cnt > best_cnt) { + best_cnt = cnt; + best.assign(N, 0); + for (int i = 0; i < N; i++) { + Vec3 xi = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] }; + Vec3 Xc = mat3_apply(rt.R, xi); + Xc[0] += rt.t[0]; Xc[1] += rt.t[1]; Xc[2] += rt.t[2]; + if (Xc[2] <= 0) continue; + const double dx = Xc[0]/Xc[2] - xn[2*i], dy = Xc[1]/Xc[2] - xn[2*i+1]; + best[i] = (std::sqrt(dx*dx + dy*dy) < thresh) ? 1 : 0; + } + } + } + std::vector sel; + for (int i = 0; i < N; i++) if (best[i]) sel.push_back(i); + if ((int) sel.size() < 6) { sel.clear(); for (int i = 0; i < N; i++) sel.push_back(i); } + RT rt = decode(dlt(Xw, xn.data(), sel.data(), (int) sel.size()), + Xw, xn.data(), sel.data(), (int) sel.size()); + + Mat4 world2cam = mat4_identity(); + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) world2cam(i, j) = rt.R(i, j); world2cam(i, 3) = rt.t[i]; } + inliers = best; + return inv_rigid4(world2cam); +} + +// ---- robust PnP: EPnP + Gauss-Newton -------------------------------------- +// +// EPnP (Lepetit/Moreno-Noguer/Fua 2009): express each world point as a +// barycentric combination of 4 control points; the camera-frame control points +// live in the null space of a 2n x 12 system, recovered from the eigenvectors of +// the 12x12 MᵀM (the same Jacobi eigensolver as everywhere else). It is +// non-iterative, uses ALL points (no random minimal samples), and handles +// near-planar configs — so it does not suffer the DLT's seed-dependent mirror +// flips. We try N=1,2,3 null vectors, pick the lowest reprojection error, then +// polish (R,t) with a Huber-robust Gauss-Newton on the reprojection residual. + +namespace { + +Mat3 skew_neg(const Vec3 & x) { // -[x]_× = ∂Xc/∂ω for left perturbation + return Mat3{ { 0, x[2], -x[1], + -x[2], 0, x[0], + x[1], -x[0], 0 } }; +} + +Mat3 so3_exp(const Vec3 & w) { // rotation from an axis-angle (rotation vector) + const double th = std::sqrt(w[0]*w[0] + w[1]*w[1] + w[2]*w[2]); + if (th < 1e-12) { // I + [w]_× to first order + Mat3 R = mat3_identity(); + R(0,1) = -w[2]; R(0,2) = w[1]; R(1,0) = w[2]; R(1,2) = -w[0]; R(2,0) = -w[1]; R(2,1) = w[0]; + return R; + } + return rodrigues({ w[0]/th, w[1]/th, w[2]/th }, th); +} + +double reproj_rms(const double * Xw, const double * px, int N, const Mat3 & K, + const Mat3 & R, const Vec3 & t) { + const double fu = K(0,0), fv = K(1,1), cu = K(0,2), cv = K(1,2); + double s = 0; int m = 0; + for (int i = 0; i < N; i++) { + Vec3 p = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] }; + Vec3 Xc = mat3_apply(R, p); Xc[0]+=t[0]; Xc[1]+=t[1]; Xc[2]+=t[2]; + if (Xc[2] <= 1e-6) { s += 1e6; m++; continue; } + const double dx = fu*Xc[0]/Xc[2] + cu - px[2*i]; + const double dy = fv*Xc[1]/Xc[2] + cv - px[2*i+1]; + s += dx*dx + dy*dy; m++; + } + return std::sqrt(s / std::max(m, 1)); +} + +// Recover camera-frame control points from the null-space vector x (12), fix the +// global sign by cheirality (scene in front), and get world2cam (R,t) from the +// rigid fit between the 4 world and 4 camera control points. +void recover_Rt(const double * x12, const double cw[4][3], const double * alpha, int N, + Mat3 & R, Vec3 & t) { + double cc[12]; + for (int i = 0; i < 12; i++) cc[i] = x12[i]; + // mean camera-frame depth over the points (pc = sum_j alpha_ij cc_j) + double meanz = 0; + for (int i = 0; i < N; i++) { + double z = 0; + for (int j = 0; j < 4; j++) z += alpha[4*i+j] * cc[3*j+2]; + meanz += z; + } + if (meanz < 0) for (int i = 0; i < 12; i++) cc[i] = -cc[i]; + double cwf[12]; + for (int j = 0; j < 4; j++) for (int k = 0; k < 3; k++) cwf[3*j+k] = cw[j][k]; + Sim3 T = fit_similarity(cwf, cc, 4, /*with_scale=*/false); // rigid world->camera + R = T.R; t = T.t; +} + +// EPnP core: returns the best (R,t) world2cam over N=1,2,3 by reprojection error. +void epnp(const double * Xw, const double * px, int N, const Mat3 & K, Mat3 & Rout, Vec3 & tout) { + // 1. control points: centroid + principal axes (sqrt-eigenvalue scaled). + double c0[3] = {0,0,0}; + for (int i = 0; i < N; i++) { c0[0]+=Xw[3*i]; c0[1]+=Xw[3*i+1]; c0[2]+=Xw[3*i+2]; } + c0[0]/=N; c0[1]/=N; c0[2]/=N; + double Cov[9] = {0}; + for (int i = 0; i < N; i++) { + const double d[3] = { Xw[3*i]-c0[0], Xw[3*i+1]-c0[1], Xw[3*i+2]-c0[2] }; + for (int r = 0; r < 3; r++) for (int c = 0; c < 3; c++) Cov[r*3+c] += d[r]*d[c]; + } + for (int i = 0; i < 9; i++) Cov[i] /= N; + double ev[3], evec[9]; + fsla::jacobi_eigh(Cov, 3, ev, evec); + double lam_max = std::max({ev[0], ev[1], ev[2], 1e-12}); + double cw[4][3]; + for (int k = 0; k < 3; k++) cw[0][k] = c0[k]; + for (int j = 0; j < 3; j++) { + // floor the axis length so a near-planar scene keeps 4 affinely-independent + // control points (an invertible barycentric matrix). + const double s = std::sqrt(std::max(ev[j], 1e-6 * lam_max)); + for (int k = 0; k < 3; k++) cw[j+1][k] = c0[k] + s * evec[k*3 + j]; + } + + // 2. barycentric coordinates: alpha_i = C^{-1} [p_i; 1], C = [[cw^T];[1...]]. + double C[16]; + for (int j = 0; j < 4; j++) { for (int k = 0; k < 3; k++) C[k*4+j] = cw[j][k]; C[3*4+j] = 1.0; } + double Cinv[16] = {1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1}; + solve_lin(C, 4, Cinv, 4); // Cinv now holds C^{-1} + std::vector alpha((size_t) 4 * N); + for (int i = 0; i < N; i++) { + const double p[4] = { Xw[3*i], Xw[3*i+1], Xw[3*i+2], 1.0 }; + for (int j = 0; j < 4; j++) { + double a = 0; for (int k = 0; k < 4; k++) a += Cinv[j*4+k]*p[k]; + alpha[4*i+j] = a; + } + } + + // 3. M (2n x 12) -> MtM (12 x 12). + const double fu = K(0,0), fv = K(1,1), cu = K(0,2), cv = K(1,2); + double MtM[144] = {0}; + for (int i = 0; i < N; i++) { + double r1[12] = {0}, r2[12] = {0}; + const double uu = px[2*i], vv = px[2*i+1]; + for (int j = 0; j < 4; j++) { + const double a = alpha[4*i+j]; + r1[3*j+0] = a*fu; r1[3*j+2] = a*(cu - uu); + r2[3*j+1] = a*fv; r2[3*j+2] = a*(cv - vv); + } + for (int r = 0; r < 12; r++) for (int c = 0; c < 12; c++) MtM[r*12+c] += r1[r]*r1[c] + r2[r]*r2[c]; + } + double mev[12], mevec[144]; + fsla::jacobi_eigh(MtM, 12, mev, mevec); + int ord[12]; for (int i = 0; i < 12; i++) ord[i] = i; + std::sort(ord, ord + 12, [&](int a, int b){ return mev[a] < mev[b]; }); + double V[3][12]; // the 3 smallest-eigenvalue vectors + for (int n = 0; n < 3; n++) for (int i = 0; i < 12; i++) V[n][i] = mevec[i*12 + ord[n]]; + + // control-point pairs and their world distances + const int pr[6][2] = {{0,1},{0,2},{0,3},{1,2},{1,3},{2,3}}; + double dw2[6]; + for (int p = 0; p < 6; p++) { + const int i = pr[p][0], j = pr[p][1]; + double d = 0; for (int k = 0; k < 3; k++) { const double e = cw[i][k]-cw[j][k]; d += e*e; } + dw2[p] = d; + } + auto blkdiff = [&](int n, int p, double out[3]) { + const int i = pr[p][0], j = pr[p][1]; + for (int k = 0; k < 3; k++) out[k] = V[n][3*i+k] - V[n][3*j+k]; + }; + auto dot3 = [](const double a[3], const double b[3]){ return a[0]*b[0]+a[1]*b[1]+a[2]*b[2]; }; + + Mat3 bestR{}; Vec3 bestT{}; double best_err = 1e300; bool have = false; + auto consider = [&](const double x12[12]) { + Mat3 R; Vec3 t; + recover_Rt(x12, cw, alpha.data(), N, R, t); + const double e = reproj_rms(Xw, px, N, K, R, t); + if (e < best_err) { best_err = e; bestR = R; bestT = t; have = true; } + }; + + // N=1: x = beta * v0, beta from the 6 distance constraints (least squares). + { + double num = 0, den = 0; + for (int p = 0; p < 6; p++) { double d0[3]; blkdiff(0,p,d0); + const double nd = std::sqrt(dot3(d0,d0)); num += nd*std::sqrt(dw2[p]); den += nd*nd; } + const double beta = (den > 0) ? num/den : 0.0; + double x[12]; for (int i = 0; i < 12; i++) x[i] = beta*V[0][i]; + consider(x); + } + // N=2: x = b1 v0 + b2 v1; distance constraints linear in (b1^2,b1b2,b2^2). + { + double L[18], rho[6]; + for (int p = 0; p < 6; p++) { + double d0[3], d1[3]; blkdiff(0,p,d0); blkdiff(1,p,d1); + L[p*3+0] = dot3(d0,d0); L[p*3+1] = 2*dot3(d0,d1); L[p*3+2] = dot3(d1,d1); + rho[p] = dw2[p]; + } + double G[9] = {0}, g[3] = {0}; // normal equations (3x3) + for (int p = 0; p < 6; p++) { for (int r = 0; r < 3; r++) { for (int c = 0; c < 3; c++) G[r*3+c] += L[p*3+r]*L[p*3+c]; g[r] += L[p*3+r]*rho[p]; } } + solve_lin(G, 3, g, 1); // g = [b11,b12,b22] + const double b1 = std::sqrt(std::fabs(g[0])); + double b2 = std::sqrt(std::fabs(g[2])); + if (g[1] < 0) b2 = -b2; + double x[12]; for (int i = 0; i < 12; i++) x[i] = b1*V[0][i] + b2*V[1][i]; + consider(x); + } + // N=3: x = b1 v0 + b2 v1 + b3 v2; 6 constraints in 6 quadratic unknowns. + { + double L[36], rho[6]; + for (int p = 0; p < 6; p++) { + double d0[3], d1[3], d2[3]; blkdiff(0,p,d0); blkdiff(1,p,d1); blkdiff(2,p,d2); + L[p*6+0]=dot3(d0,d0); L[p*6+1]=2*dot3(d0,d1); L[p*6+2]=2*dot3(d0,d2); + L[p*6+3]=dot3(d1,d1); L[p*6+4]=2*dot3(d1,d2); L[p*6+5]=dot3(d2,d2); + rho[p]=dw2[p]; + } + double Lc[36], rc[6]; + for (int i = 0; i < 36; i++) Lc[i] = L[i]; + for (int i = 0; i < 6; i++) rc[i] = rho[i]; + solve_lin(Lc, 6, rc, 1); // rc = [b11,b12,b13,b22,b23,b33] + const double b1 = std::sqrt(std::fabs(rc[0])); + const double b2 = (b1 > 1e-12) ? rc[1]/b1 : 0.0; + const double b3 = (b1 > 1e-12) ? rc[2]/b1 : 0.0; + double x[12]; for (int i = 0; i < 12; i++) x[i] = b1*V[0][i] + b2*V[1][i] + b3*V[2][i]; + consider(x); + } + + if (!have) { Rout = mat3_identity(); tout = {0,0,0}; return; } + Rout = bestR; tout = bestT; +} + +// Huber-robust Gauss-Newton on the reprojection residual; 6-DoF left perturbation +// (R <- exp(dw) R, t += dt). Refines an EPnP init to the reprojection minimum and +// downweights gross outliers, so the deterministic EPnP+GN matches cv2's +// RANSAC+SQPNP+refine without random sampling. +void gauss_newton_pnp(const double * Xw, const double * px, int N, const Mat3 & K, + Mat3 & R, Vec3 & t, int iters, double huber_px) { + const double fu = K(0,0), fv = K(1,1), cu = K(0,2), cv = K(1,2); + for (int it = 0; it < iters; it++) { + double H[36] = {0}, b[6] = {0}; + for (int i = 0; i < N; i++) { + Vec3 p = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] }; + Vec3 Xc = mat3_apply(R, p); Xc[0]+=t[0]; Xc[1]+=t[1]; Xc[2]+=t[2]; + if (Xc[2] <= 1e-6) continue; + const double iz = 1.0/Xc[2]; + const double rx = fu*Xc[0]*iz + cu - px[2*i]; + const double ry = fv*Xc[1]*iz + cv - px[2*i+1]; + const double rn = std::sqrt(rx*rx + ry*ry); + const double w = (rn <= huber_px || rn < 1e-12) ? 1.0 : huber_px/rn; + // dproj/dXc (2x3) + const double Jp[2][3] = { { fu*iz, 0, -fu*Xc[0]*iz*iz }, + { 0, fv*iz, -fv*Xc[1]*iz*iz } }; + // dXc/d(delta) (3x6) = [ -[Xc]_x | I ] + const Mat3 S = skew_neg(Xc); + double Jd[3][6]; + for (int r = 0; r < 3; r++) { Jd[r][0]=S(r,0); Jd[r][1]=S(r,1); Jd[r][2]=S(r,2); + Jd[r][3]=(r==0); Jd[r][4]=(r==1); Jd[r][5]=(r==2); } + double J[2][6]; + for (int a = 0; a < 2; a++) for (int c = 0; c < 6; c++) { + double s = 0; for (int k = 0; k < 3; k++) s += Jp[a][k]*Jd[k][c]; J[a][c] = s; } + const double r2[2] = { rx, ry }; + for (int a = 0; a < 2; a++) { + for (int c = 0; c < 6; c++) { + b[c] += w * J[a][c] * r2[a]; + for (int d = 0; d < 6; d++) H[c*6+d] += w * J[a][c] * J[a][d]; + } + } + } + double rhs[6]; for (int i = 0; i < 6; i++) rhs[i] = -b[i]; + solve_lin(H, 6, rhs, 1); // solve H delta = -b + Vec3 dw = { rhs[0], rhs[1], rhs[2] }; + R = mat3_mul(so3_exp(dw), R); + t[0]+=rhs[3]; t[1]+=rhs[4]; t[2]+=rhs[5]; + if (std::fabs(rhs[0])+std::fabs(rhs[1])+std::fabs(rhs[2])+ + std::fabs(rhs[3])+std::fabs(rhs[4])+std::fabs(rhs[5]) < 1e-12) break; + } +} + +} // namespace + +Mat4 solve_pnp_epnp(const double * Xw, const double * pixels, int N, const Mat3 & K) { + Mat3 R; Vec3 t; + epnp(Xw, pixels, N, K, R, t); + Mat4 w2c = mat4_identity(); + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) w2c(i,j) = R(i,j); w2c(i,3) = t[i]; } + return inv_rigid4(w2c); +} + +Mat4 solve_pnp(const double * Xw, const double * pixels, int N, const Mat3 & K, + std::vector & inliers, double thresh_px, int gn_iters) { + Mat3 R; Vec3 t; + epnp(Xw, pixels, N, K, R, t); + gauss_newton_pnp(Xw, pixels, N, K, R, t, gn_iters, thresh_px); + + inliers.assign(N, 0); + const double fu = K(0,0), fv = K(1,1), cu = K(0,2), cv = K(1,2); + for (int i = 0; i < N; i++) { + Vec3 p = { Xw[3*i], Xw[3*i+1], Xw[3*i+2] }; + Vec3 Xc = mat3_apply(R, p); Xc[0]+=t[0]; Xc[1]+=t[1]; Xc[2]+=t[2]; + if (Xc[2] <= 1e-6) continue; + const double dx = fu*Xc[0]/Xc[2] + cu - pixels[2*i]; + const double dy = fv*Xc[1]/Xc[2] + cv - pixels[2*i+1]; + if (std::sqrt(dx*dx + dy*dy) < thresh_px) inliers[i] = 1; + } + Mat4 w2c = mat4_identity(); + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) w2c(i,j) = R(i,j); w2c(i,3) = t[i]; } + return inv_rigid4(w2c); +} + +// ---- estimate_poses orchestration ----------------------------------------- + +PoseResult estimate_poses(const std::vector & points, + const std::vector & opacities, + int H, int W, double opacity_threshold, + double focal, int pnp_iter, bool normalize, uint64_t seed) { + (void) pnp_iter; (void) seed; // robust PnP is deterministic (EPnP+GN) + const int Nv = (int) points.size(); + const int P = H * W; + const double ppx = W / 2.0, ppy = H / 2.0; + std::vector grid; + pixel_grid(H, W, grid); + + // per-view opacity masks + std::vector> masks(Nv); + for (int v = 0; v < Nv; v++) { + masks[v].assign(P, 1); + if (v < (int) opacities.size() && opacities[v]) { + for (int i = 0; i < P; i++) masks[v][i] = (opacities[v][i] > opacity_threshold) ? 1 : 0; + } + } + + // focal: view 0 only, ALL pixels (use_first_focal, scene recipe) + if (focal <= 0.0) { + std::vector pts(3 * P), pix(2 * P); + for (int i = 0; i < P; i++) { + pts[3*i] = points[0][3*i]; pts[3*i+1] = points[0][3*i+1]; pts[3*i+2] = points[0][3*i+2]; + pix[2*i] = grid[2*i]; pix[2*i+1] = grid[2*i+1]; + } + focal = estimate_focal(pts.data(), pix.data(), P, ppx, ppy); + } + Mat3 K = mat3_identity(); + K(0,0) = focal; K(1,1) = focal; K(0,2) = ppx; K(1,2) = ppy; + + PoseResult res; + res.focal = focal; + res.scale = 1.0; + for (int v = 0; v < Nv; v++) { + std::vector Xw, px; + for (int i = 0; i < P; i++) if (masks[v][i]) { + Xw.push_back(points[v][3*i]); Xw.push_back(points[v][3*i+1]); Xw.push_back(points[v][3*i+2]); + px.push_back(grid[2*i]); px.push_back(grid[2*i+1]); + } + std::vector inl; + Mat4 c2w = solve_pnp(Xw.data(), px.data(), (int) (Xw.size()/3), K, inl, 5.0, 10); + res.cam2world.push_back(c2w); + } + + if (normalize && Nv >= 2) { + Vec3 t0 = { res.cam2world.front()(0,3), res.cam2world.front()(1,3), res.cam2world.front()(2,3) }; + Vec3 tl = { res.cam2world.back()(0,3), res.cam2world.back()(1,3), res.cam2world.back()(2,3) }; + const double baseline = std::sqrt((tl[0]-t0[0])*(tl[0]-t0[0]) + (tl[1]-t0[1])*(tl[1]-t0[1]) + (tl[2]-t0[2])*(tl[2]-t0[2])) + 1e-2; + res.scale = 1.0 / baseline; + for (auto & c : res.cam2world) for (int i = 0; i < 3; i++) c(i, 3) *= res.scale; + } + return res; +} + +// ---- after-inference parallax --------------------------------------------- + +Parallax parallax_stats(const Vec3 & C0, const Vec3 & C1, const Vec3 & axis0, + const float * pts, const float * mask, int N, double mask_thr) { + constexpr double PI = 3.14159265358979323846; + auto sub = [](const Vec3 & a, const Vec3 & b) -> Vec3 { return { a[0]-b[0], a[1]-b[1], a[2]-b[2] }; }; + auto dot = [](const Vec3 & a, const Vec3 & b) { return a[0]*b[0] + a[1]*b[1] + a[2]*b[2]; }; + auto nrm = [&](const Vec3 & a) { return std::sqrt(dot(a, a)); }; + + Parallax px{}; + // unit optical axis of view 0 + Vec3 ax = axis0; const double an = nrm(ax); + if (an > 1e-12) { ax[0]/=an; ax[1]/=an; ax[2]/=an; } + + // baseline split: forward (dolly, no parallax) vs lateral (strafe, parallax) + const Vec3 b = sub(C1, C0); + px.baseline = nrm(b); + const double fwd = dot(b, ax); + const Vec3 blat = { b[0]-fwd*ax[0], b[1]-fwd*ax[1], b[2]-fwd*ax[2] }; + px.lateral_angle_deg = std::atan2(nrm(blat), std::fabs(fwd)) * 180.0 / PI; + + // per confident point: depth from cam0 + triangulation angle (rays from C0,C1) + std::vector depths, angles; + depths.reserve(N); angles.reserve(N); + for (int i = 0; i < N; i++) { + if (mask && !(mask[i] > mask_thr)) continue; + const Vec3 X = { (double)pts[3*i], (double)pts[3*i+1], (double)pts[3*i+2] }; + if (!std::isfinite(X[0]) || !std::isfinite(X[1]) || !std::isfinite(X[2])) continue; + const Vec3 r0 = sub(X, C0), r1 = sub(X, C1); + const double n0 = nrm(r0), n1 = nrm(r1); + if (n0 < 1e-9 || n1 < 1e-9) continue; + const double depth = dot(r0, ax); + if (depth > 0) depths.push_back(depth); + double c = dot(r0, r1) / (n0 * n1); + c = c < -1.0 ? -1.0 : (c > 1.0 ? 1.0 : c); + angles.push_back(std::acos(c) * 180.0 / PI); + } + auto median = [](std::vector & v) -> double { + if (v.empty()) return 0.0; + const size_t m = v.size() / 2; + std::nth_element(v.begin(), v.begin() + m, v.end()); + return v[m]; + }; + px.median_depth = median(depths); + px.tri_angle_deg = median(angles); + px.baseline_over_depth = (px.median_depth > 1e-9) ? px.baseline / px.median_depth : 0.0; + px.n_points = (int) angles.size(); + return px; +} + +Parallax pair_parallax(const std::vector & points, + const std::vector & opacities, + int H, int W, double opacity_threshold, double focal) { + Parallax px{}; + if (points.size() < 2) return px; + PoseResult pr = estimate_poses(points, opacities, H, W, opacity_threshold, + focal, 100, /*normalize=*/false, /*seed=*/0); + if (pr.cam2world.size() < 2) return px; + const Mat4 & c0 = pr.cam2world[0]; + const Mat4 & c1 = pr.cam2world[1]; + const Vec3 C0 = { c0(0,3), c0(1,3), c0(2,3) }; + const Vec3 C1 = { c1(0,3), c1(1,3), c1(2,3) }; + const Vec3 axis0 = { c0(0,2), c0(1,2), c0(2,2) }; // camera +z (optical axis) in world + px = parallax_stats(C0, C1, axis0, points[0], + opacities.empty() ? nullptr : opacities[0], H * W, opacity_threshold); + px.focal = pr.focal; + return px; +} + +// ---- accumulation --------------------------------------------------------- + +namespace { +inline float clamp01(float v) { return v < 0.0f ? 0.0f : (v > 1.0f ? 1.0f : v); } +const double SH_C0 = 0.28209479177387814; // SH degree-0 basis (DC -> rgb) + +// quaternion (w,x,y,z) from a rotation matrix (Shepperd's method) +std::array mat3_to_quat(const Mat3 & R) { + const double tr = R(0,0) + R(1,1) + R(2,2); + double w,x,y,z; + if (tr > 0) { + double s = std::sqrt(tr + 1.0) * 2; + w = 0.25*s; x = (R(2,1)-R(1,2))/s; y = (R(0,2)-R(2,0))/s; z = (R(1,0)-R(0,1))/s; + } else if (R(0,0) > R(1,1) && R(0,0) > R(2,2)) { + double s = std::sqrt(1.0 + R(0,0) - R(1,1) - R(2,2)) * 2; + w = (R(2,1)-R(1,2))/s; x = 0.25*s; y = (R(0,1)+R(1,0))/s; z = (R(0,2)+R(2,0))/s; + } else if (R(1,1) > R(2,2)) { + double s = std::sqrt(1.0 + R(1,1) - R(0,0) - R(2,2)) * 2; + w = (R(0,2)-R(2,0))/s; x = (R(0,1)+R(1,0))/s; y = 0.25*s; z = (R(1,2)+R(2,1))/s; + } else { + double s = std::sqrt(1.0 + R(2,2) - R(0,0) - R(1,1)) * 2; + w = (R(1,0)-R(0,1))/s; x = (R(0,2)+R(2,0))/s; y = (R(1,2)+R(2,1))/s; z = 0.25*s; + } + return {w,x,y,z}; +} +// Hamilton product (w,x,y,z): compose rotation a then... q = a ⊗ b applies b first. +std::array quat_mul(const std::array & a, const std::array & b) { + return { a[0]*b[0] - a[1]*b[1] - a[2]*b[2] - a[3]*b[3], + a[0]*b[1] + a[1]*b[0] + a[2]*b[3] - a[3]*b[2], + a[0]*b[2] - a[1]*b[3] + a[2]*b[0] + a[3]*b[1], + a[0]*b[3] + a[1]*b[2] - a[2]*b[1] + a[3]*b[0] }; +} +std::array quat_normalize(std::array q) { + const double n = std::sqrt(q[0]*q[0]+q[1]*q[1]+q[2]*q[2]+q[3]*q[3]); + if (n > 1e-12) for (auto & c : q) c /= n; else q = {1,0,0,0}; + return q; +} +} // namespace + +Accumulator::Accumulator(int H, int W, double opacity_threshold, + double ransac_thresh_frac, int ransac_iters, uint64_t seed) + : H_(H), W_(W), thr_(opacity_threshold), rthr_(ransac_thresh_frac), + riters_(ransac_iters), seed_(seed), T_(sim_identity()), final_cam_(mat4_identity()) {} + +void Accumulator::add_view(const float * pts, const float * op, const float * rgb, + const float * scl, const float * rot, const Sim3 & T, int frame) { + const int P = H_ * W_; + // A similarity x -> s*(R@x)+t scales each gaussian's covariance by s^2 and + // rotates it by R: so the world-frame axis lengths are s*scale and the world + // rotation is qT ⊗ q_local. Compute qT (from T.R) once for the whole view. + const std::array qT = mat3_to_quat(T.R); + for (int i = 0; i < P; i++) { + if (op[i] <= thr_) continue; + Vec3 x = { pts[3*i], pts[3*i+1], pts[3*i+2] }; + Vec3 w = sim_apply(T, x); + AccumPoint p; + p.x = (float) w[0]; p.y = (float) w[1]; p.z = (float) w[2]; + p.r = rgb[3*i]; p.g = rgb[3*i+1]; p.b = rgb[3*i+2]; p.opacity = op[i]; + p.sx = (float) (T.s * scl[3*i]); p.sy = (float) (T.s * scl[3*i+1]); p.sz = (float) (T.s * scl[3*i+2]); + std::array ql = quat_normalize({ rot[4*i], rot[4*i+1], rot[4*i+2], rot[4*i+3] }); + std::array qw = quat_normalize(quat_mul(qT, ql)); + p.qw = (float) qw[0]; p.qx = (float) qw[1]; p.qy = (float) qw[2]; p.qz = (float) qw[3]; + p.frame = frame; + cloud_.push_back(p); + } +} + +ChainLink Accumulator::add_pair(const float * g, int gc, double focal) { + const int P = H_ * W_; + // de-interleave the two views: points (ch 0:3), opacity (ch 15, already + // sigmoid-activated), rgb = clip(SH_DC * C0 + 0.5, 0, 1) (ch 3:6). + std::vector pts0(3*P), pts1(3*P), op0(P), op1(P), rgb0(3*P), rgb1(3*P); + std::vector scl0(3*P), scl1(3*P), rot0(4*P), rot1(4*P); + for (int i = 0; i < P; i++) { + const float * a0 = g + (size_t) i * gc; + const float * a1 = g + (size_t) (P + i) * gc; + for (int c = 0; c < 3; c++) { pts0[3*i+c] = a0[c]; pts1[3*i+c] = a1[c]; } + op0[i] = a0[15]; op1[i] = a1[15]; + for (int c = 0; c < 3; c++) { + rgb0[3*i+c] = clamp01((float) (a0[3+c] * SH_C0 + 0.5)); + rgb1[3*i+c] = clamp01((float) (a1[3+c] * SH_C0 + 0.5)); + } + // gaussian shape: scale ch16:19, rotation quaternion (w,x,y,z) ch19:23 + for (int c = 0; c < 3; c++) { scl0[3*i+c] = a0[16+c]; scl1[3*i+c] = a1[16+c]; } + for (int c = 0; c < 4; c++) { rot0[4*i+c] = a0[19+c]; rot1[4*i+c] = a1[19+c]; } + } + + // recover this pair's cameras (view-0 frame, use_first_focal) + std::vector pts = { pts0.data(), pts1.data() }; + std::vector ops = { op0.data(), op1.data() }; + PoseResult pr = estimate_poses(pts, ops, H_, W_, thr_, focal, 100, false, seed_); + const Mat4 c2w0 = pr.cam2world[0], c2w1 = pr.cam2world[1]; + + ChainLink link{}; + if (!have_prev_) { + T_ = sim_identity(); + link.sim = sim_identity(); + link.global = T_; + link.scale = 1.0; link.inlier_frac = 1.0; link.valid_frac = 1.0; link.resid_frac = 0.0; + add_view(pts0.data(), op0.data(), rgb0.data(), scl0.data(), rot0.data(), T_, next_frame_++); // f_0 + add_view(pts1.data(), op1.data(), rgb1.data(), scl1.data(), rot1.data(), T_, next_frame_++); // f_1 + } else { + // fit Sim(3) from this run's view 0 (the shared frame, run-k coords) to the + // previous run's view 1 (same frame, run-(k-1) coords). Mask = valid in both. + std::vector XA, XB; + for (int i = 0; i < P; i++) if (op0[i] > thr_ && prev_mask1_[i]) { + XA.push_back(pts0[3*i]); XA.push_back(pts0[3*i+1]); XA.push_back(pts0[3*i+2]); + XB.push_back(prev_pts1_[3*i]); XB.push_back(prev_pts1_[3*i+1]); XB.push_back(prev_pts1_[3*i+2]); + } + const int n = (int) (XA.size() / 3); + double myB[3]; mean3(XB.data(), n, myB); + std::vector ycB(3*n); + for (int i = 0; i < n; i++) for (int c = 0; c < 3; c++) ycB[3*i+c] = XB[3*i+c] - myB[c]; + const double scene = rms3(ycB.data(), n); + + std::vector inl; + Sim3 S = fit_similarity_ransac(XA.data(), XB.data(), n, rthr_ * scene, riters_, inl, true, seed_); + T_ = sim_compose(T_, S); // run k -> global + + int n_inl = 0; + std::vector rA, rB; + for (int i = 0; i < n; i++) if (inl[i]) { + n_inl++; + Vec3 a = { XA[3*i], XA[3*i+1], XA[3*i+2] }; + Vec3 sa = sim_apply(S, a); + for (int c = 0; c < 3; c++) { rA.push_back(sa[c] - XB[3*i+c]); } + (void) rB; + } + link.sim = S; + link.global = T_; + link.scale = S.s; + link.inlier_frac = (n > 0) ? (double) n_inl / n : 0.0; + link.valid_frac = (double) n / P; + link.resid_frac = (scene > 0 && n_inl > 0) ? rms3(rA.data(), n_inl) / scene : 0.0; + + add_view(pts1.data(), op1.data(), rgb1.data(), scl1.data(), rot1.data(), T_, next_frame_++); // f_{k+1} + } + + cams_.push_back(mat4_mul(sim_matrix(T_), c2w0)); // frame f_k camera (view 0) + final_cam_ = mat4_mul(sim_matrix(T_), c2w1); // latest view-1 camera + + // this run's view 1 becomes the next run's shared frame + prev_pts1_.assign(pts1.begin(), pts1.end()); + prev_mask1_.assign(P, 0); + for (int i = 0; i < P; i++) prev_mask1_[i] = (op1[i] > thr_) ? 1 : 0; + have_prev_ = true; + links_.push_back(link); + return link; +} + +double Accumulator::refine(double voxel_frac, int iters, double alpha) { + return consensus_refine(cloud_, voxel_frac, iters, alpha); +} + +std::vector Accumulator::camera_path() const { + std::vector path = cams_; + if (!cams_.empty()) path.push_back(final_cam_); + return path; +} + +// ---- hierarchical tree accumulation --------------------------------------- + +// One gaussian (gc-channel engine layout) -> AccumPoint at identity (leaf frame). +static AccumPoint tree_leaf_point(const float * a, int frame) { + AccumPoint p; + p.x = a[0]; p.y = a[1]; p.z = a[2]; + p.r = clamp01((float) (a[3] * SH_C0 + 0.5)); + p.g = clamp01((float) (a[4] * SH_C0 + 0.5)); + p.b = clamp01((float) (a[5] * SH_C0 + 0.5)); + p.opacity = a[15]; + p.sx = a[16]; p.sy = a[17]; p.sz = a[18]; + std::array q = quat_normalize({ a[19], a[20], a[21], a[22] }); + p.qw = (float) q[0]; p.qx = (float) q[1]; p.qy = (float) q[2]; p.qz = (float) q[3]; + p.frame = frame; + return p; +} + +// Apply a Sim(3) to a whole cloud (position s*(R@x)+t, scale *s, orientation qS⊗q). +static void tree_apply_sim_cloud(std::vector & c, const Sim3 & S) { + const std::array qS = mat3_to_quat(S.R); + for (auto & p : c) { + Vec3 w = sim_apply(S, { p.x, p.y, p.z }); + p.x = (float) w[0]; p.y = (float) w[1]; p.z = (float) w[2]; + p.sx = (float) (S.s * p.sx); p.sy = (float) (S.s * p.sy); p.sz = (float) (S.s * p.sz); + std::array q = quat_normalize(quat_mul(qS, { p.qw, p.qx, p.qy, p.qz })); + p.qw = (float) q[0]; p.qx = (float) q[1]; p.qy = (float) q[2]; p.qz = (float) q[3]; + } +} +static void tree_apply_sim_pts(std::vector & pts, const Sim3 & S) { + for (size_t i = 0; i + 3 <= pts.size(); i += 3) { + Vec3 w = sim_apply(S, { pts[i], pts[i+1], pts[i+2] }); + pts[i] = (float) w[0]; pts[i+1] = (float) w[1]; pts[i+2] = (float) w[2]; + } +} + +// Robustly register one band of shared-frame correspondences (XA -> XB, both flat +// 3*n): RANSAC similarity at a scene-relative threshold, plus the inlier residual. +// The single registration primitive for both tree merges (caller derives the +// inlier/valid fractions, which it normalises differently per tree). +struct BandFit { Sim3 S; int n_inl; double scene; double resid; }; +static BandFit fit_band_sim(const std::vector & XA, const std::vector & XB, + double rthr, int riters, uint64_t seed) { + const int n = (int) (XA.size() / 3); + BandFit f{ sim_identity(), 0, 0.0, 0.0 }; + if (n < 3) return f; + double mB[3]; mean3(XB.data(), n, mB); + std::vector yc(3 * n); + for (int k = 0; k < n; k++) for (int c = 0; c < 3; c++) yc[3*k+c] = XB[3*k+c] - mB[c]; + f.scene = rms3(yc.data(), n); + std::vector inl; + f.S = fit_similarity_ransac(XA.data(), XB.data(), n, rthr * f.scene, riters, inl, true, seed); + std::vector rA; + for (int k = 0; k < n; k++) if (inl[k]) { + f.n_inl++; + Vec3 sa = sim_apply(f.S, { XA[3*k], XA[3*k+1], XA[3*k+2] }); + for (int c = 0; c < 3; c++) rA.push_back(sa[c] - XB[3*k+c]); + } + f.resid = (f.scene > 0 && f.n_inl > 0) ? rms3(rA.data(), f.n_inl) / f.scene : 0.0; + return f; +} + +// ---- multi-frame-overlap tree --------------------------------------------- + +// A submap covering a contiguous global frame range, carrying the per-pixel point +// arrays of every frame it spans (in the submap's local frame) so a neighbour can +// align on the whole band of frames they share, not just one. +struct ONode { + std::vector cloud; + int lo, hi; // inclusive global frame range + std::vector> fpts; // [f-lo] -> P*3 per-pixel points + std::vector> fmsk; // [f-lo] -> P opacity mask +}; + +// One pair (frames a, a+1) as a 2-frame submap (identity local frame). +static ONode opair(const float * g, int a, int P, int gc, double thr) { + ONode nd; nd.lo = a; nd.hi = a + 1; + nd.fpts.assign(2, {}); nd.fmsk.assign(2, {}); + nd.fpts[0].resize(3*P); nd.fpts[1].resize(3*P); + nd.fmsk[0].assign(P, 0); nd.fmsk[1].assign(P, 0); + nd.cloud.reserve(2*P); + for (int i = 0; i < P; i++) { + const float * a0 = g + (size_t) i * gc; + const float * a1 = g + (size_t) (P + i) * gc; + for (int c = 0; c < 3; c++) { nd.fpts[0][3*i+c] = a0[c]; nd.fpts[1][3*i+c] = a1[c]; } + nd.fmsk[0][i] = (a0[15] > thr) ? 1 : 0; + nd.fmsk[1][i] = (a1[15] > thr) ? 1 : 0; + if (a0[15] > thr) nd.cloud.push_back(tree_leaf_point(a0, a)); + if (a1[15] > thr) nd.cloud.push_back(tree_leaf_point(a1, a + 1)); + } + return nd; +} + +// Merge R into L's frame, fitting one Sim(3) over EVERY frame they share. Returns +// the combined submap; optionally records the merge diagnostics. +static ONode omerge(ONode & L, ONode & R, int P, double rthr, int riters, uint64_t seed, + std::vector * merges, int level) { + const int so = std::max(L.lo, R.lo), sh = std::min(L.hi, R.hi); // shared frame range + std::vector XA, XB; + for (int f = so; f <= sh; f++) { + const std::vector & ra = R.fpts[f - R.lo]; const std::vector & rm = R.fmsk[f - R.lo]; + const std::vector & la = L.fpts[f - L.lo]; const std::vector & lm = L.fmsk[f - L.lo]; + for (int i = 0; i < P; i++) if (rm[i] && lm[i]) { + XA.push_back(ra[3*i]); XA.push_back(ra[3*i+1]); XA.push_back(ra[3*i+2]); + XB.push_back(la[3*i]); XB.push_back(la[3*i+1]); XB.push_back(la[3*i+2]); + } + } + const int n = (int) (XA.size() / 3); + const int shared = sh - so + 1; + BandFit fit = fit_band_sim(XA, XB, rthr, riters, seed); + const Sim3 & S = fit.S; + if (merges) merges->push_back(TreeMerge{ level, std::min(L.lo,R.lo), std::max(L.hi,R.hi), shared, + S.s, (n > 0) ? (double) fit.n_inl / n : 0.0, (double) n / ((double) shared * P), fit.resid }); + + tree_apply_sim_cloud(R.cloud, S); + for (auto & a : R.fpts) tree_apply_sim_pts(a, S); + ONode m; m.lo = std::min(L.lo, R.lo); m.hi = std::max(L.hi, R.hi); + const int F = m.hi - m.lo + 1; + m.fpts.assign(F, {}); m.fmsk.assign(F, {}); + for (int f = L.lo; f <= L.hi; f++) { m.fpts[f-m.lo] = std::move(L.fpts[f-L.lo]); m.fmsk[f-m.lo] = std::move(L.fmsk[f-L.lo]); } + for (int f = R.lo; f <= R.hi; f++) if (m.fpts[f-m.lo].empty()) { m.fpts[f-m.lo] = std::move(R.fpts[f-R.lo]); m.fmsk[f-m.lo] = std::move(R.fmsk[f-R.lo]); } + m.cloud = std::move(L.cloud); + m.cloud.insert(m.cloud.end(), R.cloud.begin(), R.cloud.end()); + return m; +} + +std::vector tree_accumulate_overlap(const std::vector & pairs, + int H, int W, int gc, double thr, + int block, int overlap, + double rthr, int riters, uint64_t seed, + std::vector * merges, + int max_levels, double layout_spacing, + int * n_nodes_out, int per_node_cap) { + const int P = H * W; + const int M = (int) pairs.size(); + if (n_nodes_out) *n_nodes_out = 0; + if (M < 1) return {}; + const int nframes = M + 1; + overlap = std::max(1, overlap); + block = std::max(block, overlap + 1); + block = std::min(block, nframes); + overlap = std::min(overlap, block - 1); + const int step = std::max(1, block - overlap); + + // overlapping base submaps: each chains `block-1` consecutive pairs (single-frame + // internal links), adjacent submaps share `overlap` frames. + std::vector nodes; + for (int s = 0; s < M; s += step) { + const int w = std::min(block, nframes - s); // frames in this submap + if (w < 2) break; + ONode nd = opair(pairs[s], s, P, gc, thr); // frames s, s+1 + for (int p = s + 1; p <= s + w - 2; p++) { // fold the rest of the block + ONode nx = opair(pairs[p], p, P, gc, thr); + nd = omerge(nd, nx, P, rthr, riters, seed, nullptr, -1); + } + nodes.push_back(std::move(nd)); + if (s + block >= nframes) break; // last submap reached the end + } + + // median base-submap extent — the auto layout pitch for side-by-side stages. + // Only the auto-layout (layout_spacing<0, staged demo) path reads it, so the + // full-merge callers skip these two passes over every submap. + double leaf_extent = 1.0; + if (layout_spacing < 0.0) { + std::vector ex; + for (auto & nd : nodes) { + double m[3] = {0,0,0}; int64_t nf = 0; + for (auto & p : nd.cloud) if (std::isfinite(p.x) && std::isfinite(p.y) && std::isfinite(p.z)) { + m[0]+=p.x; m[1]+=p.y; m[2]+=p.z; nf++; } + if (nf == 0) continue; + for (int c = 0; c < 3; c++) m[c] /= nf; + double ss = 0; + for (auto & p : nd.cloud) if (std::isfinite(p.x) && std::isfinite(p.y) && std::isfinite(p.z)) { + double dx=p.x-m[0], dy=p.y-m[1], dz=p.z-m[2]; ss += dx*dx+dy*dy+dz*dz; } + ex.push_back(std::sqrt(ss / nf)); + } + if (!ex.empty()) { std::sort(ex.begin(), ex.end()); leaf_extent = ex[ex.size()/2]; } + } + + // balanced tree-merge of submaps; each merge fits over the overlap band. Stop + // after max_levels rounds (-1 = fully to the root) for the staged demo. + int level = 0; + while (nodes.size() > 1 && (max_levels < 0 || level < max_levels)) { + std::vector nxt; + nxt.reserve(nodes.size()/2 + 1); + for (size_t i = 0; i + 1 < nodes.size(); i += 2) + nxt.push_back(omerge(nodes[i], nodes[i+1], P, rthr, riters, seed, merges, level)); + if (nodes.size() % 2 == 1) nxt.push_back(std::move(nodes.back())); + nodes = std::move(nxt); level++; + } + if (n_nodes_out) *n_nodes_out = (int) nodes.size(); + if (nodes.empty()) return {}; + + // concatenate remaining nodes, laid out side by side (layout_spacing != 0) and + // capped per node (opacity*radius) — same as tree_accumulate. + const double sigma = (layout_spacing < 0.0) ? 4.0 * leaf_extent : layout_spacing; + const double gcentre = 0.5 * M; + auto importance = [](const AccumPoint & p) -> double { + const double vol = (double) std::max(p.sx,1e-9f) * std::max(p.sy,1e-9f) * std::max(p.sz,1e-9f); + return (double) std::max(p.opacity, 0.0f) * std::cbrt(vol); + }; + std::vector out; + size_t total = 0; + for (auto & nd : nodes) + total += (per_node_cap > 0) ? std::min((size_t) per_node_cap, nd.cloud.size()) : nd.cloud.size(); + out.reserve(total); + for (auto & nd : nodes) { + const double dx = (sigma != 0.0) ? (0.5 * (nd.lo + nd.hi) - gcentre) * sigma : 0.0; + if (per_node_cap > 0 && (int) nd.cloud.size() > per_node_cap) { // keep each scene's own detail + std::vector idx(nd.cloud.size()); + for (size_t i = 0; i < idx.size(); i++) idx[i] = i; + std::partial_sort(idx.begin(), idx.begin() + per_node_cap, idx.end(), + [&](size_t a, size_t b){ return importance(nd.cloud[a]) > importance(nd.cloud[b]); }); + for (int i = 0; i < per_node_cap; i++) { AccumPoint p = nd.cloud[idx[i]]; p.x = (float) (p.x + dx); out.push_back(p); } + } else { + for (const auto & src : nd.cloud) { AccumPoint p = src; p.x = (float) (p.x + dx); out.push_back(p); } + } + } + return out; +} + +// ---- consensus fusion ----------------------------------------------------- + +FuseStats consensus_fuse(const std::vector & cloud, double voxel_frac, int k, + std::vector & fused, int mode) { + fused.clear(); + FuseStats st{}; + st.raw_points = (int64_t) cloud.size(); + if (cloud.empty()) return st; + + // A non-finite point (NaN/Inf — possible since the cloud is engine output and + // this is a public boundary) would poison the extent and, worse, make the + // float->int voxel-coord cast UB. Skip them throughout; fuse only finite points. + auto finite = [](const AccumPoint & p) { + return std::isfinite(p.x) && std::isfinite(p.y) && std::isfinite(p.z); + }; + // extent = rms distance to centroid; voxel = voxel_frac * extent; coords from min. + double mean[3] = {0,0,0}, lo[3] = {1e300,1e300,1e300}; + int64_t nf = 0; + for (const auto & p : cloud) { + if (!finite(p)) continue; + mean[0]+=p.x; mean[1]+=p.y; mean[2]+=p.z; + lo[0]=std::min(lo[0],(double)p.x); lo[1]=std::min(lo[1],(double)p.y); lo[2]=std::min(lo[2],(double)p.z); + nf++; + } + if (nf == 0) return st; + for (int c = 0; c < 3; c++) mean[c] /= nf; + double ss = 0; + for (const auto & p : cloud) { + if (!finite(p)) continue; + const double dx=p.x-mean[0], dy=p.y-mean[1], dz=p.z-mean[2]; + ss += dx*dx + dy*dy + dz*dz; + } + const double ext = std::sqrt(ss / nf); + const double v = voxel_frac * ext; + if (!(v > 0)) return st; + + struct Vox { double sx=0,sy=0,sz=0,sr=0,sg=0,sb=0,sop=0; double ssx=0,ssy=0,ssz=0; int64_t cnt=0; + float q[4]={1,0,0,0}; bool has_q=false; + std::vector> frames; // (frame, opacity weight) + int best_frame() const { + int32_t b=-1; double bw=-1; + for (auto & fr : frames) if (fr.second > bw) { bw=fr.second; b=fr.first; } + return b; } }; + struct Key { int32_t i,j,k; bool operator==(const Key&o) const { return i==o.i&&j==o.j&&k==o.k; } }; + struct KeyHash { size_t operator()(const Key&q) const { + uint64_t h = (uint64_t)(uint32_t)q.i * 0x9E3779B97F4A7C15ULL; + h ^= (uint64_t)(uint32_t)q.j + 0x9E3779B9U + (h<<6) + (h>>2); + h ^= (uint64_t)(uint32_t)q.k + 0x85EBCA6BU + (h<<6) + (h>>2); + return (size_t) h; } }; + std::unordered_map grid; + grid.reserve(cloud.size() / 2 + 1); + + auto vcoord = [&](double x, double lov) -> int32_t { + const double c = std::floor((x - lov) / v); // finite (x finite, v>0) + // clamp to int32 so an extreme-but-finite coordinate can't overflow the cast + if (c < -2147483640.0) return -2147483640; + if (c > 2147483640.0) return 2147483640; + return (int32_t) c; + }; + for (const auto & p : cloud) { + if (!finite(p)) continue; + Key key{ vcoord(p.x, lo[0]), vcoord(p.y, lo[1]), vcoord(p.z, lo[2]) }; + Vox & vx = grid[key]; + vx.sx+=p.x; vx.sy+=p.y; vx.sz+=p.z; vx.sr+=p.r; vx.sg+=p.g; vx.sb+=p.b; vx.sop+=p.opacity; vx.cnt++; + vx.ssx+=p.sx; vx.ssy+=p.sy; vx.ssz+=p.sz; + if (!vx.has_q) { vx.q[0]=p.qw; vx.q[1]=p.qx; vx.q[2]=p.qy; vx.q[3]=p.qz; vx.has_q=true; } // representative orientation + const double w = std::max((double) p.opacity, 1e-3); + bool seen = false; + for (auto & fr : vx.frames) if (fr.first == p.frame) { fr.second += w; seen = true; break; } + if (!seen) vx.frames.push_back({ p.frame, w }); + } + + st.voxels = (int64_t) grid.size(); + for (const auto & kv : grid) { + if ((int) kv.second.frames.size() < k) continue; + st.kept_voxels++; + st.points_kept += kv.second.cnt; + } + st.points_dropped = st.raw_points - st.points_kept; + + if (mode == FUSE_KEPT) { + // DENSE: keep every raw gaussian whose voxel is corroborated by >= k frames + // (floaters dropped, nothing averaged/decimated away). + fused.reserve((size_t) st.points_kept); + for (const auto & p : cloud) { + if (!finite(p)) continue; + Key key{ vcoord(p.x, lo[0]), vcoord(p.y, lo[1]), vcoord(p.z, lo[2]) }; + auto it = grid.find(key); + if (it != grid.end() && (int) it->second.frames.size() >= k) fused.push_back(p); + } + } else if (mode == FUSE_BEST) { + // DENSE + DE-GHOSTED: in each consensus voxel keep only the single most- + // confident frame's gaussians, so overlapping (disagreeing) copies aren't + // stacked — one frame represents each region. + for (const auto & p : cloud) { + if (!finite(p)) continue; + Key key{ vcoord(p.x, lo[0]), vcoord(p.y, lo[1]), vcoord(p.z, lo[2]) }; + auto it = grid.find(key); + if (it != grid.end() && (int) it->second.frames.size() >= k && p.frame == it->second.best_frame()) + fused.push_back(p); + } + } else { + // DENOISED: one averaged gaussian per consensus voxel. + for (const auto & kv : grid) { + const Vox & vx = kv.second; + if ((int) vx.frames.size() < k) continue; + AccumPoint p; + p.x = (float) (vx.sx / vx.cnt); p.y = (float) (vx.sy / vx.cnt); p.z = (float) (vx.sz / vx.cnt); + p.r = (float) (vx.sr / vx.cnt); p.g = (float) (vx.sg / vx.cnt); p.b = (float) (vx.sb / vx.cnt); + p.opacity = (float) (vx.sop / vx.cnt); + p.sx = (float) (vx.ssx / vx.cnt); p.sy = (float) (vx.ssy / vx.cnt); p.sz = (float) (vx.ssz / vx.cnt); + p.qw = vx.q[0]; p.qx = vx.q[1]; p.qy = vx.q[2]; p.qz = vx.q[3]; + p.frame = (int32_t) vx.frames.size(); // support count (informational) + fused.push_back(p); + } + } + return st; +} + +// ---- gaussian-level geometric de-ghost ------------------------------------- + +double consensus_refine(std::vector & cloud, double voxel_frac, int iters, double alpha) { + if (cloud.size() < 2 || iters < 1) return 0.0; + auto finite = [](const AccumPoint & p) { + return std::isfinite(p.x) && std::isfinite(p.y) && std::isfinite(p.z); + }; + struct Key { int32_t i,j,k; bool operator==(const Key&o) const { return i==o.i&&j==o.j&&k==o.k; } }; + struct KeyHash { size_t operator()(const Key&q) const { + uint64_t h = (uint64_t)(uint32_t)q.i * 0x9E3779B97F4A7C15ULL; + h ^= (uint64_t)(uint32_t)q.j + 0x9E3779B9U + (h<<6) + (h>>2); + h ^= (uint64_t)(uint32_t)q.k + 0x85EBCA6BU + (h<<6) + (h>>2); + return (size_t) h; } }; + struct FC { int32_t f; double sx,sy,sz,w; }; // per-frame opacity-weighted centroid + + double disagree = 0.0; + for (int it = 0; it < iters; it++) { + double mean[3]={0,0,0}, lo[3]={1e300,1e300,1e300}; int64_t nf=0; + for (const auto & p : cloud) { if (!finite(p)) continue; + mean[0]+=p.x; mean[1]+=p.y; mean[2]+=p.z; + lo[0]=std::min(lo[0],(double)p.x); lo[1]=std::min(lo[1],(double)p.y); lo[2]=std::min(lo[2],(double)p.z); nf++; } + if (nf == 0) return disagree; + for (int c=0;c<3;c++) mean[c]/=nf; + double ss=0; for (const auto & p : cloud) { if (!finite(p)) continue; + const double dx=p.x-mean[0], dy=p.y-mean[1], dz=p.z-mean[2]; ss+=dx*dx+dy*dy+dz*dz; } + const double ext = std::sqrt(ss / nf); + const double frac = (iters > 1) + ? voxel_frac * std::pow(2.0, 1.0 - (double) it / (iters - 1)) // 2*vf -> vf (coarse->fine) + : voxel_frac; + const double v = frac * ext; + if (!(v > 0)) return disagree; + auto vc = [&](double x, double l) -> int32_t { + const double c = std::floor((x - l) / v); + if (c < -2.1e9) return -2100000000; + if (c > 2.1e9) return 2100000000; + return (int32_t) c; }; + + // pass 1: per-voxel, per-frame opacity-weighted centroid (a snapshot) + std::unordered_map, KeyHash> grid; + grid.reserve(cloud.size()/2 + 1); + for (const auto & p : cloud) { if (!finite(p)) continue; + Key key{ vc(p.x,lo[0]), vc(p.y,lo[1]), vc(p.z,lo[2]) }; + const double w = std::max((double)p.opacity, 1e-3); + auto & vec = grid[key]; bool hit=false; + for (auto & fc : vec) if (fc.f==p.frame) { fc.sx+=w*p.x; fc.sy+=w*p.y; fc.sz+=w*p.z; fc.w+=w; hit=true; break; } + if (!hit) vec.push_back({ p.frame, w*(double)p.x, w*(double)p.y, w*(double)p.z, w }); + } + // pass 2: move each point toward the OTHER frames' consensus in its voxel + double dsum=0; int64_t dcnt=0; + for (auto & p : cloud) { if (!finite(p)) continue; + Key key{ vc(p.x,lo[0]), vc(p.y,lo[1]), vc(p.z,lo[2]) }; + auto git = grid.find(key); + if (git == grid.end() || git->second.size() < 2) continue; + double tx=0,ty=0,tz=0,tw=0; + for (const auto & fc : git->second) { if (fc.f==p.frame) continue; tx+=fc.sx; ty+=fc.sy; tz+=fc.sz; tw+=fc.w; } + if (tw <= 0) continue; + tx/=tw; ty/=tw; tz/=tw; + const double dx=tx-p.x, dy=ty-p.y, dz=tz-p.z; + dsum += dx*dx+dy*dy+dz*dz; dcnt++; + p.x += (float)(alpha*dx); p.y += (float)(alpha*dy); p.z += (float)(alpha*dz); + } + disagree = (dcnt && ext > 0) ? std::sqrt(dsum/dcnt)/ext : 0.0; + } + return disagree; +} + +// ---- loop closure --------------------------------------------------------- + +Mat4 sim4_invert(const Mat4 & M) { + Mat3 A{}; Vec3 t{}; + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) A(i,j) = M(i,j); t[i] = M(i,3); } + const double s = std::cbrt(det3(A)); + Mat3 R = A; + for (int i = 0; i < 9; i++) R.a[i] /= s; + Sim3 inv = sim_invert(Sim3{ s, R, t }); + return sim_matrix(inv); +} + +std::vector distribute_drift(const Mat4 & D, const std::vector & global_poses) { + const int m = (int) global_poses.size(); + std::vector out(m); + const int n = std::max(m - 1, 1); + for (int k = 0; k < m; k++) { + Mat4 Dk = sim_frac_power(D, (double) k / n); + Vec3 c = { global_poses[k](0,3), global_poses[k](1,3), global_poses[k](2,3) }; + Vec3 r = mat3_apply(Mat3{ { Dk(0,0),Dk(0,1),Dk(0,2), Dk(1,0),Dk(1,1),Dk(1,2), Dk(2,0),Dk(2,1),Dk(2,2) } }, c); + out[k] = { r[0] + Dk(0,3), r[1] + Dk(1,3), r[2] + Dk(2,3) }; + } + return out; +} + +} // namespace pose +} // namespace free_splatter diff --git a/src/pose.h b/src/pose.h new file mode 100644 index 0000000..dd0e50a --- /dev/null +++ b/src/pose.h @@ -0,0 +1,288 @@ +// pose.h — downstream camera-pose recovery + cross-run alignment (C++ port). +// +// This is the C++ port of the (now removed; see git history) proven `pose/` +// research prototype: given the engine's per-pixel gaussians it recovers each +// view's camera (PnP), and aligns successive runs into one accumulating world +// (Sim(3)). It is dependency-free — only the self-contained linalg.h — so it +// ships from the CLI / C API with no Python and no Eigen/OpenCV (see CLAUDE.md). +// +// Faithful to FreeSplatter's scene recipe (estimate_poses -> DUSt3R fast_pnp): +// the prototype's numpy reference solver was verified ~1e-7 against cv2 and +// bit-exact to upstream estimate_poses on real engine output. +// +// Conventions: f64 throughout; a similarity acts as x -> s*(R@x)+t; gaussian +// channel layout (scene, 23ch) is xyz[0:3] SH[3:15] opacity[15] scale[16:19] +// rotation[19:23], opacity already activated (sigmoid) in [0,1]. +#ifndef FREE_SPLATTER_POSE_H +#define FREE_SPLATTER_POSE_H + +#include "linalg.h" + +#include +#include + +namespace free_splatter { +namespace pose { + +using fsla::Vec3; +using fsla::Mat3; +using fsla::Mat4; + +// ---- focal (Weiszfeld shared-focal, mirrors pose/focal.py) ----------------- +// Robust (L1) focal from camera-frame points and their pixels, principal point +// pp. N points, pts3d is N*3 (x,y,z), pixels is N*2 (col,row), all row-major. +double estimate_focal(const double * pts3d, const double * pixels, int N, + double ppx, double ppy, int weiszfeld_iters = 10); + +// ---- similarity alignment (mirrors pose/align.py) -------------------------- +struct Sim3 { double s; Mat3 R; Vec3 t; }; // x -> s*(R@x)+t + +Sim3 sim_identity(); +Vec3 sim_apply(const Sim3 & T, const Vec3 & x); +Sim3 sim_compose(const Sim3 & T2, const Sim3 & T1); // apply T1 then T2 +Sim3 sim_invert(const Sim3 & T); +Mat4 sim_matrix(const Sim3 & T); // 4x4 [[sR,t],[0,1]] + +// Umeyama closed-form (s,R,t) minimizing ||sRX+t - Y||^2. with_scale=false -> rigid. +// X, Y are N*3 row-major. +Sim3 fit_similarity(const double * X, const double * Y, int N, bool with_scale = true); + +// RANSAC similarity robust to gross outliers; fills `inliers` (size N, 0/1). +Sim3 fit_similarity_ransac(const double * X, const double * Y, int N, double thresh, + int iters, std::vector & inliers, + bool with_scale = true, uint64_t seed = 0); + +// Residual ladder (rigid -> similarity -> affine) + depth correlation, and the +// verdict that classifies the A<->B mismatch. Mirrors align.diagnose. +struct Ladder { + double scene, rigid, similarity, affine, scale, depth_corr, aff_gain; + bool structured; + std::string verdict; // rigid_ok | needs_scale | needs_affine | nonlinear | similarity_plus_noise +}; +Ladder diagnose(const double * X, const double * Y, int N, + double tol = 1e-3, double corr_tol = 0.3); + +// Loop-closure metrics for a closed chain of similarities (deviation from I). +struct LoopError { double scale_err, rot_deg, trans; }; +LoopError loop_closure_error(const std::vector & links); + +// Fractional power M^f of a Sim(3) 4x4 (even loop-closure relaxation). Real part +// of the eigendecomposition; valid while rotation < 180deg. Mirrors sim_frac_power. +Mat4 sim_frac_power(const Mat4 & M, double f); + +// ---- PnP ------------------------------------------------------------------- +// RANSAC DLT PnP with known intrinsics K (the prototype's numpy backend). +// Xw is N*3 world points, pixels is N*2 (col,row). Returns cam2world (4x4) and +// fills `inliers` (size N, 0/1). Kept as the asset-free golden-test reference; on +// real scenes the DLT inherits the planar/mirror degeneracy — use solve_pnp. +Mat4 solve_pnp_numpy(const double * Xw, const double * pixels, int N, const Mat3 & K, + std::vector & inliers, + double thresh_px = 5.0, int iters = 100, uint64_t seed = 0); + +// Robust PnP: EPnP (planar-robust, deterministic — uses all points, no random +// minimal samples) for the initial pose, then a Gauss-Newton / Huber reprojection +// refine. Reaches cv2/SQPNP-grade poses on real scenes with no OpenCV dependency. +// Fills `inliers` (reprojection < thresh_px). This is what estimate_poses uses. +Mat4 solve_pnp(const double * Xw, const double * pixels, int N, const Mat3 & K, + std::vector & inliers, double thresh_px = 5.0, int gn_iters = 10); + +// EPnP-only (no refine) — exposed for testing the planar-robust initialization. +Mat4 solve_pnp_epnp(const double * Xw, const double * pixels, int N, const Mat3 & K); + +// Integer pixel grid (col,row), row-major over H*W — matches upstream xy_grid +// (no half-pixel offset). Fills `out` with 2*H*W doubles. +void pixel_grid(int H, int W, std::vector & out); + +// ---- estimate_poses orchestration (scene recipe, mirrors pnp.estimate_poses) - +struct PoseResult { + std::vector cam2world; + double focal; + double scale; +}; +// points: N views, each H*W*3 (gaussian centers, view-0 frame). opacities: N +// views, each H*W (activated; pass nullptr entries for no mask). focal<=0 -> +// estimate from view 0 over all pixels (use_first_focal). normalize -> the +// runner's 1/baseline camera rescale. +PoseResult estimate_poses(const std::vector & points, + const std::vector & opacities, + int H, int W, + double opacity_threshold = 0.05, + double focal = -1.0, int pnp_iter = 100, + bool normalize = false, uint64_t seed = 0); + +// ---- after-inference parallax (depth conditioning of a recovered pair) ------- +// Quantifies how well a 2-view pair constrains depth, from the model's OWN +// recovered geometry. All angle fields are scale-invariant; B/Z is consistent +// because cameras and points share the (un-normalized) view-0 frame. +struct Parallax { + double tri_angle_deg; // median triangulation angle over confident points + double lateral_angle_deg; // baseline angle off view-0 optical axis (0=dolly, 90=strafe) + double baseline_over_depth; // ||C1-C0|| / median point depth from camera 0 + double baseline; + double median_depth; + double focal; // estimated (or supplied) focal, px + int n_points; // confident points used for the medians +}; + +// Pure geometry core (no PnP) — directly testable against hand-computed angles. +// pts: 3*N view-0-frame points; mask: per-point weight (>thr kept), or nullptr. +// axis0 need not be unit; it is normalized internally. +Parallax parallax_stats(const Vec3 & C0, const Vec3 & C1, const Vec3 & axis0, + const float * pts, const float * mask, int N, double mask_thr); + +// Full path: recover the pair's cameras (normalize=false), then parallax_stats. +// points/opacities: 2 views (view-0-frame xyz + activated opacity), as estimate_poses. +Parallax pair_parallax(const std::vector & points, + const std::vector & opacities, + int H, int W, double opacity_threshold = 0.05, double focal = -1.0); + +// ---- accumulation: chain successive runs into one world (mirrors accumulate.py) +// One accumulated 3D gaussian in the global frame, tagged with the source frame +// it came from (consensus fusion needs the frame id). Carries the full anisotropic +// shape (scale + rotation quaternion w,x,y,z), transformed through the cross-run +// Sim(3), so the cloud renders as oriented splats, not isotropic points. rgb [0,1]. +struct AccumPoint { + float x, y, z; + float r, g, b, opacity; // color [0,1] and activated opacity [0,1] (the splat alpha) + float sx, sy, sz; // anisotropic gaussian scale (global frame) + float qw, qx, qy, qz; // gaussian rotation quaternion (w,x,y,z), global frame + int32_t frame; +}; + +// Diagnostics for chaining one run into the global frame. +struct ChainLink { + Sim3 sim; // run k -> run k-1 (local similarity); identity for run 0 + Sim3 global; // run k -> global frame (T_k = T_{k-1} . sim) + double scale; // sim.s — per-link scale drift + double inlier_frac; // RANSAC inliers / shared-frame valid points + double valid_frac; // pixels valid (opacity) in BOTH shared views + double resid_frac; // inlier RANSAC residual / target scene extent +}; + +// Sliding-window accumulator: feed it each consecutive pair's engine output +// ([2,H,W,gc], where view 0 shares a frame with the previous pair's view 1) and +// it recovers each pair's cameras (estimate_poses), fits the cross-run Sim(3) on +// the shared frame, composes a global chain, and drops every new frame's gaussians +// into one world. This is the live accumulating-reconstruction loop, in C++. +class Accumulator { +public: + Accumulator(int H, int W, double opacity_threshold = 0.05, + double ransac_thresh_frac = 0.02, int ransac_iters = 300, + uint64_t seed = 0); + + // gaussians: 2*H*W*gc floats (two views, engine channel layout). Recovers the + // pair's cameras, chains the Sim(3), accumulates the new frame(s). focal<=0 -> + // estimate per pair (use_first_focal). Returns the chain link just added. + ChainLink add_pair(const float * gaussians, int gaussian_channels, double focal = -1.0); + + const std::vector & cloud() const { return cloud_; } + const std::vector & links() const { return links_; } + // count+1 global cam2world (similarity) matrices: frame f_k = T_k . c2w_k[view0], + // plus the final frame f_n = T_{n-1} . c2w_{n-1}[view1]. Mirrors accumulate.py's `rec`. + std::vector camera_path() const; + int frame_count() const { return next_frame_; } + int pair_count() const { return (int) links_.size(); } + const Sim3 & global_transform() const { return T_; } + // Geometric de-ghost the accumulated cloud in place (consensus_refine). + double refine(double voxel_frac = 0.03, int iters = 8, double alpha = 0.5); + +private: + void add_view(const float * pts, const float * op, const float * rgb, + const float * scl, const float * rot, // P*3 scale, P*4 quat (w,x,y,z) + const Sim3 & T, int frame); + + int H_, W_; + double thr_, rthr_; + int riters_; + uint64_t seed_; + Sim3 T_; + bool have_prev_ = false; + std::vector prev_pts1_; // 3*P previous view-1 points (the shared frame) + std::vector prev_mask1_; // P previous view-1 opacity mask + std::vector cloud_; + std::vector cams_; // per-pair view-0 global cam2world + Mat4 final_cam_; // latest view-1 global cam2world (appended last) + std::vector links_; + int next_frame_ = 0; +}; + +// ---- hierarchical tree accumulation (balanced binary merge) ---------------- +// Linear chaining composes N Sim(3)s in series, so registration drift compounds +// over N hops and piles onto the last frame (the far end smears). This instead +// builds short submaps and merges them pairwise up a balanced binary tree — so +// drift compounds over only ~log2(N) hops and is spread evenly rather than dumped +// on the end. +struct TreeMerge { // diagnostics per merge (defs as ChainLink) + int level, lo_frame, hi_frame, shared_frame; + double scale, inlier_frac, valid_frac, resid_frac; +}; + +// Multi-frame-overlap tree. pairs[k]: one engine output [2*H*W*gc] (pair k = frames +// k,k+1; pair k's view-1 and pair k+1's view-0 are the same physical frame). +// Adjacent submaps of `block` frames overlap by `overlap` frames and each merge fits +// its Sim(3) over ALL shared frames at once — the fit is over-determined, so the +// per-frame depth-inconsistency noise (the reason a single-frame alignment is only +// ~13% rigid-consistent) averages out and the compounded drift drops. Base submaps +// are short windows of `block` frames (block > overlap), chained internally by +// single-frame links, then merged in a balanced tree on their overlap bands. The +// degenerate block=2,overlap=1 is the plain overlap-by-one (single shared boundary) +// tree. Returns the merged root cloud in the first submap's frame; `merges` +// (optional) gets one entry per merge. +// +// For the staged side-by-side demo: max_levels caps the merge rounds (-1 = full): +// 0 returns the independent base submaps, 1 the first merge round, ... . The +// remaining (>1) nodes are shifted apart along X by their frame-range midpoint when +// layout_spacing != 0 (0 = none, <0 = auto 4x median submap extent, >0 = that +// spacing), each capped to its top per_node_cap points by opacity*radius (0 = +// uncapped) so every laid-out scene renders at its own detail. n_nodes_out +// (optional) reports how many nodes remain (1 once fully merged). +std::vector tree_accumulate_overlap(const std::vector & pairs, + int H, int W, int gaussian_channels, + double opacity_threshold = 0.05, + int block = 4, int overlap = 2, + double ransac_thresh_frac = 0.02, + int ransac_iters = 300, uint64_t seed = 0, + std::vector * merges = nullptr, + int max_levels = -1, double layout_spacing = 0.0, + int * n_nodes_out = nullptr, int per_node_cap = 0); + +// ---- consensus fusion (mirrors fuse.py) ----------------------------------- +struct FuseStats { + int64_t raw_points, voxels, kept_voxels, points_kept, points_dropped; +}; +// Voxelize the cloud at voxel_frac * cloud-extent and keep only voxels corroborated +// by >= k DISTINCT source frames (single-frame floaters dropped). `mode` selects how +// a consensus voxel is emitted: +// FUSE_AVERAGED (0): one averaged gaussian per voxel — denoised but decimated +// (sparse unless many frames overlap). +// FUSE_KEPT (1): every raw gaussian in the voxel — dense, nothing averaged. +// FUSE_BEST (2): only the single most-confident frame's gaussians per voxel — +// dense AND de-ghosted (no overlapping copies stacked). +enum FuseMode { FUSE_AVERAGED = 0, FUSE_KEPT = 1, FUSE_BEST = 2 }; +FuseStats consensus_fuse(const std::vector & cloud, double voxel_frac, int k, + std::vector & fused, int mode = FUSE_AVERAGED); + +// ---- gaussian-level geometric de-ghost (multi-view consensus refinement) ---- +// Pairwise Sim(3) chaining leaves per-OBJECT (non-rigid) misregistration that a +// per-frame pose fit can't remove, so overlapping frames show doubled objects. +// This refines the cloud IN PLACE at the gaussian level: each iteration, for every +// point, moves it a fraction `alpha` toward the opacity-weighted consensus of the +// OTHER frames' points in its (coarse-to-fine) voxel neighbourhood. Non-rigid and +// local, so spatially-varying ghosting collapses to one surface; single-frame +// regions (no other frame nearby) are left untouched. Returns the RMS point-to- +// consensus distance after refinement, as a fraction of cloud extent. +double consensus_refine(std::vector & cloud, double voxel_frac = 0.03, + int iters = 8, double alpha = 0.5); + +// ---- loop closure (mirrors loop_closure.py) ------------------------------- +// Inverse of a similarity 4x4 [[sR,t],[0,1]] (sR = s*rotation): [[(1/s)Rᵀ,...],[0,1]]. +Mat4 sim4_invert(const Mat4 & M); +// Distribute an accumulated Sim(3) drift D over a chain of n+1 nodes by applying +// D^(k/n) at node k (even relaxation, sim_frac_power). Returns the corrected camera +// centers. global_poses are the open-loop global cam2world (similarity) matrices. +std::vector distribute_drift(const Mat4 & D, const std::vector & global_poses); + +} // namespace pose +} // namespace free_splatter + +#endif // FREE_SPLATTER_POSE_H diff --git a/src/splat.h b/src/splat.h new file mode 100644 index 0000000..f5cc2e3 --- /dev/null +++ b/src/splat.h @@ -0,0 +1,44 @@ +// splat.h — the single, shared encoder for an antimatter15 .splat record. +// +// There are two producers of .splat output: the single-run path (write_splat, +// straight from the engine's gaussians) and the accumulated-cloud path +// (write_cloud_splat, from pose::AccumPoint). They MUST encode a gaussian +// identically — same OpenCV->OpenGL convention, same quaternion remap, same byte +// packing, same opacity->alpha. They used to be two copies and drifted: the cloud +// path silently dropped first the rotation/scale, then the opacity (rendering a +// fully-opaque, swirling, blurry soup instead of a proper alpha-blended surface). +// +// This is the ONE definition both call, so they cannot diverge again. It is pinned +// byte-for-byte by tests/test_pose.cpp::test_splat_record. +#ifndef FREE_SPLATTER_SPLAT_H +#define FREE_SPLATTER_SPLAT_H + +#include +#include + +namespace free_splatter { + +// One 32-byte .splat record: pos[3]f32, scale[3]f32, rgba[4]u8, rot[4]u8 (w,x,y,z). +// Inputs are in the engine's OpenCV frame (y down, z forward); `rgb` and `opacity` +// in [0,1]; `quat_wxyz` as (w,x,y,z). Writes exactly 32 bytes to `out`. +inline void encode_splat_record(unsigned char out[32], const float pos[3], const float scale[3], + const float quat_wxyz[4], const float rgb[3], float opacity) { + auto u8 = [](float v) -> unsigned char { float t = v < 0 ? 0 : (v > 255 ? 255 : v); return (unsigned char) t; }; + auto c01 = [](float v) -> float { return v < 0.0f ? 0.0f : (v > 1.0f ? 1.0f : v); }; + // OpenCV (y down, z forward) -> viewer OpenGL (y up): 180deg about X = diag(1,-1,-1): + // position.yz negate, quaternion (w,x,y,z) -> (-x, w, -z, y). + const float p[3] = { pos[0], -pos[1], -pos[2] }; + float q[4] = { -quat_wxyz[1], quat_wxyz[0], -quat_wxyz[3], quat_wxyz[2] }; + const float nrm = std::sqrt(q[0]*q[0] + q[1]*q[1] + q[2]*q[2] + q[3]*q[3]) + 1e-12f; + std::memcpy(out + 0, p, 12); + std::memcpy(out + 12, scale, 12); + out[24] = u8(c01(rgb[0]) * 255.0f); + out[25] = u8(c01(rgb[1]) * 255.0f); + out[26] = u8(c01(rgb[2]) * 255.0f); + out[27] = u8(c01(opacity) * 255.0f); // opacity -> alpha (NOT forced opaque) + for (int c = 0; c < 4; c++) out[28 + c] = u8(q[c] / nrm * 128.0f + 128.0f); +} + +} // namespace free_splatter + +#endif // FREE_SPLATTER_SPLAT_H diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 7f9f3dc..49ddd7a 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -13,6 +13,7 @@ endfunction() free_splatter_add_test(test_loader) # GGUF KV/hparams round-trip free_splatter_add_test(test_graph_blocks) # golden-op pins (ggml ops we depend on) free_splatter_add_test(test_image) # untrusted image-input validation +free_splatter_add_test(test_pose) # pose math golden tests (focal/align/PnP) # Per-layer parity vs the float64 PyTorch reference. Labelled "model": runs only # under `ctest -L model` and self-skips (77) unless FREE_SPLATTER_GGUF and diff --git a/tests/test_pose.cpp b/tests/test_pose.cpp new file mode 100644 index 0000000..a4816f5 --- /dev/null +++ b/tests/test_pose.cpp @@ -0,0 +1,774 @@ +// Asset-free golden tests for the C++ pose component (src/pose.{h,cpp}) — the +// direct mirror of pose/test_pose.py. No model, no fixtures, no cv2: synthetic +// geometry with KNOWN ground truth, the way CLAUDE.md wants new ops verified +// ("pinned to hand-computed references... a wrong op should fail with zero +// fixtures"). Runs in the fast tier (ctest -LE model). +// +// The RNG differs from numpy's (the checks are tolerance-based on recovered +// quantities, not bitstream-matched), so this validates the C++ port to the same +// correctness bar the Python prototype meets, independently of numpy. +#include "pose.h" +#include "splat.h" + +#include +#include +#include +#include +#include +#include + +using namespace free_splatter::pose; +using fsla::Vec3; +using fsla::Mat3; +using fsla::Mat4; + +static int failures = 0; +static std::mt19937_64 RNG(12345); + +static void check(const char * name, bool cond, const char * detail = "") { + std::printf(" [%s] %s%s%s\n", cond ? "PASS" : "FAIL", name, + detail[0] ? " " : "", detail); + if (!cond) failures++; +} + +// --- helpers --------------------------------------------------------------- + +static Mat3 rand_rotation() { + std::normal_distribution g(0, 1); + Vec3 ax = { g(RNG), g(RNG), g(RNG) }; + double n = std::sqrt(ax[0]*ax[0] + ax[1]*ax[1] + ax[2]*ax[2]); + ax = { ax[0]/n, ax[1]/n, ax[2]/n }; + std::uniform_real_distribution a(0.2, M_PI - 0.2); + const double ang = a(RNG), c = std::cos(ang), s = std::sin(ang), v = 1 - c; + const double x = ax[0], y = ax[1], z = ax[2]; + return Mat3{ { c+x*x*v, x*y*v-z*s, x*z*v+y*s, + y*x*v+z*s, c+y*y*v, y*z*v-x*s, + z*x*v-y*s, z*y*v+x*s, c+z*z*v } }; +} + +static double rot_angle(const Mat3 & R) { + const double tr = R(0,0) + R(1,1) + R(2,2); + return std::acos(std::max(-1.0, std::min(1.0, (tr-1)/2))) * 180.0 / M_PI; +} + +static Mat3 mul(const Mat3 & A, const Mat3 & B) { return fsla::mat3_mul(A, B); } +static Mat3 T(const Mat3 & A) { return fsla::mat3_transpose(A); } + +// apply (s,R,t) to a row-major N*3 array -> out +static void apply_sim_arr(double s, const Mat3 & R, const Vec3 & t, + const double * X, int N, double * out) { + for (int i = 0; i < N; i++) { + Vec3 x = { X[3*i], X[3*i+1], X[3*i+2] }; + Vec3 Rx = fsla::mat3_apply(R, x); + out[3*i] = s*Rx[0] + t[0]; + out[3*i+1] = s*Rx[1] + t[1]; + out[3*i+2] = s*Rx[2] + t[2]; + } +} + +static double rms3(const double * a, const double * b, int N) { + double s = 0; + for (int i = 0; i < 3*N; i++) { const double d = a[i]-b[i]; s += d*d; } + return std::sqrt(s / N); +} + +static Vec3 apply_mat4(const Mat4 & M, const Vec3 & p) { + return { M(0,0)*p[0]+M(0,1)*p[1]+M(0,2)*p[2]+M(0,3), + M(1,0)*p[0]+M(1,1)*p[1]+M(1,2)*p[2]+M(1,3), + M(2,0)*p[0]+M(2,1)*p[1]+M(2,2)*p[2]+M(2,3) }; +} + +// =========================================================================== +// Coordinate-system normalization (the scale question) +// =========================================================================== + +static void test_similarity_roundtrip() { + std::printf("test_similarity_roundtrip\n"); + const int N = 200; + std::normal_distribution g(0, 1); + std::vector X(3*N), Y(3*N); + for (int i = 0; i < N; i++) { X[3*i]=g(RNG)*3; X[3*i+1]=g(RNG)*2; X[3*i+2]=g(RNG)*5; } + const double s_t = 1.7; Mat3 R_t = rand_rotation(); Vec3 t_t = { 4.0, -1.0, 2.0 }; + apply_sim_arr(s_t, R_t, t_t, X.data(), N, Y.data()); + + Sim3 T_ = fit_similarity(X.data(), Y.data(), N); + char buf[64]; + std::snprintf(buf, sizeof buf, "%.12f vs %g", T_.s, s_t); + check("scale", std::fabs(T_.s - s_t) < 1e-9, buf); + check("rotation", rot_angle(mul(T_.R, T(R_t))) < 1e-7); + check("translation", std::sqrt((T_.t[0]-t_t[0])*(T_.t[0]-t_t[0]) + + (T_.t[1]-t_t[1])*(T_.t[1]-t_t[1]) + (T_.t[2]-t_t[2])*(T_.t[2]-t_t[2])) < 1e-7); + std::vector Yp(3*N); + apply_sim_arr(T_.s, T_.R, T_.t, X.data(), N, Yp.data()); + check("residual~0", rms3(Yp.data(), Y.data(), N) < 1e-9); +} + +static void test_scale_detection() { + std::printf("test_scale_detection\n"); + const int N = 300; + std::normal_distribution g(0, 1); + std::vector X(3*N), Y(3*N); + for (int i = 0; i < 3*N; i++) X[i] = g(RNG)*2; + const double s_t = 2.5; + apply_sim_arr(s_t, rand_rotation(), {1.0, 2.0, -3.0}, X.data(), N, Y.data()); + Ladder L = diagnose(X.data(), Y.data(), N); + check("recovered scale", std::fabs(L.scale - s_t) < 1e-8); + check("rigid residual large", L.rigid / L.scene > 0.1); + check("similarity residual ~0", L.similarity / L.scene < 1e-9); + check("verdict needs_scale", L.verdict == "needs_scale", L.verdict.c_str()); +} + +static void test_nonlinear_detection() { + std::printf("test_nonlinear_detection\n"); + const int N = 400; + std::normal_distribution g(0, 1); + std::vector X(3*N); + for (int i = 0; i < N; i++) { X[3*i]=g(RNG)*2; X[3*i+1]=g(RNG)*2; X[3*i+2]=g(RNG)*2 + 6.0; } + + // (a) anisotropic (affine) warp + Mat3 A = mul(Mat3{ {1.0,0,0, 0,1.6,0, 0,0,0.7} }, rand_rotation()); + std::vector Yaff(3*N); + for (int i = 0; i < N; i++) { + Vec3 x = { X[3*i], X[3*i+1], X[3*i+2] }; + Vec3 Ax = fsla::mat3_apply(A, x); + Yaff[3*i]=Ax[0]+1.0; Yaff[3*i+1]=Ax[1]; Yaff[3*i+2]=Ax[2]+2.0; + } + Ladder La = diagnose(X.data(), Yaff.data(), N); + check("affine: similarity fails", La.similarity / La.scene > 1e-3); + check("affine: affine fits", La.affine / La.scene < 1e-9); + check("affine: verdict needs_affine", La.verdict == "needs_affine", La.verdict.c_str()); + + // (b) depth-quadratic (focal-error fingerprint) + double zmean = 0; for (int i = 0; i < N; i++) zmean += X[3*i+2]; zmean /= N; + double zvar = 0; for (int i = 0; i < N; i++) zvar += (X[3*i+2]-zmean)*(X[3*i+2]-zmean); zvar /= N; + std::vector Ynl(X); + for (int i = 0; i < N; i++) { + const double Z = X[3*i+2]; + Ynl[3*i] *= (1.0 + 0.15 * (Z-zmean)*(Z-zmean) / zvar); + } + Ladder Ln = diagnose(X.data(), Ynl.data(), N); + check("nonlinear: affine fails", Ln.affine / Ln.scene > 1e-3); + check("nonlinear: verdict nonlinear", Ln.verdict == "nonlinear", Ln.verdict.c_str()); + check("nonlinear: depth-correlated residual", Ln.structured); +} + +static void test_outlier_robustness() { + std::printf("test_outlier_robustness\n"); + const int N = 400; + std::normal_distribution g(0, 1); + std::vector X(3*N), Y(3*N); + for (int i = 0; i < 3*N; i++) X[i] = g(RNG)*2; + const double s_t = 1.3; Mat3 R_t = rand_rotation(); Vec3 t_t = { 0.5, -2.0, 1.0 }; + apply_sim_arr(s_t, R_t, t_t, X.data(), N, Y.data()); + const int n_out = (int)(0.3 * N); + std::vector oidx(N); for (int i = 0; i < N; i++) oidx[i] = i; + std::shuffle(oidx.begin(), oidx.end(), RNG); + for (int k = 0; k < n_out; k++) { int i = oidx[k]; Y[3*i]+=g(RNG)*10; Y[3*i+1]+=g(RNG)*10; Y[3*i+2]+=g(RNG)*10; } + std::vector inl; + Sim3 T_ = fit_similarity_ransac(X.data(), Y.data(), N, 0.05, 300, inl); + check("scale recovered", std::fabs(T_.s - s_t) < 1e-3); + check("rotation recovered", rot_angle(mul(T_.R, T(R_t))) < 0.1); + check("translation recovered", std::sqrt((T_.t[0]-t_t[0])*(T_.t[0]-t_t[0]) + + (T_.t[1]-t_t[1])*(T_.t[1]-t_t[1]) + (T_.t[2]-t_t[2])*(T_.t[2]-t_t[2])) < 1e-2); + int rejected = 0; for (int k = 0; k < n_out; k++) if (!inl[oidx[k]]) rejected++; + check("outliers excluded", rejected > 0.9 * n_out); +} + +static void test_loop_correction() { + std::printf("test_loop_correction\n"); + Mat3 R = rand_rotation(); + Mat4 M = sim_matrix({ 1.07, R, {0.3, -0.2, 0.1} }); + Mat4 m0 = sim_frac_power(M, 0.0), m1 = sim_frac_power(M, 1.0), mh = sim_frac_power(M, 0.5); + bool id_ok = true, m1_ok = true; + for (int i = 0; i < 16; i++) { id_ok &= std::fabs(m0.a[i] - fsla::mat4_identity().a[i]) < 1e-9; + m1_ok &= std::fabs(m1.a[i] - M.a[i]) < 1e-9; } + Mat4 mh2 = fsla::mat4_mul(mh, mh); + bool half_ok = true; for (int i = 0; i < 16; i++) half_ok &= std::fabs(mh2.a[i] - M.a[i]) < 1e-9; + check("M^0 == I", id_ok); + check("M^1 == M", m1_ok); + check("(M^.5)^2 == M", half_ok); + + const int n = 12; + Mat4 D = sim_matrix({ 1.15, R, {0.4, -0.2, 0.1} }); + std::vector clean(n+1), drift(n+1), corr(n+1); + for (int k = 0; k <= n; k++) { + const double tt = 2*M_PI * k / n; + clean[k] = { std::cos(tt), std::sin(tt), 0.1*tt }; + drift[k] = apply_mat4(sim_frac_power(D, -(double)k/n), clean[k]); // D^{-k/n} + corr[k] = apply_mat4(sim_frac_power(D, (double)k/n), drift[k]); // D^{ k/n} + } + double pre = 0, post = 0; + for (int k = 0; k <= n; k++) { + for (int c = 0; c < 3; c++) { pre += (drift[k][c]-clean[k][c])*(drift[k][c]-clean[k][c]); + post += (corr[k][c]-clean[k][c])*(corr[k][c]-clean[k][c]); } + } + check("drift present before", std::sqrt(pre/(n+1)) > 0.1); + char buf[64]; std::snprintf(buf, sizeof buf, "ATE %.1e", std::sqrt(post/(n+1))); + check("recovers clean loop", std::sqrt(post/(n+1)) < 1e-9, buf); +} + +static void test_loop_closure() { + std::printf("test_loop_closure\n"); + std::uniform_real_distribution us(0.8, 1.2); + std::normal_distribution g(0, 1); + std::vector links; + for (int i = 0; i < 4; i++) links.push_back({ us(RNG), rand_rotation(), {g(RNG), g(RNG), g(RNG)} }); + Sim3 acc = sim_identity(); + for (int i = 0; i < 3; i++) acc = sim_compose(links[i], acc); + std::vector closed = { links[0], links[1], links[2], sim_invert(acc) }; + LoopError e = loop_closure_error(closed); + // rot_deg has a ~sqrt(eps) (~1e-6 deg) floor: it is acos(~1) of RᵀR, whose + // deviation from I is at the fp roundoff level — scale_err and trans are + // exactly 0 here, so 1e-5 deg cleanly separates "closed" from any real drift. + check("closed loop ~ identity", e.scale_err < 1e-9 && e.rot_deg < 1e-5 && e.trans < 1e-9); + std::vector bad = closed; + bad[0].s *= 1.12; + LoopError e2 = loop_closure_error(bad); + check("scale drift detected", std::fabs(e2.scale_err - std::log(1.12)) < 1e-9); +} + +// =========================================================================== +// PnP (verified against analytic ground truth) +// =========================================================================== + +static Mat3 make_K(double f, double W, double H) { + Mat3 K = fsla::mat3_identity(); + K(0,0) = f; K(1,1) = f; K(0,2) = W/2; K(1,2) = H/2; + return K; +} + +// project world points by (R,t) world2cam through K; fill px (N*2) and z (N) +static void project(const double * Pw, int N, const Mat3 & R, const Vec3 & t, + const Mat3 & K, double * px, double * z) { + for (int i = 0; i < N; i++) { + Vec3 p = { Pw[3*i], Pw[3*i+1], Pw[3*i+2] }; + Vec3 Xc = fsla::mat3_apply(R, p); + Xc[0]+=t[0]; Xc[1]+=t[1]; Xc[2]+=t[2]; + px[2*i] = K(0,0)*Xc[0]/Xc[2] + K(0,2); + px[2*i+1] = K(1,1)*Xc[1]/Xc[2] + K(1,2); + z[i] = Xc[2]; + } +} + +static void test_focal_recovery() { + std::printf("test_focal_recovery\n"); + const int N = 500; const double f = 437.0, W = 512, H = 512; + std::uniform_real_distribution ux(-2,2), uy(-2,2), uz(4,9); + std::vector Xc(3*N), px(2*N); + for (int i = 0; i < N; i++) { + Xc[3*i]=ux(RNG); Xc[3*i+1]=uy(RNG); Xc[3*i+2]=uz(RNG); + px[2*i] = f*Xc[3*i] /Xc[3*i+2] + W/2; + px[2*i+1] = f*Xc[3*i+1]/Xc[3*i+2] + H/2; + } + double fe = estimate_focal(Xc.data(), px.data(), N, W/2, H/2); + char buf[64]; std::snprintf(buf, sizeof buf, "%.9f vs %g", fe, f); + check("exact focal", std::fabs(fe - f) < 1e-6, buf); + std::normal_distribution g(0, 0.5); + std::vector pn(px); + for (int i = 0; i < 2*N; i++) pn[i] += g(RNG); + double fn = estimate_focal(Xc.data(), pn.data(), N, W/2, H/2); + check("focal under noise", std::fabs(fn - f)/f < 0.02); +} + +static void test_pnp_recovery() { + std::printf("test_pnp_recovery\n"); + const int N = 500; Mat3 K = make_K(400, 512, 512); + std::uniform_real_distribution ux(-2,2), uy(-2,2), uz(4,8); + std::vector Pw(3*N); + for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=uz(RNG); } + std::normal_distribution g(0, 1); + std::uniform_real_distribution ut(-0.6, 0.6), ua(3, 20); + double worst_R = 0, worst_t = 0; + for (int trial = 0; trial < 8; trial++) { + Vec3 ax = { g(RNG), g(RNG), g(RNG) }; + double n = std::sqrt(ax[0]*ax[0]+ax[1]*ax[1]+ax[2]*ax[2]); + ax = { ax[0]/n, ax[1]/n, ax[2]/n }; + const double ang = ua(RNG) * M_PI/180.0, c = std::cos(ang), s = std::sin(ang), v = 1-c; + Mat3 R{ { c+ax[0]*ax[0]*v, ax[0]*ax[1]*v-ax[2]*s, ax[0]*ax[2]*v+ax[1]*s, + ax[1]*ax[0]*v+ax[2]*s, c+ax[1]*ax[1]*v, ax[1]*ax[2]*v-ax[0]*s, + ax[2]*ax[0]*v-ax[1]*s, ax[2]*ax[1]*v+ax[0]*s, c+ax[2]*ax[2]*v } }; + Vec3 t = { ut(RNG), ut(RNG), ut(RNG) }; + std::vector px(2*N), z(N); + project(Pw.data(), N, R, t, K, px.data(), z.data()); + bool behind = false; for (int i = 0; i < N; i++) if (z[i] <= 0) behind = true; + if (behind) continue; + std::vector inl; + Mat4 c2w = solve_pnp_numpy(Pw.data(), px.data(), N, K, inl); + Mat4 w2c = fsla::inv_rigid4(c2w); + Mat3 Rr{}; Vec3 tr{}; + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); } + worst_R = std::max(worst_R, (double) rot_angle(mul(Rr, T(R)))); + worst_t = std::max(worst_t, std::sqrt((tr[0]-t[0])*(tr[0]-t[0])+(tr[1]-t[1])*(tr[1]-t[1])+(tr[2]-t[2])*(tr[2]-t[2]))); + } + char buf[64]; std::snprintf(buf, sizeof buf, "worst %.2e deg", worst_R); + check("rotation recovered", worst_R < 1e-3, buf); + check("translation recovered", worst_t < 1e-4); +} + +static void test_pnp_outliers() { + std::printf("test_pnp_outliers\n"); + const int N = 600; Mat3 K = make_K(400, 512, 512); + std::uniform_real_distribution ux(-2,2), uy(-2,2), uz(4,8); + std::vector Pw(3*N); + for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=uz(RNG); } + Mat3 R = rand_rotation(); + // shrink towards identity so all points stay in front + for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i,j) = (i==j?1.0:0.0)*0.85 + R(i,j)*0.15; + { fsla::Mat3 U,V; Vec3 s; fsla::svd3(R, U, s, V); R = mul(U, T(V)); if (fsla::det3(R)<0) for(int i=0;i<3;i++) R(i,2)=-R(i,2); } + Vec3 t = { 0.3, -0.2, 0.4 }; + std::vector px(2*N), z(N); + project(Pw.data(), N, R, t, K, px.data(), z.data()); + // keep only points in front + std::vector Pk, pk; + for (int i = 0; i < N; i++) if (z[i] > 0) { for(int c=0;c<3;c++) Pk.push_back(Pw[3*i+c]); pk.push_back(px[2*i]); pk.push_back(px[2*i+1]); } + const int M = (int) pk.size()/2; + const int n_out = (int)(0.25 * M); + std::vector oidx(M); for (int i = 0; i < M; i++) oidx[i]=i; + std::shuffle(oidx.begin(), oidx.end(), RNG); + std::uniform_real_distribution corrupt(-150, 150); + for (int k = 0; k < n_out; k++) { int i = oidx[k]; pk[2*i]+=corrupt(RNG); pk[2*i+1]+=corrupt(RNG); } + std::vector inl; + Mat4 c2w = solve_pnp_numpy(Pk.data(), pk.data(), M, K, inl, 2.0, 300); + Mat4 w2c = fsla::inv_rigid4(c2w); + Mat3 Rr{}; Vec3 tr{}; + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); } + char buf[64]; std::snprintf(buf, sizeof buf, "%.3f deg", (double) rot_angle(mul(Rr, T(R)))); + check("rotation under outliers", rot_angle(mul(Rr, T(R))) < 0.2, buf); + check("translation under outliers", std::sqrt((tr[0]-t[0])*(tr[0]-t[0])+(tr[1]-t[1])*(tr[1]-t[1])+(tr[2]-t[2])*(tr[2]-t[2])) < 0.02); + int rejected = 0; for (int k = 0; k < n_out; k++) if (!inl[oidx[k]]) rejected++; + check("outliers rejected", rejected > 0.85 * n_out); +} + +// =========================================================================== +// Robust PnP: EPnP + Gauss-Newton (the shipped real-data solver) +// =========================================================================== + +static void test_pnp_robust_recovery() { + std::printf("test_pnp_robust_recovery\n"); + const int N = 500; Mat3 K = make_K(400, 512, 512); + std::uniform_real_distribution ux(-2,2), uy(-2,2), uz(4,8); + std::vector Pw(3*N); + for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=uz(RNG); } + std::normal_distribution g(0, 1); + std::uniform_real_distribution ut(-0.6, 0.6); + double worst_R = 0, worst_t = 0; + for (int trial = 0; trial < 8; trial++) { + Mat3 R = rand_rotation(); + for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i,j) = (i==j?1.0:0.0)*0.8 + R(i,j)*0.2; + { fsla::Mat3 U,V; Vec3 s; fsla::svd3(R,U,s,V); R = mul(U,T(V)); if (fsla::det3(R)<0) for(int i=0;i<3;i++) R(i,2)=-R(i,2); } + Vec3 t = { ut(RNG), ut(RNG), ut(RNG) }; + std::vector px(2*N), z(N); + project(Pw.data(), N, R, t, K, px.data(), z.data()); + bool behind = false; for (int i = 0; i < N; i++) if (z[i] <= 0) behind = true; + if (behind) continue; + std::vector inl; + Mat4 c2w = solve_pnp(Pw.data(), px.data(), N, K, inl); + Mat4 w2c = fsla::inv_rigid4(c2w); + Mat3 Rr{}; Vec3 tr{}; + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); } + worst_R = std::max(worst_R, (double) rot_angle(mul(Rr, T(R)))); + worst_t = std::max(worst_t, std::sqrt((tr[0]-t[0])*(tr[0]-t[0])+(tr[1]-t[1])*(tr[1]-t[1])+(tr[2]-t[2])*(tr[2]-t[2]))); + } + char buf[64]; std::snprintf(buf, sizeof buf, "worst %.2e deg", worst_R); + check("rotation recovered", worst_R < 1e-3, buf); + check("translation recovered", worst_t < 1e-5); +} + +static void test_pnp_robust_planar() { + // Near-planar scene (a thin slab) — the config where the DLT's coplanar + // minimal samples flip. EPnP's all-point control basis stays well-posed. + std::printf("test_pnp_robust_planar\n"); + const int N = 500; Mat3 K = make_K(400, 512, 512); + std::uniform_real_distribution ux(-2,2), uy(-2,2); + std::normal_distribution thin(0, 0.02); // ~planar, slight thickness + std::vector Pw(3*N); + for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=6.0 + thin(RNG); } + Mat3 R = rand_rotation(); + for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i,j) = (i==j?1.0:0.0)*0.85 + R(i,j)*0.15; + { fsla::Mat3 U,V; Vec3 s; fsla::svd3(R,U,s,V); R = mul(U,T(V)); if (fsla::det3(R)<0) for(int i=0;i<3;i++) R(i,2)=-R(i,2); } + Vec3 t = { 0.2, -0.1, 0.3 }; + std::vector px(2*N), z(N); + project(Pw.data(), N, R, t, K, px.data(), z.data()); + std::vector inl; + Mat4 w2c = fsla::inv_rigid4(solve_pnp(Pw.data(), px.data(), N, K, inl)); + Mat3 Rr{}; Vec3 tr{}; + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); } + char buf[64]; std::snprintf(buf, sizeof buf, "%.4f deg", (double) rot_angle(mul(Rr, T(R)))); + check("planar rotation recovered", rot_angle(mul(Rr, T(R))) < 0.05, buf); + check("planar translation recovered", std::sqrt((tr[0]-t[0])*(tr[0]-t[0])+(tr[1]-t[1])*(tr[1]-t[1])+(tr[2]-t[2])*(tr[2]-t[2])) < 0.01); +} + +static void test_pnp_robust_outliers() { + std::printf("test_pnp_robust_outliers\n"); + const int N = 600; Mat3 K = make_K(400, 512, 512); + std::uniform_real_distribution ux(-2,2), uy(-2,2), uz(4,8); + std::vector Pw(3*N); + for (int i = 0; i < N; i++) { Pw[3*i]=ux(RNG); Pw[3*i+1]=uy(RNG); Pw[3*i+2]=uz(RNG); } + Mat3 R = rand_rotation(); + for (int i = 0; i < 3; i++) for (int j = 0; j < 3; j++) R(i,j) = (i==j?1.0:0.0)*0.85 + R(i,j)*0.15; + { fsla::Mat3 U,V; Vec3 s; fsla::svd3(R,U,s,V); R = mul(U,T(V)); if (fsla::det3(R)<0) for(int i=0;i<3;i++) R(i,2)=-R(i,2); } + Vec3 t = { 0.3, -0.2, 0.4 }; + std::vector px(2*N), z(N); + project(Pw.data(), N, R, t, K, px.data(), z.data()); + std::vector Pk, pk; + for (int i = 0; i < N; i++) if (z[i] > 0) { for(int c=0;c<3;c++) Pk.push_back(Pw[3*i+c]); pk.push_back(px[2*i]); pk.push_back(px[2*i+1]); } + const int M = (int) pk.size()/2; + const int n_out = (int)(0.15 * M); // 15% gross pixel corruption + std::vector oidx(M); for (int i = 0; i < M; i++) oidx[i]=i; + std::shuffle(oidx.begin(), oidx.end(), RNG); + std::uniform_real_distribution corrupt(-120, 120); + for (int k = 0; k < n_out; k++) { int i = oidx[k]; pk[2*i]+=corrupt(RNG); pk[2*i+1]+=corrupt(RNG); } + std::vector inl; + Mat4 w2c = fsla::inv_rigid4(solve_pnp(Pk.data(), pk.data(), M, K, inl, 3.0)); + Mat3 Rr{}; Vec3 tr{}; + for (int i = 0; i < 3; i++) { for (int j = 0; j < 3; j++) Rr(i,j)=w2c(i,j); tr[i]=w2c(i,3); } + char buf[64]; std::snprintf(buf, sizeof buf, "%.3f deg", (double) rot_angle(mul(Rr, T(R)))); + check("rotation under outliers", rot_angle(mul(Rr, T(R))) < 0.5, buf); + int rejected = 0; for (int k = 0; k < n_out; k++) if (!inl[oidx[k]]) rejected++; + check("outliers rejected", rejected > 0.85 * n_out); +} + +// =========================================================================== +// Accumulation / fusion / loop closure (the orchestration on top of the +// primitives, mirrors accumulate.py / fuse.py / loop_closure.py) +// =========================================================================== + +// Synthetic multi-frame clip with KNOWN geometry: a pinhole camera on a smooth +// trajectory; each pair's gaussians are exact back-projections (engine channel +// layout), and consecutive runs are rescaled by a distinct per-run factor (the +// 1/baseline normalization that makes cross-run alignment a *similarity*). The +// accumulator must recover the camera trajectory (up to a global Sim(3)) and the +// per-link scale ratios. +static Mat4 cam2world_frame(int i) { + const double ang = i * 4.0 * M_PI / 180.0; // small yaw per frame + const double c = std::cos(ang), s = std::sin(ang); + Mat4 M = fsla::mat4_identity(); + M(0,0)=c; M(0,2)=s; M(2,0)=-s; M(2,2)=c; + M(0,3) = i * 0.4; // translate along x + return M; +} +static double depth_at(int i, int r, int c) { + return 5.0 + 0.3*std::sin(0.7*i) + 0.05*r - 0.03*c + + 0.4 * (((r*7 + c*13 + i*5) % 11) / 11.0); // non-coplanar spread +} + +static void test_accumulate_chain() { + std::printf("test_accumulate_chain\n"); + const int H = 16, W = 16, gc = 23, P = H*W, F = 5; // 5 frames -> 4 pairs + const double f = 18.0; + Mat3 K = make_K(f, W, H), Kinv = fsla::inv3(K); + auto backproj = [&](int i, int r, int c) -> Vec3 { + const double d = depth_at(i, r, c); + Vec3 uv = { (double) c, (double) r, 1.0 }; + Vec3 n = fsla::mat3_apply(Kinv, uv); + return { n[0]*d, n[1]*d, n[2]*d }; + }; + + Accumulator acc(H, W, /*opacity_threshold=*/0.05); + std::vector sigma(F); + for (int k = 0; k < F-1; k++) { // pair (k, k+1) -> run k + const double sk = 1.0 + 0.1*k; // distinct per-run scale + sigma[k] = sk; + Mat4 w2c_k = fsla::inv_rigid4(cam2world_frame(k)); + Mat4 rel = fsla::mat4_mul(w2c_k, cam2world_frame(k+1)); // cam_k <- cam_{k+1} + std::vector buf((size_t) 2 * P * gc, 0.0f); + for (int r = 0; r < H; r++) for (int c = 0; c < W; c++) { + const int i = r*W + c; + Vec3 X0 = backproj(k, r, c); // camera-k frame + Vec3 Xn = backproj(k+1, r, c); // camera-(k+1) frame + Vec3 X1 = apply_mat4(rel, Xn); // -> camera-k frame + float * v0 = &buf[(size_t) i * gc]; + float * v1 = &buf[(size_t) (P + i) * gc]; + for (int t = 0; t < 3; t++) { v0[t] = (float)(sk*X0[t]); v1[t] = (float)(sk*X1[t]); } + v0[15] = v1[15] = 0.9f; // opacity (activated) + for (int t = 0; t < 3; t++) { v0[3+t] = 0.1f*t; v1[3+t] = 0.1f*t; } // color + } + acc.add_pair(buf.data(), gc); + } + + check("pair count", acc.pair_count() == F-1); + check("frame count", acc.frame_count() == F); + check("cloud non-empty", acc.cloud().size() == (size_t) F * P); // all opacities valid + + // per-link scale recovers sigma_{k-1}/sigma_k + bool scales_ok = true; double worst_scale = 0; + for (int k = 1; k < F-1; k++) { + const double expect = sigma[k-1]/sigma[k]; + worst_scale = std::max(worst_scale, std::fabs(acc.links()[k].scale - expect)); + scales_ok &= std::fabs(acc.links()[k].scale - expect) < 1e-6; + } + { char buf[64]; std::snprintf(buf, sizeof buf, "worst |Δs| %.2e", worst_scale); + check("per-link scale recovered", scales_ok, buf); } + double worst_resid = 0; + for (int k = 1; k < F-1; k++) worst_resid = std::max(worst_resid, acc.links()[k].resid_frac); + { char buf[64]; std::snprintf(buf, sizeof buf, "%.2e", worst_resid); + check("per-link residual ~0", worst_resid < 1e-6, buf); } + + // recovered trajectory vs GT, after a global Sim(3) alignment (ATE) + std::vector path = acc.camera_path(); + std::vector rec(3*F), gt(3*F); + for (int i = 0; i < F; i++) { + rec[3*i]=path[i](0,3); rec[3*i+1]=path[i](1,3); rec[3*i+2]=path[i](2,3); + Mat4 g = cam2world_frame(i); + gt[3*i]=g(0,3); gt[3*i+1]=g(1,3); gt[3*i+2]=g(2,3); + } + Sim3 A = fit_similarity(rec.data(), gt.data(), F); + std::vector rec_al(3*F); + apply_sim_arr(A.s, A.R, A.t, rec.data(), F, rec_al.data()); + double mg[3]={0,0,0}; for (int i=0;i cloud; + // 5 surface points, each corroborated by frames {0,1,2} (tight cluster -> one + // voxel, support 3); 20 isolated single-frame floaters far apart (support 1). + // AccumPoint: {xyz, rgb, opacity, scale[3], quat(w,x,y,z), frame}; opacity/ + // scale/quat are unused by fusion's counts here, set to a unit opaque gaussian. + for (int loc = 0; loc < 5; loc++) + for (int fr = 0; fr < 3; fr++) + cloud.push_back({ (float)(2.0*loc) + 0.001f*fr, 0.0f, 0.0f, 0.5f, 0.5f, 0.5f, + 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, fr }); + for (int j = 0; j < 20; j++) + cloud.push_back({ 40.0f + 2.0f*j, 0.0f, 0.0f, 0.2f, 0.2f, 0.2f, + 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, j % 4 }); + + std::vector fused; + FuseStats st = consensus_fuse(cloud, /*voxel_frac=*/0.02, /*k=*/2, fused, FUSE_AVERAGED); + check("raw points", st.raw_points == 35); + check("kept voxels (the surface)", st.kept_voxels == 5); + check("fused size (averaged: one per voxel)", fused.size() == 5); + check("points kept", st.points_kept == 15); + check("floaters dropped", st.points_dropped == 20); + // FUSE_KEPT: dense — every raw point in a consensus voxel, floaters dropped + std::vector kept; + consensus_fuse(cloud, /*voxel_frac=*/0.02, /*k=*/2, kept, FUSE_KEPT); + check("kept mode: dense (all 15 consensus raw points)", kept.size() == 15); + // FUSE_BEST: per voxel only the most-confident frame. Each surface voxel has 3 + // frames with equal opacity (0.5) so frame 0 (first seen) wins the tie -> 5 points. + std::vector best; + consensus_fuse(cloud, /*voxel_frac=*/0.02, /*k=*/2, best, FUSE_BEST); + check("best mode: one frame per voxel (5 points)", best.size() == 5); + // a fused point sits at its cluster centroid (~2*loc), color preserved + bool centroid_ok = true; + for (const auto & p : fused) { + const double loc = std::round(p.x / 2.0); + centroid_ok &= std::fabs(p.x - 2.0*loc) < 0.01 && std::fabs(p.r - 0.5) < 1e-6; + } + check("fused centroid + color", centroid_ok); +} + +// Gaussian-level de-ghost: a base scene seen by 3 frames through small per-object +// misalignments ghosts; consensus_refine (non-rigid, per-point) must pull the +// overlapping copies together. +static void test_consensus_refine() { + std::printf("test_consensus_refine\n"); + // well-separated base points (a grid, spacing 1) so each object's 3 copies share + // a voxel but distinct objects don't — the real (2D surface) regime. + const int side = 4, M = side*side*side, Fr = 3; + std::normal_distribution g(0, 1); + std::vector base(3*M); + { int idx=0; for (int a=0;a cloud(Fr*M); + for (int f = 0; f < Fr; f++) for (int i = 0; i < M; i++) { + Vec3 b = { base[3*i], base[3*i+1], base[3*i+2] }; + Vec3 w = sim_apply(P[f], b); + AccumPoint p{}; p.x=(float)w[0]; p.y=(float)w[1]; p.z=(float)w[2]; + p.opacity=1.0f; p.sx=p.sy=p.sz=0.01f; p.qw=1.0f; p.frame=f; + cloud[(size_t)f*M + i] = p; + } + double mean[3]={0,0,0}; for (auto&p:cloud){ mean[0]+=p.x;mean[1]+=p.y;mean[2]+=p.z; } + for (int c=0;c<3;c++) mean[c]/=cloud.size(); + double ss=0; for (auto&p:cloud){ double dx=p.x-mean[0],dy=p.y-mean[1],dz=p.z-mean[2]; ss+=dx*dx+dy*dy+dz*dz; } + const double ext = std::sqrt(ss/cloud.size()); + auto spread = [&](const std::vector& c) { + double s=0; + for (int i=0;i %.3f%% of extent", 100*pre, 100*post); + check("ghosting present before", pre > 0.01); + check("consensus_refine de-ghosts", post < 0.4 * pre, buf); +} + +static void test_loop_distribute() { + std::printf("test_loop_distribute\n"); + Mat3 R = rand_rotation(); + Mat4 D = sim_matrix({ 1.15, R, {0.4, -0.2, 0.1} }); + // sim4_invert is a true inverse of a similarity 4x4 + Mat4 S = sim_matrix({ 1.3, rand_rotation(), {1.0, -2.0, 0.5} }); + Mat4 SI = fsla::mat4_mul(sim4_invert(S), S); + bool inv_ok = true; for (int i = 0; i < 16; i++) inv_ok &= std::fabs(SI.a[i] - fsla::mat4_identity().a[i]) < 1e-9; + check("sim4_invert is an inverse", inv_ok); + + // a clean loop, drifted by D^{-k/n}; distribute_drift(D) must cancel it. + const int n = 8; + std::vector poses(n+1); + std::vector clean(n+1); + for (int k = 0; k <= n; k++) { + const double tt = 2*M_PI * k / n; + clean[k] = { std::cos(tt), std::sin(tt), 0.1*tt }; + Vec3 drifted = apply_mat4(sim_frac_power(D, -(double)k/n), clean[k]); + Mat4 Pk = fsla::mat4_identity(); + Pk(0,3)=drifted[0]; Pk(1,3)=drifted[1]; Pk(2,3)=drifted[2]; + poses[k] = Pk; + } + std::vector corr = distribute_drift(D, poses); + double post = 0; + for (int k = 0; k <= n; k++) for (int c = 0; c < 3; c++) post += (corr[k][c]-clean[k][c])*(corr[k][c]-clean[k][c]); + char buf[64]; std::snprintf(buf, sizeof buf, "ATE %.1e", std::sqrt(post/(n+1))); + check("distribute_drift recovers clean loop", std::sqrt(post/(n+1)) < 1e-9, buf); +} + +// =========================================================================== +// Regression guards for the gaussian -> .splat seam (the class of bug that +// dropped rotation, then opacity, from the accumulated cloud writer) +// =========================================================================== + +// #1 guard: pin the shared encoder's byte output. Both writers (single-run and +// accumulated cloud) call this; if it ever drops a channel or changes the +// convention the bytes change here. The opacity->alpha byte is the exact field +// the regression got wrong (it had been forced to 255). +static void test_splat_record() { + std::printf("test_splat_record\n"); + const float pos[3] = { 1.0f, 2.0f, 3.0f }; + const float scale[3] = { 0.01f, 0.02f, 0.03f }; + const float quat[4] = { 0.70710678f, 0.70710678f, 0.0f, 0.0f }; // (w,x,y,z), unit + const float rgb[3] = { 0.2f, 0.4f, 0.6f }; + unsigned char rec[32]; + free_splatter::encode_splat_record(rec, pos, scale, quat, rgb, 0.5f); + + float p[3], sc[3]; std::memcpy(p, rec, 12); std::memcpy(sc, rec+12, 12); + check("pos: OpenCV->GL flips y,z", p[0]==1.0f && p[1]==-2.0f && p[2]==-3.0f); + check("scale: passthrough", sc[0]==0.01f && sc[1]==0.02f && sc[2]==0.03f); + check("rgb bytes", rec[24]==(unsigned char)(0.2f*255.0f) && + rec[25]==(unsigned char)(0.4f*255.0f) && + rec[26]==(unsigned char)(0.6f*255.0f)); + // THE regression field: opacity must become the alpha, not a forced 255. + char buf[48]; std::snprintf(buf, sizeof buf, "alpha=%d (must be 127, not 255)", rec[27]); + check("opacity -> alpha (not forced opaque)", rec[27] == 127, buf); + // rotation must be encoded with the (w,x,y,z)->(-x,w,-z,y) remap, not dropped. + // q=(0.7071,0.7071,0,0) -> (-0.7071,0.7071,0,0) -> bytes (37,218,128,128). + check("rotation: remap encoded (not identity)", + rec[28]==37 && rec[29]==218 && rec[30]==128 && rec[31]==128); + // boundary opacities + unsigned char r1[32], r0[32]; + free_splatter::encode_splat_record(r1, pos, scale, quat, rgb, 1.0f); + free_splatter::encode_splat_record(r0, pos, scale, quat, rgb, 0.0f); + check("alpha boundaries (1->255, 0->0)", r1[27]==255 && r0[27]==0); +} + +// #2 guard: run-0 accumulation must preserve EVERY gaussian channel. With one pair +// the global transform is identity, so each AccumPoint should equal the engine's +// gaussian for that pixel (xyz, SH->rgb, opacity, scale, rotation, frame). A +// dropped channel — the exact bug — fails here, asset-free. +static void test_accumulate_channels() { + std::printf("test_accumulate_channels\n"); + const int H = 4, W = 4, gc = 23, P = H*W; + std::vector g((size_t) 2 * P * gc, 0.0f); + for (int v = 0; v < 2; v++) for (int i = 0; i < P; i++) { + float * a = &g[(size_t)(v*P + i) * gc]; + const float b = (float)(v*100 + i); + a[0]=b+0.11f; a[1]=b+0.22f; a[2]=b+0.33f; // xyz (distinct) + a[3]=0.5f; a[4]=-0.5f; a[5]=1.0f; // SH-DC -> in-range rgb + a[15]=0.2f + 0.5f*((i%3)/2.0f); // opacity in {0.2,0.45,0.7} > thr + a[16]=0.01f+0.001f*i; a[17]=0.02f; a[18]=0.03f; // scale (distinct) + a[19]=0.8f; a[20]=0.6f; a[21]=0.0f; a[22]=0.0f; // unit quat (w,x,y,z) + } + Accumulator acc(H, W, /*opacity_threshold=*/0.05); + acc.add_pair(g.data(), gc); + const auto & cloud = acc.cloud(); + check("run-0 cloud = all pixels (both views kept)", cloud.size() == (size_t) 2 * P); + + const double C0 = 0.28209479177387814; + bool ok = cloud.size() == (size_t) 2 * P; + auto cl = [](float x, float y) { return std::fabs(x - y) < 1e-5f; }; + for (int v = 0; v < 2 && ok; v++) for (int i = 0; i < P && ok; i++) { + const float * a = &g[(size_t)(v*P + i) * gc]; + const AccumPoint & p = cloud[(size_t) v*P + i]; + ok &= cl(p.x,a[0]) && cl(p.y,a[1]) && cl(p.z,a[2]); // xyz (identity sim) + float er=(float)(a[3]*C0+0.5), eg=(float)(a[4]*C0+0.5), eb=(float)(a[5]*C0+0.5); + ok &= cl(p.r,er) && cl(p.g,eg) && cl(p.b,eb); // SH -> rgb + ok &= cl(p.opacity,a[15]); // opacity (the dropped one) + ok &= cl(p.sx,a[16]) && cl(p.sy,a[17]) && cl(p.sz,a[18]); // scale (T.s=1) + ok &= cl(p.qw,a[19]) && cl(p.qx,a[20]) && cl(p.qy,a[21]) && cl(p.qz,a[22]); // rotation + ok &= (p.frame == v); // source frame + } + check("every gaussian channel survives run-0 accumulation", ok); +} + +// Golden: parallax_stats on synthetic cameras with hand-computed angles. A +// lateral (strafe) baseline yields a large triangulation/lateral angle; a pure +// forward (dolly) baseline of the SAME length yields ~0 — the whole point of the +// metric (it measures depth-resolving motion, not raw camera displacement). +static void test_parallax_geometry() { + std::printf("test_parallax_geometry\n"); + const double DEG = 180.0 / 3.14159265358979323846; + const Vec3 axis0 = { 0, 0, 1 }; // view-0 looks down +z + // points at depth 1 (a few, all same depth -> clean medians) + const int N = 5; + std::vector pts(3 * N); + for (int i = 0; i < N; i++) { pts[3*i]=0.0f; pts[3*i+1]=0.0f; pts[3*i+2]=1.0f; } + + // (1) pure strafe: baseline 0.1 along x, perpendicular to the optical axis + Parallax s = parallax_stats({0,0,0}, {0.1,0,0}, axis0, pts.data(), nullptr, N, 0.0); + const double tri_expect = std::atan(0.1) * DEG; // 5.7106 deg + char b1[96]; std::snprintf(b1, sizeof b1, "strafe lateral=%.3f tri=%.3f (expect 90, %.3f)", s.lateral_angle_deg, s.tri_angle_deg, tri_expect); + check("strafe -> lateral angle ~90 deg", std::fabs(s.lateral_angle_deg - 90.0) < 1e-3, b1); + check("strafe -> tri angle = atan(B/Z)", std::fabs(s.tri_angle_deg - tri_expect) < 1e-3, b1); + check("strafe -> B/Z = 0.1", std::fabs(s.baseline_over_depth - 0.1) < 1e-4); + check("strafe -> median depth = 1", std::fabs(s.median_depth - 1.0) < 1e-4); + + // (2) pure dolly: SAME baseline length 0.1 but along the optical axis + Parallax d = parallax_stats({0,0,0}, {0,0,0.1}, axis0, pts.data(), nullptr, N, 0.0); + char b2[96]; std::snprintf(b2, sizeof b2, "dolly lateral=%.3f tri=%.3f (expect ~0)", d.lateral_angle_deg, d.tri_angle_deg); + check("dolly -> lateral angle ~0 deg", std::fabs(d.lateral_angle_deg) < 1e-3, b2); + check("dolly -> tri angle ~0 deg (no parallax)", std::fabs(d.tri_angle_deg) < 1e-3, b2); + check("dolly baseline equals strafe baseline", std::fabs(d.baseline - s.baseline) < 1e-6); +} + +int main() { + test_similarity_roundtrip(); + test_parallax_geometry(); + test_scale_detection(); + test_nonlinear_detection(); + test_outlier_robustness(); + test_loop_correction(); + test_loop_closure(); + test_focal_recovery(); + test_pnp_recovery(); + test_pnp_outliers(); + test_pnp_robust_recovery(); + test_pnp_robust_planar(); + test_pnp_robust_outliers(); + test_accumulate_chain(); + test_consensus_fuse(); + test_consensus_refine(); + test_loop_distribute(); + test_splat_record(); + test_accumulate_channels(); + std::printf(failures ? "\ntest_pose: %d FAILURES\n" : "\ntest_pose: ok\n", failures); + return failures ? 1 : 0; +} diff --git a/tools/free_splatter-cli.cpp b/tools/free_splatter-cli.cpp index 495f15a..36aee40 100644 --- a/tools/free_splatter-cli.cpp +++ b/tools/free_splatter-cli.cpp @@ -8,6 +8,7 @@ // [--opacity-threshold T] [--max-splats N] [--dump-taps DIR] // MODEL.gguf (IMAGES... | INPUT.f32) #include "free_splatter.h" +#include "splat.h" // the single shared .splat record encoder #include "stb_image.h" // implementation in tools/stb_impl.cpp #include "stb_image_resize2.h" @@ -70,42 +71,93 @@ static bool write_splat(const float * g, size_t n, int gc, float opac_thr, std::ofstream f(path, std::ios::binary); if (!f) { std::fprintf(stderr, "cannot write %s\n", path); return false; } - auto u8 = [](float v) -> unsigned char { float t = v < 0 ? 0 : v > 255 ? 255 : v; return (unsigned char) t; }; for (size_t k = 0; k < m; k++) { const float * x = &g[keep[k].second * gc]; - // FreeSplatter's reference frame is OpenCV (y down, z forward); convert - // to the viewer's OpenGL convention (y up) via a 180deg rotation about X - // = diag(1,-1,-1): position.yz negate, quaternion (w,x,y,z)->(-x,w,-z,y). - float pos[3] = { x[0], -x[1], -x[2] }; - float scale[3] = { x[16], x[17], x[18] }; - unsigned char rgba[4], rot[4]; - for (int c = 0; c < 3; c++) { float v = 0.5f + (float) C0 * x[3+c]; rgba[c] = u8((v<0?0:v>1?1:v) * 255.0f); } - rgba[3] = u8(std::min(std::max(x[15], 0.0f), 1.0f) * 255.0f); - float q[4] = { -x[20], x[19], -x[22], x[21] }; - float nrm = std::sqrt(q[0]*q[0]+q[1]*q[1]+q[2]*q[2]+q[3]*q[3]) + 1e-12f; - for (int c = 0; c < 4; c++) rot[c] = u8(q[c]/nrm * 128.0f + 128.0f); - f.write((const char *) pos, 12); - f.write((const char *) scale, 12); - f.write((const char *) rgba, 4); - f.write((const char *) rot, 4); + const float pos[3] = { x[0], x[1], x[2] }; // OpenCV; the encoder flips y,z + const float scale[3] = { x[16], x[17], x[18] }; + const float quat[4] = { x[19], x[20], x[21], x[22] }; // (w,x,y,z) + const float rgb[3] = { 0.5f + (float) C0 * x[3], 0.5f + (float) C0 * x[4], 0.5f + (float) C0 * x[5] }; + unsigned char rec[32]; + free_splatter::encode_splat_record(rec, pos, scale, quat, rgb, x[15]); + f.write((const char *) rec, 32); } std::printf("wrote %s: %zu splats (pruned/cap of %zu kept)\n", path, m, keep.size()); return true; } +// Write an accumulated gaussian cloud as an antimatter15 .splat, emitting each +// point's true anisotropic scale + rotation (carried through the Sim(3)) AND its +// activated opacity as the splat alpha — so it renders exactly like the single-run +// write_splat (proper alpha blending), not a fully-opaque, swirling, blurry soup. +// scale_mult scales the radii (1.0 = as-predicted). When capping, keep the most +// important splats (opacity * volume), matching write_splat (not a uniform stride). +static bool write_cloud_splat(const free_splatter_point * pts, size_t n, size_t max_splats, + float scale_mult, const char * path) { + auto importance = [&](size_t i) -> double { + const free_splatter_point & p = pts[i]; + const double vol = (double) std::max(p.sx,1e-9f) * std::max(p.sy,1e-9f) * std::max(p.sz,1e-9f); + return (double) std::max(p.opacity, 0.0f) * vol; + }; + std::vector idx(n); + for (size_t i = 0; i < n; i++) idx[i] = i; + if (max_splats > 0 && n > max_splats) { + std::partial_sort(idx.begin(), idx.begin() + max_splats, idx.end(), + [&](size_t a, size_t b){ return importance(a) > importance(b); }); + idx.resize(max_splats); + } + + std::ofstream f(path, std::ios::binary); + if (!f) { std::fprintf(stderr, "cannot write %s\n", path); return false; } + for (size_t i : idx) { + const free_splatter_point & p = pts[i]; + const float pos[3] = { p.x, p.y, p.z }; // OpenCV; the encoder flips y,z + const float scale[3] = { scale_mult*p.sx, scale_mult*p.sy, scale_mult*p.sz }; + const float quat[4] = { p.qw, p.qx, p.qy, p.qz }; // (w,x,y,z) + const float rgb[3] = { p.r, p.g, p.b }; + unsigned char rec[32]; + free_splatter::encode_splat_record(rec, pos, scale, quat, rgb, p.opacity); + f.write((const char *) rec, 32); + } + std::printf("wrote %s: %zu splats (of %zu cloud points)\n", path, idx.size(), n); + return true; +} + static int usage(const char * a0) { std::fprintf(stderr, "usage: %s [--device DEV] [--splat OUT.splat] [--out OUT.f32]\n" " [--opacity-threshold T] [--max-splats N] [--dump-taps DIR]\n" - " MODEL.gguf (IMAGES... | INPUT.f32)\n", a0); + " MODEL.gguf (IMAGES... | INPUT.f32)\n" + " --device defaults to GPU/vulkan and FAILS if none is present; pass\n" + " --device cpu to run on CPU explicitly.\n" + "\n" + "accumulate mode (>=2 images -> one world from the photo stream):\n" + " %s --accumulate [--splat-prefix P] [--fuse] [--voxel V] [--fuse-k K]\n" + " [--min-parallax DEG] [--splat-scale S] [--max-splats N] MODEL.gguf IMG0 IMG1 ...\n" + " runs the engine on each consecutive pair, chains the runs (Sim(3)) into\n" + " one accumulating cloud; writes P_.splat after each pair (the\n" + " evolving reconstruction) and, with --fuse, a consensus-fused P_fused.splat.\n" + " --min-parallax gates by keyframe: a candidate frame is folded in only if its\n" + " triangulation angle vs the last kept frame is >= DEG (else its depth is\n" + " ill-conditioned); image mode only.\n" + "\n" + "parallax (depth conditioning of one pair, after-inference):\n" + " %s --parallax MODEL.gguf (IMG0 IMG1 | PAIR.f32)\n", + a0, a0, a0); return 2; } int main(int argc, char ** argv) { const char * device = nullptr, * taps_dir = nullptr, * out_path = nullptr, * splat_path = nullptr; - const char * model = nullptr; + const char * model = nullptr, * splat_prefix = nullptr; float opac_thr = 5e-3f; long max_splats = 0; + bool accumulate = false, fuse = false, parallax = false, tree = false, tree_stages = false; + int tov_block = 4, tov_overlap = 2; // tree: submap width / shared frames (2/1 = overlap-by-one) + int fuse_mode = 1; // 0 averaged, 1 kept, 2 best-frame + float voxel = 0.02f, splat_scale = 1.0f; // multiplier on the predicted gaussian radii + int fuse_k = 2; + bool refine = false; int refine_iters = 8; float refine_voxel = 0.03f, refine_alpha = 0.5f; // geometric de-ghost + float min_parallax = 0.0f; // deg; >0 gates accumulate by keyframe parallax std::vector inputs; for (int i = 1; i < argc; i++) { @@ -116,6 +168,24 @@ int main(int argc, char ** argv) { else if (a == "--splat" && i+1 < argc) splat_path = argv[++i]; else if (a == "--opacity-threshold" && i+1 gbuf; + if (inputs.size() == 1 && ends_with(inputs[0], ".f32")) { + std::ifstream f(inputs[0], std::ios::binary | std::ios::ate); + if (!f) { std::fprintf(stderr, "cannot open %s\n", inputs[0].c_str()); free_splatter_free(ctx); return 1; } + const std::streamsize bytes = f.tellg(); f.seekg(0); + if ((size_t)(bytes / sizeof(float)) != pair_floats) { + std::fprintf(stderr, "dump %s: %zu floats, expected 2*%d*%d*%d\n", inputs[0].c_str(), + (size_t)(bytes/sizeof(float)), geo.image_height, geo.image_width, gc); + free_splatter_free(ctx); return 1; } + gbuf.resize(pair_floats); f.read((char *) gbuf.data(), bytes); + } else if (inputs.size() == 2) { + std::vector img; + for (const std::string & p : inputs) + if (!load_image_chw(p.c_str(), geo.image_width, img)) { free_splatter_free(ctx); return 1; } + float * out = nullptr; size_t n_out = 0; + if (free_splatter_run(ctx, img.data(), 2, geo.image_height, geo.image_width, &out, &n_out) != 0) { + std::fprintf(stderr, "run failed: %s\n", free_splatter_last_error(ctx)); free_splatter_free(ctx); return 1; } + gbuf.assign(out, out + n_out); free_splatter_buf_free(out); + } else { + std::fprintf(stderr, "--parallax needs exactly 2 images or one .f32 pair dump\n"); + free_splatter_free(ctx); return 2; + } + free_splatter_parallax px; + if (free_splatter_pair_parallax(gbuf.data(), 2, geo.image_height, geo.image_width, gc, opac_thr, &px) != 0) { + std::fprintf(stderr, "parallax failed\n"); free_splatter_free(ctx); return 1; } + std::printf("parallax: tri_angle=%.3f deg lateral_angle=%.2f deg B/Z=%.4f baseline=%.4f median_depth=%.4f focal=%.1f npts=%d\n", + px.tri_angle_deg, px.lateral_angle_deg, px.baseline_over_depth, + px.baseline, px.median_depth, px.focal, px.n_points); + free_splatter_free(ctx); + return 0; + } + + // ---- accumulate mode: chain a photo stream into one world ----------------- + // Two input forms: a stream of IMAGES (the engine runs on each consecutive + // pair), or pre-computed `.f32` pair dumps (each a [2,H,W,gc] engine output — + // the engine is skipped, for fast re-bakes / fusion sweeps off cached runs). + if (accumulate) { + const int sz = geo.image_width, gc = geo.gaussian_channels; + const bool dump_mode = std::all_of(inputs.begin(), inputs.end(), + [](const std::string & s){ return ends_with(s, ".f32"); }); + if (!dump_mode && inputs.size() < 2) { + std::fprintf(stderr, "--accumulate needs >=2 images (or N .f32 pair dumps)\n"); free_splatter_free(ctx); return 2; } + const size_t npairs = dump_mode ? inputs.size() : inputs.size() - 1; + const int64_t per_view = (int64_t) geo.in_channels * sz * sz; + const size_t pair_floats = (size_t) 2 * sz * sz * gc; + + // image mode: decode every frame once (CHW per view) + std::vector> frames; + if (!dump_mode) + for (const std::string & p : inputs) { + std::vector img; + if (!load_image_chw(p.c_str(), sz, img)) { free_splatter_free(ctx); return 1; } + frames.push_back(std::move(img)); + } + + free_splatter_accumulator * acc = free_splatter_accumulator_new(sz, sz, opac_thr); + if (!acc) { std::fprintf(stderr, "accumulator alloc failed\n"); free_splatter_free(ctx); return 1; } + + // --accumulate-tree: collect the (gated) pair engine outputs and merge them + // up a balanced binary tree instead of chaining linearly (less compounded drift). + std::vector> tree_pairs; + + // emit the accumulated cloud after a pair is folded in (optional de-ghost) + auto emit_step = [&](int nframes) { + if (!splat_prefix) return; + free_splatter_point * cloud = nullptr; size_t nc = 0; + free_splatter_accumulator_cloud(acc, &cloud, &nc); + if (refine && nframes >= 2) // geometric de-ghost a copy (internal untouched) + free_splatter_refine_cloud(cloud, nc, refine_voxel, refine_iters, refine_alpha); + char path[1024]; + std::snprintf(path, sizeof path, "%s_%d.splat", splat_prefix, nframes); + write_cloud_splat(cloud, nc, (size_t) max_splats, splat_scale, path); + free_splatter_buf_free(cloud); + }; + + if (dump_mode) { + // fixed pre-computed pairs: cannot re-anchor, so the parallax gate + // (which works by re-pairing keyframes) does not apply here. + if (min_parallax > 0.0f) + std::fprintf(stderr, "note: --min-parallax re-pairs keyframes; ignored for " + ".f32 dumps (fixed pairs) -- accumulating all.\n"); + for (size_t k = 0; k < npairs; k++) { + std::ifstream f(inputs[k], std::ios::binary | std::ios::ate); + if (!f) { std::fprintf(stderr, "cannot open %s\n", inputs[k].c_str()); free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; } + const std::streamsize bytes = f.tellg(); f.seekg(0); + if ((size_t)(bytes / sizeof(float)) != pair_floats) { + std::fprintf(stderr, "dump %s: %zu floats, expected 2*%d*%d*%d\n", inputs[k].c_str(), + (size_t)(bytes/sizeof(float)), sz, sz, gc); + free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; } + std::vector dumpbuf(pair_floats); f.read((char *) dumpbuf.data(), bytes); + if (tree || tree_stages) { tree_pairs.push_back(std::move(dumpbuf)); std::printf("pair %zu collected\n", k); continue; } + free_splatter_accumulator_add_pair(acc, dumpbuf.data(), gc); + const int nframes = free_splatter_accumulator_frame_count(acc); + std::printf("pair %zu -> %d frames\n", k, nframes); + emit_step(nframes); + } + } else { + // image mode: keyframe gating. Pair the next candidate against the last + // KEPT frame; fold it in only if its parallax clears --min-parallax + // (else its depth is ill-conditioned). Threshold 0 -> every consecutive + // pair, identical to the ungated path. + size_t last_kept = 0; int skipped = 0; + for (size_t j = 1; j < frames.size(); j++) { + std::vector pair(2 * per_view); + std::memcpy(&pair[0], frames[last_kept].data(), per_view * sizeof(float)); + std::memcpy(&pair[per_view], frames[j].data(), per_view * sizeof(float)); + float * runout = nullptr; size_t ng = 0; + if (free_splatter_run(ctx, pair.data(), 2, sz, sz, &runout, &ng) != 0) { + std::fprintf(stderr, "run pair (%zu,%zu) failed: %s\n", last_kept, j, free_splatter_last_error(ctx)); + free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; } + if (min_parallax > 0.0f) { + free_splatter_parallax px; + free_splatter_pair_parallax(runout, 2, sz, sz, gc, opac_thr, &px); + if (px.tri_angle_deg < min_parallax) { + std::printf("skip frame %zu: parallax %.1f deg < %.1f (vs kept frame %zu)\n", + j, px.tri_angle_deg, min_parallax, last_kept); + free_splatter_buf_free(runout); skipped++; continue; + } + std::printf("keep frame %zu: parallax %.1f deg (vs kept frame %zu)\n", j, px.tri_angle_deg, last_kept); + } + if (tree || tree_stages) { + tree_pairs.emplace_back(runout, runout + ng); + free_splatter_buf_free(runout); + std::printf("pair (%zu,%zu) collected\n", last_kept, j); + last_kept = j; continue; + } + free_splatter_accumulator_add_pair(acc, runout, gc); + free_splatter_buf_free(runout); + const int nframes = free_splatter_accumulator_frame_count(acc); + std::printf("pair (%zu,%zu) -> %d frames\n", last_kept, j, nframes); + emit_step(nframes); + last_kept = j; + } + if (skipped) std::printf("gated: skipped %d low-parallax frame(s)\n", skipped); + } + if (tree || tree_stages) { + std::vector ps; ps.reserve(tree_pairs.size()); + for (auto & v : tree_pairs) ps.push_back(v.data()); + if (ps.empty()) { std::fprintf(stderr, "tree: need >=1 pair\n"); + free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; } + const std::string pfx = splat_prefix ? splat_prefix : "stage"; + + if (tree_stages) { + // write one laid-out .splat per merge level (stage 0 = the independent + // leaf scenes side by side, ... then the single merged world, then a + // consensus-fused clean scene) + a manifest the viewer steps through. + std::string dir = ".", base = pfx; + const size_t slash = pfx.find_last_of('/'); + if (slash != std::string::npos) { dir = pfx.substr(0, slash); base = pfx.substr(slash + 1); } + // per-scene budget: each laid-out scene keeps its own detail (a single + // shared cap would starve every scene to ~1/N and fill it with floaters). + const int per_node = (max_splats > 0 && max_splats < 200000) ? (int) max_splats : 150000; + std::vector labels; + auto stage_path = [&](char * p, size_t cap) { std::snprintf(p, cap, "%s_%zu.splat", pfx.c_str(), labels.size()); }; + // multi-scene stages of the multi-frame-overlap tree: each laid-out + // scene gets its own per_node budget. + for (int L = 0; L <= 64; L++) { + free_splatter_point * sc = nullptr; size_t nsc = 0; int nnodes = 0; + if (free_splatter_tree_overlap(ps.data(), (int) ps.size(), gc, sz, sz, + opac_thr, tov_block, tov_overlap, L, -1.0f /*auto layout*/, per_node, &sc, &nsc, &nnodes) != 0) { + std::fprintf(stderr, "tree-stages failed at level %d\n", L); + free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; } + if (nnodes <= 1) { free_splatter_buf_free(sc); break; } // single scene: full budget below + char path[1024]; stage_path(path, sizeof path); + write_cloud_splat(sc, nsc, 0 /*already per-node capped*/, splat_scale, path); + free_splatter_buf_free(sc); + std::printf("stage %zu: %d scenes\n", labels.size(), nnodes); + labels.push_back(L == 0 ? (std::to_string(nnodes) + " independent scenes") + : ("merge " + std::to_string(L) + " — " + std::to_string(nnodes) + " scenes")); + } + // the single merged world, at full budget: first the raw union (the + // overlapping per-frame copies, ghosted), then the consensus-fused + // (best-frame) clean scene — same contrast as the linear demo's acc/fused. + free_splatter_point * root = nullptr; size_t nroot = 0; + if (free_splatter_tree_overlap(ps.data(), (int) ps.size(), gc, sz, sz, opac_thr, + tov_block, tov_overlap, -1, 0.0f, 0, &root, &nroot, nullptr) == 0) { + char path[1024]; stage_path(path, sizeof path); + write_cloud_splat(root, nroot, (size_t) max_splats, splat_scale, path); + std::printf("stage %zu: merged raw union (%zu pts)\n", labels.size(), nroot); + labels.push_back("merged — one scene (raw union)"); + free_splatter_point * fz = nullptr; size_t nfz = 0; + if (free_splatter_fuse_cloud(root, nroot, voxel, fuse_k, 2 /*best-frame*/, &fz, &nfz) == 0 && nfz > 0) { + char path2[1024]; stage_path(path2, sizeof path2); + write_cloud_splat(fz, nfz, (size_t) max_splats, splat_scale, path2); + std::printf("stage %zu: consensus-fused (%zu -> %zu pts)\n", labels.size(), nroot, nfz); + labels.push_back("consensus-fused — one clean scene"); + free_splatter_buf_free(fz); + } + free_splatter_buf_free(root); + } + std::ofstream mf(dir + "/manifest.json"); + mf << "{ \"reframe\": true, \"steps\": [\n"; + for (size_t L = 0; L < labels.size(); L++) + mf << " {\"splat\":\"" << base << "_" << L << ".splat\",\"images\":[],\"n\":" << (L + 1) + << ",\"label\":\"" << labels[L] << "\"}" << (L + 1 < labels.size() ? "," : "") << "\n"; + mf << " ] }\n"; + std::printf("tree-stages: %zu stages -> %s/manifest.json\n", labels.size(), dir.c_str()); + } else { + // one merged world from the overlap tree (block=tov_block, overlap=tov_overlap; + // 2/1 = the plain overlap-by-one tree). With --fuse, also the de-ghosted scene. + std::printf("tree-accumulate: %zu pairs, block=%d overlap=%d\n", ps.size(), tov_block, tov_overlap); + free_splatter_point * tc = nullptr; size_t ntc = 0; + if (free_splatter_tree_overlap(ps.data(), (int) ps.size(), gc, sz, sz, opac_thr, + tov_block, tov_overlap, -1, 0.0f, 0, &tc, &ntc, nullptr) != 0) { + std::fprintf(stderr, "tree accumulate failed\n"); + free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 1; } + char path[1024]; + std::snprintf(path, sizeof path, "%s_tree.splat", pfx.c_str()); + write_cloud_splat(tc, ntc, (size_t) max_splats, splat_scale, path); + if (fuse) { // best-frame consensus, de-ghosted + free_splatter_point * fz = nullptr; size_t nfz = 0; + if (free_splatter_fuse_cloud(tc, ntc, voxel, fuse_k, 2, &fz, &nfz) == 0 && nfz > 0) { + std::snprintf(path, sizeof path, "%s_tree_fused.splat", pfx.c_str()); + write_cloud_splat(fz, nfz, (size_t) max_splats, splat_scale, path); + free_splatter_buf_free(fz); + } + } + free_splatter_buf_free(tc); + } + free_splatter_accumulator_free(acc); free_splatter_free(ctx); return 0; + } + if (fuse && splat_prefix) { + if (refine) free_splatter_accumulator_refine(acc, refine_voxel, refine_iters, refine_alpha); + free_splatter_point * fc = nullptr; size_t nf = 0; + free_splatter_accumulator_fuse(acc, voxel, fuse_k, fuse_mode, &fc, &nf); + char path[1024]; + std::snprintf(path, sizeof path, "%s_fused.splat", splat_prefix); + write_cloud_splat(fc, nf, (size_t) max_splats, splat_scale, path); + free_splatter_buf_free(fc); + } + free_splatter_accumulator_free(acc); + free_splatter_free(ctx); + return 0; + } + // assemble input: one raw .f32, or several decoded images std::vector buf; int32_t n_views = 0; diff --git a/web/README.md b/web/README.md index 7ca0ec3..8884613 100644 --- a/web/README.md +++ b/web/README.md @@ -24,6 +24,66 @@ python3 -m http.server -d web 8000 # then open http://localhost:8000 Load a different file with `?splat=other.splat`. +## Accumulating-reconstruction demo (`accumulate.html`) + +A second viewer that plays the **growing world** the pose pipeline builds: the +cloud reconstructed from 2 photos, then 3, then 4, … as each new view is folded +in (`free_splatter-cli --accumulate` → cross-run Sim(3) alignment). The input +photos appear one-per-step in the top-right filmstrip, newest highlighted; the +WebGL renderer is the same EWA splatting as `index.html`. + +```sh +# bake the demo (engine pass over a photo stream -> a self-contained demo dir) +nix develop -c scripts/make_accumulate_demo.sh .cache/demo/accumulate \ + frames/f0000.png frames/f0020.png frames/f0040.png frames/f0060.png ... + +# serve it (the bake copies accumulate.html in as index.html) +python3 -m http.server -d .cache/demo/accumulate 8080 # open http://localhost:8080 +``` + +URL params: `?ms=2600` step interval, `?start=N` begin at step N, `?auto=0` no +auto-advance, `?spin=0` no auto-orbit. The bake also writes a consensus-fused +`acc_fused.splat` (single-view floaters removed — only voxels seen by ≥ K frames +survive), shown as a final step. `--fuse-mode kept` keeps every raw gaussian in a +consensus voxel (dense); `best` keeps only the most-confident frame per voxel +(dense AND de-ghosted); `averaged` collapses each voxel to one denoised point +(cleaner but sparse — only worthwhile with many overlapping frames). + +**De-ghosting.** Pairwise chaining overlays each frame's reconstruction in world +space, so where two frames disagree about a surface you see doubled ("ghosted") +copies. The fused step removes them by **best-frame selection** (`--fuse-mode +best`): within each voxel it keeps only the single most-confident frame's +gaussians, so disagreeing copies aren't stacked — one frame represents each +region. This de-ghosts by *selection* (no averaging → no blur), at the cost of +visible seams where adjacent voxels pick different frames. The two artifact-free +alternatives are more expensive: fixing the registration (only helps when the +disagreement is a *rigid* pose drift — usually it isn't, it's per-pair stereo +depth disagreement baked into the network output, which no rigid alignment can +undo) or photometric 3DGS re-optimization. A gaussian-level averaging pass +(`--refine`, **off** by default) exists but is **not recommended**: it averages +points that already share a voxel — blurring the in-register surface — while +leaving ghosts that are more than a voxel apart untouched, so it trades sharpness +for no real de-ghosting. + +**Pick frames with lateral baseline.** Two-view depth needs the camera to +*translate sideways* — a pure forward dolly (the camera moving along the direction +it faces) gives near-zero parallax and reconstructs blurry. Frames a few apart +from an orbiting / strafing clip work far better than tightly-spaced frames of a +walk-forward clip. Rule of thumb: the per-pair lateral baseline should be a few % +of scene depth or more. + +**…or let the gate pick them (`MIN_PARALLAX`).** The bake passes +`--min-parallax DEG` (default 8°, `MIN_PARALLAX=0` to disable): a candidate frame +is folded in only if its triangulation angle vs the last *kept* frame clears the +threshold — otherwise its depth is ill-conditioned and the model invents it. +Skipped frames re-anchor against the last kept one (keyframe selection), so you can +feed a long, dense frame stream and let the gate curate the well-conditioned +subset. The threshold is the engine's *after-inference* angle +(`free_splatter-cli --parallax …`), which over-reports, so keep it well above +COLMAP's 1–2°. `scripts/parallax_ref.py` (cv2, dev-shell only) is the independent +geometric cross-check — agreement on a pair validates it; a large gap (geometric +angle tiny, model angle confident) flags the model hallucinating depth. + ## Sanity-checking the data separately The `.splat` uses the antimatter15 layout, so you can verify the *data* (apart