Skip to content

perf: vectorize Gaussian evaluation across 2x2 pixel quad in rasterize pass#8597

Merged
mvaligursky merged 1 commit intomainfrom
mv-rasterize-vectorize-quad
Apr 14, 2026
Merged

perf: vectorize Gaussian evaluation across 2x2 pixel quad in rasterize pass#8597
mvaligursky merged 1 commit intomainfrom
mv-rasterize-vectorize-quad

Conversation

@mvaligursky
Copy link
Copy Markdown
Contributor

Vectorize the per-splat Gaussian evaluation in the tile rasterizer's color path.

Changes:

  • Remove evalSplat function and inline the evaluation directly in the batch loop
  • Compute dx = p00 - center once, then build vec4f pixel offsets exploiting the regular +1 grid pattern of the 2x2 quad
  • Evaluate power, gauss, alpha, and transmittance update as vec4 operations instead of four independent scalar chains
  • Pack transmittance into a single vec4<half> T (was four separate half variables) for branchless alpha-blend update via select() and the saturation early-out check
  • DEPTH_TEST mask folds into the vectorized valid condition
  • Pick mode (evalSplatPick) unchanged — its complex branching doesn't benefit from vectorization

Performance:

  • Neutral on Apple M4 (2.95ms) where GPU ALUs are scalar
  • Expected improvement on NVIDIA/AMD where wider SIMD lanes can exploit the vec4 operations
  • Eliminates redundant center subtractions (was computed 4× per splat, now 1×)
  • Cleaner code: fewer variables, select() instead of manual cond * a + (1-cond) * b

…e pass

Inline and vectorize the per-splat Gaussian evaluation in the color
rasterize path. Instead of four separate evalSplat calls each computing
dx, power, exp, alpha independently, compute dx once from p00, build
vec4f pixel offsets exploiting the regular +1 grid, and evaluate
power/gauss/alpha/transmittance as vec4 operations. Pack transmittance
into a single vec4<half> for the branchless update and saturation check.
@mvaligursky mvaligursky self-assigned this Apr 14, 2026
@mvaligursky mvaligursky merged commit c96f201 into main Apr 14, 2026
8 checks passed
@mvaligursky mvaligursky deleted the mv-rasterize-vectorize-quad branch April 14, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant