[Proof of Concept] Precompute static parts of the deformation gradient by efaulhaber · Pull Request #1225 · trixi-framework/TrixiParticles.jl

efaulhaber · 2026-05-28T09:54:00Z

This is a proof-of-concept demonstrating what we could potentially gain from precomputing the static part of the deformation gradient. On the GPU, the load of this precomputed part can be coalesced (which is the memory layout in this PR). For the CPU, it is faster to switch the memory layout to put particle last.

As we can see below, we get a significant but not massive speedup on the GPU, but only in 3D. On the CPU, we get a massive 2x speedup, but not more from SIMD-vectorizing it, whereas the version in #1220 is even faster when vectorized. In 2D, this is slower on the CPU.

3D

Machine	#1220	This PR
H100 FP64	1.358 ms	989.258 μs
H100 FP32	855.816 μs	615.526 μs
Intel Xeon w9-3475X (x36)	39.861 ms	20.309 ms
Intel Xeon w9-3475X (x36) vectorized	14.729 ms	18.901 ms

2D

Machine	#1220	This PR
H100 FP64	256.930 μs	207.394 μs
H100 FP32	155.842 μs	136.834 μs
Intel Xeon w9-3475X (x36)	4.855 ms	6.308 ms

Precompute static parts of the deformation gradient

300dcfd

efaulhaber self-assigned this May 28, 2026

efaulhaber added the performance label May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proof of Concept] Precompute static parts of the deformation gradient#1225

[Proof of Concept] Precompute static parts of the deformation gradient#1225
efaulhaber wants to merge 1 commit into
trixi-framework:mainfrom
efaulhaber:deformation-grad-precompute

efaulhaber commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

efaulhaber commented May 28, 2026

3D

2D

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant