Skip to content

vectorization follow-ups#447

Merged
adrn merged 8 commits intomainfrom
vectorize-more
Oct 6, 2025
Merged

vectorization follow-ups#447
adrn merged 8 commits intomainfrom
vectorize-more

Conversation

@lgarrison
Copy link
Collaborator

@lgarrison lgarrison commented Aug 24, 2025

Describe your changes

This PR is a follow-up to #446 with the following changes:

  1. applying vectorization to ruth4 and dop853
  2. batching dop853 integration
  3. specializing multipole and EXP gradient vectorization

The ruth4 and dop853 vectorization essentially means transposing the orbit arrays during integration. The ruth4 change is basically identical to leapfrog; dop853 is a bit different. Since it's used from a few places in Gala, some of which can't easily use the vectorization (e.g. N-body), I essentially just added a fast/transposed path (Fwrapper_T) while not touching the other paths.

Additionally, while benchmarking dop853, I noticed that it had an odd performance peak at intermediate numbers of orbits (both main and vectorized):

image

In looking at the dop853 source, it seems to be using a large number of intermediate arrays, so I think the performance decreases when these arrays start spilling cache. But as far as I can tell, the orbits are completely independent, so we can just do the integration in small, cache-sized batches:

image

Maybe this helps with #439.

And just to put all the improvements together on one plot:

image

I also folded in "specialized" gradient vectorization for the multipole and EXP potentials. This just means writing the vectorized version of the gradient directly, rather than writing the single-particle version and letting the template wrapper generate the vectorized version. The benefit is that you can manually lift repeated computations out of the particle loop. For EXP, for example, this lets us interpolate the coefficients once per timestep, instead of once per particle. I only did a cursory performance test, but it was something like 25% faster (and may matter more after EXP-code/EXP#157). It didn't seem to matter much for the CylSplinePotential, but I'm also not sure I was using it correctly.

I'll un-mark this as a draft once the first PR is merged, so we can target main.

Checklist

  • Did you add tests? (No, unless we want to add performance tests)
  • Did you add documentation for your changes? (Added in previous PR)
  • Did you reference any relevant issues?
  • Did you add a changelog entry? (see CHANGES.rst) (Added in previous PR)
  • Are the CI tests passing?
  • Is the milestone set?

@lgarrison lgarrison marked this pull request as draft August 24, 2025 01:16
@lgarrison lgarrison mentioned this pull request Aug 25, 2025
6 tasks
Base automatically changed from vectorize to main September 12, 2025 15:11
@lgarrison lgarrison marked this pull request as ready for review September 12, 2025 15:14
@lgarrison
Copy link
Collaborator Author

Oh neat, GitHub automatically retargeted main.

@lgarrison lgarrison requested a review from adrn September 12, 2025 15:14
@lgarrison lgarrison force-pushed the vectorize-more branch 2 times, most recently from a294345 to 4e64d07 Compare September 12, 2025 19:25
@lgarrison
Copy link
Collaborator Author

This is now actually ready for review @adrn!

@adrn
Copy link
Owner

adrn commented Sep 23, 2025

Overall, I like this - thanks!

Two things:

  1. We chatted about this last week, but I think we can simplify the transpose stuff, given that the shape at the Python layer is expected to be (3, ...)?
  2. I didn't realize / forgot you also implemented a fix for the CI failure, so I merged my own and now you have a conflict. I think you could just drop that commit?

Allows us to avoid some repeated computations.
Makes them amenable to vectorization.
dop853 seems to slow down at large numbers of orbits, probably
because of cache exhaustion. However, orbits are independent,
so we can simply batch the integration for performance.
attempting to fix compile error
@lgarrison lgarrison force-pushed the vectorize-more branch 2 times, most recently from 22c4368 to 5bb1f06 Compare October 2, 2025 18:12
@lgarrison
Copy link
Collaborator Author

This should be all set to merge; as we discussed, I'll follow up in a future PR with simplifications of the transposes (tracked in #468).

@adrn
Copy link
Owner

adrn commented Oct 6, 2025

Finally had time to give it another look -- thanks @lgarrison!

@adrn adrn merged commit 5b5e3db into main Oct 6, 2025
26 checks passed
@adrn adrn deleted the vectorize-more branch October 6, 2025 15:11
@adrn adrn mentioned this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants