Merged
Conversation
5ea4ced to
80549b3
Compare
Collaborator
Author
|
Oh neat, GitHub automatically retargeted main. |
a294345 to
4e64d07
Compare
Collaborator
Author
|
This is now actually ready for review @adrn! |
Owner
|
Overall, I like this - thanks! Two things:
|
Allows us to avoid some repeated computations.
Makes them amenable to vectorization.
dop853 seems to slow down at large numbers of orbits, probably because of cache exhaustion. However, orbits are independent, so we can simply batch the integration for performance.
attempting to fix compile error
22c4368 to
5bb1f06
Compare
Collaborator
Author
|
This should be all set to merge; as we discussed, I'll follow up in a future PR with simplifications of the transposes (tracked in #468). |
adrn
approved these changes
Oct 6, 2025
Owner
|
Finally had time to give it another look -- thanks @lgarrison! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
This PR is a follow-up to #446 with the following changes:
The ruth4 and dop853 vectorization essentially means transposing the orbit arrays during integration. The ruth4 change is basically identical to leapfrog; dop853 is a bit different. Since it's used from a few places in Gala, some of which can't easily use the vectorization (e.g. N-body), I essentially just added a fast/transposed path (
Fwrapper_T) while not touching the other paths.Additionally, while benchmarking dop853, I noticed that it had an odd performance peak at intermediate numbers of orbits (both main and vectorized):
In looking at the dop853 source, it seems to be using a large number of intermediate arrays, so I think the performance decreases when these arrays start spilling cache. But as far as I can tell, the orbits are completely independent, so we can just do the integration in small, cache-sized batches:
Maybe this helps with #439.
And just to put all the improvements together on one plot:
I also folded in "specialized" gradient vectorization for the multipole and EXP potentials. This just means writing the vectorized version of the gradient directly, rather than writing the single-particle version and letting the template wrapper generate the vectorized version. The benefit is that you can manually lift repeated computations out of the particle loop. For EXP, for example, this lets us interpolate the coefficients once per timestep, instead of once per particle. I only did a cursory performance test, but it was something like 25% faster (and may matter more after EXP-code/EXP#157). It didn't seem to matter much for the
CylSplinePotential, but I'm also not sure I was using it correctly.I'll un-mark this as a draft once the first PR is merged, so we can target
main.Checklist
CHANGES.rst) (Added in previous PR)