Skip to content

Conversation

@msimberg
Copy link
Contributor

@msimberg msimberg commented Oct 21, 2025

Allows configuring the heap object through a heap_config object.

Additionally uses a heap_config object initialized through environment variables as the default value for the heap_config given to the heap constructor.

This allows a simple way to change the limits and sizes globally, but allows locally overriding if one needs more control when constructing a heap.

The defaults have been increased.

@msimberg
Copy link
Contributor Author

Open question: I was considering moving the never_free and num_reserve_segments to be configuration options similarly to the sizes and limits. However, that's an API break. How much do you care about an API break? Or, do you think it's even a good/bad idea to move those parameters into the config? In the current state (as of 8b27a3e) I've only added another default parameter to the heap constructor, which is backwards compatible. To properly expose that oomph and probably ghex anyway need updates to allow passing the heap config... Thoughts?

@biddisco
Copy link
Collaborator

out of curiosity - what is the motivation for this? Did you find that some default allocation sizes were inappropriate for project X? (icon I assume)

@msimberg
Copy link
Contributor Author

out of curiosity - what is the motivation for this? Did you find that some default allocation sizes were inappropriate for project X? (icon I assume)

Yes, that's exactly it. The default "large limit" (i.e. the threshold after which allocations go into the "huge" category where allocations are one-off) is 64 KiB. I'm going off memory here, but some halo exchanges are of the order ~100s of KiB, meaning they go into the huge category. This also means they get reallocated every halo exchange (except for one; one is kept around by default), and in turn at least one new IPC handle is created every halo exchange, and creating IPC handles is very expensive.

This PR makes it 1. customizable and 2. increases the defaults (not yet, I need to test what are good enough defaults). Most small-ish (~1 Mib at least?) allocations should almost always go into the regular buckets of allocations and the backing allocations should almost always stay around.

@msimberg msimberg marked this pull request as ready for review October 22, 2025 13:18
@msimberg
Copy link
Contributor Author

The major todos are cleaned up now, but the question of defaults (I'm testing at the moment) and whether to move the num reserved segments and never free parameters to the config (API break) are still open.

@msimberg
Copy link
Contributor Author

I've done some tests on a few icon experiments and to make all halo exchanges always come from the large segments the large segment/large limit would have to be at least:

  • 16 MiB/2 Mib (mch-ch2, four ranks)
  • 4 MiB/512 KiB (mch-ch1_medium_stencils, four ranks)

I see three (somewhat overlapping) options:

  • Leave the defaults as they are. Explicit is better than implicit, and every application knows best what sizes work for their use case so they should set the sizes explicitly every time.
  • Increase the default segment sizes to something like 16 or 32 MiB, and the large limit to something close to that. Memory is cheap-ish. Segments are allocated only for size classes that are used, so no memory is allocated up front.
  • Allow specifying the num reserve segments and never free options as environment variables. Part of the issues we see would also go away if hwmalloc doesn't free segments as aggressively. When doing halo exchanges more likely than not you're going to be repeating that halo exchange many times, so freeing backing buffers probably doesn't make sense too often.

I'm leaning toward:

  • Increase the segment sizes modestly so that they're at least the typical 64 KiB page size on grace cpus. More likely 2 MiB. They should still be set to higher values by applications when needed.
  • Make num reserve segments an option. Default to 16 or 32?
  • Make never free an option. Default to off. If above are sized appropriately this should never be needed, but having this be an option allows for quick testing if ghex is repeatedly allocating and freeing segments.

Any objections?

@tehrengruber any comments from the PMAP perspective?

@msimberg
Copy link
Contributor Author

I'm leaning toward:

* Increase the segment sizes modestly so that they're at least the typical 64 KiB page size on grace cpus. More likely 2 MiB. They should still be set to higher values by applications when needed.

* Make num reserve segments an option. Default to 16 or 32?

* Make never free an option. Default to off. If above are sized appropriately this should never be needed, but having this be an option allows for quick testing if ghex is repeatedly allocating and freeing segments.

I'll make these changes today. That should remove the need to almost ever change the defaults.

Comment on lines +60 to +62
// Note: sizes below are defaults and can be changed through heap_config and
// environment variables.
//
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final TODO unless there are other comments: the example below is based on the old defaults and should be updated before merging.

@msimberg
Copy link
Contributor Author

msimberg commented Oct 24, 2025

I've updated the defaults and on mch-ch2 the defaults are enough to avoid reallocations.

If I'm not completely off with my calculations, using the new defaults, if all the size classes are fully used and allocated up to num_reserve_segments, and then the user allocations released again, approximately 20 300 MiB of memory will be allocated by hwmalloc, which I find quite reasonable (edit: I of course left out the num_reserve_segments factor of 16; not a trivial amount anymore, but not massive either). If that's too much for a particular use case the options can still be overridden.

Copy link
Collaborator

@boeschf boeschf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - thank you! A few minor things about redundant implementations that could be factored out.

@msimberg msimberg requested a review from boeschf November 3, 2025 12:22
Copy link
Collaborator

@boeschf boeschf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

@msimberg
Copy link
Contributor Author

msimberg commented Nov 3, 2025

Apparently I have some tests to fix... They seem to pass locally for me so not quite sure what's going wrong there. I'll have a look. Thanks for all the comments so far @boeschf!

@msimberg
Copy link
Contributor Author

msimberg commented Nov 3, 2025

I think I've fixed the test as well now. I messed up the return type of get_env while refactoring it.

@boeschf boeschf merged commit c3ddc35 into ghex-org:master Nov 3, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants