You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to start some discusson on the FORM/TFORM default buffer sizes and how they relate to each other. Changes here would break some of the tests, which rely on specific buffer sizes to hit (fixed) buggy behaviour, which is a bit of a pain.
In particular, running FORM and TFORM with large MaxTermSize values causes really enormous memory allocations, which can be multiple TB. This makes it hard to debug some crashing scripts: valgrind fails to run due to failed mallocs, even if the configuration runs outside of valgrind (because FORM touches hardly any of the memory). The large allocations are primarily due to:
large + smallextension must be at least filepatches * (sortiosize + 2*maxtermsize). The manual claims that filepatches will be reduced to satisfy this, but in fact largesize is increased. This can mean that the worker large+smallextension can be larger than 1/workers times the master buffer sizes.
in TFORM, the "deferbuffer" allocation scales with threadbucketsize * maxtermsize. There is already code and commentary to reduce this somewhat, but a maxtermsize of 5120K and default threadbucketsize of 500 causes an allocation of 2 * 4.7GB per thread. Depending on the number of threads, this can easily exceed the worker large+smallextension!
in TFORM, for sufficiently large maxtermsize, the master large+smallextension starts to scale with the number of threads, since RecalcSetups enforces that this is at least (threads-1)*(1+NUMBEROFBLOCKSINSORT*MINIMUMNUMBEROFTERMS)*maxtermsize -- this buffer is used by the sortbots when performing the final sort to the output (?)
So my questions / things to experiment with are:
Can we reduce filepatches instead of increasing largesize, as described in the manual? This could lead to more stage4 sorts, but the allocations would be much smaller.
Should we more aggressively reduce the "deferbuffer" allocation for larger threadbucketsize and maxtermsize combinations? It seems unlikely that this buffer needs to be larger than the worker large+smallextension, for example.
What is the purpose of RecalcSetups? Why does it overlap (and conflict) with size relations in AllocSetups and AllocSort?
Is a 1:1 ratio between master and thread sorting buffers optimal? What if the master buffers had less space, and the worker buffers more? Such large master buffers often seems wasteful.
I will try to implement some changes, and run some benchmarks, and update this in the future.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to start some discusson on the FORM/TFORM default buffer sizes and how they relate to each other. Changes here would break some of the tests, which rely on specific buffer sizes to hit (fixed) buggy behaviour, which is a bit of a pain.
In particular, running FORM and TFORM with large
MaxTermSizevalues causes really enormous memory allocations, which can be multiple TB. This makes it hard to debug some crashing scripts: valgrind fails to run due to failed mallocs, even if the configuration runs outside of valgrind (because FORM touches hardly any of the memory). The large allocations are primarily due to:large + smallextensionmust be at leastfilepatches * (sortiosize + 2*maxtermsize). The manual claims thatfilepatcheswill be reduced to satisfy this, but in factlargesizeis increased. This can mean that the workerlarge+smallextensioncan be larger than 1/workers times the master buffer sizes."deferbuffer"allocation scales withthreadbucketsize * maxtermsize. There is already code and commentary to reduce this somewhat, but amaxtermsizeof 5120K and defaultthreadbucketsizeof 500 causes an allocation of2 * 4.7GBper thread. Depending on the number of threads, this can easily exceed the workerlarge+smallextension!maxtermsize, the masterlarge+smallextensionstarts to scale with the number of threads, since RecalcSetups enforces that this is at least(threads-1)*(1+NUMBEROFBLOCKSINSORT*MINIMUMNUMBEROFTERMS)*maxtermsize-- this buffer is used by the sortbots when performing the final sort to the output (?)So my questions / things to experiment with are:
filepatchesinstead of increasinglargesize, as described in the manual? This could lead to more stage4 sorts, but the allocations would be much smaller.threadbucketsizeandmaxtermsizecombinations? It seems unlikely that this buffer needs to be larger than the workerlarge+smallextension, for example.I will try to implement some changes, and run some benchmarks, and update this in the future.
Beta Was this translation helpful? Give feedback.
All reactions