El::Trsm() is used e.g. for computing L^{-1} B in SDPB solver, and for L_X^{-1} F_p in SDPA solver.
If a block is assigned to many CPUs, it may lead to huge memory consumption, sometimes more than ~10x compared to single-CPU case. This is probably the cause of some unexpected OOM crashes, since our memory estimates do not account for this factor.
For wide matrices, memory overhead is roughly proportional to MPI grid height.
All CPUs are arranged as a 2D grid, and grid height is defined as min x | x >= floor(sqrt(num_cpus)) && num_cpus % x == 0.
This means that memory consumption (and also performance) is especially bad when num_cpus is a big prime number (e.g. 13 or 17), which leads to 1D vertical grids (e.g. 13x1 or 17x1).
Memory model for Trsm and Trmm (which suffers from similar issues) can be found here:
|
inline size_t get_trsm_bytes(const int height, const int width, |
Memory consumption and speedup plots for L^{-1} X, where L ~ 125 x 125, X ~ 125 x 213500, precision = 448 bit:

El::Trsm()is used e.g. for computingL^{-1} Bin SDPB solver, and forL_X^{-1} F_pin SDPA solver.If a block is assigned to many CPUs, it may lead to huge memory consumption, sometimes more than ~10x compared to single-CPU case. This is probably the cause of some unexpected OOM crashes, since our memory estimates do not account for this factor.
For wide matrices, memory overhead is roughly proportional to MPI grid height.
All CPUs are arranged as a 2D grid, and grid height is defined as
min x | x >= floor(sqrt(num_cpus)) && num_cpus % x == 0.This means that memory consumption (and also performance) is especially bad when
num_cpusis a big prime number (e.g. 13 or 17), which leads to 1D vertical grids (e.g. 13x1 or 17x1).Memory model for Trsm and Trmm (which suffers from similar issues) can be found here:
sdpb/src/sdpb_util/memory_estimates.hxx
Line 54 in 464306b
Memory consumption and speedup plots for
L^{-1} X, whereL ~ 125 x 125,X ~ 125 x 213500, precision = 448 bit: