Skip to content

El::Trsm() consumes too much memory when running on many CPUs. #275

@vasdommes

Description

@vasdommes

El::Trsm() is used e.g. for computing L^{-1} B in SDPB solver, and for L_X^{-1} F_p in SDPA solver.
If a block is assigned to many CPUs, it may lead to huge memory consumption, sometimes more than ~10x compared to single-CPU case. This is probably the cause of some unexpected OOM crashes, since our memory estimates do not account for this factor.

For wide matrices, memory overhead is roughly proportional to MPI grid height.
All CPUs are arranged as a 2D grid, and grid height is defined as min x | x >= floor(sqrt(num_cpus)) && num_cpus % x == 0.
This means that memory consumption (and also performance) is especially bad when num_cpus is a big prime number (e.g. 13 or 17), which leads to 1D vertical grids (e.g. 13x1 or 17x1).

Memory model for Trsm and Trmm (which suffers from similar issues) can be found here:

inline size_t get_trsm_bytes(const int height, const int width,

Memory consumption and speedup plots for L^{-1} X, where L ~ 125 x 125, X ~ 125 x 213500, precision = 448 bit:

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions