To Reproduce
MGPUSim version of commit ID:
v4.1.4
4277061
Command that recreates the problem
./bitonicsort -timing -use-unified-memory --report-rdma-transaction-count -unified-gpus 1,2,3,4 -trace-mem
where bitonicsort can be replaced with other test cases under amd/samples.
Current behavior
RDMA outgoing, ingoing count on all GPUs are 0.
Expected behavior
As the command is simulating multi-GPU with unified memory access enabled, we expect to observe non-zero count value on RDMA traffic.
Screenshots
Additional context
- RDMA-based remote access typically operates at cache-line granularity, while page migration uses page-sized granularity. The current simulation does not explicitly report page migration counts or provide options to switch between these modes. Is this functionality implemented but undocumented, or is it absent?
- Additionally, it seems that CPU-GPU unified memory page fault may not be fully modeled in MGPUSim. Did I overlooked some configurations, or is it not implemented intentionally?
To Reproduce
MGPUSim version of commit ID:
v4.1.4
4277061
Command that recreates the problem
where
bitonicsortcan be replaced with other test cases underamd/samples.Current behavior
RDMA outgoing, ingoing count on all GPUs are 0.
Expected behavior
As the command is simulating multi-GPU with unified memory access enabled, we expect to observe non-zero count value on RDMA traffic.
Screenshots
Additional context