Skip to content
This repository was archived by the owner on Nov 17, 2025. It is now read-only.

Conversation

@genshen
Copy link
Contributor

@genshen genshen commented Oct 7, 2025

performance log
$ cat run.sh 
#!/bin/bash

./new /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx1.bin
./old /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx1.bin

./new /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx20.bin
./old /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx20.bin

./new /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx33.bin
./old /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx33.bin

./new /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx45.bin
./old /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx45.bin

./new /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx79.bin
./old /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx79.bin

./new /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx80.bin
./old /work/share/data/XDZS2025/GMRES/sparseMatrixSet/mtx80.bin

$  srun -n   1 -p wzidnormal --gres=dcu:1 ./run.sh
mtx1: M = 503625, N = 503625
Using device 0: K100_AI
N=503625, VN=503625
verify precision : pass !   resid_ver =  0.00000097031320  
N=503625, VN=503625
verify precision : pass !   resid_ver =  0.00000097031320  
iters = 108, time = 48.5007ms, resid = 9.6667e-07
mtx1: M = 503625, N = 503625
Using device 0: K100_AI
N=503625, VN=503625
verify precision : pass !   resid_ver =  0.00000097031320  
N=503625, VN=503625
verify precision : pass !   resid_ver =  0.00000097031320  
iters = 108, time = 48.3174ms, resid = 9.6667e-07
mtx20: M = 1489752, N = 1489752
Using device 0: K100_AI
N=1489752, VN=1489752
verify precision : pass !   resid_ver =  0.00000068063210  
N=1489752, VN=1489752
verify precision : pass !   resid_ver =  0.00000068063210  
iters = 12, time = 6.81551ms, resid = 6.80996e-07
mtx20: M = 1489752, N = 1489752
Using device 0: K100_AI
N=1489752, VN=1489752
verify precision : pass !   resid_ver =  0.00000068063210  
N=1489752, VN=1489752
verify precision : pass !   resid_ver =  0.00000068063210  
iters = 12, time = 6.73839ms, resid = 6.80996e-07
mtx33: M = 84617, N = 84617
Using device 0: K100_AI
N=84617, VN=84617
verify precision : pass !   resid_ver =  0.00000098338087  
N=84617, VN=84617
verify precision : pass !   resid_ver =  0.00000098338087  
iters = 1240, time = 156.491ms, resid = 9.84889e-07
mtx33: M = 84617, N = 84617
Using device 0: K100_AI
N=84617, VN=84617
verify precision : pass !   resid_ver =  0.00000098338087  
N=84617, VN=84617
verify precision : pass !   resid_ver =  0.00000098338087  
iters = 1240, time = 157.996ms, resid = 9.84889e-07
mtx45: M = 71505, N = 71505
Using device 0: K100_AI
N=71505, VN=71505
verify precision : pass !   resid_ver =  0.00000095046978  
N=71505, VN=71505
verify precision : pass !   resid_ver =  0.00000095046978  
iters = 560, time = 108.742ms, resid = 9.50374e-07
mtx45: M = 71505, N = 71505
Using device 0: K100_AI
N=71505, VN=71505
verify precision : pass !   resid_ver =  0.00000095046978  
N=71505, VN=71505
verify precision : pass !   resid_ver =  0.00000095046978  
iters = 560, time = 109.615ms, resid = 9.50374e-07
mtx79: M = 80595, N = 80595
Using device 0: K100_AI
N=80595, VN=80595
verify precision : pass !   resid_ver =  0.00000098474838  
N=80595, VN=80595
verify precision : pass !   resid_ver =  0.00000098474838  
iters = 6382, time = 803.76ms, resid = 9.81508e-07
mtx79: M = 80595, N = 80595
Using device 0: K100_AI
N=80595, VN=80595
verify precision : pass !   resid_ver =  0.00000098474838  
N=80595, VN=80595
verify precision : pass !   resid_ver =  0.00000098474838  
iters = 6382, time = 797.196ms, resid = 9.81508e-07
mtx80: M = 180895, N = 180895
Using device 0: K100_AI
N=180895, VN=180895
verify precision : pass !   resid_ver =  0.00000098486064  
N=180895, VN=180895
verify precision : pass !   resid_ver =  0.00000098486064  
iters = 2580, time = 362.387ms, resid = 9.85461e-07
mtx80: M = 180895, N = 180895
Using device 0: K100_AI
N=180895, VN=180895
verify precision : pass !   resid_ver =  0.00000098486064  
N=180895, VN=180895
verify precision : pass !   resid_ver =  0.00000098486064  
iters = 2580, time = 368.148ms, resid = 9.85461e-07

@genshen genshen force-pushed the improve-kernel-sync-using-memset branch from fae9941 to cd7354e Compare November 8, 2025 14:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants