TDA-ris capable of 30000 nbf (3000 atoms) by John-zzh · Pull Request #615 · pyscf/gpu4pyscf

John-zzh · 2025-12-28T08:30:08Z

major updates compared to 1.4.1

3c2e engine updated with 3c2e_bdiv
on the fly Coulomb iajb instead of storing tensor
store Davidson trial vectors in CPU mem
many other optimizations

As of date, some version of MKL-numpy is not compatible with cupy 13.4.1 and cutensor 2.2.0.0, contract engine will complaine about STATUS,
while openblas-numpy-scipy has segmentation fault when solving large FC=SCE.

To use TDA-ris on large system, use

mamba install -c conda-forge \
  cupy=13.4.1 \
  cutensor=2.2.0 \
  numpy scipy \
  blas=*=*mkl

(cupy_mkl_py311) root@volcengine:~/jojo/software/gpu4pyscf_ris/gpu4pyscf/tdscf# ls /root/jojo/miniconda3/envs/cupy_mkl_py311/lib | grep blas
libblas.so
libblas.so.3
libcblas.so
libcblas.so.3
libcublasLt.so.12
libcublasLt.so.12.9.1.4
libcublas.so.12
libcublas.so.12.9.1.4
libnvblas.so.12
libnvblas.so.12.9.1.4

 python -c "import numpy as np; np.show_config()"
... 
  "Build Dependencies": {
    "blas": {
      "name": "blas",
      "found": true,
      "version": "3.9.0",
      "detection method": "pkgconfig",
      "include directory": "/root/jojo/miniconda3/envs/cupy_mkl_py311/include",
      "lib directory": "/root/jojo/miniconda3/envs/cupy_mkl_py311/lib",
      "openblas configuration": "unknown",
      "pc file directory": "/root/jojo/miniconda3/envs/cupy_mkl_py311/lib/pkgconfig"
    },
    "lapack": {
      "name": "lapack",
      "found": true,
      "version": "3.9.0",
      "detection method": "pkgconfig",
      "include directory": "/root/jojo/miniconda3/envs/cupy_mkl_py311/include",
      "lib directory": "/root/jojo/miniconda3/envs/cupy_mkl_py311/lib",
      "openblas configuration": "unknown",
      "pc file directory": "/root/jojo/miniconda3/envs/cupy_mkl_py311/lib/pkgconfig"
    }
...

jeanwsr · 2025-12-28T09:36:24Z

Wraning, this will not install a mkl-numpy-scipy, the backend is an unknown verision of BLAS,

What if you replace ls by ls -l ? If you see libblas.so is a link to libmkl_rt.so, then numpy is really using mkl.

John-zzh · 2025-12-28T10:04:34Z

Wraning, this will not install a mkl-numpy-scipy, the backend is an unknown verision of BLAS,

What if you replace ls by ls -l ? If you see libblas.so is a link to libmkl_rt.so, then numpy is really using mkl.

you are right, seems it is really using MKL.

(cupy_mkl_py311) root@volcengine:~/jojo/software/gpu4pyscf_ris/gpu4pyscf/tdscf# ls -l /root/jojo/miniconda3/envs/cupy_mkl_py311/lib | grep mkl
lrwxrwxrwx  1 root root        14 Dec 25 16:49 libblas.so -> libmkl_rt.so.2
lrwxrwxrwx  1 root root        14 Dec 25 16:49 libblas.so.3 -> libmkl_rt.so.2
lrwxrwxrwx  1 root root        14 Dec 25 16:49 libcblas.so -> libmkl_rt.so.2
lrwxrwxrwx  1 root root        14 Dec 25 16:49 libcblas.so.3 -> libmkl_rt.so.2
lrwxrwxrwx  1 root root        14 Dec 25 16:49 liblapacke.so -> libmkl_rt.so.2
lrwxrwxrwx  1 root root        14 Dec 25 16:49 liblapacke.so.3 -> libmkl_rt.so.2
lrwxrwxrwx  1 root root        14 Dec 25 16:49 liblapack.so -> libmkl_rt.so.2
lrwxrwxrwx  1 root root        14 Dec 25 16:49 liblapack.so.3 -> libmkl_rt.so.2
-rwxrwxr-x  3 root root  49788392 Oct 11 02:32 libmkl_avx2.so.2
-rwxrwxr-x  3 root root  73555016 Oct 11 02:32 libmkl_avx512.so.2
lrwxrwxrwx  1 root root        32 Dec 25 16:49 libmkl_blacs_intelmpi_ilp64.so -> libmkl_blacs_intelmpi_ilp64.so.2
-rwxrwxr-x  3 root root    581872 Oct 11 02:32 libmkl_blacs_intelmpi_ilp64.so.2
lrwxrwxrwx  1 root root        31 Dec 25 16:49 libmkl_blacs_intelmpi_lp64.so -> libmkl_blacs_intelmpi_lp64.so.2
-rwxrwxr-x  3 root root    305744 Oct 11 02:32 libmkl_blacs_intelmpi_lp64.so.2
lrwxrwxrwx  1 root root        31 Dec 25 16:49 libmkl_blacs_openmpi_ilp64.so -> libmkl_blacs_openmpi_ilp64.so.2
-rwxrwxr-x  3 root root    586312 Oct 11 02:32 libmkl_blacs_openmpi_ilp64.so.2
lrwxrwxrwx  1 root root        30 Dec 25 16:49 libmkl_blacs_openmpi_lp64.so -> libmkl_blacs_openmpi_lp64.so.2
-rwxrwxr-x  3 root root    301992 Oct 11 02:32 libmkl_blacs_openmpi_lp64.so.2
lrwxrwxrwx  1 root root        21 Dec 25 16:49 libmkl_cdft_core.so -> libmkl_cdft_core.so.2
-rwxrwxr-x  3 root root    195056 Oct 11 02:32 libmkl_cdft_core.so.2
lrwxrwxrwx  1 root root        16 Dec 25 16:49 libmkl_core.so -> libmkl_core.so.2
-rwxrwxr-x  3 root root  72157040 Oct 11 02:32 libmkl_core.so.2
-rwxrwxr-x  3 root root  41036168 Oct 11 02:32 libmkl_def.so.2
lrwxrwxrwx  1 root root        20 Dec 25 16:49 libmkl_gf_ilp64.so -> libmkl_gf_ilp64.so.2
-rwxrwxr-x  3 root root  15185440 Oct 11 02:32 libmkl_gf_ilp64.so.2
lrwxrwxrwx  1 root root        19 Dec 25 16:49 libmkl_gf_lp64.so -> libmkl_gf_lp64.so.2
-rwxrwxr-x  3 root root  17181208 Oct 11 02:32 libmkl_gf_lp64.so.2
lrwxrwxrwx  1 root root        22 Dec 25 16:49 libmkl_gnu_thread.so -> libmkl_gnu_thread.so.2
-rwxrwxr-x  3 root root  28829056 Oct 11 02:32 libmkl_gnu_thread.so.2
lrwxrwxrwx  1 root root        23 Dec 25 16:49 libmkl_intel_ilp64.so -> libmkl_intel_ilp64.so.2
-rwxrwxr-x  3 root root  18290680 Oct 11 02:32 libmkl_intel_ilp64.so.2
lrwxrwxrwx  1 root root        22 Dec 25 16:49 libmkl_intel_lp64.so -> libmkl_intel_lp64.so.2
-rwxrwxr-x  3 root root  20294504 Oct 11 02:32 libmkl_intel_lp64.so.2
lrwxrwxrwx  1 root root        24 Dec 25 16:49 libmkl_intel_thread.so -> libmkl_intel_thread.so.2
-rwxrwxr-x  3 root root  42925416 Oct 11 02:32 libmkl_intel_thread.so.2
-rwxrwxr-x  3 root root  49999992 Oct 11 02:32 libmkl_mc3.so.2
lrwxrwxrwx  1 root root        14 Dec 25 16:49 libmkl_rt.so -> libmkl_rt.so.2
-rwxrwxr-x  3 root root  18291704 Oct 11 02:32 libmkl_rt.so.2
lrwxrwxrwx  1 root root        27 Dec 25 16:49 libmkl_scalapack_ilp64.so -> libmkl_scalapack_ilp64.so.2
-rwxrwxr-x  3 root root   7675808 Oct 11 02:32 libmkl_scalapack_ilp64.so.2
lrwxrwxrwx  1 root root        26 Dec 25 16:49 libmkl_scalapack_lp64.so -> libmkl_scalapack_lp64.so.2
-rwxrwxr-x  3 root root   7695368 Oct 11 02:32 libmkl_scalapack_lp64.so.2
lrwxrwxrwx  1 root root        22 Dec 25 16:49 libmkl_sequential.so -> libmkl_sequential.so.2
-rwxrwxr-x  3 root root  25645064 Oct 11 02:32 libmkl_sequential.so.2
lrwxrwxrwx  1 root root        22 Dec 25 16:49 libmkl_tbb_thread.so -> libmkl_tbb_thread.so.2
-rwxrwxr-x  3 root root  36164656 Oct 11 02:32 libmkl_tbb_thread.so.2
-rwxrwxr-x  3 root root  15804448 Oct 11 02:32 libmkl_vml_avx2.so.2
-rwxrwxr-x  3 root root  16413888 Oct 11 02:32 libmkl_vml_avx512.so.2
-rwxrwxr-x  3 root root   8124792 Oct 11 02:32 libmkl_vml_cmpt.so.2
-rwxrwxr-x  3 root root   9461344 Oct 11 02:32 libmkl_vml_def.so.2
-rwxrwxr-x  3 root root  13247632 Oct 11 02:32 libmkl_vml_mc3.so.2

(cupy_mkl_py311) root@volcengine:~/jojo/software/gpu4pyscf_ris/gpu4pyscf/tdscf# python - << 'EOF'
import ctypes

mkl = ctypes.CDLL('libmkl_rt.so')
buf = ctypes.create_string_buffer(200)
mkl.MKL_Get_Version_String(buf, 200)
print(buf.value.decode())
EOF
Intel(R) oneAPI Math Kernel Library Version 2025.3-Product Build 20251007 for Intel(R) 64 architecture applications

In short,
use conda-forge

mamba install -c conda-forge cupy=13.4.1  cutensor=2.2.0 numpy scipy blas=*=*mkl

and do not use default channel

mamba install -c default  cupy cutensor numpy scipy blas=*=*mkl   # contract engine will complain about CUTENSOR STATUS
mamba install -c default  cupy=13.4.1 cutensor=2.2.0 numpy scipy blas=*=*mkl  # no such version combination

Maybe conda default channel have verison mismatch with contract engine

John-zzh · 2025-12-29T03:08:42Z

wired, test_ris.py and test_krylov.py works fine from my side

jeanwsr · 2026-01-10T08:52:40Z

gpu4pyscf/tdscf/ris.py

            verbose (optional): Verbosity level of the logger. If None, it will use the verbosity of `mf`.
+            nto_state (None or int, optional): Which state to calculate natural transition orbitals, 
+                                        require install MOKIT  https://jeanwsr.gitlab.io/mokit-doc-mdbook/


please use the gitlab url when suggesting installing MOKIT.

jeanwsr · 2026-01-10T08:55:56Z

gpu4pyscf/tdscf/ris.py

+def get_nto(self,state_id):
+
+    ''' need to install MOKIT to dump .fch format orbital file
+    https://jeanwsr.gitlab.io/mokit-doc-mdbook/


jeanwsr · 2026-01-10T08:58:37Z

gpu4pyscf/tdscf/ris.py

+    fchk(nto_mf, fchfilename)
+    from mokit.lib.rwwfn import del_dm_in_fch
+    del_dm_in_fch(fchname=fchfilename,itype=1)
+    return nto_mf


I think the get_nto function should only return NTO coeffs, etc. and dump fchk can be done in other functions, to keep the code more organized. Also, in this way it still work when mokit is missing.

meaning dump fchk in a independent py file, not in GPU4pyscf repo?

I mean, one function get_nto for calculating and returning NTOs only, like the api provided by tddft, and another save_nto for dumping fchk, which can also be placed in your ris class. Could also have an all-in-one function get_nto_and_save which wraps those two.

If you don't want to split them, making save_fch an option in your current function is also ok i guess, which at least allow people get the NTOs when they don't have mokit.

…capable of 30000 nao 50 states; Krylov in RAM

…act engine

…cart2sph; fix ABBA krylov residual bug

…l dump cube file

…norm

…fy a value; move RisBase to top of ris.py clear for reading

…s slower than cp.einsum, could be env/compiling issue

John-zzh · 2026-01-31T10:14:03Z

It seems install from defaults channel also works, with cupy==13.6.0 cutensor ==2.4.1.4

mamba create -n mkl -c defaults python=3.11
mamba activate mkl

mamba install -c defaults \
 cuda-version=12.8 \
 cupy=13 \
 cutensor=2 \
 numpy=2.2 \
 scipy \
 blas=*=*mkl

…refactor, TODO: verify ECD with ORCA

sunqm · 2026-02-03T01:30:32Z

gpu4pyscf/tdscf/tests/test_ris.py

+        H         -3.22959        2.35981       -0.24953
+        '''
+        mol = gto.M(atom=atom, basis='def2-svp',
+                    # output = '/dev/null',  # Suppress excessive log output


Add the output='/dev/null'. stdout.close() causes IO error

…er, it allows reuse many funcs in TDA; the old ABBA solver will be removed in the future

John-zzh force-pushed the clean2 branch from b5668fb to 41ec85a Compare January 10, 2026 07:38

jeanwsr reviewed Jan 10, 2026

View reviewed changes

John-zzh force-pushed the clean2 branch from e21f008 to 05e9cdd Compare January 17, 2026 18:07

John-zzh and others added 21 commits January 24, 2026 16:52

one big commit as a result of chaso history; ris with bdiv generator …

cc7a645

…capable of 30000 nao 50 states; Krylov in RAM

nto analysis

c72433a

save nto fch file with mokit

098949d

initial guess no need GS

dd81382

in place precondiiton is dangerous

66ae84f

debuging

bf17e23

3100 atoms; 15 states success; mkl-numpy is not compatible with contr…

f84af7f

…act engine

style fix

5f2db8a

fix GS bug during initial guess

5539b8a

fix loginfo ndarray bug

dbe085c

lowdin atomic charge; Tij Tab only save lower triangle to save mem

543b2a2

restart krylov

aec75d7

save nto in h5, can choose mokit

fce45e5

getTpq dot in GPU

627e442

new int3c

4deaee4

test_ris.py to LF; change to new eri3c_bdiv with built-in rsh and no …

4a07606

…cart2sph; fix ABBA krylov residual bug

style fix

2f3222d

rm trailing blanks; rm scaling factor of XY, print as is; get_nto wil…

63274f3

…l dump cube file

iajb bdiv have an unknown bug

cacb67c

try not use mol.cart

ede83e6

keep on the fly J be float64 and dot with float32 V; record residual …

759614d

…norm

John-zzh force-pushed the clean2 branch from 4c66273 to 759614d Compare January 24, 2026 08:58

jojo added 10 commits January 24, 2026 17:08

flake style fix

6be31d5

mem print typo

bf921e4

restart davidson subspace according to available memory or user speci…

b32e14b

…fy a value; move RisBase to top of ris.py clear for reading

flake8 style fix

3d1265a

typo fix

590d04f

cov_tol =1e-5 default

71fd390

nto file name typo

ddaa6b4

better krylov subspace restart

d4075b5

conv tol typo

da0b407

float64 for dm_sparse

cc697cd

John-zzh closed this Jan 29, 2026

jojo added 5 commits January 31, 2026 00:15

fix save cube info typo; allow router contract engine, sometimes it i…

4b261c1

…s slower than cp.einsum, could be env/compiling issue

cp.einsum for now

c265cdb

mkl env works fine; dont touch it

604f995

cleanup ris init and build

8e9052e

cleanup init and build

77e39a5

John-zzh reopened this Jan 31, 2026

fix GS memory leak; allow get Tpq in host mem dot; remove magic ECD p…

b942ac1

…refactor, TODO: verify ECD with ORCA

sunqm reviewed Feb 3, 2026

View reviewed changes

jojo added 7 commits February 3, 2026 23:10

stable TDDFT subspace solver

05eea91

TDA subspace solve with float64

b54b15c

mem unit typo

3f840c9

fix test IO error; stable full TDDFT and mem mange

dbd5c73

use X+Y and X-Y as the free variable, simply the casida equation solv…

03f9adc

…er, it allows reuse many funcs in TDA; the old ABBA solver will be removed in the future

style check; more stable TDDFT-ris

e9f1db5

style check; more stable TDDFT-ris

e817b4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TDA-ris capable of 30000 nbf (3000 atoms) #615

TDA-ris capable of 30000 nbf (3000 atoms) #615
John-zzh wants to merge 44 commits intopyscf:masterfrom
John-zzh:clean2

John-zzh commented Dec 28, 2025 •

edited

Loading

Uh oh!

jeanwsr commented Dec 28, 2025

Uh oh!

John-zzh commented Dec 28, 2025 •

edited

Loading

Uh oh!

John-zzh commented Dec 29, 2025

Uh oh!

jeanwsr Jan 10, 2026

Uh oh!

jeanwsr Jan 10, 2026

Uh oh!

jeanwsr Jan 10, 2026 •

edited

Loading

Uh oh!

John-zzh Jan 11, 2026

Uh oh!

jeanwsr Jan 11, 2026 •

edited

Loading

Uh oh!

John-zzh commented Jan 31, 2026

Uh oh!

sunqm Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

John-zzh commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeanwsr commented Dec 28, 2025

Uh oh!

John-zzh commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

John-zzh commented Dec 29, 2025

Uh oh!

jeanwsr Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

jeanwsr Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

jeanwsr Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

John-zzh Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

jeanwsr Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

John-zzh commented Jan 31, 2026

Uh oh!

sunqm Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

John-zzh commented Dec 28, 2025 •

edited

Loading

John-zzh commented Dec 28, 2025 •

edited

Loading

jeanwsr Jan 10, 2026 •

edited

Loading

jeanwsr Jan 11, 2026 •

edited

Loading