remove unnecessary parallelize usage in sepal#1120
remove unnecessary parallelize usage in sepal#1120selmanozleyen wants to merge 10 commits intoscverse:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1120 +/- ##
==========================================
- Coverage 66.36% 66.16% -0.21%
==========================================
Files 44 44
Lines 7132 7137 +5
Branches 1212 1209 -3
==========================================
- Hits 4733 4722 -11
- Misses 1923 1941 +18
+ Partials 476 474 -2
🚀 New features to boost your workflow:
|
| sat_shape = sat.shape[0] | ||
| n_threads = get_num_threads() | ||
|
|
||
| # Pre-allocate per-thread workspace to avoid allocator contention |
There was a problem hiding this comment.
I wonder if there isn’t a way to tell numba to do the pre-allocation instead of doing it manually.
There was a problem hiding this comment.
I am also wondering the same but I couldn't find a way to do it without calling get_num_threads()
Is that plan fully based on |
I would need extra time to give review on this but I am confident we can find a way that uses the cores without having multi-processes and serialization overhead. |
Matters for potential OOM's. It's doable but densifying up-front is better for speed and I don't think we will have inputs that will cause memory issues since sepal works only with spot based datasets. Is that correct @timtreis ? |
|
IIRC sepal works with any situation in which the obs are arranged in a regular grid or pattern. That'd include Visium with about 4k spots, yes, but also Visium HD which depending on which bin you choose could have more than a million obs. Not sure that'd actually be useful, but could be a potential input. |
…selmanozleyen/squidpy into feat/remove-parallelize-minimal
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
ok I added a sparse_batch_size which is 128. Given that 100m cells per gene is around 400mb I think this is a reasonable default. Here are the results |
hi,
from the rsc sepal implementation I know we can get rid of
parallelizeeasily. I also have a longer term plan to remove parallelize. I checked with this script: https://gist.github.com/selmanozleyen/c4b0e4780243fd4621f68bc2d78cae5cResults pretified with AI:
Benchmark results (502 genes, 2688 cells, visium_hne_adata, 3 runs, 8 cores)
Motivation
This PR replaces the internal
parallelize+joblib.Parallelmachinery insepal()with Numba's nativeprangethreading, removing the need formulti-process orchestration (subprocess spawning, pickling, IPC).
Key changes
prangeover genes -- The outer gene loop is parallelized withnumba.prangeinside a@njit(parallel=True)function, replacingjoblib.Parallel(n_jobs=..., backend="loky").Pre-allocated per-thread workspace buffers (this commit e011ddc) You can see why I did this in https://numba.readthedocs.io/en/stable/user/parallel.html#diagnostics under "allocation hoisting"