add multiprocessing by LFT18 · Pull Request #92 · RasmussenLab/MOVE

LFT18 · 2024-04-19T13:02:26Z

No description provided.

- use pip install to make it available -minor adaptions to original scritp w.r.t. to imputs

ref: https://pytorch.org/docs/stable/notes/multiprocessing.html#avoid-cpu-oversubscription

Fix Multiprocessing

enryH · 2024-06-08T08:49:11Z

src/move/data/perturbations.py

    slice_ = slice(*splits[target_idx : target_idx + 2])

-    num_features = baseline_dataset.con_shapes[target_idx]
+    #num_features = baseline_dataset.con_shapes[target_idx]


Is this correct? Or does it still need to be changed?

I think it's correct now. I cannot check right now because I've been having problems connecting to Esrum all morning, but I'll check as soon as it works again (hopefully soon)

I still can't connect, it's very annoying :( . I'll let you know as soon as I can again, but I think the code should be fine

I was able to connect finally today at noon :). The file was correct, because those functions are not used at all for multiprocessing, I had just changed that to test some things with the previous functions. But it is true that it led to confusion, so I reverted the changes so that the not used functions have their original code

- see if results betw. single and multirun match.

- decide wheather to log2 transform - default: do not in order to allow negative features which are then standard normalized

enryH · 2024-07-09T11:48:15Z

src/move/tasks/identify_associations.py

+    # vs all continuous features in one 1D array
+    # Then, we sort them, and get the indexes in the flattened array. So, we get an
+    # list of sorted indexes in the flatenned array
    sort_ids = np.argsort(bayes_abs, axis=None)[::-1]  # 1D: N x C


So the flattening here means that the probabilities and FDR are calculated across perturbations (meaning that actually pertubations increase the number of total probabilities)?

I think so, yes.

enryH · 2024-07-09T12:31:53Z

To solve: Reloading trained models from single process with single process yields not exactly the same fdr array everytime. The multiprocess version in general higher fdr values -> we need to figure out why

- use everywhere.

- need to switch dataloader constructor for type of pertubation (cat or cont)

enryH · 2024-07-10T12:15:31Z

I finally found the issue. The multiprocessing did no yet have categorical vs continous pertubations implemented. I moved the masking into the main function and update the bayes_worker fct accordingly. I think it's ready to be checked now @ri-heme

ri-heme

Hi Henry. Left some comments here and there (some type hints were changed, some comments are still there, and the some questions/suggestions).

Thanks for your help!

.gitignore

src/move/tasks/encode_data.py

ri-heme · 2024-07-12T10:53:30Z

src/move/tasks/identify_associations.py

+        reconstruction_path = (
+            models_path / f"baseline_recon_{task_config.model.num_latent}_{j}.pt"
        )
+        if reconstruction_path.exists():


This is never True, right? Because saving the reconstructions is commented out.

Yes, but like this it is easier to compare bayes_approach with bayes_parallel

ri-heme · 2024-07-12T10:54:34Z

src/move/tasks/identify_associations.py

        diff = np.ma.masked_array(mean_diff[i, :, :], mask=mask)  # 2D: N x C
        prob = np.ma.compressed(np.mean(diff > 1e-8, axis=0))  # 1D: C
-        bayes_k[i, :] = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8)
+        computed_bayes_k = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8)


Why create this variable?

to show where a worker function could be called.

src/move/tasks/identify_associations.py

src/move/tasks/analyze_latent.py

src/move/tasks/identify_associations.py

src/move/conf/schema.py

enryH · 2024-07-12T13:00:11Z

@ri-heme just merge if you thinks it's fine now:)

qgh533 and others added 23 commits April 18, 2024 17:55

Add files for multiprocessing

0e2242e

Update identify_associations_multiprocess.py

2add9b9

Clean multiprocessing script

f645ea4

Update __main__.py multiprocessing

f471704

Update schema.py multiprocessing

85c28e5

Update __init__.py multiprocessing

6330e92

Update preprocessing.py

bbe1b4e

🔥 clean-up duplicated src/move files (pkg was in main folder)

820c554

✨ add identify_associations_multiprocess to src/move/tasks

ce9a9dc

- use pip install to make it available -minor adaptions to original scritp w.r.t. to imputs

🐛 make mutliprocessing not stale: assign # of threads for each process

5327223

ref: https://pytorch.org/docs/stable/notes/multiprocessing.html#avoid-cpu-oversubscription

Merge pull request #1 from enryH/main

eaa858a

Fix Multiprocessing

Updated identify_associations_multiprocess.py

5ab5e59

Update config files for small tries

33f565a

Multiprocessing for analyze_latent

63f128b

Analyze latent multiprocessing

ca389d2

Analyze latent multiprocessing

e08a94b

Fix bayes_k calculation

e94ef90

Fix analyze_latent_multiprocessing

f4f0aa3

Update and new functions

6a0b665

Delete files and fix multiloop

a5310a6

Clean identify_association_multiprocess.py

e67bb75

Clean analyze_latent multiprocessing.py

86bfed5

Update perturbations.py

f9d4961

enryH reviewed Jun 8, 2024

View reviewed changes

LFT18 added 6 commits June 10, 2024 13:10

Update perturbations.py

c2c49e8

Delete src/move/tasks/analyze_latent_efficient.py

0a4bcae

Delete src/move/tasks/analyze_latent_multiprocessing.py

ce20dac

Delete src/move/tasks/identify_associations_multiprocess_loop.py

4a72842

Delete src/move/tasks/identify_associations_multiprocess_may.py

f537d21

Delete src/move/tasks/identify_associations_selected.py

568aaa8

Henry added 4 commits July 9, 2024 10:18

⚡ do not run t-test check (for now)

8d65528

- see if results betw. single and multirun match.

⚡ bump up bayes factor training

1dd6788

🎨 train both refits with 100 epochs

dc9020e

✨ add log2 option

9cd2a7b

- decide wheather to log2 transform - default: do not in order to allow negative features which are then standard normalized

enryH reviewed Jul 9, 2024

View reviewed changes

Henry added 13 commits July 9, 2024 15:35

🎨 document some more

a4911d7

⚡ test multiprocess on continuous tutorial

e0421bd

🐛 remove non-exisitng key

c70d328

✨ build dataloader fct

1c72316

- use everywhere.

🐛 fix minor bug (wrongly assigned feat)

58f08e4

⚡ move masking code into main fct of module

f895237

🎨 move feat_mask creation out

5eb7954

🚑 temp. fix of CI

8c4e53b

- need to switch dataloader constructor for type of pertubation (cat or cont)

⚡ do not build dataloaders for multiprocessing

49a93d0

🚧 test t-test again, re-run pert. w/o model training

709c674

✨ add categorical pert. to multiprocessing

6e65cc6

🔥 remove unused code

4efbdd9

🎨 remove unused argument

dab767a

enryH marked this pull request as ready for review July 10, 2024 12:15

enryH requested review from mpielies and ri-heme July 12, 2024 09:21

ri-heme reviewed Jul 12, 2024

View reviewed changes

Henry added 4 commits July 12, 2024 14:01

⏪ checkout developer version

980bbce

🎨 move shared key to base class

c26b2dd

🔥 remove comments and code duplications

05c1735

🎨 update type hints, remove unused import

c5002cd

Merge branch 'developer' into main

fe8c48b

Conversation

LFT18 commented Apr 19, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enryH commented Jul 9, 2024

Uh oh!

enryH commented Jul 10, 2024

Uh oh!

ri-heme left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

enryH commented Jul 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants