Conversation
- use pip install to make it available -minor adaptions to original scritp w.r.t. to imputs
Fix Multiprocessing
src/move/data/perturbations.py
Outdated
| slice_ = slice(*splits[target_idx : target_idx + 2]) | ||
|
|
||
| num_features = baseline_dataset.con_shapes[target_idx] | ||
| #num_features = baseline_dataset.con_shapes[target_idx] |
There was a problem hiding this comment.
Is this correct? Or does it still need to be changed?
There was a problem hiding this comment.
I think it's correct now. I cannot check right now because I've been having problems connecting to Esrum all morning, but I'll check as soon as it works again (hopefully soon)
There was a problem hiding this comment.
I still can't connect, it's very annoying :( . I'll let you know as soon as I can again, but I think the code should be fine
There was a problem hiding this comment.
I was able to connect finally today at noon :). The file was correct, because those functions are not used at all for multiprocessing, I had just changed that to test some things with the previous functions. But it is true that it led to confusion, so I reverted the changes so that the not used functions have their original code
- see if results betw. single and multirun match.
- decide wheather to log2 transform - default: do not in order to allow negative features which are then standard normalized
| # vs all continuous features in one 1D array | ||
| # Then, we sort them, and get the indexes in the flattened array. So, we get an | ||
| # list of sorted indexes in the flatenned array | ||
| sort_ids = np.argsort(bayes_abs, axis=None)[::-1] # 1D: N x C |
There was a problem hiding this comment.
So the flattening here means that the probabilities and FDR are calculated across perturbations (meaning that actually pertubations increase the number of total probabilities)?
|
To solve: Reloading trained models from single process with single process yields not exactly the same |
- use everywhere.
- need to switch dataloader constructor for type of pertubation (cat or cont)
|
I finally found the issue. The multiprocessing did no yet have categorical vs continous pertubations implemented. I moved the masking into the main function and update the bayes_worker fct accordingly. I think it's ready to be checked now @ri-heme |
ri-heme
left a comment
There was a problem hiding this comment.
Hi Henry. Left some comments here and there (some type hints were changed, some comments are still there, and the some questions/suggestions).
Thanks for your help!
| reconstruction_path = ( | ||
| models_path / f"baseline_recon_{task_config.model.num_latent}_{j}.pt" | ||
| ) | ||
| if reconstruction_path.exists(): |
There was a problem hiding this comment.
This is never True, right? Because saving the reconstructions is commented out.
There was a problem hiding this comment.
Yes, but like this it is easier to compare bayes_approach with bayes_parallel
| diff = np.ma.masked_array(mean_diff[i, :, :], mask=mask) # 2D: N x C | ||
| prob = np.ma.compressed(np.mean(diff > 1e-8, axis=0)) # 1D: C | ||
| bayes_k[i, :] = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8) | ||
| computed_bayes_k = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8) |
There was a problem hiding this comment.
Why create this variable?
There was a problem hiding this comment.
to show where a worker function could be called.
|
@ri-heme just merge if you thinks it's fine now:) |
No description provided.