Determinism implementation in nnunet#2871
Conversation
|
Note : I was finally able to test the determinism fix on CUDA. Although the reproducibility is greatly improved, the training is not deterministic. I've pinpointed the source of the non determinism to something happening in the backward path from |
|
Dear @Luugaaa , thank you for this PR. I appreciate the effort you put into this to make it deterministic. Fun fact, a long time ago there used to be a deterministic flag in nnU-Net which we removed later on. Wait what? Blasphemy! Indeed. The lack of determinism in nnU-Net is intentional. I do not believe determinism is needed. In fact, I think it is dangerous and detrimental for progress. Strange words, no? But there is a reason: Determinism gives you a false sense of security in producing and interpreting results. When you run an experiment once and get +0.5 Dice vs the baseline: is it a real improvement or not? With deteminism you can run an entire series of experiments and collect incremental improvements. And what happens when the seed changes, or determinism breaks? All your nice and tidy improvements collapse like a house of cards. Sure, you could have run multiple seeds to begin with, but who really does that when compute is always the constraint? So I would very much prefer to keep things nondeterministic in nnU-Net. It's not a bug - it's a feature :-) |
Hi MIC-DKFZ team,
This PR introduces updates to enable fully deterministic training in nnU-Net, which is crucial for reproducibility in research. The changes include adding a
deterministicflag, implementing aseed_everythingfunction, and ensuring the data augmentation pipeline is correctly seeded. I've been working on a similar fix forbatchgeneratorsand believe these changes will work together to make the entire training process reproducible.Problem
Achieving deterministic behavior in a multi-process environment can be tricky. Even with seeding, sources of randomness can persist, especially in the data augmentation pipeline. The original nnU-Net trainer used a non-deterministic data loader by default and didn't have a straightforward way to enforce reproducibility across all components, including PyTorch, NumPy, and the data loading workers. This could lead to slight variations in training results, even with the same initial seed.
Solution
To address this, I've implemented the following changes:
deterministicFlag: Adeterministicboolean flag has been added to thennUNetTrainer's__init__method. When set toTrue, it activates all the changes needed for reproducible training.seed_everythingFunction: A new helper function,seed_everything, is called when thedeterministicflag is active. This function sets the seeds forrandom,numpy, andtorch, and also configurescudnnfor deterministic behavior to eliminate sources of randomness from the GPU.get_dataloadersmethod now checks thedeterministicflag. IfTrue, it usesMultiThreadedAugmenterand passes a unique, generated seed to each worker process. This ensures that the data augmentation pipeline is fully deterministic and produces the same results in the same order for every run. IfFalse, it continues to use the defaultNonDetMultiThreadedAugmenter. Same inget_training_transformsto enable or not the benchmark.How It's Tested
To validate these changes, I've developed a determinism test pipeline that sets up a dummy dataset and runs the entire preprocessing and training pipeline for two epochs with the 2d and 3d configuration. The test works as follows:
nnUNetTraineris instantiated withdeterministic=True, and a short training session is run. The final checkpoint is saved.With these changes, the trainer now passes this test, confirming that the training process is fully reproducible when the
deterministicflag is enabled. This should be a significant help for researchers who need to ensure their results are perfectly reproducible. The changes are self-contained and don't affect the default behavior of the trainer.Notes :
I hope these changes are helpful. Thanks for maintaining this great project, and I look forward to your feedback!
Post Scriptum
Here is a partial output of the determinism test pipeline :