-
Notifications
You must be signed in to change notification settings - Fork 228
Description
Hello there,
I have been using nnU-Net for a while and I think it is a great software so thank you very much for your work.
I have been having trouble running trainings on clusters due to background workers dying with nnU-Net v2.5.2 and batchgenerators 0.25.1 and I fixed it but I would like to discuss my fix and see if it should be adopted or not.
I was running on AMD EPYC 7413 (Zen 3) CPU and a NVidia A100SXM4 GPU on a Linux Centos 7 machine. I would constantly get the "One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message" error from the NonDetMultiThreadedAugmenter class. This was caused by the two successive "with threadpool_limits(limits=1, user_api=None):" statements. I removed the "with threadpool_limits" from the _start() function and it solved my problem. So my question is, is it necessary to have a "with threadpool_limits" statement in the _start() class function since we already have it in the producer() function?
