Skip to content

Conversation

@MikeUU332
Copy link

See also MIC-DKFZ/nnUNet#2910, this is the same issue and fix.
Hello,

We encountered the "One or more background workers are no longer alive. Exiting" error as described in:

#134
#133

when running nnUNet on an HPC cluster. A colleague and I looked into this issue and came across this link:
joblib/threadpoolctl#176

which says that using with threadpool_limits is not thread safe and to use a global mutex instead. with threadpool_limits is used in both nnUNet and batchgenerators libraries. By using a global mutex instead of with threadpool limits as done in this pull request, we no longer encounter this issue. We verified that nnUNet output is similar (they can't be identical with the non-determinative multi-threaded launcher).

We encountered this error with both nnUNet and the batchgenerators library so a similar pull request was made in nnUNet.

Added global mutex instead of threadpool_limits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant