Skip to content

Conversation

@joelnn
Copy link
Contributor

@joelnn joelnn commented Dec 28, 2025

On Ampere and later GPUs (SM 8.0+), cuDNN's default math mode permits TF32 Tensor Core operations which use reduced mantissa precision. This causes numerical differences when comparing CUDA vs CPU convolution results, particularly in cudnnConvolutionBackwardFilter().

Explicitly set CUDNN_FMA_MATH to force true FP32 computation for consistent numerical results across all GPU architectures.

On Ampere and later GPUs (SM 8.0+), cuDNN's default math mode permits
TF32 Tensor Core operations which use reduced mantissa precision. This
causes numerical differences when comparing CUDA vs CPU convolution
results, particularly in cudnnConvolutionBackwardFilter().

Explicitly set CUDNN_FMA_MATH to force true FP32 computation for
consistent numerical results across all GPU architectures.
@davisking
Copy link
Owner

Sweet, thanks for another PR :D

@davisking davisking merged commit 07c1e73 into davisking:master Dec 28, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants