Skip to content

Conversation

@LegNeato
Copy link
Contributor

First, we weren't handling all the types.
After fixing that, it exposed a libnvvm crash.
Also saw a type issue in one of the warp APIs used by vecadd so fixed that.

Fixes #320

First, we weren't handling all the types.
After fixing that, it exposed a `libnvvm` crash.
Also saw a type issue in one of the warp APIs used by
vecadd so fixed that.

Fixes Rust-GPU#320
@LegNeato LegNeato requested a review from nnethercote November 29, 2025 03:27
Copy link
Collaborator

@nnethercote nnethercote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem, one question. It would be nice to have some kind of test added to avoid regressing, too, though I'm not sure what that would look like.

extern "C" {
#[link_name = "llvm.nvvm.match.any.sync.i64"]
fn __nvvm_warp_match_any_64(mask: u32, value: u64) -> u32;
fn __nvvm_warp_match_any_64(mask: u32, value: u64) -> u64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html has this:

declare i32 @llvm.nvvm.match.any.sync.i32(i32 %membermask, i32 %value)
declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value)
declare {i32, i1} @llvm.nvvm.match.all.sync.i32(i32 %membermask, i32 %value)
declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value)

Not sure about the signed/unsigned mismatches, but the return value is definitely 32-bits.

Aside: The match_all_{32,64} functions below don't have link_name attributes the way the match_any_{32,64} functions do. Not sure if this is valid. I suspect these functions aren't tested at all!

Anyway, I think this change should be reverted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hm. Ok, will revert.

TypeKind::Vector | TypeKind::ScalableVector => {
// Recurse on element type for vector floats
self.float_width(self.element_type(ty))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of Half/BFloat/Vector/ScalableVector needed to fix the issue? I see that rustc_codegen_llvm only has Half. Seems wise to only add code that's necessary for the fix (and thus has some level of testing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need at least BFloat as well but I'll double check.

@LegNeato
Copy link
Contributor Author

LegNeato commented Dec 1, 2025

What do you think about changing the default here to be keyed off of #[cfg(debug_assertions)]? then we can just test the examples in debug and release modes...though I guess it won't test the manual override / host and device split case so we wouldn't be any better off test matrix-wise.

@nnethercote
Copy link
Collaborator

What do you think about changing the default here to be keyed off of #[cfg(debug_assertions)]?

sounds ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Examples do not compile when release is disabled for gpu crate(s)

2 participants