-
Notifications
You must be signed in to change notification settings - Fork 216
llvm 19 support #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
llvm 19 support #227
Conversation
|
i'm on the fence about renaming i think it might not actually be needed and i might put it back |
|
@LegNeato my measuring stick here is does could you tell me if I'm close or if I'm actually missing something huge like a mountain of work I'm not seeing? |
pain |
|
I don't know, llvm is an area of the project I have not touched. |
LegNeato
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should stay the same, right? The newer support should be optional / only enabled when
| "https://github.com/rust-gpu/rustc_codegen_nvvm-llvm/releases/download/LLVM-7.1.0/"; | ||
|
|
||
| static REQUIRED_MAJOR_LLVM_VERSION: u8 = 7; | ||
| static REQUIRED_MAJOR_LLVM_VERSION: u8 = 19; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirement doesn't bump unless targeting a higher arch? So the logic is:
- 7 if targeting arch supported by 7 and 19
- 19 if targeting arch not supported by 7
|
We should probably break this down, seeing as neither of us understand this space. First, we should probably add various values for arch and stuff to enums on the rust side. Some of these might need to be gated if on 7 vs 19. Then, we should get switching between 7 and a stubbed out / non working 19 via target arch. Then, we should systematically fix each issue and refactor common code on the way. |
ok... kind of a strange remark... i'm just going to work on my local branch and make kernels compile with 19.1 then i was going to work backwards and "make it upstream worthy"... |
|
i rented some powerful big aws spot VM in the cloud and built it's helping find issues more than a typical llvm release build fyi, just a little tip i'd share i have the "shaved yaks" opinionated way to get that instance if you want, cost me about $3 total |
3cac1eb to
f50708b
Compare
|
@LegNeato i'm on the fence about having two rustc_codegen_nvvm_v{{version}} crates 85% copy and pasted but.... i got this to work. vecadd is working at least, going to see if ed25519_vanity_rs compiles later proof: |
|
@LegNeato i've read multiple conflicting things from multiple "official nvidia sources/documentation" that the new cuda 12.9 toolkit is based on/adds support for either llvm 18, llvm 19, or llvm 20. i can see in their i also question if we need this. this might sound dumb but... what if we used a simple official rust target like riscv64gc-unknown-none-elf, use official rust compiler (no custom nightly, no custom codegen llvm integration) to spit out llvm ir, and then patch it to work with cuda... i totally agree with you/the project's view on "use nvvm to compile llvm ir to ptx" https://github.com/brandonros/vanity-miner-rs/pull/8/files i haven't had a chance to test it yet (working on it) but in terms of thinking outside the box, i was for sure able to get nvvm to accept llvm ir and make what seems to be a "valid ptx" |
|
i got this working for blackwell a different way. this branch/pr/LLVM v19 integration might work just fine but it's kind of a lot to maintain if there's an "easier" (albeit hackier) way to solve this https://github.com/brandonros/vanity-miner-rs/actions/runs/15809309968/job/44558442212 Build Pipeline
let me know if you actually want this/to put time into it, otherwise blackwell+ might be able to avoid cuda_builder or i'd need to make a cuda_builder that makes this super opinionated rust -> cubin pipeline i made |
|
I'm re-opening because I think we want to go this route. Totally understand if you go a different route for your project! |
|
I plan to poke at it this week. Apologies for the previous response saying we don't know the space, I don't really know the LLVM side of the house and I misspoke. |
Nope! Here to help, let's land this! Let's land that and then I'll rebase? |
|
I don't have time to jam on this with you until later in the week, but I think this is a great start! Ideally both are compiled in statically or as dylibs and runtime chooses based on arch selected. But distribution with dylibs is annoying, and compiling 2 llvm versions in the same process will be annoying. So I think the first step is what this is doing, manually switching, but we should be aware where we would like it to go. |
|
Im willing to put some time in here if I can get some pointers. |
|
@tyler274 I'm also spending time on this and have followed the same approach as @brandonros. Writing extensive conditional compilation is truly a pain. Perhaps I think we should focus our efforts on this branch? Maybe we could explore how to improve this together? |
|
@tyler274 #229 (comment) Here are some previous hints from @LegNeato. |
|
@LegNeato Additionally, based on my previous open-source contributions and discussions, I've learned that both the Graphite and Turso are exploring GPU acceleration. I believe this represents a significant opportunity for us. I'm highly motivated to drive this initiative forward and position our project as the leading GPU-accelerated solution within the Rust ecosystem. |
|
@Firestar99 is working on Graphite's support via rust-gpu! |
@LegNeato That's awesome! The good news is, after a month of learning and hands-on practice, I've basically figured out the workflow of LLVM backend generation. The bad news is debugging conditional compilation remains quite painful. What do you think – would it be better to write conditional compilation, or maintain two separate branches? |
|
I think conditional is better, as I think upgrading drops support for a bunch of devices? Or is that not the case? |
|
what would it take to land this? |
|
@LegNeato Based on my research, here are the facts: Taking PassManager as an example, this component orchestrates LLVM optimizations and analyses during backend code generation. Typically, multiple passes need to collaborate (e.g., constant propagation followed by dead code elimination). The PassManager schedules passes in a predefined order or based on dependencies to ensure logical execution sequence. Prior to LLVM 9/10, implementations required inheriting from specific PassManager virtual base classes. However, starting from LLVM 14, everything has been unified into a CRTP (Curiously Recurring Template Pattern) code structure. Additionally, header files may have been relocated or modified across different LLVM versions. Therefore, if we choose conditional compilation, I would need to create CI workflows for every single version from LLVM 7 to LLVM 19 to ensure compatibility. |
I think this is a misunderstanding. NVIDIA cards either run LLVM 7 or LLVM 19, nothing in between. Please correct me if I am wrong. |
@brandonros I'm referring specifically to modifications in the LLVM backend generation logic within wrapper files like rustc_llvm_wrapper. This is not targeting NVVM specifically. |
|
@brandonros @LegNeato https://gist.github.com/ax3l/9489132 This source perfectly highlights that while NVIDIA officially certifies specific LLVM versions per CUDA release (as shown in the version matrix), the reality is more nuanced:
|
|
@brandonros I think we could start by refining the branches for LLVM 7 and LLVM 19 based on your existing work, then progressively extend support to other components like NVVM (CUDA) and additional LLVM versions. |
|
@LegNeato I strongly agree that prioritizing support for the newer Rust toolchain, CUDA, and LLVM versions is critical. Our project's future hinges on optimizing for cutting-edge hardware like the H100, A100, and even the GH200 (which I currently have access to). This strategic focus will enable major enterprise customers to integrate our solution into their infrastructure—the key to maximizing our long-term growth and impact. |
I will have rebased this massive thing twice now and it continues to go stale at almost 2+ months old. Are we serious about upstreaming this? Otherwise I'm hesitant to keep doing this same song and dance of "get it ready for merge, put it on the shelf". |
|
@LegNeato Based on the current modifications, what are the primary remaining challenges? Let's explore what additional efforts and adjustments we can make. |
|
@LegNeato After reviewing his code changes, I can roughly understand that it seems you don't want to create code for different LLVM versions in multiple folders. Instead, you may prefer to achieve a balance between NVVM and LLVM through conditional compilation. I sincerely request that you take the time to delve into this part. |
I understood... the opposite? There was hesitancy in the beginning and then I thought we settled on we would do it this way? |
This analysis examines the effort required to add CUDA 13.0 support to rust-cuda with dynamic LLVM version detection and switching. Key findings: - CUDA 13 introduces NVVM IR 2.0 (breaking change from 1.x) - Requires dual LLVM support: 7.0.1 (legacy) + 20.1.0 (Blackwell+) - Estimated effort: 6-8 weeks for experienced developer - Recommended approach: runtime backend selection by architecture - Maintains backward compatibility with CUDA 11.2+ Analysis includes: - Current state assessment of codebase - CUDA 13.0 breaking changes documentation - Detailed 5-phase implementation plan - Risk assessment and mitigation strategies - Comparison to PR Rust-GPU#227 (LLVM 19 effort) - Test strategy and validation metrics Related to: Rust-GPU#299, Rust-GPU#227
rustc_codegen_nvvm: V7 vs V19 File-by-File ComparisonThis document describes each file in Summary
Root Files
|
| Aspect | V7 | V19 |
|---|---|---|
| LLVM Version | 7.1.0 | 19.1.7 |
| Env var for config | LLVM_CONFIG |
LLVM_CONFIG_19 |
| Env var for prebuilt | USE_PREBUILT_LLVM |
USE_PREBUILT_LLVM_19 |
| llvm-as command | llvm-as-7 |
llvm-as-19 |
| llvm-config fallback | llvm-config |
llvm-config-19 |
Cargo.toml — DIFFERENT
Only differs in crate name (rustc_codegen_nvvm_v7 vs rustc_codegen_nvvm_v19). Dependencies are identical.
CHANGELOG.md — (not compared)
Documentation file.
libintrinsics.bc / libintrinsics.ll — (not compared)
Precompiled LLVM bitcode for intrinsics. Likely version-specific.
rustc_llvm_wrapper/ (C++ FFI Layer)
V7 Files
| File | Description |
|---|---|
rustllvm.h |
Header with LLVM version compatibility macros (134 lines) |
RustWrapper.cpp |
Main FFI wrapper (62KB, 1927 lines) |
PassWrapper.cpp |
Pass manager wrapper using legacy PassManager (43KB, 1509 lines) |
V19 Files
| File | Description |
|---|---|
LLVMWrapper.h |
Minimal header, hard-coded for LLVM 19 (48 lines) |
SuppressLLVMWarnings.h |
Warning suppression header (NEW) |
RustWrapper.cpp |
Main FFI wrapper (95KB, 2616 lines) — significantly larger |
PassWrapper.cpp |
Pass manager wrapper using new PassBuilder (64KB, 1805 lines) |
README.md |
Documentation (NEW) |
Key C++ Differences
- Pass Manager: V7 uses legacy
PassManagerBuilder, V19 uses newPassBuilder - Type System: V19 adds opaque pointer support
- Size: V19 RustWrapper.cpp is ~50% larger due to new APIs
src/ Files
IDENTICAL FILES (can be shared)
| File | Lines | Description |
|---|---|---|
common.rs |
~30 | AsCCharPtr trait for C string conversion |
ptxgen.rs |
~50 | PTX generation utilities |
debug_info/ IDENTICAL FILES
| File | Lines | Description |
|---|---|---|
create_scope_map.rs |
~100 | Debug scope mapping |
dwarf_const.rs |
~50 | DWARF constant definitions |
namespace.rs |
~80 | Namespace handling for debug info |
util.rs |
~60 | Debug info utilities |
metadata/type_map.rs |
~100 | Type mapping for metadata |
src/ DIFFERENT FILES
Core Infrastructure
lib.rs — DIFFERENT (114 diff lines)
Main crate entry point, implements CodegenBackend trait.
| Difference | V7 | V19 |
|---|---|---|
| Modules | includes ptx_filter |
no ptx_filter |
| Feature flags | different set | adds hash_raw_entry |
global_backend_features |
custom impl parsing CUDA arch | empty impl |
| Trait signatures | older rustc_codegen_ssa | newer signatures |
llvm.rs — DIFFERENT (337 diff lines) ⚠️ MAJOR
FFI bindings to LLVM C API. Completely different between versions.
| Difference | V7 | V19 |
|---|---|---|
| Attribute handling | simple LLVMRustAddFunctionAttribute |
type-aware for StructRet |
TypeKind enum |
18 variants | 21 variants (+X86_FP80, FP128, PPC_FP128, X86_AMX, TargetExt) |
AsmDialect |
{Other, Att, Intel} |
{Att, Intel} (removed Other) |
CodeGenOptLevel |
{Other, None, Less, Default, Aggressive} |
removed Other |
| Pointer types | LLVMPointerType |
LLVMPointerTypeInContext (opaque ptrs) |
| Build calls | LLVMRustBuildCall(B, Fn, Args, N, Bundle) |
LLVMRustBuildCall(B, Ty, Fn, Args, N, Bundles, NBundles) |
| New types | — | PassBuilderOptions, FloatABIType, LLVMRustDISPFlags |
context.rs — DIFFERENT (~80 diff lines)
Codegen context and command-line argument parsing.
| Difference | V7 | V19 |
|---|---|---|
CodegenArgs |
includes DisassembleMode, disassemble options |
removed disassembly options |
parse() signature |
parse(args, sess) |
parse(args) |
| Methods | add_used_global(), add_compiler_used_global() |
removed |
| New method | — | codegen_unit() |
builder.rs — DIFFERENT (221 diff lines)
LLVM IR builder implementation.
| Difference | V7 | V19 |
|---|---|---|
| Load instruction | LLVMBuildLoad(builder, ptr, name) |
LLVMBuildLoad2(builder, ty, ptr, name) |
AtomicOrdering import |
rustc_middle::ty |
rustc_codegen_ssa::common |
atomic_load impl |
different stub | returns LLVMGetUndef(ty) |
| Let-chain syntax | uses if let ... && |
nested if let blocks |
Code Generation
abi.rs — DIFFERENT (79 diff lines)
ABI and calling convention handling.
| Difference | V7 | V19 |
|---|---|---|
| Conv type | CanonAbi::Rust |
Conv::Rust |
| Float types | f32, f64 | adds f16, f128 |
| Pointer creation | LLVMPointerType(ty, ...) |
LLVMPointerTypeInContext(cx.llcx, ...) |
| New method | — | arg_memory_ty() |
intrinsic.rs — DIFFERENT (124 diff lines)
Intrinsic function codegen.
| Difference | V7 | V19 |
|---|---|---|
codegen_intrinsic_call signature |
(instance, args, result: PlaceRef, span) |
(instance, fn_abi, args, llresult: &Value, span) |
volatile_load impl |
simple type cast | extracts pointee type properly |
| Format strings | f-string format!("{name}") |
printf format!("{}", name) |
back.rs — DIFFERENT (316 diff lines) ⚠️ MAJOR
Backend code generation and target machine creation.
| Difference | V7 | V19 |
|---|---|---|
LLVMRustCreateTargetMachine |
~10 params | ~20+ params |
| ABI parameter | none | abi_cstr |
| Float ABI | bool use_softfp |
FloatABIType enum |
| String passing | raw ptr + len | CString |
target.rs — DIFFERENT (~20 diff lines)
Target specification.
| Difference | V7 | V19 |
|---|---|---|
DATA_LAYOUT |
no i128 | adds i128:128:128 |
| Default CPU | sm_30 |
sm_120 |
ty.rs — DIFFERENT (~60 diff lines)
Type handling.
| Difference | V7 | V19 |
|---|---|---|
| Pointer creation | LLVMPointerType |
LLVMPointerTypeInContext |
consts.rs — DIFFERENT (~40 diff lines)
Constant value handling. Minor API adjustments.
const_ty.rs — DIFFERENT (~15 diff lines)
Constant type utilities. Minor differences.
attributes.rs — DIFFERENT (~30 diff lines)
LLVM attribute handling. Minor API adjustments.
Linking & LTO
link.rs — DIFFERENT (~100 diff lines)
Linking and output generation.
| Difference | V7 | V19 |
|---|---|---|
| PTX filtering | uses PtxFilter |
removed |
| Metadata | metadata param in link_crate |
uses codegen_results.metadata |
file_for_writing |
4 params | 3 params |
lto.rs — DIFFERENT (~50 diff lines)
Link-time optimization. Minor API adjustments.
Other Files
nvvm.rs — DIFFERENT (~150 diff lines)
NVVM (NVIDIA's LLVM fork) interface.
| Difference | V7 | V19 |
|---|---|---|
| IR version check | major <= 1 && minor < 6 |
ir_major != 2 || ir_minor != 0 |
| Pass manager | LLVMCreatePassManager + LLVMAddGlobalDCEPass |
LLVMRunPasses with string |
LLVMRustParseBitcodeForLTO |
5 params | 4 params (removed one) |
| Import style | use crate::llvm::* |
explicit imports |
init.rs — DIFFERENT (~10 diff lines)
LLVM initialization.
| Difference | V7 | V19 |
|---|---|---|
LLVMInitializePasses() |
called | removed (TODO comment) |
allocator.rs — DIFFERENT (~50 diff lines)
Allocator shim generation.
| Difference | V7 | V19 |
|---|---|---|
| Pointer type | LLVMPointerType(i8, 0) |
LLVMPointerTypeInContext(llcx, 0) |
LLVMRustBuildCall |
5 params | 7 params (added ty, bundles) |
mono_item.rs — DIFFERENT (~20 diff lines)
Monomorphization item handling.
| Difference | V7 | V19 |
|---|---|---|
predefine_static |
&mut self |
&self |
predefine_fn |
&mut self |
&self |
override_fns.rs — DIFFERENT (~40 diff lines)
Function override logic for libm.
| Difference | V7 | V19 |
|---|---|---|
define_or_override_fn |
cx: &mut CodegenCx |
cx: &CodegenCx |
MonoItem::define |
4 params with MonoItemData |
2 params |
| Closure check | none | skips closures |
asm.rs — DIFFERENT (~15 diff lines)
Inline assembly handling. Minor differences.
ctx_intrinsics.rs — DIFFERENT (~10 diff lines)
Context intrinsics. Minor import differences.
int_replace.rs — DIFFERENT (~5 diff lines)
Integer type replacement. Very minor differences (likely just formatting).
V7-Only Files
ptx_filter.rs — V7 ONLY (~14,000 lines!)
PTX disassembly and filtering functionality. Completely removed in V19.
Features:
- PTX parsing and filtering
- Function/global disassembly
- Entry point filtering
- Pretty printing
debug_info/ DIFFERENT FILES
mod.rs — DIFFERENT (~30 diff lines)
Debug info module root. Minor API adjustments.
metadata.rs — DIFFERENT (~50 diff lines)
Debug metadata handling. LLVM metadata API differences.
metadata/enums.rs — DIFFERENT (~20 diff lines)
Enum debug info. Minor differences.
Conclusion
Why These Files Differ
-
LLVM API Changes (7→19)
- Opaque pointers (
LLVMPointerType→LLVMPointerTypeInContext) - Typed instructions (
LLVMBuildLoad→LLVMBuildLoad2) - New pass manager (
PassManagerBuilder→PassBuilder) - Extended type system (f16, f128, new TypeKind variants)
- Opaque pointers (
-
rustc_codegen_ssa Evolution
- Trait signatures changed (
&mut self→&self) - Method signatures changed (extra parameters)
- Import path changes
- Trait signatures changed (
-
Feature Changes
- V7 has PTX disassembly (
ptx_filter.rs) - V19 removed disassembly features
- V19 targets newer CUDA architectures (sm_120 vs sm_30)
- V7 has PTX disassembly (
Files That Could Theoretically Be Shared
Only these 7 files are truly identical:
common.rsptxgen.rsdebug_info/create_scope_map.rsdebug_info/dwarf_const.rsdebug_info/namespace.rsdebug_info/util.rsdebug_info/metadata/type_map.rs
All other files have embedded LLVM version-specific calls or rustc API differences.
potentially addresses all of:
Update rustc_llvm_wrapper to optionally support LLVM v19 #226 (Updates rustc_llvm_wrapper)
GitHub Codespaces/VSCode Devcontainer support #224 (Adds Codespaces, optional, we could remove)
sha2 crate = runtime error #207 (if LLVM v19 will compile and not have same issues as LLVM v7)
CUDA 12.8.1 and LLVM 18.1.8 #197 (we can put this behind a feature flag to optionally support Blackwell+)