[HIPIFY][feature] Add header injection fallback for local headers hipification #2275

ranapratap55 · 2026-01-08T17:25:17Z

Motivation

https://github.com/NVIDIA/cuda-samples/blob/master/Samples/5_Domain_Specific/dxtc/dxtc.cu

The dxtc.cu cuda-samples includes multiple local headers (CudaMath.h, dds.h, permutations.h) that depend on helper_math.h for float3 operator definitions. Without header injection, these headers fail with standalone hipify-clang because they lack the required operator overloads.

Purpose

Enhances the existing --local-headers and --local-headers-recursive options with header injection fallback mechanism. When direct hipification of a local header fails due to missing type definitions or operator overloads, the header injection method automatically injects preceding system includes from the main source file.

Problem Statement

The initial local header hipification works well for self-contained headers that include all their dependencies. However, many cuda-samples have headers that depend on types and operators defined in preceding includes from the main source file. For example, CudaMath.h in cuda-samples/Samples/5_Domain_Specific/dxtc/ uses float3 operators (+=, *, -) that are defined in helper_math.h, which is included in the dxtc.cu. Few more below
bodysystemcpu_impl.h
MonteCarlo_reduction.cuh
fastWalshTransform_kernel.cuh
bicubicTexture_kernel.cuh

Solution

Implements a hybrid approach:
Existing: First attempts direct hipification, Works for self-contained headers.
Enhanced version: Header Injection, when direct method fails, it collects preceding system header files (#include <...>) from the main source and includes them.

Usage

No change to CLI flags
Non-recursive: hipify-clang --local-headers main.cu -I include/
Recursive: hipify-clang --local-headers-recursive main.cu -I include/

emankov · 2026-01-13T21:53:51Z

Cases like CudaMath.h are a different story, as those header files are not self-contained and are not intended to be compiled as separate compilation units. The compilation unit here designed to be compiled is dxtc.cu, which is being correctly compiled by clang.

FMPOV, the problem with missing header files is broader. The root cause is that nvcc automatically pre-includes some CUDA header files at the beginning of ANY compilation unit (not only header files, but also .cu files). They are at least (but probably are not all):

cuda_runtime.h (The top-level injection)
crt/host_config.h (System check)
builtin_types.h (Basic types)
device_types.h (enums and structs)
host_defines.h (Macro definitions)

And not all of them necessarily need to be included during our hipification process.

Such nvcc behaviour is a design choice by NVIDIA to make writing CUDA device and host code less verbose. It effectively treats CUDA as part of the language environment rather than just an external library, whereas clang reasonably decides that CUDA is a C++ language extension.

nvcc knows which header files are needed for implicit inclusion, and clang doesn't.

So the question is: how should we determine whether to include this or that CUDA header file from the most likely incomplete list above?

ranapratap55 · 2026-01-14T06:42:34Z

Cases like CudaMath.h are a different story, as those header files are not self-contained and are not intended to be compiled as separate compilation units. The compilation unit here designed to be compiled is dxtc.cu, which is being correctly compiled by clang.

FMPOV, the problem with missing header files is broader. The root cause is that nvcc automatically pre-includes some CUDA header files at the beginning of ANY compilation unit (not only header files, but also .cu files). They are at least (but probably are not all):

yes, you are correct. nvcc pre-includes the CUDA headers and headers like CudaMath.h is not designed to be a standalone compilation. But when users are trying to port using --local-header their project, many real world headers fail because they lack/depend on the main source file context.

I believe we should support users even when their headers have implicit header dependencies (not only properly designed headers).

how should we determine whether to include this or that CUDA header file from the most likely incomplete list above?

This approach avoids this problem because

It includes ONLY that are already in main source file
Injects only PRECEDING system include files up to the local header's inclusion point
Relies on what original developer determined necessary for their specific project

whatever includes the main .cu/cpp file had before including this header - those are what the header needs
for e.g.,

#include<header_1.h>
#include "sample.h"

here we include only header_1.h file for "sample.h". We don't need to know why it is needed just it comes before in main source file.

This feature is a FALLBACK mechanism, not the primary/direct path. Here

Direct hipification is tried first (for self-contained headers)
Header injection (this approach) only kicks in when direct fails.

I believe this approach works for any project (cuda-samples, etc.,) as we are only injecting original determined headers for that specific project.

emankov · 2026-01-14T18:49:33Z

This approach avoids this problem because

Nope. The proposed approach avoids the problems in CUDA samples, where additional header files such as helper_math.h are provided. And it's they who are giving the opportunity to compile the main CU files with other compilers besides nvcc.

The wider problem is that the necessary header file might not be provided at all. And this is the common practice in CUDA, as I tried to explain above.

I believe this approach works for any project (cuda-samples, etc.,) as we are only injecting original determined headers for that specific project.

Can't agree here; the approach won't work for projects that don't provide header files at all.

ranapratap55 · 2026-01-16T05:34:39Z

This approach avoids this problem because

Nope. The proposed approach avoids the problems in CUDA samples, where additional header files such as helper_math.h are provided. And it's they who are giving the opportunity to compile the main CU files with other compilers besides nvcc.

The wider problem is that the necessary header file might not be provided at all. And this is the common practice in CUDA, as I tried to explain above.

I believe this approach works for any project (cuda-samples, etc.,) as we are only injecting original determined headers for that specific project.

Can't agree here; the approach won't work for projects that don't provide header files at all.

You are right. I appreciate you clarifying the distinction.
My approach solves projects like cuda-samples where explicit system includes are provided in the main source file and well-structured projects that follow C++ practices.

This approach does not solve the projects that rely entirely on nvcc's implicit pre-inclusion with no explicit system includes.

I agree this is a limitation. I would like to take this feature as incremental improvement, with understanding that

it helps the projects with explicit includes (cuda-samples, etc.)
It doesn't help projects relying purely on nvcc's implicit behavior
Future work could address the "no includes" case separately

I'm open to take a different approach that you'd suggest.

emankov · 2026-01-21T19:38:24Z

On Windows:

`error C2589: '(': illegal token on right side of '::'`:

while compiling LocalHeader.cpp

A possible reason is a Conflict: min/max Macros vs. std::min/std::max

emankov · 2026-01-21T19:48:12Z

HeaderInjection.cpp:

Hybrid Approach fragility.
The code uses a hybrid approach: it creates temporary files on disk to inject headers, then runs the tool on them. This is inefficient (disk IO) and brittle compared to using Clang's VirtualFileSystem to overlay changes in memory.
The use of std::regex to find #include directives and include guards is highly error-prone. It ignores the complexity of the C++ preprocessor (comments, inactive #ifdef blocks, macros). This should be done using Clang's PPCallbacks.

emankov · 2026-01-21T19:53:25Z

StderrCapture.h:

Thread Safety: Capturing global stderr by manipulating file descriptors _dup2 is not thread-safe. If the tool is ever parallelized, this will break.

src/HeaderInjection.cpp

emankov · 2026-01-21T19:30:42Z

src/StderrCapture.h

@@ -0,0 +1,163 @@
+/*


Please explain the reason(s) for having this Std Err Capture.

The StderrCapture class provides a clean, professional user experience in the hybrid hipification approach. The local header hipification uses below strategy:

Phase 1: Direct hipification (works for self-contained headers) Phase 2: Injection fallback (if direct fails due to missing dependencies)

Without StderrCapture, when Phase 1 fails which is expected for non-self-contained headers, users might see confusing Clang compilation errors. These errors appear even though the injection fallback succeeds moments later. This creates confusion for users to think something is broken when the tool is working correctly.

When both approaches fail, we are showing users meaningful diagnostics/errors. This gives a well structured error(s) showing what went wrong in each phase, rather than an interleaved mess of error messages.

emankov · 2026-01-23T14:38:35Z

I'm open to take a different approach that you'd suggest.

I want us to think a bit broader here and come to a solution, partially based on your PR.

What if we implicitly include the "short list" of already-known "hidden" CUDA headers during the hipification of each source CUDA file, as the first step?

cuda_runtime.h (The top-level injection)
crt/host_config.h (System check)
builtin_types.h (Basic types)
device_types.h (enums and structs)
host_defines.h (Macro definitions)

Will we get some issues in that scenario, performance penalties, etc.?

ranapratap55 · 2026-01-26T15:54:14Z

On Windows:
`error C2589: '(': illegal token on right side of '::'`:
while compiling LocalHeader.cpp

A possible reason is a Conflict: min/max Macros vs. std::min/std::max

Fixed in the latest commit.

ranapratap55 · 2026-01-26T15:58:29Z

HeaderInjection.cpp:

Hybrid Approach fragility.
The code uses a hybrid approach: it creates temporary files on disk to inject headers, then runs the tool on them. This is inefficient (disk IO) and brittle compared to using Clang's VirtualFileSystem to overlay changes in memory.

The use of std::regex to find #include directives and include guards is highly error-prone. It ignores the complexity of the C++ preprocessor (comments, inactive #ifdef blocks, macros). This should be done using Clang's PPCallbacks.

Thank you for detailed review. The current temp-file approach chosen for simplicity in initial implementation. The injection fallback is only triggered when direct hipification fails. That said, VFS and Clang's PPCallbacks approach would be cleaner and robust.

Would you prefer I refractor the code in this PR or accept this in a follow-up PR improvement?

ranapratap55 · 2026-01-26T16:02:24Z

StderrCapture.h:

Thread Safety: Capturing global stderr by manipulating file descriptors _dup2 is not thread-safe. If the tool is ever parallelized, this will break.

Valid point. The current implementation is not thread safe. If parallelization is planned, I have added a static mutex to serialize the stderr captures. This mutex approach is thread safe by ensuring only one thread can capture stderr at a time.

ranapratap55 · 2026-01-26T16:34:42Z

I'm open to take a different approach that you'd suggest.

I want us to think a bit broader here and come to a solution, partially based on your PR.

What if we implicitly include the "short list" of already-known "hidden" CUDA headers during the hipification of each source CUDA file, as the first step?

cuda_runtime.h (The top-level injection)

crt/host_config.h (System check)

builtin_types.h (Basic types)

device_types.h (enums and structs)

host_defines.h (Macro definitions)

Will we get some issues in that scenario, performance penalties, etc.?

This inclusion of already-known CUDA headers is interesting approach. It would address the nvcc's implicit pre-inclusion problem. I see some potential concerns

Each hipification would parse these additional headers which can lead to performance overhead
Some projects might have their own definitions that conflict with these headers
Version compatability - some headers may vary across CUDA versions

One approach I think of is we can make it configurable with CLI flag option for users to enable/disable the short list injection.

Would you prefer having CLI flag approach for short list of already knows headers?

emankov · 2026-01-26T18:29:25Z

Each hipification would parse these additional headers which can lead to performance overhead

Nope, as those CUDA header files (or all the included header files, like, for instance, defined in the builtin_types.h) are correctly guarded with the corresponding

#if !defined(...) #define ... #endif

Some projects might have their own definitions that conflict with these headers

Nope, for the same reason.

Version compatability - some headers may vary across CUDA versions

This we can overcome, as hipify-clang now detects the CUDA version at runtime.

Would you prefer having CLI flag approach for short list of already knows headers?

I'd prefer to do it implicitly (w/o flag), as there is no actual point for the end user to see compilation errors due to the headers implicitly mentioned by CUDA's nvcc.

emankov · 2026-01-26T18:33:05Z

... The current temp-file approach chosen for simplicity in initial implementation. The injection fallback is only triggered when direct hipification fails. That said, VFS and Clang's PPCallbacks approach would be cleaner and robust.

Would you prefer I refractor the code in this PR or accept this in a follow-up PR improvement?

We need to use VFS by default everywhere in the hipify-clang. That is on the to-do list - will file an internal ticket for this. Thank you for pointing it out! I think it should be done as a separate enhancement PR.

emankov · 2026-01-26T18:36:43Z

Valid point. The current implementation is not thread safe. If parallelization is planned, I have added a static mutex to serialize the stderr captures. This mutex approach is thread safe by ensuring only one thread can capture stderr at a time.

If we reach a common decision, perhaps all of this StderrCapture stuff won't be necessary.

ranapratap55 · 2026-01-28T08:52:05Z

Each hipification would parse these additional headers which can lead to performance overhead

Nope, as those CUDA header files (or all the included header files, like, for instance, defined in the builtin_types.h) are correctly guarded with the corresponding
#if !defined(...) #define ... #endif
Some projects might have their own definitions that conflict with these headers

Nope, for the same reason.

Version compatability - some headers may vary across CUDA versions

This we can overcome, as hipify-clang now detects the CUDA version at runtime.

Would you prefer having CLI flag approach for short list of already knows headers?

I'd prefer to do it implicitly (w/o flag), as there is no actual point for the end user to see compilation errors due to the headers implicitly mentioned by CUDA's nvcc.

Agreed. If we implement implicit CUDA header inclusion at the Clang preprocessor level which reduces the need for fallback phases. This potentially removes the need for StderrCapture class.

[HIPIFY] Add header injection fallback for local headers hipification

4c9c030

ranapratap55 self-assigned this Jan 8, 2026

ranapratap55 requested review from emankov and searlmc1 as code owners January 8, 2026 17:25

ranapratap55 added fix It fixes bug feature Feature request or implementation labels Jan 8, 2026

ranapratap55 changed the title ~~[HIPIFY] Add header injection fallback for local headers hipification~~ [HIPIFY][feature] Add header injection fallback for local headers hipification Jan 8, 2026

emankov requested review from SyamaAmd and kzhuravl January 8, 2026 18:24

emankov requested changes Jan 21, 2026

View reviewed changes

Fixed Windows min/max and namespace prefix

0d8ae93

Add mutex to StderrCapture for thread-safe capture

db1541b

ranapratap55 mentioned this pull request Jan 30, 2026

[HIPIFY][#2275][feature] Implement nvcc behavior with implicit CUDA headers #2340

Open

emankov marked this pull request as draft February 7, 2026 20:43

emankov mentioned this pull request Feb 10, 2026

[HIPIFY][#2340][feature] Add implicit CUDA header inclusion to mimic nvcc behavior #2349

Open

[HIPIFY][feature] Add header injection fallback for local headers hipification #2275

Are you sure you want to change the base?

[HIPIFY][feature] Add header injection fallback for local headers hipification #2275

Uh oh!

Conversation

ranapratap55 commented Jan 8, 2026

Motivation

Purpose

Problem Statement

Solution

Usage

Uh oh!

emankov commented Jan 13, 2026

Uh oh!

ranapratap55 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emankov commented Jan 14, 2026

Uh oh!

ranapratap55 commented Jan 16, 2026

Uh oh!

emankov commented Jan 21, 2026

Uh oh!

emankov commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emankov commented Jan 21, 2026

Uh oh!

Uh oh!

emankov Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ranapratap55 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

emankov commented Jan 23, 2026

Uh oh!

ranapratap55 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranapratap55 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranapratap55 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranapratap55 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emankov commented Jan 26, 2026

Uh oh!

emankov commented Jan 26, 2026

Uh oh!

emankov commented Jan 26, 2026

Uh oh!

ranapratap55 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ranapratap55 commented Jan 14, 2026 •

edited

Loading

emankov commented Jan 21, 2026 •

edited

Loading

ranapratap55 commented Jan 26, 2026 •

edited

Loading

ranapratap55 commented Jan 26, 2026 •

edited

Loading

ranapratap55 commented Jan 26, 2026 •

edited

Loading

ranapratap55 commented Jan 26, 2026 •

edited

Loading

ranapratap55 commented Jan 28, 2026 •

edited

Loading