Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions builds/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates

ENV ROCM_PATH "/opt/rocm"

# Pass repo_env flags so that rocm_configure repository rule can see
# them when generating the crosstool wrapper. Without these, the
# wrapper defaults to gcc and fails on Clang-only flags.
ENV TF_ROCM_CLANG="1"
ENV CLANG_COMPILER_PATH="/usr/lib/llvm-18/bin/clang"

FROM base-${VARIANT}

# Set the missing UTF-8 locale, otherwise Elixir warns
Expand Down
1 change: 1 addition & 0 deletions builds/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ case "$target" in
--build-arg BASE_IMAGE=$base_image \
--build-arg ROCM_VERSION=$rocm_ver \
--build-arg XLA_TARGET=rocm \
--network host \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

.
;;

Expand Down
11 changes: 11 additions & 0 deletions extension/patches/apply.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,14 @@ arch="$(uname -m)"
# See https://github.com/tensorflow/tensorflow/pull/86413 and the
# referenced threads.
git apply $dir/cuda_ncrtc_builtins.patch

# When building XLA with ROCm, the compiler resolves symlinks in the
# local_config_rocm repository and reports include paths at the real
# absolute location (e.g. /opt/rocm/llvm/lib/clang/22/include)
# rather than the symlinked bazel-cache path. Bazel's header
# validation rejects these as "absolute path inclusions" unless they
# are listed in cxx_builtin_include_directories. This patch adds
# the absolute resource directory paths alongside the relative ones.
if [[ -n "${XLA_TARGET:-}" && "${XLA_TARGET}" == "rocm" ]]; then
git apply $dir/rocm_absolute_includes.patch
fi
Comment on lines 21 to +34
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to apply cuda patches conditionally too?

Copy link
Copy Markdown
Author

@jvantuyl jvantuyl May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good idea. It was a bit out of scope for me, though.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're touching this file, how about we apply rocm patches if XLA_TARGET starts with rocm and cuda patches if XLA_TARGET starts with cuda?

This allows for the existing cuda targets and opens the way for version-specific rocm targets

29 changes: 29 additions & 0 deletions extension/patches/rocm_absolute_includes.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
diff --git a/third_party/gpus/rocm_configure.bzl b/third_party/gpus/rocm_configure.bzl
index 1510140..6ab54b4 100644
--- a/third_party/gpus/rocm_configure.bzl
+++ b/third_party/gpus/rocm_configure.bzl
@@ -163,6 +163,24 @@ def _rocm_include_path(repository_ctx, rocm_config, bash_bin):
inc_dirs.append(resource_dir + "/include")
inc_dirs.append(resource_dir + "/share")

+ # Also add the absolute paths, because the compiler may resolve
+ # symlinks and report includes at the real location rather than
+ # the symlinked bazel-cache path. Bazel's header validation rejects
+ # absolute path inclusions unless they are in cxx_builtin_include_directories.
+ #
+ # resource_dir_abs is the canonical path (e.g. /opt/rocm/lib/llvm/lib/clang/22).
+ # The ROCm installation typically has /opt/rocm/llvm -> ./lib/llvm, so the
+ # compiler may report includes via the symlink path instead. We must include
+ # both the real path and the symlink-based path.
+ inc_dirs.append(resource_dir_abs + "/include")
+ inc_dirs.append(resource_dir_abs + "/share")
+ rocm_path_env = get_host_environ(repository_ctx, "ROCM_PATH", "/opt/rocm")
+ if rocm_path_env:
+ clang_version = resource_dir_abs.split("/")[-1]
+ rocm_prefix = rocm_path_env.rstrip("/")
+ inc_dirs.append(rocm_prefix + "/llvm/lib/clang/" + clang_version + "/include")
+ inc_dirs.append(rocm_prefix + "/llvm/lib/clang/" + clang_version + "/share")
+
return inc_dirs

def _enable_rocm(repository_ctx):
8 changes: 6 additions & 2 deletions lib/xla.ex
Original file line number Diff line number Diff line change
Expand Up @@ -356,8 +356,12 @@ defmodule XLA do
"rocm" <> _ ->
[
"--config=rocm",
~s/--action_env=TF_ROCM_CLANG="1"/,
~s/--action_env=TF_HIPCC_CLANG="1"/,
# These must be repo_env (not action_env) so that the
# rocm_configure repository rule sees them when generating
# the crosstool wrapper. Otherwise the wrapper defaults to
# gcc and chokes on Clang-only flags like -Qunused-arguments.
~s/--repo_env=TF_ROCM_CLANG="1"/,
~s|--repo_env=CLANG_COMPILER_PATH="/usr/lib/llvm-18/bin/clang"|,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to hardcode the clang path like this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it either. I went with this because the clang path is already hard-coded (though less obviously) further down.

Specifically:

# We need to install "add-apt-repository" first
RUN apt-get update && \
  # Install basic system dependencies
  apt-get update && apt-get install -y ca-certificates curl git unzip wget xxd && \
  # Install Clang
  clang_version="18" && \
  apt-get install -y wget gnupg software-properties-common lsb-release && \
  wget -qO- https://apt.llvm.org/llvm.sh | bash -s -- $clang_version && \
  update-alternatives --install /usr/bin/clang clang /usr/bin/clang-$clang_version 100 && \
  update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-$clang_version 100 &&
 \
  apt-get clean -y && rm -rf /var/lib/apt/lists/*

It uses a variable, but the value is just hardcoded.

Any suggestions as to how to modify this to make this palatable?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this then! Maybe adding this reference as a comment to the file is a good idea.
My concern was exactly the path getting out of sync, but since we're hardcoding it in the installation too it's fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the path is specific to the Docker build, but technically someone could run the build outside, so ideally the code here would not be Docker specific.

Would this work?

~s|--repo_env=CLANG_COMPILER_PATH="#{System.find_executable("clang")}"|

and if not, then I would do:

~s|--repo_env=CLANG_COMPILER_PATH="#{System.fetch_env!("CLANG_COMPILER_PATH")"|

so that we require the caller to set the env (which the Docker already does).

wdyt?

# See https://github.com/jax-ml/jax/blob/098e953afb2b83daf85e6456c89e896f9cfd483d/.bazelrc#L239
# GPU targets: MI200 (gfx90a), MI300 (gfx942), RDNA2 (gfx1030), RDNA3 (gfx1100), RDNA4 (gfx120x)
# Note: gfx900/906/908 (Vega, MI50/60, MI100) removed - deprecated in ROCm 7.x
Expand Down