-
Notifications
You must be signed in to change notification settings - Fork 39
Fix ROCm build: add absolute include paths and use repo_env flags #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,3 +21,14 @@ arch="$(uname -m)" | |
| # See https://github.com/tensorflow/tensorflow/pull/86413 and the | ||
| # referenced threads. | ||
| git apply $dir/cuda_ncrtc_builtins.patch | ||
|
|
||
| # When building XLA with ROCm, the compiler resolves symlinks in the | ||
| # local_config_rocm repository and reports include paths at the real | ||
| # absolute location (e.g. /opt/rocm/llvm/lib/clang/22/include) | ||
| # rather than the symlinked bazel-cache path. Bazel's header | ||
| # validation rejects these as "absolute path inclusions" unless they | ||
| # are listed in cxx_builtin_include_directories. This patch adds | ||
| # the absolute resource directory paths alongside the relative ones. | ||
| if [[ -n "${XLA_TARGET:-}" && "${XLA_TARGET}" == "rocm" ]]; then | ||
| git apply $dir/rocm_absolute_includes.patch | ||
| fi | ||
|
Comment on lines
21
to
+34
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we want to apply cuda patches conditionally too?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that is a good idea. It was a bit out of scope for me, though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we're touching this file, how about we apply rocm patches if XLA_TARGET starts with rocm and cuda patches if XLA_TARGET starts with cuda? This allows for the existing cuda targets and opens the way for version-specific rocm targets |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| diff --git a/third_party/gpus/rocm_configure.bzl b/third_party/gpus/rocm_configure.bzl | ||
| index 1510140..6ab54b4 100644 | ||
| --- a/third_party/gpus/rocm_configure.bzl | ||
| +++ b/third_party/gpus/rocm_configure.bzl | ||
| @@ -163,6 +163,24 @@ def _rocm_include_path(repository_ctx, rocm_config, bash_bin): | ||
| inc_dirs.append(resource_dir + "/include") | ||
| inc_dirs.append(resource_dir + "/share") | ||
|
|
||
| + # Also add the absolute paths, because the compiler may resolve | ||
| + # symlinks and report includes at the real location rather than | ||
| + # the symlinked bazel-cache path. Bazel's header validation rejects | ||
| + # absolute path inclusions unless they are in cxx_builtin_include_directories. | ||
| + # | ||
| + # resource_dir_abs is the canonical path (e.g. /opt/rocm/lib/llvm/lib/clang/22). | ||
| + # The ROCm installation typically has /opt/rocm/llvm -> ./lib/llvm, so the | ||
| + # compiler may report includes via the symlink path instead. We must include | ||
| + # both the real path and the symlink-based path. | ||
| + inc_dirs.append(resource_dir_abs + "/include") | ||
| + inc_dirs.append(resource_dir_abs + "/share") | ||
| + rocm_path_env = get_host_environ(repository_ctx, "ROCM_PATH", "/opt/rocm") | ||
| + if rocm_path_env: | ||
| + clang_version = resource_dir_abs.split("/")[-1] | ||
| + rocm_prefix = rocm_path_env.rstrip("/") | ||
| + inc_dirs.append(rocm_prefix + "/llvm/lib/clang/" + clang_version + "/include") | ||
| + inc_dirs.append(rocm_prefix + "/llvm/lib/clang/" + clang_version + "/share") | ||
| + | ||
| return inc_dirs | ||
|
|
||
| def _enable_rocm(repository_ctx): |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -356,8 +356,12 @@ defmodule XLA do | |
| "rocm" <> _ -> | ||
| [ | ||
| "--config=rocm", | ||
| ~s/--action_env=TF_ROCM_CLANG="1"/, | ||
| ~s/--action_env=TF_HIPCC_CLANG="1"/, | ||
| # These must be repo_env (not action_env) so that the | ||
| # rocm_configure repository rule sees them when generating | ||
| # the crosstool wrapper. Otherwise the wrapper defaults to | ||
| # gcc and chokes on Clang-only flags like -Qunused-arguments. | ||
| ~s/--repo_env=TF_ROCM_CLANG="1"/, | ||
| ~s|--repo_env=CLANG_COMPILER_PATH="/usr/lib/llvm-18/bin/clang"|, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if we want to hardcode the clang path like this
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like it either. I went with this because the clang path is already hard-coded (though less obviously) further down. Specifically: # We need to install "add-apt-repository" first
RUN apt-get update && \
# Install basic system dependencies
apt-get update && apt-get install -y ca-certificates curl git unzip wget xxd && \
# Install Clang
clang_version="18" && \
apt-get install -y wget gnupg software-properties-common lsb-release && \
wget -qO- https://apt.llvm.org/llvm.sh | bash -s -- $clang_version && \
update-alternatives --install /usr/bin/clang clang /usr/bin/clang-$clang_version 100 && \
update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-$clang_version 100 &&
\
apt-get clean -y && rm -rf /var/lib/apt/lists/*It uses a variable, but the value is just hardcoded. Any suggestions as to how to modify this to make this palatable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm ok with this then! Maybe adding this reference as a comment to the file is a good idea.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue is that the path is specific to the Docker build, but technically someone could run the build outside, so ideally the code here would not be Docker specific. Would this work? ~s|--repo_env=CLANG_COMPILER_PATH="#{System.find_executable("clang")}"|and if not, then I would do: ~s|--repo_env=CLANG_COMPILER_PATH="#{System.fetch_env!("CLANG_COMPILER_PATH")"|so that we require the caller to set the env (which the Docker already does). wdyt? |
||
| # See https://github.com/jax-ml/jax/blob/098e953afb2b83daf85e6456c89e896f9cfd483d/.bazelrc#L239 | ||
| # GPU targets: MI200 (gfx90a), MI300 (gfx942), RDNA2 (gfx1030), RDNA3 (gfx1100), RDNA4 (gfx120x) | ||
| # Note: gfx900/906/908 (Vega, MI50/60, MI100) removed - deprecated in ROCm 7.x | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?