-
Notifications
You must be signed in to change notification settings - Fork 65
[CI] Add GitHub workflow for building and releasing fat wheels #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
057b34a
Split CUDA extensions by SM architecture for fat-binary wheel builds β¦
tongke6 7616737
fix ruff lint errors
tongke6 b148c52
revert version requirements changes
tongke6 2c56304
Make cudac proxy thread-safe and raise on missing extensions
tongke6 c955d47
Surface per-extension import errors in cudac proxy
zheyang0825 6ccb8fa
Fix build-release matrix with DRY expression mapping
Copilot f700810
Apply suggestions from code review
tongke6 e0a1e21
Potential fix for pull request finding
tongke6 ead4f7e
Add README example for building fat wheels
Copilot 75f1710
Surface partial-extension failures in cudac proxy
tongke6 ceab48b
Build release wheels against manylinux_2_28
tongke6 430531b
fix python 3.12 GLIBC compat problems on ubi8
tongke6 a27443c
install gcc13
tongke6 44ab401
Load CUDA extension matching current GPU architecture
tongke6 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| name: Build & Release Wheels | ||
|
|
||
| on: | ||
| push: | ||
| tags: | ||
| - "v*" | ||
| workflow_dispatch: | ||
|
|
||
| concurrency: | ||
| group: build-release-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| build-wheel: | ||
| name: "wheel / ${{ matrix.cuda }} / cp312 / ${{ matrix.arch }}" | ||
| runs-on: ${{ matrix.arch == 'aarch64' && 'ubuntu-24.04-arm' || 'ubuntu-latest' }} | ||
| defaults: | ||
| run: | ||
| shell: bash | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| cuda: | ||
| - cu129 | ||
| - cu130 | ||
| arch: | ||
| - x86_64 | ||
| - aarch64 | ||
| container: | ||
| # UBI 8 provides the glibc 2.28 baseline required by manylinux_2_28. | ||
| image: "nvidia/cuda:${{ matrix.cuda == 'cu129' && '12.9.0' || '13.0.0' }}-devel-ubi8" | ||
|
|
||
| steps: | ||
| - name: Free disk space | ||
| run: | | ||
| rm -rf /opt/hostedtoolcache /usr/local/lib/android /usr/share/dotnet \ | ||
| /usr/local/share/boost /opt/ghc 2>/dev/null || true | ||
| dnf clean all 2>/dev/null || true | ||
| df -h / || true | ||
|
|
||
| - name: Install system dependencies | ||
| run: | | ||
| dnf install -y \ | ||
| git \ | ||
| gcc-toolset-13-gcc \ | ||
| gcc-toolset-13-gcc-c++ \ | ||
| python3.12 \ | ||
| python3.12-devel \ | ||
| python3.12-pip | ||
| dnf clean all | ||
|
|
||
| - name: Checkout | ||
| uses: actions/checkout@v5 | ||
| with: | ||
| fetch-depth: 0 | ||
| submodules: recursive | ||
|
|
||
| - name: Configure git safe directory | ||
| run: git config --global --add safe.directory "$GITHUB_WORKSPACE" | ||
|
|
||
| - name: Install Python dependencies | ||
| run: | | ||
| python3.12 -m pip install --no-cache-dir --upgrade pip | ||
| python3.12 -m pip install --no-cache-dir torch --index-url ${{ matrix.cuda == 'cu129' && 'https://download.pytorch.org/whl/cu129' || 'https://download.pytorch.org/whl/cu130' }} | ||
| python3.12 -m pip install --no-cache-dir setuptools wheel "setuptools_scm>=6.0" build ninja auditwheel patchelf | ||
|
|
||
| - name: Compute version | ||
| id: version | ||
| run: | | ||
| if [[ "$GITHUB_REF" == refs/tags/v* ]]; then | ||
| BASE="${GITHUB_REF#refs/tags/v}" | ||
| else | ||
| # Strip any local segment (+gXXX) so we get a clean base | ||
| BASE=$(python3.12 -c "from setuptools_scm import get_version; print(get_version().split('+')[0])") | ||
| fi | ||
| echo "version=${BASE}+${{ matrix.cuda }}" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Build fat-binary wheel | ||
| env: | ||
| CC: /opt/rh/gcc-toolset-13/root/usr/bin/gcc | ||
| CXX: /opt/rh/gcc-toolset-13/root/usr/bin/g++ | ||
| CUDAHOSTCXX: /opt/rh/gcc-toolset-13/root/usr/bin/g++ | ||
| CULA_BUILD_ALL_ARCHS: "1" | ||
| SETUPTOOLS_SCM_PRETEND_VERSION: "${{ steps.version.outputs.version }}" | ||
| NVCC_THREADS: "4" | ||
| MAX_JOBS: "4" | ||
| run: | | ||
| "$CC" --version | ||
| "$CXX" --version | ||
| python3.12 -m build --wheel --no-isolation --outdir dist-raw | ||
|
|
||
| - name: Repair wheel for manylinux_2_28 | ||
| run: | | ||
| # These libraries are supplied by the NVIDIA driver, PyTorch, or | ||
| # PyTorch's CUDA runtime dependency and must remain external. | ||
| python3.12 -m auditwheel repair \ | ||
| --plat manylinux_2_28_${{ matrix.arch }} \ | ||
| --exclude libcuda.so.1 \ | ||
| --exclude 'libcudart.so.*' \ | ||
| --exclude 'libc10*.so' \ | ||
| --exclude 'libtorch*.so' \ | ||
| --wheel-dir dist \ | ||
| dist-raw/*.whl | ||
|
|
||
| - name: Verify wheel | ||
| run: | | ||
| echo "Built wheel:" | ||
| ls -lh dist/*.whl | ||
| ls dist/*.whl | grep -q "+${{ matrix.cuda }}" \ | ||
| || { echo "ERROR: wheel name missing +${{ matrix.cuda }} suffix"; exit 1; } | ||
| ls dist/*.whl | grep -q "manylinux_2_28_${{ matrix.arch }}" \ | ||
| || { echo "ERROR: wheel is not tagged manylinux_2_28_${{ matrix.arch }}"; exit 1; } | ||
| python3.12 -m auditwheel show dist/*.whl | ||
|
|
||
| - name: Upload wheel artifact | ||
| uses: actions/upload-artifact@v6 | ||
| with: | ||
| name: wheel-${{ matrix.cuda }}-${{ matrix.arch }} | ||
| path: dist/*.whl | ||
|
|
||
| release: | ||
| name: Create GitHub Release | ||
| needs: [build-wheel] | ||
| runs-on: ubuntu-latest | ||
| if: startsWith(github.ref, 'refs/tags/v') | ||
| permissions: | ||
| contents: write | ||
| steps: | ||
| - name: Download all artifacts | ||
| uses: actions/download-artifact@v6 | ||
| with: | ||
| path: artifacts/ | ||
|
|
||
| - name: Create release | ||
| uses: softprops/action-gh-release@v3 | ||
| with: | ||
| files: | | ||
| artifacts/wheel-*/*.whl | ||
| generate_release_notes: true | ||
| draft: true | ||
| prerelease: ${{ contains(github.ref, 'rc') || contains(github.ref, 'beta') || contains(github.ref, 'alpha') }} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| # Copyright 2025-2026 Ant Group Co., Ltd. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """Unified interface to per-architecture CUDA extensions. | ||
|
|
||
| Downstream code can continue to use ``import cula.cudac as cula_cuda`` | ||
| and call ``cula_cuda.kda_fwd_prefill(...)`` or | ||
| ``cula_cuda.chunk_kda_fwd_intra_cuda(...)`` without knowing which | ||
| extension provides the function. | ||
|
|
||
| Loading is **once per process**: the first attribute access checks the | ||
| currently active CUDA device, imports the matching ``cula._cudac_sm*`` | ||
| extension, and caches the discovered callables on the module instance. | ||
| Changing the active CUDA device to a different architecture after a | ||
| process has already loaded ``cula.cudac`` will therefore not be picked | ||
| up -- callers that need a different extension must restart Python. | ||
| """ | ||
|
|
||
| import importlib | ||
| import sys | ||
| import threading | ||
| from types import ModuleType | ||
|
|
||
|
|
||
| def _current_device_extension() -> tuple[str, str]: | ||
| try: | ||
| import torch | ||
| except ImportError as exc: | ||
| raise ImportError("cuLA CUDA extensions require PyTorch to detect the current GPU.") from exc | ||
|
|
||
| if not torch.cuda.is_available(): | ||
| raise RuntimeError("cuLA CUDA extensions require a visible CUDA GPU, but torch.cuda.is_available() is False.") | ||
|
|
||
| device = torch.cuda.current_device() | ||
| prop = torch.cuda.get_device_properties(device) | ||
| sm_label = f"sm_{prop.major}{prop.minor}" | ||
| if prop.major == 10 and prop.minor in (0, 3): | ||
| return "cula._cudac_sm100", sm_label | ||
| if prop.major == 9 and prop.minor == 0: | ||
| return "cula._cudac_sm90", sm_label | ||
| raise RuntimeError(f"Unsupported CUDA compute capability {sm_label}. Supported architectures: sm_100, sm_103, sm_90.") | ||
|
|
||
|
|
||
| class _CudacProxy(ModuleType): | ||
| """Lazy proxy that exposes functions from the current GPU arch extension.""" | ||
|
|
||
| def __init__(self): | ||
| super().__init__(__name__) | ||
| self.__path__ = [] | ||
| self._modules_loaded = False | ||
| self._funcs: dict[str, object] = {} | ||
| self._lock = threading.Lock() | ||
|
|
||
| def _load(self): | ||
| if self._modules_loaded: | ||
| return | ||
| with self._lock: | ||
| if self._modules_loaded: | ||
| return | ||
| ext_name, sm_label = _current_device_extension() | ||
| try: | ||
| mod = importlib.import_module(ext_name) | ||
| for attr in dir(mod): | ||
| if not attr.startswith("_"): | ||
| self._funcs[attr] = getattr(mod, attr) | ||
| except (ImportError, AttributeError, OSError) as exc: | ||
| raise ImportError( | ||
| f"The cuLA CUDA extension for the current GPU ({sm_label}) could not be imported. " | ||
| f"Extension {ext_name} failed with: {exc}. " | ||
| "Please make sure cuLA is compiled correctly." | ||
| ) from exc | ||
| self.__dict__.update(self._funcs) | ||
| self._modules_loaded = True | ||
|
|
||
| def __getattr__(self, name: str): | ||
| if name.startswith("_"): | ||
| raise AttributeError(name) | ||
| self._load() | ||
| try: | ||
| return self._funcs[name] | ||
| except KeyError: | ||
| raise AttributeError(f"module 'cula.cudac' has no attribute '{name}'") from None | ||
|
|
||
| def __dir__(self): | ||
| self._load() | ||
| return list(self._funcs.keys()) | ||
|
|
||
|
|
||
| _proxy = _CudacProxy() | ||
| _proxy.__dict__.update({k: globals().get(k) for k in ("__spec__", "__file__", "__package__", "__loader__")}) | ||
| sys.modules[__name__] = _proxy | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.