SageAttention fork for Windows wheels and easy installation

This repo makes it easy to build SageAttention for multiple Python, PyTorch, and CUDA versions, then distribute the wheels to other people.

The latest wheels support GTX 16xx, RTX 20xx/30xx/40xx/50xx, V100, A100, H100, AGX Orin (sm70/75/80/86/87/89/90/120). There are also reports that it works with B200 (sm100) and DGX Spark (sm121), but I did not bundle these kernels in the wheels, and you need to build from source.

Installation

Know how to use pip to install packages in the correct Python environment, see https://github.com/woct0rdho/triton-windows
Install triton-windows
Install a wheel in the release page: https://github.com/woct0rdho/SageAttention/releases
- Unlike triton-windows, you need to manually choose a wheel in the GitHub release page for SageAttention
- Choose the wheel for your PyTorch version. For example, 'torch2.7.0' in the filename
  - The torch minor version (2.6/2.7 ...) must be correct, but the patch version (2.7.0/2.7.1 ...) can be different from yours
- No need to worry about the CUDA minor version (12.8/12.9 ...). It can be different from yours, because SageAttention does not yet use any breaking API
  - But there is a difference between CUDA 12 and 13
- No need to worry about the Python minor version (3.10/3.11 ...). The recent wheels use Python Stable ABI (also known as ABI3) and have cp39-abi3 in the filenames, so they support Python >= 3.9

If you see any error, please open an issue at https://github.com/woct0rdho/SageAttention/issues

Recently we've simplified the installation by a lot. There is no need to install Visual Studio or CUDA toolkit to use Triton and SageAttention (unless you want to step into the world of building from source)

Use notes

Before using SageAttention in larger projects like ComfyUI, please run test_sageattn.py to test if SageAttention itself works.

To use SageAttention in ComfyUI, you just need to add --use-sage-attention when starting ComfyUI.

Some recent models, such as Wan and Qwen-Image, may produce black or noise output when SageAttention is used, because some intermediate values overflow SageAttention's quantization. In this case, you may use the PatchSageAttentionKJ node in KJNodes, and choose sageattn_qk_int8_pv_fp16_cuda, which is the least likely to overflow.

Also, if you want to run Flux or Qwen-Image, try Nunchaku if you haven't. It's faster and more accurate than GGUF Q4_0 + SageAttention.

If you want to run Wan, try RadialAttention if you haven't. It's also faster than SageAttention.

Build from source

(This is for developers)

If you need to build and run SageAttention on your own machine:

Install Visual Studio (MSVC and Windows SDK), and CUDA toolkit
Clone this repo
- Checkout abi3_stable branch if you want ABI3 and libtorch stable ABI, which supports PyTorch >= 2.9
- Checkout abi3 branch if you want ABI3, which supports PyTorch >= 2.4
- The purpose of ABI3 and libtorch stable ABI is to avoid building many wheels. There is no functional difference from the main branch
Install the dependencies in pyproject.toml, include the correct torch version such as torch 2.7.1+cu128
Run python setup.py install --verbose to install directly, or python setup.py bdist_wheel --verbose to build a wheel. This avoids the environment checks of pip

Dev notes

The wheels are built using the workflow
- It's tricky to specify both torch (with index URL at download.pytorch.org ) and pybind11 (not in that index URL) in an isolated build environment. The easiest way I could think of is to use simpleindex
CUDA kernels for sm80/89/90 are bundled in the wheels, and also sm120 for CUDA >= 12.8
For RTX 20xx, SageAttention 2 runs Triton kernels, which are the same as SageAttention 1. If you want to help improve the CUDA kernels for RTX 20xx, you may see https://github.com/Ph0rk0z/SageAttention2/tree/updates
The wheels do not use CXX11 ABI
We cannot publish the wheels to PyPI, because PyPI does not support multiple PyTorch/CUDA variants for the same version of SageAttention. Some people are working on this, see https://astral.sh/blog/introducing-pyx and https://wheelnext.dev/proposals/pepxxx_wheel_variant_support/

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
.github/workflows		.github/workflows
assets		assets
bench		bench
csrc		csrc
example		example
sageattention		sageattention
sageattention3_blackwell		sageattention3_blackwell
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
simpleindex.toml		simpleindex.toml
update_pyproject.py		update_pyproject.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SageAttention fork for Windows wheels and easy installation

Installation

Use notes

Build from source

Dev notes

About

Uh oh!

Releases 6

Languages

License

woct0rdho/SageAttention

Folders and files

Latest commit

History

Repository files navigation

SageAttention fork for Windows wheels and easy installation

Installation

Use notes

Build from source

Dev notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Languages