A collection of high-performance custom kernels for Ascend NPUs, built on top of pto-isa — the Parallel Tile Operation virtual instruction set architecture designed by Ascend CANN.
PTO focuses on tile-level operations, enabling efficient, composable kernel development targeting Huawei's Ascend AI processors.
- A configured torch-npu environment
- Ascend toolkit installed at
/usr/local/Ascend/ascend-toolkit
Run the one-time setup before building:
make setup_onceThe repository is "pip installable", i.e.,
export CMAKE_GENERATOR="Unix Makefiles" && pip install -v git+https://github.com/huawei-csl/pto-kernels.gitsource /usr/local/Ascend/ascend-toolkit/set_env.sh
pip3 install -r requirements.txt
make build_wheelThis produces an installable Python wheel:
pto_kernels-X.Y.Z-*.whl
pip install --force-reinstall pto_kernels-*.whlmake testpto-kernels/
├── csrc/ # C++ kernel source files
├── python/pto_kernels/ # Python bindings and utilities
├── examples/jit_cpp/ # JIT compilation examples
├── tests/ # Test suite
├── scripts/ # Helper scripts
├── doxygen/ # API documentation config
└── CMakeLists.txt # CMake build configuration
Contributions are welcome! Please read CONTRIBUTING.md before opening a pull request.
BSD-3-Clause-Clear — see LICENSE for details.