Skip to content

Asixa/slang2drjit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

slang2drjit

slang2drjit compiles trusted local Slang CUDA kernels into Dr.Jit custom operations and exposes them through a Python loadModule() API.

The project is an experimental v2 foundation, not a general-purpose Slang binding generator. The public argument model is intentionally focused on DiffTensorView and TensorView kernels, with native Dr.Jit integration for the dtype and AD paths covered by the test suite.

Features

  • Compiles Slang kernels to CUDA source and torch-binding metadata source.
  • Parses generated __funcinfo__* metadata into a structured Python model.
  • Generates a native nanobind host wrapper and CUDA launch shim.
  • Exposes Slang kernels as keyword-only Python methods.
  • Allocates output arguments automatically.
  • Supports explicit multi-output functions with outputArgs.
  • Supports explicit and source-declared output allocation with outputShapes, hostAlloc, and [DrJitEntryPoint].
  • Dispatches overloads by keyword set, dtype, and parsed rank metadata.
  • Caches builds by source, options, package versions, and hashed include-path contents.
  • Supports Dr.Jit forward and backward AD paths for the currently verified dtype and view combinations.

Requirements

  • Python 3.10+
  • Dr.Jit with CUDA support
  • slangc with CUDA and torch-binding target support. The package includes a bundled Windows slangc.exe; set SLANGC_PATH to override it, or put slangc on PATH on other platforms.
  • CUDA toolkit
  • CMake 3.26+
  • Ninja
  • A working C++/CUDA compiler toolchain:
    • Windows: MSVC with CUDA integration
    • Linux: GCC or Clang with CUDA integration

Native build dependencies are environment requirements today; they are not yet fully declared as optional package extras.

Installation

Install the Python package from a local checkout:

python -m pip install -e .

Install Dr.Jit, CUDA, CMake, Ninja, and the platform compiler toolchain separately according to their upstream instructions.

Quick Start

Create a Slang kernel:

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(DiffTensorView input, DiffTensorView output)
{
    uint3 i = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
    if (i.x >= input.size(0)) return;
    output[i.x] = input[i.x] * input[i.x];
}

Load and call it from Python:

import drjit as dr
import drjit.cuda.ad as ad
from slang2drjit import loadModule

module = loadModule("square.slang")

x = ad.Float([1.0, 2.0, 3.0])
dr.enable_grad(x)

y = module.square(input=x).launchRaw(
    blockSize=(128, 1, 1),
    gridSize=(1, 1, 1),
)
dr.backward(dr.sum(y))

print(y)
print(dr.grad(x))

Python calls are keyword-only and follow slangtorch launch semantics: module.kernel(**kwargs) returns a launchable object, and the kernel runs when you call .launchRaw(blockSize, gridSize). Output arguments are allocated by slang2drjit; pass only input arguments from Python.

Public API

from slang2drjit import loadModule

module = loadModule(
    "kernel.slang",
    defines={"USE_FAST_PATH": 1},
    includePaths=["include"],
    extraSlangFlags=[],
    extraCudaFlags=[],
)

loadModule() compiles the Slang source, builds a native extension in the cache, imports it, and returns a proxy object whose methods correspond to public Slang kernels.

Launching Kernels

slang2drjit mirrors slangtorch's explicit launch API:

launchable = module.square(input=x)
y = launchable.launchRaw(
    blockSize=(128, 1, 1),
    gridSize=((dr.width(x) + 127) // 128, 1, 1),
)

blockSize and gridSize must both be tuples of three integers. As in slangtorch, launchTotal() and autoLaunch() are present but not implemented; use launchRaw() for now.

Outputs

By default, the last public Slang argument is treated as the single output:

module = loadModule("square.slang")
y = module.square(input=x).launchRaw(blockSize=(128, 1, 1), gridSize=(1, 1, 1))

For multiple outputs, mark them explicitly:

module = loadModule(
    "split.slang",
    outputArgs={"split": ("left", "right")},
)

left, right = module.split(input=x).launchRaw(blockSize=(128, 1, 1), gridSize=(1, 1, 1))

Multiple outputs return a plain Python tuple.

When an output shape differs from the first input shape, provide outputShapes:

module = loadModule(
    "reshape_output.slang",
    outputShapes={"reshapeOut": {"output": (2, 3)}},
)

y = module.reshapeOut(input=x).launchRaw(blockSize=(128, 1, 1), gridSize=(1, 1, 1))
assert tuple(y.shape) == (2, 3)

For allocation logic that belongs with the Slang source, use [DrJitEntryPoint]:

[DrJitEntryPoint]
DiffTensorView reshapeOut(DiffTensorView input)
{
    var output = DrJitTensor<float>.empty(input.size(0) / 3, 3);
    __dispatch_kernel(reshapeOut_kernel)(input, output);
    return output;
}

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void reshapeOut_kernel(DiffTensorView input, DiffTensorView output)
{
    /* write output */
}

Then Python can load the file directly:

module = loadModule("reshape_output.slang")
y = module.reshapeOut(input=x).launchRaw(blockSize=(128, 1, 1), gridSize=(1, 1, 1))

hostAlloc remains available as a lower-level Python-side allocation rule:

module = loadModule(
    "reshape_output.slang",
    hostAlloc={"reshapeOut": {"output": lambda input: (input.shape[0] // 3, 3)}},
)

For data-dependent output lengths, allocate a capacity-shaped output and return a separate count output:

[DrJitEntryPoint]
(DiffTensorView, TensorView<uint>) compact(DiffTensorView input)
{
    var values = DrJitTensor<float>.emptyLike(input);
    var count = DrJitTensor<uint>.empty(1);
    __dispatch_kernel(compact_kernel)(input, values, count);
    return values, count;
}

Overloads

Overloads are exposed through one Python method and selected at call time:

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void tag(DiffTensorView x, DiffTensorView out) { /* float path */ }

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void tag(TensorView<uint> x, TensorView<uint> out) { /* uint path */ }
module = loadModule("dtype_overloads.slang")

float_result = module.tag(x=ad.Float([1.0, 2.0, 3.0])).launchRaw(
    blockSize=(128, 1, 1),
    gridSize=(1, 1, 1),
)
uint_result = module.tag(x=ad.UInt([1, 2, 3])).launchRaw(
    blockSize=(128, 1, 1),
    gridSize=(1, 1, 1),
)

Build Cache

Native extensions are cached by a build key derived from:

  • Slang source contents
  • selected compile options and flags
  • relevant package versions
  • output metadata options
  • hashed includePaths contents

This keeps repeated loads fast while invalidating builds when source, configuration, or included files change.

Supported Scope

Public Slang arguments are limited to the DiffTensorView and TensorView families.

Currently verified native paths include:

  • Scalar array dtypes: float32, float16, float64, int32, uint32, int8, uint8, int64, uint64
  • Tensor dtypes: TensorXf, TensorXf64, TensorXi, TensorXu
  • AD paths: float32, float16, float64, TensorXf, TensorXf64
  • Multi-output AD for float arrays and TensorXf
  • Mixed float/int and mixed float/uint backward paths

Limitations

  • The host wrapper uses a local S2DCustomOp compatibility path. It avoids #define private public and direct writes to Dr.Jit private CustomOp::m_output, but it is not a direct call to the upstream drjit::custom() helper.
  • Metadata parsing is mostly regex/string based over generated Slang/C++ output, not a full AST parser.
  • Slang scalar parameters, structs, buffers, textures, and samplers are out of scope for now.
  • Packaging metadata is not yet ready for a stable PyPI release.

Development

Run the standard checks with a repository-local pytest temp directory:

python -m pytest -q --basetemp=.tmp/pytest-basetemp
python -m compileall -q src tests

Run CUDA integration tests in an environment that has Dr.Jit CUDA, slangc, CUDA, CMake, Ninja, and a working C++/CUDA compiler toolchain:

python -m pytest -q -m cuda_integration --basetemp=.tmp/pytest-basetemp

If CUDA integration tests are skipped, inspect the skip reason before treating native build support as verified.

License

slang2drjit is distributed under the MIT License. See LICENSE.

About

Compile Slang CUDA kernels into Dr.Jit custom operations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages