slang2drjit

slang2drjit compiles trusted local Slang CUDA kernels into Dr.Jit custom operations and exposes them through a Python loadModule() API.

The project is an experimental v2 foundation, not a general-purpose Slang binding generator. The public argument model is intentionally focused on DiffTensorView and TensorView kernels, with native Dr.Jit integration for the dtype and AD paths covered by the test suite.

Features

Compiles Slang kernels to CUDA source and torch-binding metadata source.
Parses generated __funcinfo__* metadata into a structured Python model.
Generates a native nanobind host wrapper and CUDA launch shim.
Exposes Slang kernels as keyword-only Python methods.
Allocates output arguments automatically.
Supports explicit multi-output functions with outputArgs.
Supports explicit and source-declared output allocation with outputShapes, hostAlloc, and [DrJitEntryPoint].
Dispatches overloads by keyword set, dtype, and parsed rank metadata.
Caches builds by source, options, package versions, and hashed include-path contents.
Supports Dr.Jit forward and backward AD paths for the currently verified dtype and view combinations.

Requirements

Python 3.10+
Dr.Jit with CUDA support
slangc with CUDA and torch-binding target support. The package includes a bundled Windows slangc.exe; set SLANGC_PATH to override it, or put slangc on PATH on other platforms.
CUDA toolkit
CMake 3.26+
Ninja
A working C++/CUDA compiler toolchain:
- Windows: MSVC with CUDA integration
- Linux: GCC or Clang with CUDA integration

Native build dependencies are environment requirements today; they are not yet fully declared as optional package extras.

Installation

Install the Python package from a local checkout:

python -m pip install -e .

Install Dr.Jit, CUDA, CMake, Ninja, and the platform compiler toolchain separately according to their upstream instructions.

Quick Start

Create a Slang kernel:

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(DiffTensorView input, DiffTensorView output)
{
    uint3 i = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
    if (i.x >= input.size(0)) return;
    output[i.x] = input[i.x] * input[i.x];
}

Load and call it from Python:

import drjit as dr
import drjit.cuda.ad as ad
from slang2drjit import loadModule

module = loadModule("square.slang")

x = ad.Float([1.0, 2.0, 3.0])
dr.enable_grad(x)

y = module.square(input=x).launchRaw(
    blockSize=(128, 1, 1),
    gridSize=(1, 1, 1),
)
dr.backward(dr.sum(y))

print(y)
print(dr.grad(x))

Python calls are keyword-only and follow slangtorch launch semantics: module.kernel(**kwargs) returns a launchable object, and the kernel runs when you call .launchRaw(blockSize, gridSize). Output arguments are allocated by slang2drjit; pass only input arguments from Python.

Public API

from slang2drjit import loadModule

module = loadModule(
    "kernel.slang",
    defines={"USE_FAST_PATH": 1},
    includePaths=["include"],
    extraSlangFlags=[],
    extraCudaFlags=[],
)

loadModule() compiles the Slang source, builds a native extension in the cache, imports it, and returns a proxy object whose methods correspond to public Slang kernels.

Launching Kernels

slang2drjit mirrors slangtorch's explicit launch API:

launchable = module.square(input=x)
y = launchable.launchRaw(
    blockSize=(128, 1, 1),
    gridSize=((dr.width(x) + 127) // 128, 1, 1),
)

blockSize and gridSize must both be tuples of three integers. As in slangtorch, launchTotal() and autoLaunch() are present but not implemented; use launchRaw() for now.

Outputs

By default, the last public Slang argument is treated as the single output:

module = loadModule("square.slang")
y = module.square(input=x).launchRaw(blockSize=(128, 1, 1), gridSize=(1, 1, 1))

For multiple outputs, mark them explicitly:

module = loadModule(
    "split.slang",
    outputArgs={"split": ("left", "right")},
)

left, right = module.split(input=x).launchRaw(blockSize=(128, 1, 1), gridSize=(1, 1, 1))

Multiple outputs return a plain Python tuple.

When an output shape differs from the first input shape, provide outputShapes:

module = loadModule(
    "reshape_output.slang",
    outputShapes={"reshapeOut": {"output": (2, 3)}},
)

y = module.reshapeOut(input=x).launchRaw(blockSize=(128, 1, 1), gridSize=(1, 1, 1))
assert tuple(y.shape) == (2, 3)

For allocation logic that belongs with the Slang source, use [DrJitEntryPoint]:

[DrJitEntryPoint]
DiffTensorView reshapeOut(DiffTensorView input)
{
    var output = DrJitTensor<float>.empty(input.size(0) / 3, 3);
    __dispatch_kernel(reshapeOut_kernel)(input, output);
    return output;
}

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void reshapeOut_kernel(DiffTensorView input, DiffTensorView output)
{
    /* write output */
}

Then Python can load the file directly:

module = loadModule("reshape_output.slang")
y = module.reshapeOut(input=x).launchRaw(blockSize=(128, 1, 1), gridSize=(1, 1, 1))

hostAlloc remains available as a lower-level Python-side allocation rule:

module = loadModule(
    "reshape_output.slang",
    hostAlloc={"reshapeOut": {"output": lambda input: (input.shape[0] // 3, 3)}},
)

For data-dependent output lengths, allocate a capacity-shaped output and return a separate count output:

[DrJitEntryPoint]
(DiffTensorView, TensorView<uint>) compact(DiffTensorView input)
{
    var values = DrJitTensor<float>.emptyLike(input);
    var count = DrJitTensor<uint>.empty(1);
    __dispatch_kernel(compact_kernel)(input, values, count);
    return values, count;
}

Overloads

Overloads are exposed through one Python method and selected at call time:

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void tag(DiffTensorView x, DiffTensorView out) { /* float path */ }

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void tag(TensorView<uint> x, TensorView<uint> out) { /* uint path */ }

module = loadModule("dtype_overloads.slang")

float_result = module.tag(x=ad.Float([1.0, 2.0, 3.0])).launchRaw(
    blockSize=(128, 1, 1),
    gridSize=(1, 1, 1),
)
uint_result = module.tag(x=ad.UInt([1, 2, 3])).launchRaw(
    blockSize=(128, 1, 1),
    gridSize=(1, 1, 1),
)

Build Cache

Native extensions are cached by a build key derived from:

Slang source contents
selected compile options and flags
relevant package versions
output metadata options
hashed includePaths contents

This keeps repeated loads fast while invalidating builds when source, configuration, or included files change.

Supported Scope

Public Slang arguments are limited to the DiffTensorView and TensorView families.

Currently verified native paths include:

Scalar array dtypes: float32, float16, float64, int32, uint32, int8, uint8, int64, uint64
Tensor dtypes: TensorXf, TensorXf64, TensorXi, TensorXu
AD paths: float32, float16, float64, TensorXf, TensorXf64
Multi-output AD for float arrays and TensorXf
Mixed float/int and mixed float/uint backward paths

Limitations

The host wrapper uses a local S2DCustomOp compatibility path. It avoids #define private public and direct writes to Dr.Jit private CustomOp::m_output, but it is not a direct call to the upstream drjit::custom() helper.
Metadata parsing is mostly regex/string based over generated Slang/C++ output, not a full AST parser.
Slang scalar parameters, structs, buffers, textures, and samplers are out of scope for now.
Packaging metadata is not yet ready for a stable PyPI release.

Development

Run the standard checks with a repository-local pytest temp directory:

python -m pytest -q --basetemp=.tmp/pytest-basetemp
python -m compileall -q src tests

Run CUDA integration tests in an environment that has Dr.Jit CUDA, slangc, CUDA, CMake, Ninja, and a working C++/CUDA compiler toolchain:

python -m pytest -q -m cuda_integration --basetemp=.tmp/pytest-basetemp

If CUDA integration tests are skipped, inspect the skip reason before treating native build support as verified.

License

slang2drjit is distributed under the MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/slang2drjit		src/slang2drjit
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

slang2drjit

Features

Requirements

Installation

Quick Start

Public API

Launching Kernels

Outputs

Overloads

Build Cache

Supported Scope

Limitations

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

slang2drjit

Features

Requirements

Installation

Quick Start

Public API

Launching Kernels

Outputs

Overloads

Build Cache

Supported Scope

Limitations

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages