Add INT8 GEMM support to the GEMM operator by albiol2004 · Pull Request #94 · amd/IRON

albiol2004 · 2026-04-09T10:04:57Z

Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the
Python GEMM operator layer. The C++ kernels already had the templates and
compile flags, this connects them to the Python API.

Also fixes a pre-existing bug in get_arg_spec() where AIERuntimeArgSpec
defaulted all buffers to bfloat16, causing silent data corruption for any
non-bf16 output type.

Closes #93

Added

INT8 input support (dtype_in="i8") with i8, i16, i32 output types
INT8 MAC dimensions (8,8,8) for npu1/npu2 in microkernel_mac_dim_map
INT8 kernel compilation flags (-Di8_i32_ONLY, etc.)
INT8 golden reference with int32 accumulation in reference.py
5 INT8 test configurations (4/8 columns, all output types, row/col-major B)

Changed

get_arg_spec() now passes correct dtype to AIERuntimeArgSpec
Test params include dtype_in/dtype_out (existing bf16 tests unchanged)
bf16-specific flags (prio_accuracy, bfp16 emulation) skipped for INT8

Removed

Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the Python GEMM operator layer. Fix get_arg_spec() to pass correct dtype to AIERuntimeArgSpec (was defaulting to bfloat16 for all types). Closes issue amd#93

dtype_in and dtype_out have repr=False, so they are excluded from the auto-generated operator name. When a bf16 and an int8 GEMM share the same dimensions (M, K, N, tiles, columns), they produce identical xclbin filenames. The first to compile wins; the second silently reuses the wrong binary, producing garbage output. Override the name property to append the dtype suffix (e.g. _i8_i32) when dtype_in is not the default bf16. bf16 names are unchanged for backward compatibility.

hunhoffe · 2026-04-09T14:47:38Z

iron/operators/gemm/op.py

+        identical dimensions."""
+        base = super().name
+        if self.dtype_in != "bf16":
+            base += f"_{self.dtype_in}_{self.dtype_out}"


I think it might make sense to always change it to include dtype in/out... thoughts @andrej ?

iron/operators/gemm/op.py

from review feedback: replace GEMM's private _np_dtype_map with a shared np_dtype_map in test_utils.py, derived from the existing torch_dtype_map to stay in sync

Add INT8 GEMM support to the GEMM operator

24e6204

Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the Python GEMM operator layer. Fix get_arg_spec() to pass correct dtype to AIERuntimeArgSpec (was defaulting to bfloat16 for all types). Closes issue amd#93

albiol2004 requested review from andrej, hunhoffe and jgmelber as code owners April 9, 2026 10:04

hunhoffe reviewed Apr 9, 2026

View reviewed changes

iron/operators/gemm/op.py Outdated Show resolved Hide resolved

Move dtype map to shared np_dtype_map in test_utils

a2cb969

from review feedback: replace GEMM's private _np_dtype_map with a shared np_dtype_map in test_utils.py, derived from the existing torch_dtype_map to stay in sync

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add INT8 GEMM support to the GEMM operator#94

Add INT8 GEMM support to the GEMM operator#94
albiol2004 wants to merge 3 commits intoamd:develfrom
albiol2004:int8-gemm

albiol2004 commented Apr 9, 2026 •

edited

Loading

Uh oh!

hunhoffe Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

albiol2004 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

Changed

Removed

Uh oh!

hunhoffe Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

albiol2004 commented Apr 9, 2026 •

edited

Loading