Skip to content

Add INT8 GEMM support to the GEMM operator#94

Open
albiol2004 wants to merge 3 commits intoamd:develfrom
albiol2004:int8-gemm
Open

Add INT8 GEMM support to the GEMM operator#94
albiol2004 wants to merge 3 commits intoamd:develfrom
albiol2004:int8-gemm

Conversation

@albiol2004
Copy link
Copy Markdown

@albiol2004 albiol2004 commented Apr 9, 2026

Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the
Python GEMM operator layer. The C++ kernels already had the templates and
compile flags, this connects them to the Python API.

Also fixes a pre-existing bug in get_arg_spec() where AIERuntimeArgSpec
defaulted all buffers to bfloat16, causing silent data corruption for any
non-bf16 output type.

Closes #93

Added

  • INT8 input support (dtype_in="i8") with i8, i16, i32 output types
  • INT8 MAC dimensions (8,8,8) for npu1/npu2 in microkernel_mac_dim_map
  • INT8 kernel compilation flags (-Di8_i32_ONLY, etc.)
  • INT8 golden reference with int32 accumulation in reference.py
  • 5 INT8 test configurations (4/8 columns, all output types, row/col-major B)

Changed

  • get_arg_spec() now passes correct dtype to AIERuntimeArgSpec
  • Test params include dtype_in/dtype_out (existing bf16 tests unchanged)
  • bf16-specific flags (prio_accuracy, bfp16 emulation) skipped for INT8

Removed

  Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through
  the Python GEMM operator layer. Fix get_arg_spec() to pass correct
  dtype to AIERuntimeArgSpec (was defaulting to bfloat16 for all types).

  Closes issue amd#93
  dtype_in and dtype_out have repr=False, so they are excluded from the
  auto-generated operator name. When a bf16 and an int8 GEMM share the
  same dimensions (M, K, N, tiles, columns), they produce identical
  xclbin filenames. The first to compile wins; the second silently
  reuses the wrong binary, producing garbage output.

  Override the name property to append the dtype suffix (e.g. _i8_i32)
  when dtype_in is not the default bf16. bf16 names are unchanged for
  backward compatibility.
identical dimensions."""
base = super().name
if self.dtype_in != "bf16":
base += f"_{self.dtype_in}_{self.dtype_out}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to always change it to include dtype in/out... thoughts @andrej ?

  from review feedback: replace GEMM's private _np_dtype_map with a
  shared np_dtype_map in test_utils.py, derived from the existing
  torch_dtype_map to stay in sync
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

INT8 GEMM support

2 participants