Add INT8 GEMM support to the GEMM operator#94
Open
albiol2004 wants to merge 3 commits intoamd:develfrom
Open
Conversation
Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the Python GEMM operator layer. Fix get_arg_spec() to pass correct dtype to AIERuntimeArgSpec (was defaulting to bfloat16 for all types). Closes issue amd#93
dtype_in and dtype_out have repr=False, so they are excluded from the auto-generated operator name. When a bf16 and an int8 GEMM share the same dimensions (M, K, N, tiles, columns), they produce identical xclbin filenames. The first to compile wins; the second silently reuses the wrong binary, producing garbage output. Override the name property to append the dtype suffix (e.g. _i8_i32) when dtype_in is not the default bf16. bf16 names are unchanged for backward compatibility.
hunhoffe
reviewed
Apr 9, 2026
| identical dimensions.""" | ||
| base = super().name | ||
| if self.dtype_in != "bf16": | ||
| base += f"_{self.dtype_in}_{self.dtype_out}" |
Collaborator
There was a problem hiding this comment.
I think it might make sense to always change it to include dtype in/out... thoughts @andrej ?
hunhoffe
reviewed
Apr 9, 2026
from review feedback: replace GEMM's private _np_dtype_map with a shared np_dtype_map in test_utils.py, derived from the existing torch_dtype_map to stay in sync
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the
Python GEMM operator layer. The C++ kernels already had the templates and
compile flags, this connects them to the Python API.
Also fixes a pre-existing bug in
get_arg_spec()whereAIERuntimeArgSpecdefaulted all buffers to bfloat16, causing silent data corruption for any
non-bf16 output type.
Closes #93
Added
dtype_in="i8") with i8, i16, i32 output typesmicrokernel_mac_dim_map-Di8_i32_ONLY, etc.)reference.pyChanged
get_arg_spec()now passes correctdtypetoAIERuntimeArgSpecdtype_in/dtype_out(existing bf16 tests unchanged)Removed