Remove padding from scales for hipBLASlt calls #442

alextmagro · 2026-02-04T01:33:22Z

Removes padding for scale vectors that are used mainly for MXFP8.

ipanfilo · 2026-02-04T15:04:58Z

tests/cpp/operator/test_cublaslt_gemm.cu

+    if (params.m % 16 || params.n % 16) {
+      GTEST_SKIP() << "MXFP8 requires M & N to be multiples of 16";
+    }
+    if (params.k % 128) {


Is it hipblasLt limitation?

Yes, these are the values that hipblastlt team provided to us. I tested just in case, but nothing smaller that 128 works for k.

Is 32x128x32 config needed with 16x128x16 then?

I would say it makes sense to keep. This allows us to test a TE acceptable size with 32 while also ensuring unpadding and hipBLASlt is working with 16.

In this case I'd change 32x128x32 to 32x128x16 to test they work together

ipanfilo · 2026-02-04T15:06:15Z

transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp

  NVTE_DIM_CHECK(chunk_height > 0 && chunk_width > 0, "Attempted to get empty tensor chunk");
  NVTE_DIM_CHECK(chunk_height <= height && chunk_width <= width,
                 "Attempted to get out-of-bounds tensor chunk");
+#ifndef __HIP_PLATFORM_AMD__


I think this file is not currently compiled for ROCm - it is for UB

Yes, I can move it to the UB PR if you prefer?

Yes, better move to UB because this file wold require more changes than those ifdefs, right?

Yes, makes sense.

transformer_engine/common/gemm/rocm_gemm.cu

ipanfilo · 2026-02-09T18:43:17Z

tests/cpp/operator/test_cublaslt_gemm.cu

+    if (params.m % 16 || params.n % 16) {
+      GTEST_SKIP() << "MXFP8 requires M & N to be multiples of 16";
+    }
+    if (params.k % 128) {


Is 32x128x32 config needed with 16x128x16 then?

tests/cpp/test_common.cu

ipanfilo · 2026-02-09T18:50:16Z

transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp

  NVTE_DIM_CHECK(chunk_height > 0 && chunk_width > 0, "Attempted to get empty tensor chunk");
  NVTE_DIM_CHECK(chunk_height <= height && chunk_width <= width,
                 "Attempted to get out-of-bounds tensor chunk");
+#ifndef __HIP_PLATFORM_AMD__


Yes, better move to UB because this file wold require more changes than those ifdefs, right?

ipanfilo · 2026-02-09T18:57:18Z

tests/pytorch/test_sanity.py

+@pytest.mark.skipif(not mxfp8_available, reason=reason_for_no_mxfp8)
+@pytest.mark.parametrize("N", [32])
+@pytest.mark.parametrize("K", [128])
+@pytest.mark.parametrize("M", [32])


Better use non multiple of 32 to test this path is unpadding

We require block sizes of 32 at the python level, so not possible to do a non-multiple. We are padding scales, so we will see a rowwise scale of (1,4) padded to (128,4), and a colwise scale of (4,1) being padded to (4,128).

ipanfilo · 2026-02-09T19:11:38Z

transformer_engine/pytorch/cpp_extensions/gemm.py

    return 0.0


+def unpad_scales(tensor: torch.Tensor, transpose: bool) -> torch.Tensor:


I wonder if it can be called once when tensors are created?

When a user creates a tensor for the first time, we don't do padding to begin with -- This logic is for loading NV checkpoints only. I was thinking that when we load a pytorch checkpoint, the tensors are filled without calling the initializers, potentially missing the logic. Is there a way to guarantee the unpadding function is called when a Tensor is loaded?

I think init should be called when checkpointing is loading. Or loading can be intercepted be overriding load_from_state_dict

Remove padding from scales for hipBLASlt calls

1a24ff2

alextmagro requested review from ipanfilo, wangye805 and wenchenvincent as code owners February 4, 2026 01:33

alextmagro added 2 commits February 3, 2026 20:44

Unpadding for checkpoints

5bbfb4b

copyrights

cbbd027

ipanfilo reviewed Feb 4, 2026

View reviewed changes

alextmagro added 2 commits February 6, 2026 17:37

Updated unpadding func

9f8f611

rocm_gemm.cu revert

9a5980f

alextmagro requested a review from ipanfilo February 6, 2026 23:41

ipanfilo reviewed Feb 9, 2026

View reviewed changes

alextmagro added 5 commits February 10, 2026 10:11

minor fixes

ff72142

Moved unpadding check to load_from_state_dict

b16611d

gemm.py revert

1cfd235

HIP guards

04a8409

Revert checkpointing logic

8ff5c15

		return 0.0


		def unpad_scales(tensor: torch.Tensor, transpose: bool) -> torch.Tensor:

Remove padding from scales for hipBLASlt calls #442

Are you sure you want to change the base?

Remove padding from scales for hipBLASlt calls #442

Conversation

alextmagro commented Feb 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants