[Pipeline] Fix Python packaging pipeline failures#29200
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes Python packaging CI failures by preventing AVX-512–only MLAS unit tests from running on hosts without AVX-512, and tidies MLAS CMake source grouping so AVX-512 2-bit kernel sources/headers are associated with the AVX-512 compilation group (instead of AVX2).
Changes:
- Added
GTEST_SKIP()guards to AVX-512–dependent “Scalar_*” W2 SQNBitGEMM tests whenGetMlasPlatform().Avx512Supported_is false. - Moved AVX-512 W2 2-bit kernel sources/headers from the AVX2 CMake source list to the AVX-512 core list.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| onnxruntime/test/mlas/unittest/test_sqnbitgemm_2bit_gemm.cpp | Skips AVX-512–dependent unit tests on non-AVX-512 hosts to avoid CI/runtime failures. |
| cmake/onnxruntime_mlas.cmake | Re-groups AVX-512 W2 2-bit kernel files under the AVX-512 core source set for correct organization/flag association. |
tianleiwu
left a comment
There was a problem hiding this comment.
Sensible, low-risk fix for the packaging-pipeline regression from #29064. The CMake regrouping correctly moves the avx512_2bit sources into the AVX512-core group, and the Avx512Supported_ skip guards are the right mechanism (consistent with test_sq8bitgemm.cpp).
One robustness suggestion (non-blocking): the scalar oracle's int8 dot-product loop is now compiled with -mavx512vnni, so the compiler may auto-vectorize it to vpdpbusd, while the test guard only checks Avx512Supported_ (AVX512F+BW/DQ/VL, no VNNI). On an AVX-512-core host without VNNI (Skylake-X/SP) the guard passes but an auto-vectorized vpdpbusd would SIGILL — the same failure class this PR fixes. See the inline note for options. The CI agents passing today evidently have VNNI, so this is hardening rather than a regression.
tianleiwu
left a comment
There was a problem hiding this comment.
The fix looks correct and complete. The earlier concern — that the scalar-oracle TU is now compiled with -mavx512vnni and the compiler may auto-vectorize the uint8 × int8 → int32 reduction to vpdpbusd, which would SIGILL on an AVX-512-core-without-VNNI host — is fully closed: all six Scalar_* tests now gate on QNBitGemmDispatch == &MlasSQNBitGemmDispatchAvx512vnni, which is the highest QNBit AVX-512 dispatch tier in platform.cpp. So VNNI hosts run the tests and Skylake-X/SP-class hosts skip them safely.
The CMake regrouping (moving the AVX-512 2-bit kernel sources/headers from the AVX2 list into the AVX-512-core list) correctly associates them with the -mavx512vnni -mavx512bw -mavx512dq -mavx512vl flags, matching where the dispatch actually lives.
Deferring the more invasive option (a dedicated non-VNNI CMake bucket for the scalar oracle so the tests can run universally without dispatch-aware gating) to a follow-up is reasonable.
LGTM.
|
Addressed PR feedback by stripping -mavx512vnni from the scalar oracle TU rather than tightening the test guard. Cleaner fix because the TU also hosts |
Description
PR that introduced issue: #29064
Fixes in this PR:
Add relevant platform guard in some tests that was previously missing
Added the new AVX512 headers that host the kernels to their right location within the cmake file grouping - previously they were placed in AVX2 grouping (ultimately the TU that included those headers were compiled with AVX512 flags - so no harm was done). This fix is more pedantic than fixing a real issue. The lone .cpp file in that list didn't include any intrinsics manually but the compiler might use AVX512 now for auto-vectorization with the shuffling. Since that file contains only the pre-packing functions that are used in production, it is safe. The "scalar" kernel implementation in that file is mostly a test oracle - nothing else
Sample failed run (before PR): https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=1268723&view=results
Sample successful run (with PR): https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=1273196&view=results
Motivation and Context
Fix Python packaging pipeline