[Pipeline] Fix Python packaging pipeline failures by hariharans29 · Pull Request #29200 · microsoft/onnxruntime

hariharans29 · 2026-06-20T19:55:33Z

Description

PR that introduced issue: #29064

Fixes in this PR:

Add relevant platform guard in some tests that was previously missing
Added the new AVX512 headers that host the kernels to their right location within the cmake file grouping - previously they were placed in AVX2 grouping (ultimately the TU that included those headers were compiled with AVX512 flags - so no harm was done). This fix is more pedantic than fixing a real issue. The lone .cpp file in that list didn't include any intrinsics manually but the compiler might use AVX512 now for auto-vectorization with the shuffling. Since that file contains only the pre-packing functions that are used in production, it is safe. The "scalar" kernel implementation in that file is mostly a test oracle - nothing else

Sample failed run (before PR): https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=1268723&view=results
Sample successful run (with PR): https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=1273196&view=results

Motivation and Context

Fix Python packaging pipeline

Copilot

Pull request overview

Fixes Python packaging CI failures by preventing AVX-512–only MLAS unit tests from running on hosts without AVX-512, and tidies MLAS CMake source grouping so AVX-512 2-bit kernel sources/headers are associated with the AVX-512 compilation group (instead of AVX2).

Changes:

Added GTEST_SKIP() guards to AVX-512–dependent “Scalar_*” W2 SQNBitGEMM tests when GetMlasPlatform().Avx512Supported_ is false.
Moved AVX-512 W2 2-bit kernel sources/headers from the AVX2 CMake source list to the AVX-512 core list.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
onnxruntime/test/mlas/unittest/test_sqnbitgemm_2bit_gemm.cpp	Skips AVX-512–dependent unit tests on non-AVX-512 hosts to avoid CI/runtime failures.
cmake/onnxruntime_mlas.cmake	Re-groups AVX-512 W2 2-bit kernel files under the AVX-512 core source set for correct organization/flag association.

tianleiwu

Sensible, low-risk fix for the packaging-pipeline regression from #29064. The CMake regrouping correctly moves the avx512_2bit sources into the AVX512-core group, and the Avx512Supported_ skip guards are the right mechanism (consistent with test_sq8bitgemm.cpp).

One robustness suggestion (non-blocking): the scalar oracle's int8 dot-product loop is now compiled with -mavx512vnni, so the compiler may auto-vectorize it to vpdpbusd, while the test guard only checks Avx512Supported_ (AVX512F+BW/DQ/VL, no VNNI). On an AVX-512-core host without VNNI (Skylake-X/SP) the guard passes but an auto-vectorized vpdpbusd would SIGILL — the same failure class this PR fixes. See the inline note for options. The CI agents passing today evidently have VNNI, so this is hardening rather than a regression.

tianleiwu

The fix looks correct and complete. The earlier concern — that the scalar-oracle TU is now compiled with -mavx512vnni and the compiler may auto-vectorize the uint8 × int8 → int32 reduction to vpdpbusd, which would SIGILL on an AVX-512-core-without-VNNI host — is fully closed: all six Scalar_* tests now gate on QNBitGemmDispatch == &MlasSQNBitGemmDispatchAvx512vnni, which is the highest QNBit AVX-512 dispatch tier in platform.cpp. So VNNI hosts run the tests and Skylake-X/SP-class hosts skip them safely.

The CMake regrouping (moving the AVX-512 2-bit kernel sources/headers from the AVX2 list into the AVX-512-core list) correctly associates them with the -mavx512vnni -mavx512bw -mavx512dq -mavx512vl flags, matching where the dispatch actually lives.

Deferring the more invasive option (a dedicated non-VNNI CMake bucket for the scalar oracle so the tests can run universally without dispatch-aware gating) to a follow-up is reasonable.

LGTM.

hariharans29 · 2026-06-21T08:09:05Z

Addressed PR feedback by stripping -mavx512vnni from the scalar oracle TU rather than tightening the test guard. Cleaner fix because the TU also hosts SQ2BitGemmPackQuantBDataAndBlkSum_Scalar, which is installed in the AVX-512 W2 dispatch and runs at model load on AVX-512-only (non-VNNI) hosts. Without the flag strip, the compiler could autovectorize that pack helper to vpdpbusd and we’d hit an illegal-instruction SIGILL in production on those hosts. Test guard remains at Avx512Supported_ - it is safe now with the cmake change

hariharans29 added 12 commits June 19, 2026 18:28

Debug

231d70b

Add debug file

6bba3dd

More debug

d5bab99

Add right test exclusion when the test runs on non-AVX512 hosts

8d1fccd

Fixes

86302dd

More debug

f26fe93

More debug 2

8b8dc5f

Debug 3

118ba2b

Debug 5

dd744c9

Debug 6

f2ea65b

Debug 7

412a5b4

Debug 8

ce168d2

hariharans29 requested a review from Copilot June 20, 2026 19:57

Copilot started reviewing on behalf of hariharans29 June 20, 2026 19:58 View session

Copilot AI reviewed Jun 20, 2026

View reviewed changes

hariharans29 requested a review from tianleiwu June 20, 2026 20:06

tianleiwu reviewed Jun 20, 2026

View reviewed changes

Comment thread cmake/onnxruntime_mlas.cmake

PR feedback

492a5f7

tianleiwu previously approved these changes Jun 21, 2026

View reviewed changes

Re-work PR feedback

5bf8ca7

hariharans29 dismissed tianleiwu’s stale review via 5bf8ca7 June 21, 2026 07:57

hariharans29 requested a review from tianleiwu June 21, 2026 08:02

tianleiwu previously approved these changes Jun 22, 2026

View reviewed changes

edgchen1 reviewed Jun 22, 2026

View reviewed changes

Comment thread cmake/onnxruntime_mlas.cmake

PR feedback 2

17d323c

hariharans29 dismissed tianleiwu’s stale review via 17d323c June 22, 2026 17:24

edgchen1 approved these changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline] Fix Python packaging pipeline failures#29200

[Pipeline] Fix Python packaging pipeline failures#29200
hariharans29 wants to merge 15 commits into
mainfrom
hari/py_pipeline_fix

hariharans29 commented Jun 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

hariharans29 commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hariharans29 commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

hariharans29 commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hariharans29 commented Jun 20, 2026 •

edited

Loading