Skip to content

[Pipeline] Fix Python packaging pipeline failures#29200

Open
hariharans29 wants to merge 15 commits into
mainfrom
hari/py_pipeline_fix
Open

[Pipeline] Fix Python packaging pipeline failures#29200
hariharans29 wants to merge 15 commits into
mainfrom
hari/py_pipeline_fix

Conversation

@hariharans29

@hariharans29 hariharans29 commented Jun 20, 2026

Copy link
Copy Markdown
Member

Description

PR that introduced issue: #29064

Fixes in this PR:

  1. Add relevant platform guard in some tests that was previously missing

  2. Added the new AVX512 headers that host the kernels to their right location within the cmake file grouping - previously they were placed in AVX2 grouping (ultimately the TU that included those headers were compiled with AVX512 flags - so no harm was done). This fix is more pedantic than fixing a real issue. The lone .cpp file in that list didn't include any intrinsics manually but the compiler might use AVX512 now for auto-vectorization with the shuffling. Since that file contains only the pre-packing functions that are used in production, it is safe. The "scalar" kernel implementation in that file is mostly a test oracle - nothing else

Sample failed run (before PR): https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=1268723&view=results
Sample successful run (with PR): https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=1273196&view=results

Motivation and Context

Fix Python packaging pipeline

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Python packaging CI failures by preventing AVX-512–only MLAS unit tests from running on hosts without AVX-512, and tidies MLAS CMake source grouping so AVX-512 2-bit kernel sources/headers are associated with the AVX-512 compilation group (instead of AVX2).

Changes:

  • Added GTEST_SKIP() guards to AVX-512–dependent “Scalar_*” W2 SQNBitGEMM tests when GetMlasPlatform().Avx512Supported_ is false.
  • Moved AVX-512 W2 2-bit kernel sources/headers from the AVX2 CMake source list to the AVX-512 core list.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
onnxruntime/test/mlas/unittest/test_sqnbitgemm_2bit_gemm.cpp Skips AVX-512–dependent unit tests on non-AVX-512 hosts to avoid CI/runtime failures.
cmake/onnxruntime_mlas.cmake Re-groups AVX-512 W2 2-bit kernel files under the AVX-512 core source set for correct organization/flag association.

@hariharans29 hariharans29 requested a review from tianleiwu June 20, 2026 20:06

@tianleiwu tianleiwu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sensible, low-risk fix for the packaging-pipeline regression from #29064. The CMake regrouping correctly moves the avx512_2bit sources into the AVX512-core group, and the Avx512Supported_ skip guards are the right mechanism (consistent with test_sq8bitgemm.cpp).

One robustness suggestion (non-blocking): the scalar oracle's int8 dot-product loop is now compiled with -mavx512vnni, so the compiler may auto-vectorize it to vpdpbusd, while the test guard only checks Avx512Supported_ (AVX512F+BW/DQ/VL, no VNNI). On an AVX-512-core host without VNNI (Skylake-X/SP) the guard passes but an auto-vectorized vpdpbusd would SIGILL — the same failure class this PR fixes. See the inline note for options. The CI agents passing today evidently have VNNI, so this is hardening rather than a regression.

Comment thread cmake/onnxruntime_mlas.cmake
tianleiwu
tianleiwu previously approved these changes Jun 21, 2026

@tianleiwu tianleiwu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct and complete. The earlier concern — that the scalar-oracle TU is now compiled with -mavx512vnni and the compiler may auto-vectorize the uint8 × int8 → int32 reduction to vpdpbusd, which would SIGILL on an AVX-512-core-without-VNNI host — is fully closed: all six Scalar_* tests now gate on QNBitGemmDispatch == &MlasSQNBitGemmDispatchAvx512vnni, which is the highest QNBit AVX-512 dispatch tier in platform.cpp. So VNNI hosts run the tests and Skylake-X/SP-class hosts skip them safely.

The CMake regrouping (moving the AVX-512 2-bit kernel sources/headers from the AVX2 list into the AVX-512-core list) correctly associates them with the -mavx512vnni -mavx512bw -mavx512dq -mavx512vl flags, matching where the dispatch actually lives.

Deferring the more invasive option (a dedicated non-VNNI CMake bucket for the scalar oracle so the tests can run universally without dispatch-aware gating) to a follow-up is reasonable.

LGTM.

@hariharans29

Copy link
Copy Markdown
Member Author

Addressed PR feedback by stripping -mavx512vnni from the scalar oracle TU rather than tightening the test guard. Cleaner fix because the TU also hosts SQ2BitGemmPackQuantBDataAndBlkSum_Scalar, which is installed in the AVX-512 W2 dispatch and runs at model load on AVX-512-only (non-VNNI) hosts. Without the flag strip, the compiler could autovectorize that pack helper to vpdpbusd and we’d hit an illegal-instruction SIGILL in production on those hosts. Test guard remains at Avx512Supported_ - it is safe now with the cmake change

tianleiwu
tianleiwu previously approved these changes Jun 22, 2026
Comment thread cmake/onnxruntime_mlas.cmake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants