Conversation
This workflow enables running PyTorch upstream tests with NPU patches: - Clones PyTorch v2.7.1 official repository for test source - Applies test_upsteam patches from current repository - Runs pytest with NPU device support - Supports shard-based parallel execution (default 3 shards) - Triggers on push/PR/schedule/workflow_dispatch
- Change runner from [self-hosted, npu-910b] to linux-aarch64-a3-2 - Set NUM_SHARDS to 40 (each shard ~2.5% of tests) - Enable concurrent execution of all 40 shards (max-parallel: 40)
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
Each shard now contains ~1% of tests instead of ~2.5%, reducing the chance of a single shard containing multiple problematic test files.
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
…5667405 - Identified 21 shards that crashed during test execution - Total 126 unique test files causing process crashes (SIGSEGV/SIGABRT) - Categories: distributed, dynamo, functorch, nn, profiler, quantization, etc.
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
…D.yml - Revert commit 148e92b changes to disabled_testcases.json - Add test/test_proxy_tensor.py to CRASHED.yml blacklist - test_make_fx_exhaustive__native_batch_norm_legit_npu causes segfault
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
- Add new job 'upload_torch_npu_wheel' that downloads wheel from build job and re-uploads with clearer artifact name 'torch-npu-wheel-2.7.1' - Increase retention days from 7 to 30 for easier access - Update test and report job dependencies accordingly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
- Change "Non-Passing Shards" to "分片任务详情" - Show all shards in detail table instead of only failed ones - Add "总用例数" and "通过用例数" columns for better visibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
- Remove --index-url for PyTorch CPU wheels - Use default PyPI which has aarch64 wheels for torch 2.7.1 - PyTorch CPU index only provides x86_64 wheels, not ARM64 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
- Change default shard_end from 100 to 10 - This is a temporary change to validate the report format changes - Will restore to 100 after validation passes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
- Change from direct pytest execution to run_test.py invocation - Add --parallel parameter (default 2) to control NUM_PARALLEL_PROCS - Execute from test directory (run_test.py expects this working dir) - Strip 'test/' prefix from paths for run_test.py -i argument - run_test.py automatically handles distributed tests as serial Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
- Parse JUnit XML files to extract testsuite-level statistics - Display test file name, passed/failed/error counts, and duration - Format: "test_file.py: 5 passed, 2 failed, 1 error, 3.2s" - Replace "Scope" column with "测试文件详情" column Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
run_test.py check_pip_packages() requires these packages: - pytest-rerunfailures - pytest-flakefinder - pytest-xdist (already installed) Without these, run_test.py exits with error code 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
run_test.py expects test names without the .py extension. For example, it expects 'custom_backend/test_custom_backend' not 'custom_backend/test_custom_backend.py'. The strip_test_prefix function now removes both 'test/' prefix and '.py' suffix from test paths before passing to run_test.py -i. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
run_test.py has a predefined TESTS list and rejects tests from directories like custom_backend/ and custom_operator/. Solution: - Validate tests against known unrecognized prefixes - Run valid tests via run_test.py (file-level parallel) - Run unrecognized tests via direct pytest fallback This allows shard 1 tests (custom_backend, custom_operator) to run while still benefiting from run_test.py parallelism for other tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
Phase 2 pytest fallback was passing full paths like 'test/custom_backend/test_custom_backend.py' but pytest runs from test_dir, so paths should be relative (e.g. 'custom_backend/test_custom_backend.py'). The strip_test_prefix function removes both 'test/' and '.py', so we add '.py' back for pytest which expects file paths with extensions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
Changes: 1. Workflow: Upload both test-reports/ and pytorch-test-src/test/test-reports/ to capture Phase 1 run_test.py output 2. Report generator: Improved testsuite aggregation that: - Parses ALL XML files (not just shard-specific) - Filters by planned test files using test identifier matching - Handles both Phase 1 (run_test.py) and Phase 2 (pytest) results This fixes: - INCOMPLETE status when run_test.py produced XMLs in different directory - Empty "测试文件详情" column (now shows per-test-file statistics) - Note column now properly shows test results summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
The Phase 1 run_test.py output XMLs are stored in nested directories:
pytorch-test-src/test/test-reports/python-pytest/{test_identifier}/
Each directory contains multiple XML files (one per worker due to parallel
execution), and the testsuite name is "pytest" (generic), not the specific
test file identifier.
This fix:
- Uses the parent directory name as the test identifier for Phase 1 XMLs
- Aggregates stats from all XML files in the same directory
- Overrides INCOMPLETE status when Phase 1 XMLs exist with test results
- Parses testcase file attribute for Phase 2 XMLs to identify test files
This resolves the issue where shards showed INCOMPLETE status and empty
"测试文件详情" column even when Phase 1 XMLs with test results existed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLA Signature Passkerer-ai, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
用例测试