Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ML-Frameworks/pytorch-aarch64/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,11 @@ where `YY` is the year, and `MM` the month of the increment.
- `IDEEP_HASH=e087b6e4b32a7ba684db82231d1558123968ac1d`, from ideep_pytorch, May 11th, 2026.
- `ONEDNN_HASH=3004f0a1d9cf92c06eaaca57840aaa2149ebba85`, from main, May 27th, 2026.
- `KLEIDIAI_HASH=5866364d3bc079d2d6cae5f0acf6d076594bc7a7`, v1.25.0 from main, May 28th, 2026.
- Replaces `ACL_VERSION=v52.8.0` with `ACL_VERSION=v53.1.0`, from main, May 18th.
- Updates `OPENBLAS_VERSION` from `d26960a21ec5da7f77377f28bd6e230060841ae0` to v0.3.33, from main, Apr 23rd.
- Updates `torchvision` from 0.26.0.dev20260329 to 0.28.0.dev20260527.

### Removed
- Removes `ACL_VERSION` which references library that is no longer present.
- Disables PyTorch [PR #182655](https://github.com/pytorch/pytorch/pull/182655), to update the PyTorch CI build scripts.
- Disables PyTorch [PR #170600](https://github.com/pytorch/pytorch/pull/170600), to gate deletion of clean-up steps in build_common.sh.
- Disables PyTorch [PR #167328](https://github.com/pytorch/pytorch/pull/167328), to build cpuinfo into c10 shared library.
Expand Down
5 changes: 1 addition & 4 deletions ML-Frameworks/pytorch-aarch64/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,7 @@ Note: use the `CommitDate` for the trailing comments unless otherwise specified

#### Tags

For these dependencies, you should assign the latest tag from the releases to the appropriate variable in `versions.sh` (e.g. assign the latest tag for `ComputeLibrary` to `ACL_VERSION`).

- ComputeLibrary: https://github.com/ARM-software/ComputeLibrary/tags
- Pick the newest release tag.
For these dependencies, you should assign the latest tag from the releases to the appropriate variable in `versions.sh` (e.g. assign the latest tag for `OpenBLAS` to `OPENBLAS_VERSION`).

- OpenBLAS: https://github.com/OpenMathLib/OpenBLAS/tags
- Pick the newest release tag.
Expand Down
3 changes: 1 addition & 2 deletions ML-Frameworks/pytorch-aarch64/build-wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,9 @@ mkdir -p "${OUTPUT_LOCAL_DIR}"

trap cleanup EXIT

echo "Building local manywheel builder image with ACL_VERSION=${ACL_VERSION} and OPENBLAS_VERSION=${OPENBLAS_VERSION}"
echo "Building local manywheel builder image with OPENBLAS_VERSION=${OPENBLAS_VERSION}"
(
cd "${PYTORCH_LOCAL_DIR}"
ACL_VERSION="${ACL_VERSION}" \
OPENBLAS_VERSION="${OPENBLAS_VERSION}" \
MAX_JOBS="${MAX_JOBS}" \
USE_CCACHE="${USE_CCACHE}" \
Expand Down
46 changes: 20 additions & 26 deletions ML-Frameworks/pytorch-aarch64/examples/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
SPDX-FileCopyrightText: Copyright 2021-2025 Arm Limited and affiliates.
SPDX-FileCopyrightText: Copyright 2021-2026 Arm Limited and affiliates.

SPDX-License-Identifier: Apache-2.0
-->
Expand All @@ -9,25 +9,23 @@ SPDX-License-Identifier: Apache-2.0
<!-- Generated with VS Code's 'Markdown All in One' extension. -->
<!-- Regenerate with: 'Markdown All in One: Update Table of Contents'. -->

- [Examples](#examples)
- [Description](#description)
- [Vision](#vision)
- [Image classification](#image-classification)
- [Object detection](#object-detection)
- [Natural Language Processing (NLP)](#natural-language-processing-nlp)
- [Question answering](#question-answering)
- [Dynamic quantization](#dynamic-quantization)
- [General optimization guidelines](#general-optimization-guidelines)
- [Weight prepacking](#weight-prepacking)
- [General flags](#general-flags)
- [Compiled mode flags](#compiled-mode-flags)
- [Eager mode flags](#eager-mode-flags)
- [Generative AI](#generative-ai)
- [4 bit Dynamic Quantization](#4-bit-dynamic-quantization)
- [Vision](#vision-1)
- [Command-Line Options](#command-line-options)
- [Text Generation](#text-generation)
- [Command-Line Options](#command-line-options-1)
- [Description](#description)
- [Vision](#vision)
- [Image classification](#image-classification)
- [Object detection](#object-detection)
- [Natural Language Processing (NLP)](#natural-language-processing-nlp)
- [Question answering](#question-answering)
- [Dynamic quantization](#dynamic-quantization)
- [General optimization guidelines](#general-optimization-guidelines)
- [General flags](#general-flags)
- [Compiled mode flags](#compiled-mode-flags)
- [Eager mode flags](#eager-mode-flags)
- [Generative AI](#generative-ai)
- [4 bit Dynamic Quantization](#4-bit-dynamic-quantization)
- [Vision](#vision-1)
- [Command-Line Options](#command-line-options)
- [Text Generation](#text-generation)
- [Command-Line Options](#command-line-options-1)

## Description

Expand Down Expand Up @@ -56,7 +54,7 @@ The file [`resnet_v1-50.yml`](resnet_v1-50.yml) provides, in [YAML format](https

### Object detection

The script [`detect_objects.py`](detect_object.py) demonstrates how to run object detection using SSD-ResNet-34.
The script [`detect_objects.py`](detect_objects.py) demonstrates how to run object detection using SSD-ResNet-34.

The SSD-ResNet-34 model is trained from the Common Object in Context (COCO) image dataset.
This is a multiscale SSD (Single Shot Detection) model based on the ResNet-34 backbone network that performs object detection.
Expand Down Expand Up @@ -198,15 +196,11 @@ Note that in the above data we used the `--warmup` flag to run the model once be

## General optimization guidelines

### Weight prepacking

`Linear` layers calling [Arm ComputeLibrary](https://github.com/ARM-software/ComputeLibrary) (ACL) matmuls reorder weights during runtime by default. These reorders can be eliminated by calling `pack_linear_weights` as shown in `pack_linear_weights.py`. This improves the performance of any models calling a `Linear` layer multiple times.

### General flags

There are several flags which typically improve the performance of PyTorch.

`DNNL_DEFAULT_FPMATH_MODE`: setting the environment variable `DNNL_DEFAULT_FPMATH_MODE` to `BF16` or `ANY` will instruct ACL to dispatch fp32 workloads to bfloat16 kernels where hardware support permits. _Note: this may introduce a drop in accuracy._
`DNNL_DEFAULT_FPMATH_MODE`: setting the environment variable `DNNL_DEFAULT_FPMATH_MODE` to `BF16` or `ANY` will result in fp32 workloads being dispatched to bfloat16 kernels where hardware support permits. _Note: this may introduce a drop in accuracy._

You can control the number of threads with `OMP_NUM_THREADS`, smaller models may perform better with fewer threads.

Expand Down
1 change: 0 additions & 1 deletion ML-Frameworks/pytorch-aarch64/versions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ ONEDNN_HASH=3004f0a1d9cf92c06eaaca57840aaa2149ebba85 # From main, May 27th, 2
KLEIDIAI_HASH=5866364d3bc079d2d6cae5f0acf6d076594bc7a7 # v1.25.0 from main, May 28th, 2026

# build-wheel.sh deps
ACL_VERSION="v53.1.0" # May 18th
OPENBLAS_VERSION="v0.3.33" # Apr 23rd

# Dockerfile deps
Expand Down
Loading