Skip to content

Conversation

@nnethercote
Copy link
Contributor

I have an ASUS GX10 Ascent which has an ARM CPU with 10 performance cores and 10 efficiency cores. This PR is enough to get rustc-perf working on it.

@nnethercote nnethercote requested a review from Kobzol December 18, 2025 04:21
If we have an inclusive range like `0-3`, that's four cores, not three.
@nnethercote
Copy link
Contributor Author

Note: I haven't touched the Target enum, which currently has a single variant X86_64UnknownLinuxGnu. Presumably this should be added to, but I wasn't sure because that type doesn't seem to be meaningfully used at the moment?

Copy link
Member

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package-lock.json change looks unrelated, could you revert it please?

CC @Jamesbarford if you have any comments as an ARM expert :)

Otherwise I'm fine with merging ofc, as it solves your use-case.

@Jamesbarford
Copy link
Contributor

The package-lock.json change looks unrelated, could you revert it please?

CC @Jamesbarford if you have any comments as an ARM expert :)

Otherwise I'm fine with merging ofc, as it solves your use-case.

As far as I understand there are 2 physical clusters with different cache sizes which could impact performance runs?

Each cluster has 5 high-performance cores (ARM v9.2, Cortex-X925) with 2MB L2 cache each, and 5 high-efficiency cores (ARM v9.2, Cortex-A725) with 512KB L2 cache each. The high-performance cluster features 16MB L3 cache, while the high-efficiency cluster is equipped with an 8MB L3 cache.

reference: https://docs.nvidia.com/dgx/dgx-spark-porting-guide/overview.html

@nnethercote
Copy link
Contributor Author

2 physical clusters with different cache sizes

True, good catch. Here's what lstopo shows:

image

I'm restricting execution to cores 5-9 (8MiB L3 cache) and 15-19 (16MiB L3 cache). The L3 cache sizes must explain some of the variance in the cpu_capacity numbers (e.g. some P-cores are 997, some are 1017). I guess I could check the cache configuration (in /sys/devices/system/cpu/cpu*/cache/index*/size] but I don't think I care enough, because L3 cache size will only affect a few things like wall-time, but shouldn't effect icounts, nor all the other profilers like Cachegrind and DHAT. I'll update the comments to note this, though.

Specifically:
- Add support for detecting ARM hybrid CPUs, via a heuristic on CPU
  "capacity".
- Adjust ARM-specific event names as necessary, e.g.
  `armv8_pmuv3_0/instructions:u/` -> `instructions:u`.

There is also some refactoring of the existing code for handling Intel
hybrid architectures, e.g. merging `run_on_p_cores` into
`performance_cores`, to avoid code duplication.
@nnethercote
Copy link
Contributor Author

I moved the regex and updated the comment about the ASUS GX10 to mention the L3 cache.

Copy link
Contributor

@Jamesbarford Jamesbarford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍. (I’m on a phone and GitHub won’t let me select the Approve radio button, thus this is a comment 🤷🏻‍♂️)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants