From 876ece44be6d36838b7e753184ee0c090791249c Mon Sep 17 00:00:00 2001 From: Jessica Clarke Date: Sun, 25 Jan 2026 17:27:32 +0000 Subject: [PATCH] [aaelf64-morello][aapcs64-morello] Document pure-capability benchmark ABI This has existed for several years but was never specified, so attempt to do so in key places. Omissions are likely but this should give enough detail that anyone caring about this niche within a niche can figure out any missed implications themselves. --- aaelf64-morello/aaelf64-morello.rst | 41 ++++++++++++++++++++++++++--- aapcs64-morello/aapcs64-morello.rst | 14 +++++++++- 2 files changed, 51 insertions(+), 4 deletions(-) diff --git a/aaelf64-morello/aaelf64-morello.rst b/aaelf64-morello/aaelf64-morello.rst index c07d069b..9a0ea4b0 100644 --- a/aaelf64-morello/aaelf64-morello.rst +++ b/aaelf64-morello/aaelf64-morello.rst @@ -306,6 +306,24 @@ A Morello toolchain can emit ELF Note sections in accordance to [CHERI_ELF_]. The following Morello-specific ELF Note types are used, allocated from the space reserved by [CHERI_ELF_] for processor-specific use: +.. _Morello-specific note types: + +.. class:: aaelf64-morello-note-types + +.. table:: Morello-specific note types + + +------------+----------------------------------------+-----------------------------------------------------------+ + | Value | Name | Description | + +------------+----------------------------------------+-----------------------------------------------------------+ + | 0x80000000 | NT_CHERI_MORELLO_PURECAP_BENCHMARK_ABI | Whether the object uses the pure-capability benchmark ABI | + +------------+----------------------------------------+-----------------------------------------------------------+ + +.. note:: + + NT_CHERI_MORELLO_PURECAP_BENCHMARK_ABI has a Desc Size of 4, and Desc should + be interpreted as a 4-byte boolean value, with values other than 0 and 1 + reserved. + .. _Morello-specific NT_CHERI_TLS_ABI types: .. class:: aaelf64-morello-NT_CHERI_TLS_ABI-types @@ -346,8 +364,10 @@ expected definition. The type of any other symbol defined in an executable section can be ``STT_NOTYPE``. A linker is only required to provide long-branch and PLT support for symbols of type ``STT_FUNC``. A linker is also only required to provide -interworking support for A64 and C64 symbols of type ``STT_FUNC`` (interworking -for untyped symbols must be encoded directly in the object file) +interworking support for A64 and C64 symbols of type ``STT_FUNC``, and only if +not using the pure-capability benchmark ABI (interworking for untyped symbols +or the pure-capability benchmark ABI must be encoded directly in the object +file). Symbol names ^^^^^^^^^^^^ @@ -451,7 +471,8 @@ apply to symbols of type ``STT_FUNC`` and ``STT_GNU_IFUNC``: - If the symbol addresses a C64 instruction, its value is the address of the instruction with bit 0 set (in a relocatable object, the section offset with - bit 0 set). + bit 0 set) if not using the pure-capability benchmark ABI, otherwise it is + the same as for a symbol addressing an A64 instruction. .. note:: This allows a linker to distinguish A64 and C64 code symbols without having @@ -459,6 +480,10 @@ apply to symbols of type ``STT_FUNC`` and ``STT_GNU_IFUNC``: C64 symbol will always have an odd value. However, a linker should strip the discriminating bit from the value before using it for relocation. + Due to the pure-capability benchmark ABI using integer BR/BLR for indirect + calls, bit 0 is part of the branch target rather than the new value to use + for PSTATE.C64, and so this distinction cannot be used. + Relocation ---------- @@ -1258,6 +1283,11 @@ allow correct linker relaxation: .tlsdesccall sym blr c1 +.. note:: + + In the pure-capability benchmark ABI, the final ``blr c1`` is replaced with + ``blr x1``. + General Dynamic to Initial Exec relaxation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1398,6 +1428,11 @@ allow correct linker relaxation: .tgot_tlsdesccall sym blr c2 +.. note:: + + In the pure-capability benchmark ABI, the final ``blr c2`` is replaced with + ``blr x2``. + General Dynamic to Initial Exec relaxation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/aapcs64-morello/aapcs64-morello.rst b/aapcs64-morello/aapcs64-morello.rst index 69aa0675..e7dfaad4 100644 --- a/aapcs64-morello/aapcs64-morello.rst +++ b/aapcs64-morello/aapcs64-morello.rst @@ -10,6 +10,8 @@ affiliates. All rights reserved. .. _AAPCS64: https://github.com/ARM-software/abi-aa/releases +.. |UCAM-CL-TR-986-url| replace:: https://ctsrd-cheri.github.io/morello-early-performance-results/ +.. _UCAM-CL-TR-986: https://ctsrd-cheri.github.io/morello-early-performance-results/ Morello extensions to Procedure Call Standard for the ArmĀ® 64-bit Architecture (AArch64) **************************************************************************************** @@ -216,6 +218,8 @@ This document refers to, or is referred to by, the following documents. +------------------+--------------------------+--------------------------------------------------------------------------------------------+ | AAPCS64_ | IHI 005D | Procedure Call Standard for the Arm 64-bit Architecture. | +------------------+--------------------------+--------------------------------------------------------------------------------------------+ + | UCAM-CL-TR-986_ | |UCAM-CL-TR-986-url| | Early performance results from the prototype Morello microarchitecture | + +------------------+--------------------------+--------------------------------------------------------------------------------------------+ Terms and Abbreviations ----------------------- @@ -242,6 +246,12 @@ Deriving a capability when CV2 is a copy of CV1 with optionally removed permissions and/or optionally narrowed bounds (base increased or limit reduced). +Pure-capability benchmark ABI + A variant of the normal pure-capability ABI to work around known + limitations in the microarchitecture of the Morello implementation and more + closely model the essential overheads of CHERI. See UCAM-CL-TR-986_ for + more details of the motivation. + More specific terminology is defined when it is first used. .. raw:: pdf @@ -486,6 +496,8 @@ Subroutine Calls The A64 and C64 states contain primitive subroutine call instructions, BL and BLR, which performs a branch-with-link operation. The effect of executing BL is to transfer the sequentially next value of the program counter - the return address - into the link register (LR or CLR) and the destination address into the program counter. The effect of executing BLR is similar except that the new PC value is constructed from the specified register. +.. note:: + For C64 code in the pure-capability benchmark ABI, the value of CLR after a BL or BLR will have bit 0 set and must be cleared by the callee before returning with ``ret x30``. Use of CIP0 and CIP1 by the linker ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -718,7 +730,7 @@ Interworking between data model variants of AAPCS64 (although technically possib Interworking between AAPCS64 and AAPCS64-cap is not supported. -Interworking between A64 and C64 states is supported. The linker will insert a veneer at direct branches between different states. The veneer will perform both the state switch and range extensions. It is the responsibility of the callee to switch state on return. +Interworking between A64 and C64 states is supported if not using the pure-capability benchmark ABI. The linker will insert a veneer at direct branches between different states. The veneer will perform both the state switch and range extensions. It is the responsibility of the callee to switch state on return. .. raw:: pdf