threads: Implement asymmetric atomic fences #60311

Keno · 2025-12-04T04:09:15Z

Asymmetric atomic fences are a performance optimization of regular atomic fences (the seq_cst version of which we expose as Base.Threads.atomic_fence). The problem with these regular fences is that they require a CPU fence instruction, which can be very expensive and is thus unsuitable for code in the hot path. Asymmetric fences on the other hand split an ordinary fence into two: A light side where the fence is extremely cheap (only a compiler reordering barrier) and a heavy side where the fence is very expensive.

Basically the way it works is that the heavy side does a system call that issues an inter-processor-interrupt (IPI) which then issues the appropriate barrier instruction on the other CPU (i.e. both CPUs will have issues a barrier instruction, one of them just does it asynchronously due to interrupt).

The light and heavy naming here is taken from C++ PR1202R5 [1], which is the proposal for the same feature in the C++ standard library (to appear in the next iteration of the C++ concurrency spec).

On the julia side, these functions are exposed as
Threads.atomic_fence_light and Threads.atomic_fence_heavy. The light side lowers to fence singlethread in llvm IR (the Core.Intrinsic atomic_fence is adjusted appropriately to faciliate this). The heavy side has OS-specifc implementations, where:

Linux/FreeBSD try to use the membarrier syscall or a fallback to mprotect for systems that don't have it.
Windows uses the FlushProcessWriteBuffers syscall.
macOS uses an implementation from the dotnet runtime (Implement AppleSilicon FlushProcessWriteBuffers dotnet/runtime#44670), which the dotnet folks have checked with Apple does the right thing by happenstance (i.e. an IPI/memory barrier is needed to execute the syscall), but looks a little nonsensical by itself. However, since it's what Apple recommended to dotnet, I don't see much risk here, though I wouldn't be surprised if Apple added a proper syscall for this in the future (since freebsd has it now).

Note that unlike the C++ spec, I have specified that atomic_fence_heavy does synchronize with atomic_fence. This matches the underlying system call. I suspect C++ chose to omit this for a hypothetical future architecture that has instruction support for doing this from userspace that would then not synchronize with ordinary barriers, but I think I would rather cross that bridge when we get there.

I intend to use this in #60281, but it's an independently useful feature.

[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1202r5.pdf

Seelengrab · 2025-12-04T08:58:10Z

Will this be documented as part of #46739 ?

src/runtime_intrinsics.c

test/threads_exec.jl

src/signals-mach.c

src/signals-unix.c

base/atomics.jl

test/threads_exec.jl

src/signals-unix.c

test/threads_exec.jl

Asymmetric atomic fences are a performance optimization of regular atomic fences (the seq_cst version of which we expose as `Base.Threads.atomic_fence`). The problem with these regular fences is that they require a CPU fence instruction, which can be very expensive and is thus unsuitable for code in the hot path. Asymmetric fences on the other hand split an ordinary fence into two: A `light` side where the fence is extremely cheap (only a compiler reordering barrier) and a `heavy` side where the fence is very expensive. Basically the way it works is that the heavy side does a system call that issues an inter-processor-interrupt (IPI) which then issues the appropriate barrier instruction on the other CPU (i.e. both CPUs will have issues a barrier instruction, one of them just does it asynchronously due to interrupt). The `light` and `heavy` naming here is taken from C++ PR1202R5 [1], which is the proposal for the same feature in the C++ standard library (to appear in the next iteration of the C++ concurrency spec). On the julia side, these functions are exposed as `Threads.atomic_fence_light` and `Threads.atomic_fence_heavy`. The light side lowers to `fence singlethread` in llvm IR (the Core.Intrinsic atomic_fence is adjusted appropriately to faciliate this). The heavy side has OS-specifc implementations, where: 1. Linux/FreeBSD try to use the `membarrier` syscall or a fallback to `mprotect` for systems that don't have it. 2. Windows uses the `FlushProcessWriteBuffers` syscall. 3. macOS uses an implementation from the dotnet runtime (dotnet/runtime#44670), which the dotnet folks have checked with Apple does the right thing by happenstance (i.e. an IPI/memory barrier is needed to execute the syscall), but looks a little nonsensical by itself. However, since it's what Apple recommended to dotnet, I don't see much risk here, though I wouldn't be surprised if Apple added a proper syscall for this in the future (since freebsd has it now). Note that unlike the C++ spec, I have specified that `atomic_fence_heavy` does synchronize with `atomic_fence`. This matches the underlying system call. I suspect C++ chose to omit this for a hypothetical future architecture that has instruction support for doing this from userspace that would then not synchronize with ordinary barriers, but I think I would rather cross that bridge when we get there. I intend to use this in #60281, but it's an independently useful feature. [1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1202r5.pdf

Keno · 2025-12-05T03:50:18Z

Addressed review, added NEWS, added docstring ref to the appropriate section.

topolarity · 2025-12-05T18:49:06Z

NEWS.md

 Multi-threading changes
 -----------------------

+  - New functions `Threads.atomic_fence_heavy` and `Threads.atoimc_fence_light` provide support for


Suggested change

- New functions `Threads.atomic_fence_heavy` and `Threads.atoimc_fence_light` provide support for

- New functions `Threads.atomic_fence_heavy` and `Threads.atomic_fence_light` provide support for

Keno requested review from vtjnash and xal-0 December 4, 2025 04:09

Keno force-pushed the kf/membarrier branch 5 times, most recently from b683c64 to 44b50e2 Compare December 4, 2025 08:05

vtjnash added docs This change adds or pertains to documentation needs news A NEWS entry is required for this change needs docs Documentation for this change is required and removed docs This change adds or pertains to documentation labels Dec 4, 2025