Skip to content

ideal_mhd_model: share the MHD force-density kernel (6th/last)#18

Draft
krystophny wants to merge 12 commits into
pressure-kernelfrom
mhdforce-kernel
Draft

ideal_mhd_model: share the MHD force-density kernel (6th/last)#18
krystophny wants to merge 12 commits into
pressure-kernelfrom
mhdforce-kernel

Conversation

@krystophny

Copy link
Copy Markdown
Member

What

Extract computeMHDForces' real-space force-density assembly (armn/azmn/brmn/bzmn,
and crmn/czmn in 3D, even+odd parity) into the shared, allocation-free kernel
ComputeMHDForceDensity (mhdforce_kernel.h). The Eigen arithmetic is preserved
verbatim over flat-buffer Eigen::Map views, with caller-owned handover/average
scratch, so the result is bit-for-bit identical.

Why

Sixth and final point-local force-chain kernel. The six kernels (Jacobian #13,
metric #14, B^contra #15, B_cov #16, pressure #17, force here) now form the local
map geometry -> force density, ready to compose into the exact Hessian-vector
product Hv = Tᵀ ∘ J_g ∘ T.

This branch also merges #12 (allocation-free force kernel), which removes the
per-surface heap temporaries this extraction depends on; #12 remains separately
reviewable.

Verification

Bit-for-bit vmec_standalone MHD energy, 1 and 4 threads (ghost-cell
partitioning exercised), 2D and 3D:

            1 thread        4 threads
solovev     2.548352e+00    2.548352e+00
cth_like    5.057191e-02    5.057191e-02

Stacked on #17 (+ merges #12).

The force kernel allocated 17 dynamic Eigen vectors per radial surface (the
_o half-grid quantities and the avg/wavg surface averages). Move them to
preallocated per-thread ThreadLocalStorage scratch and assign in place, so
the radial loop allocates nothing.

Two benefits: it removes per-surface heap churn from the hot force loop, and
it makes the kernel differentiable by Enzyme, which cannot trace dynamic
Eigen temporaries (forward and reverse mode both abort on them). This is the
allocation-free prerequisite for an exact autodiff Hessian.

Pure refactor, identical arithmetic. Verified bit-for-bit: vmec_standalone
MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy
(5.057191e-02).
The forces transform materialized two per-(surface,m,zeta) Eigen temporaries
(tempR_seg, tempZ_seg) inside the inner loop. Reuse per-thread scratch
instead, so the whole FFTX-off force path (geometryFromFourier,
computeJacobian/Metric/BContra/BCo, pressureAndEnergies, computeMHDForces,
forcesToFourier) is now allocation-free end to end.

Same arithmetic as the previous .eval(); verified bit-for-bit: solovev
2.548352e+00, cth_like_fixed_bdy 5.057191e-02.
Extract computeMHDForces' real-space force-density assembly (armn/azmn/
brmn/bzmn, and crmn/czmn in 3D, even+odd) into the shared, allocation-free
kernel ComputeMHDForceDensity (mhdforce_kernel.h). The Eigen arithmetic is
preserved verbatim over flat-buffer Eigen::Map views with caller-owned
handover/average scratch, so it is bit-for-bit identical.

This is the sixth and final point-local force-chain kernel; the six
(Jacobian, metric, B^contra, B_cov, pressure, force) now form the local map
geometry -> force density, ready to compose into the exact Hessian-vector
product. (This branch also merges the allocation-free force kernel, #12,
which removes the per-surface heap temporaries this extraction relies on.)

Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and
cth_like_fixed_bdy (5.057191e-02).
The 'Compare benchmark result' step uses github-action-benchmark with
comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from
forks -> 'Resource not accessible by integration'. Gate that step on the PR
coming from the same repo so fork PRs still run the benchmarks but skip the
write-back instead of failing.
The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under
the numpy 2.x that the test env now resolves, importing it dies in the
f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension
must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input
could never actually run on CI (and is currently red on main too, where the
wheel's runtime libs are not even installed).

Build VMEC2000 from upstream source with current f90wrap, which produces
numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI
(hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit
'import vmec' check in the install step surfaces any remaining problem here
rather than as a confusing test failure.
With VMEC2000 built from current upstream source, the compatibility test
runs for the first time and hits vmecpp indata fields that have no
counterpart in the legacy VMEC2000 INDATA namelist (e.g.
free_boundary_method), which raised AttributeError. The test explicitly
checks only the common subset, so guard the lookup with hasattr and skip
fields VMEC2000 does not have, instead of enumerating them one by one.
…mit pin

Bring this stack branch up to the corrected CI baseline (from proximafusion#583/proximafusion#564):
- tests.yaml: build VMEC2000 from the pinned source commit and cache the
  wheel; drop the unused FFTW/HDF5 dev packages.
- benchmarks.yaml: skip the result upload on fork PRs (read-only token).
- test_simsopt_compat.py: skip vmecpp-only INDATA fields.
- CMakeLists: pin abseil to the 20260107.1 commit hash, not the tag.
…hmark fork guard (proximafusion#564)

* build: bump CMake abseil pin to 20260107.1 for Clang >= 21

The CMake FetchContent abseil pin (2024-08) fails to compile under
Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the
numbers.cc nullability annotations are rejected by the newer frontend.
Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8
and GCC. Clang is the compiler required for the Enzyme autodiff build.

The Bazel build keeps its own (BCR) abseil pin and is unaffected.

* ci: skip benchmark result upload on fork PRs (token is read-only)

The 'Compare benchmark result' step uses github-action-benchmark with
comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from
forks -> 'Resource not accessible by integration'. Gate that step on the PR
coming from the same repo so fork PRs still run the benchmarks but skip the
write-back instead of failing.

* ci: build VMEC2000 from source so the compat test runs on numpy 2

The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under
the numpy 2.x that the test env now resolves, importing it dies in the
f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension
must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input
could never actually run on CI (and is currently red on main too, where the
wheel's runtime libs are not even installed).

Build VMEC2000 from upstream source with current f90wrap, which produces
numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI
(hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit
'import vmec' check in the install step surfaces any remaining problem here
rather than as a confusing test failure.

* test: skip vmecpp-only indata fields in the VMEC2000 compat subset

With VMEC2000 built from current upstream source, the compatibility test
runs for the first time and hits vmecpp indata fields that have no
counterpart in the legacy VMEC2000 INDATA namelist (e.g.
free_boundary_method), which raised AttributeError. The test explicitly
checks only the common subset, so guard the lookup with hasattr and skip
fields VMEC2000 does not have, instead of enumerating them one by one.

* build: pin abseil to the 20260107.1 commit hash

Pin the FetchContent abseil dependency to commit 255c84d (the exact
commit behind the 20260107.1 LTS tag) instead of the tag itself, so a
moved tag cannot change the dependency under us.

* ci: cache and pin the VMEC2000-from-source build

Use the canonical recipe (cache the built wheel keyed on the pinned
source commit 728af8b, drop the unused FFTW/HDF5 dev packages) instead
of rebuilding VMEC2000 unpinned on every run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant