ideal_mhd_model: make computeMHDForces allocation-free by krystophny · Pull Request #12 · itpplasma/vmecpp

krystophny · 2026-06-14T08:34:16Z

What

Make IdealMhdModel::computeMHDForces allocation-free. The kernel allocated 17
dynamic Eigen::VectorXd per radial surface: the _o half-grid quantities
(P_o, rup_o, ..., gbvbv_o) and the surface averages (P_avg, P_wavg,
gbubu_avg, ...). These move to preallocated per-thread ThreadLocalStorage
scratch and are assigned in place; the radial loop now allocates nothing.

Why

Two reasons:

Performance: the force kernel runs every iteration; allocating 17 vectors
per surface per iteration is avoidable heap churn.
Differentiability: Enzyme cannot trace dynamic Eigen temporaries -- both
forward and reverse mode abort on a dynamic-size Eigen expression that
allocates (the "freeing without malloc" failure). Writing into preallocated
storage via Eigen::Map/in-place assignment is the form Enzyme can
differentiate. This is the allocation-free prerequisite for an exact autodiff
Hessian-vector product of the force.

This is a pure refactor: the arithmetic is identical, only the storage of the
intermediates changes.

Verification

computeMHDForces now contains zero Eigen::VectorXd locals or
Eigen::VectorXd::Zero constructions (allocation-free).

Bit-for-bit identical results, vmec_standalone final MHD energy:

                       before (main)     after
solovev.json           2.548352e+00      2.548352e+00
cth_like_fixed_bdy     5.057191e-02      5.057191e-02

Builds clean under GCC 16 and Clang 21; clang-format (file style) clean.

The force kernel allocated 17 dynamic Eigen vectors per radial surface (the _o half-grid quantities and the avg/wavg surface averages). Move them to preallocated per-thread ThreadLocalStorage scratch and assign in place, so the radial loop allocates nothing. Two benefits: it removes per-surface heap churn from the hot force loop, and it makes the kernel differentiable by Enzyme, which cannot trace dynamic Eigen temporaries (forward and reverse mode both abort on them). This is the allocation-free prerequisite for an exact autodiff Hessian. Pure refactor, identical arithmetic. Verified bit-for-bit: vmec_standalone MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

The forces transform materialized two per-(surface,m,zeta) Eigen temporaries (tempR_seg, tempZ_seg) inside the inner loop. Reuse per-thread scratch instead, so the whole FFTX-off force path (geometryFromFourier, computeJacobian/Metric/BContra/BCo, pressureAndEnergies, computeMHDForces, forcesToFourier) is now allocation-free end to end. Same arithmetic as the previous .eval(); verified bit-for-bit: solovev 2.548352e+00, cth_like_fixed_bdy 5.057191e-02.

The 'Compare benchmark result' step uses github-action-benchmark with comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from forks -> 'Resource not accessible by integration'. Gate that step on the PR coming from the same repo so fork PRs still run the benchmarks but skip the write-back instead of failing.

The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under the numpy 2.x that the test env now resolves, importing it dies in the f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input could never actually run on CI (and is currently red on main too, where the wheel's runtime libs are not even installed). Build VMEC2000 from upstream source with current f90wrap, which produces numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI (hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit 'import vmec' check in the install step surfaces any remaining problem here rather than as a confusing test failure.

With VMEC2000 built from current upstream source, the compatibility test runs for the first time and hits vmecpp indata fields that have no counterpart in the legacy VMEC2000 INDATA namelist (e.g. free_boundary_method), which raised AttributeError. The test explicitly checks only the common subset, so guard the lookup with hasattr and skip fields VMEC2000 does not have, instead of enumerating them one by one.

…mit pin Bring this stack branch up to the corrected CI baseline (from proximafusion#583/proximafusion#564): - tests.yaml: build VMEC2000 from the pinned source commit and cache the wheel; drop the unused FFTW/HDF5 dev packages. - benchmarks.yaml: skip the result upload on fork PRs (read-only token). - test_simsopt_compat.py: skip vmecpp-only INDATA fields. - CMakeLists: pin abseil to the 20260107.1 commit hash for Clang >= 21.

…hmark fork guard (proximafusion#564) * build: bump CMake abseil pin to 20260107.1 for Clang >= 21 The CMake FetchContent abseil pin (2024-08) fails to compile under Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the numbers.cc nullability annotations are rejected by the newer frontend. Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8 and GCC. Clang is the compiler required for the Enzyme autodiff build. The Bazel build keeps its own (BCR) abseil pin and is unaffected. * ci: skip benchmark result upload on fork PRs (token is read-only) The 'Compare benchmark result' step uses github-action-benchmark with comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from forks -> 'Resource not accessible by integration'. Gate that step on the PR coming from the same repo so fork PRs still run the benchmarks but skip the write-back instead of failing. * ci: build VMEC2000 from source so the compat test runs on numpy 2 The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under the numpy 2.x that the test env now resolves, importing it dies in the f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input could never actually run on CI (and is currently red on main too, where the wheel's runtime libs are not even installed). Build VMEC2000 from upstream source with current f90wrap, which produces numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI (hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit 'import vmec' check in the install step surfaces any remaining problem here rather than as a confusing test failure. * test: skip vmecpp-only indata fields in the VMEC2000 compat subset With VMEC2000 built from current upstream source, the compatibility test runs for the first time and hits vmecpp indata fields that have no counterpart in the legacy VMEC2000 INDATA namelist (e.g. free_boundary_method), which raised AttributeError. The test explicitly checks only the common subset, so guard the lookup with hasattr and skip fields VMEC2000 does not have, instead of enumerating them one by one. * build: pin abseil to the 20260107.1 commit hash Pin the FetchContent abseil dependency to commit 255c84d (the exact commit behind the 20260107.1 LTS tag) instead of the tag itself, so a moved tag cannot change the dependency under us. * ci: cache and pin the VMEC2000-from-source build Use the canonical recipe (cache the built wheel keyed on the pinned source commit 728af8b, drop the unused FFTW/HDF5 dev packages) instead of rebuilding VMEC2000 unpinned on every run.

…roximafusion#562) `VmecWOut` NetCDF serialization assumed `jaxtyping._array_types._AnonymousDim` exists. With newer jaxtyping releases, that symbol may be absent (replaced by `_anonymous_dim`), causing runtime `AttributeError` in `test_init` wout IO/reference paths. - **Compatibility fix: dim marker resolution** - Resolve jaxtyping dimension marker types via guarded lookup on `jt._array_types`: - named: `_NamedDim` - anonymous: `_AnonymousDim` **or** fallback to `type(_anonymous_dim)` when only the instance-style API is present. - **Runtime behavior: robust dimension-name inference** - Keep existing inference behavior, but branch on resolved types instead of hard-coding a single private symbol. - If a marker type is unavailable, logic degrades safely to existing default dimension naming. - **Code organization** - Compute marker-type resolution once before field iteration, then reuse in per-field shape inference. ```python array_types = getattr(jt, "_array_types", None) named_dim_type = getattr(array_types, "_NamedDim", None) if array_types is not None else None anonymous_dim_type = getattr(array_types, "_AnonymousDim", None) if array_types is not None else None if anonymous_dim_type is None and array_types is not None: anonymous_dim_instance = getattr(array_types, "_anonymous_dim", None) if anonymous_dim_instance is not None: anonymous_dim_type = type(anonymous_dim_instance) ```

krystophny added 2 commits June 14, 2026 10:33

This was referenced Jun 14, 2026

ideal_mhd_model: share the Jacobian kernel with exact autodiff (fwd vs rev) #13

Draft

ideal_mhd_model: share the MHD force-density kernel (6th/last) #18

Draft

krystophny and others added 9 commits June 14, 2026 15:54

apply pre-commit formatting (ruff, docformatter, clang-format)

a9d477f

Merge remote-tracking branch 'upstream/main' into HEAD

453fd6a

Merge branch 'main' into allocation-free-forces

b1d86f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ideal_mhd_model: make computeMHDForces allocation-free#12

ideal_mhd_model: make computeMHDForces allocation-free#12
krystophny wants to merge 11 commits into
mainfrom
allocation-free-forces

krystophny commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krystophny commented Jun 14, 2026

What

Why

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants