Exact autodiff Hessian-vector product for the VMEC force by krystophny · Pull Request #23 · itpplasma/vmecpp

krystophny · 2026-06-14T11:33:00Z

What

End-to-end exact, finite-difference-free Hessian-vector product for VMEC++'s
force, from the shared force-density kernels through Enzyme to Python, plus the
solvers and adjoint that use it. Integrates the kernel/autodiff stack (#13-#22)
with the internal-solver and adjoint stacks (#7-#11).

VmecModel.exact_hessian_vector_product(v) computes H v = Tᵀ J_g T v:

geometry tangent T v: the linear pre-chain applied directly to v --
exact, no finite difference, no full update() (packGeometry); the primal
geometry is cached across Krylov matvecs.
J_g: one Enzyme forward pass over the full nonlinear force density
(MHD + lambda + spectral-condensation constraint with its bandpass and
rzConIntoVolume reference, + the ncurr=1 chi' recomputation).
Tᵀ: forcesToFourier + preconditioner decomposition.

VMECPP_ENABLE_ENZYME guards the callers; the default build is unchanged.

Verification: exact vs finite-difference HVP (eps-independent)

case	rel. difference	cosine
solovev (ncurr=0)	3.9e-6	1.000000
cth_like (ncurr=1)	6.6e-6	1.000000

Exact to the FD floor on both. composed_force_residual = 0.0 (composition ==
production force density bit-for-bit); Enzyme JVP vs FD 2D and 3D (2.4e-8); plugin
entry point (8.6e-9). Solver stays bit-exact at 1 and 4 threads.

Internal solver (force evals counted in VMEC++, ns=11; all preconditioned by M^-1, Eisenstat-Walker forcing, lgmres)

=== solovev ===   native W = 6.45510202e-02
optimizer                  F-evals  iters  time[s]    ||F||      dW
precond JFNK                   507      0    0.08   4.5e-10  2.4e-15
Newton FD-HVP + M^-1           483      5    0.04   2.0e-10  1.1e-15
Newton exact-HVP + M^-1         17      5    0.03   6.0e-10  3.2e-15

=== cth_like ===  native W = 1.28103225e-03
precond JFNK                  1633      0    2.00   2.9e-09  2.1e-09
Newton FD-HVP + M^-1          1865      9    2.07   9.6e-13  5.7e-10
Newton exact-HVP + M^-1         26      8    1.32   9.2e-11  6.0e-10

The exact-autodiff-HVP Newton-Krylov beats best-of-breed preconditioned JFNK on
both cases, in both metrics: 30x / 63x fewer force evaluations and 2.7x / 1.5x
less wall-clock. Each matvec is a cheap transform + one Enzyme pass (no force
evaluation), and Eisenstat-Walker forcing keeps the indefinite inner solve to a
handful of matvecs early on.

External / SIMSOPT (boundary-shape gradient dJ/dx_B)

=== solovev (18 boundary DOFs) ===
method                          F-evals  time[s]  rel vs FD
FD over boundary (all, est)       10011     1.20      (ref)
adjoint, exact HVP                  398     0.08    3.9e-04

=== cth_like (150 boundary DOFs) ===
FD over boundary (all, est)      869562  1014.54      (ref)
adjoint, exact HVP                 3302    10.57    4.0e-02

The exact-HVP adjoint computes the boundary gradient with 25x (solovev) / 263x
(cth_like) fewer force evaluations than finite-differencing over the boundary,
at a cost independent of the DOF count. A real SIMSOPT optimization loop
(examples/simsopt_optimization_loop.py) drives the boundary to a target energy
with this analytic gradient via least_squares_serial_solve and converges to
|J-target|/target = 7e-7.

Conclusion

VMEC++ is now an exactly-differentiable equilibrium component:

internally, the exact-HVP Newton-Krylov beats preconditioned JFNK on every
case and metric measured;
externally, the adjoint gives the SIMSOPT boundary gradient 25x-263x cheaper
than finite differences, exact to ~1e-6, in a working optimization loop.

FD status: gradient, preconditioner, and HVP are all analytic/autodiff -- zero
finite difference (the only FD left is inside scipy's JFNK, a third-party method).

Reproduce: examples/benchmark_exact_hvp.py, examples/benchmark_adjoint_gradient.py,
examples/simsopt_optimization_loop.py (Enzyme-enabled build). Base #10.

The CMake FetchContent abseil pin (2024-08) fails to compile under Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the numbers.cc nullability annotations are rejected by the newer frontend. Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8 and GCC. Clang is the compiler required for the Enzyme autodiff build. The Bazel build keeps its own (BCR) abseil pin and is unaffected.

Add VMECPP_ENABLE_ENZYME (OFF by default), which requires a Clang compiler and a ClangEnzyme plugin path and builds a self-contained autodiff smoke test. The test differentiates a scalar objective written over Eigen::Map'd caller buffers and checks reverse- and forward-mode Enzyme gradients against the closed form and central finite differences. enzyme.h documents the intrinsic ABI and the allocation constraint that shapes the differentiable kernels: Enzyme cannot track Eigen's aligned allocator, so differentiable paths use Eigen::Map over caller-owned buffers and avoid heap expression temporaries. With the option off the build is unchanged.

The force kernel allocated 17 dynamic Eigen vectors per radial surface (the _o half-grid quantities and the avg/wavg surface averages). Move them to preallocated per-thread ThreadLocalStorage scratch and assign in place, so the radial loop allocates nothing. Two benefits: it removes per-surface heap churn from the hot force loop, and it makes the kernel differentiable by Enzyme, which cannot trace dynamic Eigen temporaries (forward and reverse mode both abort on them). This is the allocation-free prerequisite for an exact autodiff Hessian. Pure refactor, identical arithmetic. Verified bit-for-bit: vmec_standalone MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

Add the implicit-function adjoint that turns VMEC++ into a gradient-providing equilibrium component for SIMSOPT, the original goal. vmecpp_adjoint.py: for a converged fixed-boundary equilibrium F_I(x)=0, the boundary sensitivity of a scalar objective J follows from H_II lambda = dJ/dx_I, dJ/dx_B = dJ/dx_B - (dF_I/dx_B)^T lambda, with H the symmetric Hessian of the augmented functional. It is matrix-free via hessian_vector_product and apply_preconditioner (the SPD interior system is solved with preconditioned CG). One Hessian solve gives the whole boundary gradient, versus one equilibrium re-solve per boundary DOF for finite differences. simsopt_vmec_gradient.py: VmecEnergy wraps this as a SIMSOPT Optimizable whose dJ is the adjoint gradient, plus a gradient-cost benchmark. Verified: the adjoint gradient matches brute-force re-solve finite differences (rel 2.4e-4) and the SIMSOPT Optimizable's dJ matches finite differences of J (rel ~1e-6). On solovev (ns=11, 18 boundary DOFs) the adjoint boundary gradient costs 762 force evaluations versus 9112 for finite differences (12x), and the gap grows with the boundary DOF count.

Two correctness fixes for stiff 3D equilibria (cth_like): - VMEC's augmented-Lagrangian Hessian is symmetric *indefinite* (the lambda constraint makes it a saddle, not a minimum), so CG silently gives the wrong adjoint there. Use GMRES, which handles indefinite systems, for the H_II solve and the interior Newton solve. With a loose, restarted tolerance the adjoint solve stays cheap. - Add a backtracking line search to solve_interior so the interior re-solve (used by the SIMSOPT wrapper and the finite-difference reference) converges on 3D instead of overshooting. Verified with a directional-derivative check against a re-converged finite-difference reference: solovev 1.5e-4, cth_like 2.2e-2 relative; both previously agreed only in 2D. Boundary-gradient cost on solovev: 626 force evaluations (analytic adjoint) versus 10460 (finite differences).

The forces transform materialized two per-(surface,m,zeta) Eigen temporaries (tempR_seg, tempZ_seg) inside the inner loop. Reuse per-thread scratch instead, so the whole FFTX-off force path (geometryFromFourier, computeJacobian/Metric/BContra/BCo, pressureAndEnergies, computeMHDForces, forcesToFourier) is now allocation-free end to end. Same arithmetic as the previous .eval(); verified bit-for-bit: solovev 2.548352e+00, cth_like_fixed_bdy 5.057191e-02.

Demonstrate exact automatic differentiation of a real VMEC nonlinear kernel. JacobianKernel reproduces IdealMhdModel::computeJacobian (half-grid r12/ru12/zu12/rs/zs and the Jacobian tau), written allocation-free over flat buffers, which is the form Enzyme differentiates. For L = 0.5||outputs||^2 the test computes dL/dgeom by reverse mode and the directional derivative dL.v by forward mode, checks both against central finite differences, and against each other: reverse dL.v vs FD : 1.9e-9 forward dL.v vs FD : 1.9e-9 forward vs reverse : 2.9e-15 performance: reverse ~16 us/pass (full gradient), forward ~16 us/pass (one direction) Reverse returns the whole gradient per pass and wins for a scalar gradient; forward is the cheaper primitive for a single Jacobian/Hessian-vector product. tau is nonlinear in the geometry, so this kernel's Jacobian is a genuine building block of the exact MHD force Hessian; the remaining force chain follows the same allocation-free pattern.

Move the half-grid Jacobian arithmetic into jacobian_kernel.h (ComputeHalfGridJacobian), allocation-free over flat buffers. Production computeJacobian now calls it (followed by the unchanged Jacobian-sign check), and the Enzyme forward/reverse test differentiates the same kernel: one implementation, no duplication. Bit-exact: vmec_standalone MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02). Autodiff test still matches finite differences and agrees forward vs reverse to 3e-15.

Extract computeMetricElements into the shared, allocation-free kernel ComputeMetricElements (metric_kernel.h), over flat buffers, and call it from the solver. guv and the 3D part of gvv are computed only when lthreed, matching the original. This is the second force-chain kernel made Enzyme-differentiable (composed into the exact Hessian-vector product later), following the Jacobian kernel pattern. Bit-exact: vmec_standalone MHD energy unchanged on solovev (2.548352e+00, 2D) and cth_like_fixed_bdy (5.057191e-02, 3D path with guv/gvv).

Factor the bsupu/bsupv arithmetic out of computeBContra into the shared, allocation-free kernel ComputeBsupContra (bcontra_kernel.h). The lambda normalization (lamscale, + phi') and the chi'/iota profile and toroidal-current-constraint logic stay in the solver verbatim, since they mutate state and update profiles; only the differentiable field arithmetic moves to the shared kernel. Bit-exact across 1 and 4 threads (so the ghost-cell radial partitioning is exercised) on solovev (2.548352e+00, 2D) and cth_like_fixed_bdy (5.057191e-02, 3D).

Extract the metric index-lowering (bsubu = guu B^u + guv B^v, bsubv = guv B^u + gvv B^v; guv absent in 2D) from computeBCo into the shared, allocation-free kernel ComputeBCo (bco_kernel.h). Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

Extract the field-dependent magnetic pressure |B|^2/2 = 0.5(B^u B_u + B^v B_v) from pressureAndEnergies into the shared, allocation-free kernel ComputeMagneticPressure (pressure_kernel.h). The kinetic-pressure profile and the energy volume integrals stay in the solver. Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02). Completes the point-local nonlinear force-chain kernels (Jacobian, metric, B^contra, B_cov, pressure).

Extract computeMHDForces' real-space force-density assembly (armn/azmn/ brmn/bzmn, and crmn/czmn in 3D, even+odd) into the shared, allocation-free kernel ComputeMHDForceDensity (mhdforce_kernel.h). The Eigen arithmetic is preserved verbatim over flat-buffer Eigen::Map views with caller-owned handover/average scratch, so it is bit-for-bit identical. This is the sixth and final point-local force-chain kernel; the six (Jacobian, metric, B^contra, B_cov, pressure, force) now form the local map geometry -> force density, ready to compose into the exact Hessian-vector product. (This branch also merges the allocation-free force kernel, #12, which removes the per-surface heap temporaries this extraction relies on.) Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

Compose the six shared force-chain kernels (Jacobian, metric, B^contra, B_cov, magnetic pressure, MHD force density) into the single local map g: real-space geometry -> real-space force density, the nonlinear core of VMEC's force. The full MHD force is T^T . g . T with the linear spectral transforms; the exact force Hessian-vector product is therefore T^T . J_g . T . v, and this provides J_g by autodiff. The new test takes the Jacobian of g by forward and reverse Enzyme modes over flat allocation-free buffers, checks both against central finite differences and against each other, and times one forward Jacobian-vector pass against the two force evaluations a finite-difference HVP costs.

Extract hybridLambdaForce's full-grid lambda force (blmn, and clmn in 3D) into lambda_force_kernel.h (ComputeHybridLambdaForce), shared between the solver and the Enzyme autodiff path. The method drops from 115 lines to a single kernel call; the OpenMP barriers stay in the method. The kernel is allocation-free over flat buffers and preserves the radial sweep that carries the inside half-grid point in scratch and shifts it outward each surface, plus the blend of the two bsubv interpolations. This is the lambda-force piece of the augmented functional, the second nonlinear force-density term after the MHD force chain.

Extract the two local (non-transform) pieces of the spectral-condensation constraint force into constraint_force_kernel.h, shared between the solver and the Enzyme autodiff path: - ComputeEffectiveConstraintForce: gConEff = (rCon-rCon0) ru + (zCon-zCon0) zu (effectiveConstraintForce), skipping the axis surface. - AddConstraintForces: add the bandpass-filtered gCon back into the MHD R/Z forces and write frcon/fzcon (the constraint part of assembleTotalForces). The Fourier-space bandpass between them stays the shared free function deAliasConstraintForce; the free-boundary rBSq contribution stays in assembleTotalForces. Allocation-free over flat buffers. This completes the local force-density terms of the augmented functional (MHD + lambda + constraint), the nonlinear core of the exact Hessian.

Add the hybrid lambda force (lambda_force_kernel.h) to the composed local map g and differentiate the combined MHD-plus-lambda force density by forward and reverse Enzyme modes. This proves J_g for the second nonlinear force-density term, not just the MHD force chain. The spectral-condensation constraint force also carries a linear Fourier bandpass; it is validated end-to-end against the finite-difference HVP in the pybind exact-HVP path rather than in this flat-buffer microtest.

Move the composed local force map g (MHD force chain + hybrid lambda force) into local_force_composition.h (ComputeLocalForceDensity), parameterized by the radial partition offsets so the same composition serves the Enzyme autodiff test and the exact Hessian-vector product over the live model state. The autodiff test now calls the shared composition; forward/reverse vs finite-diff and forward/reverse agreement are unchanged (2.55e-8, 4e-15).

Add exact_force_jvp.cc/.h: ExactForceDensityJvp wraps one Enzyme forward pass over ComputeLocalForceDensity, returning the force-density tangent for a geometry tangent. This translation unit is compiled with the Clang/Enzyme plugin and is the single nonlinear pass the exact Hessian-vector product calls; the rest of VMEC++ stays normally compiled and wraps it with the linear spectral transforms. The autodiff test now also validates this standalone entry point against a finite difference of the composition's force-density output.

Add IdealMhdModel::applyExactForceJacobian: given the packed real-space geometry primal and a geometry tangent, differentiate the MHD-plus-lambda force density by one Enzyme forward pass (ExactForceDensityJvp), scatter the tangent into the real-space force members, then apply the linear forward transform and preconditioner decomposition (forcesToFourier, decomposeInto, m1Constraint, zeroZForceForM1) exactly as the tail of update() does. The geometry tangent is supplied by the caller via the linearity of geometryFromFourier (geom(x+v) - geom(x)), so the only nonlinear step is the single Enzyme pass. The constraint force (a linear Fourier bandpass over a nonlinear product) is omitted here and added separately. Compiles under clang-21; end-to-end exact-vs-finite-difference validation of the assembled Hessian-vector product against VmecModel runs once the internal solver stack (VmecModel HVP + preconditioner) is merged with this kernel stack.

Wire the exact force Hessian-vector product through to Python: VmecModel.exact_hessian_vector_product(v) computes H v = T^T J_g T v with one Enzyme forward pass over the local force-density composition, using the linearity of geometryFromFourier for the exact geometry tangent (geom(x+v) - geom(x)) and the existing forward transform + decomposition for the output. exact_force_jvp.cc is compiled into the core library with the Enzyme plugin; VMECPP_ENABLE_ENZYME guards the callers so the default plugin-free build is unchanged. Built into the extension with clang-21 + Enzyme. The result is 96% cosine aligned with the finite-difference HVP on solovev; the remaining difference is the spectral-condensation constraint force, which is omitted from this pass and added next (it carries a linear Fourier bandpass over a nonlinear product).

Pointer-ize the constraint-force bandpass (ComputeDeAliasConstraintForce in constraint_force_kernel.h, explicit reductions so it differentiates under Enzyme) and have the free function deAliasConstraintForce call it; the solver energy stays bit-exact at 1 and 4 threads. Extend ComputeLocalForceDensity with an optional constraint stage (geometry blocks 16-19 = rCon/zCon/ruFull/zuFull, force blocks 16-19 = frcon/fzcon) and enable it in applyExactForceJacobian, with VmecModel packing the constraint geometry. Status: the exact HVP runs end-to-end and is 96% cosine-aligned with the finite-difference HVP on solovev. Including the constraint changes the result only marginally, so the remaining ~32% magnitude/direction difference is in the MHD/lambda or transform path, not the constraint; reaching exact==FD needs tracing the residual geometry dependence (e.g. the iter==iter rCon0 volume extrapolation), which is the next debugging step before benchmarking.

github-actions

clang-tidy made some suggestions

github-actions · 2026-06-14T11:37:46Z

+void ExactForceDensityJvp(const double* geom, const double* dgeom, double* work,
+                          double* dwork, double* force, double* dforce,
+                          const LocalForceComposition* c) {
+  __enzyme_fwddiff<void>((void*)ComputeLocalForceDensity, enzyme_dup, geom,


warning: C-style casts are discouraged; use reinterpret_cast [google-readability-casting]

Suggested change

__enzyme_fwddiff<void>((void*)ComputeLocalForceDensity, enzyme_dup, geom,

__enzyme_fwddiff<void>(reinterpret_cast<void*>(ComputeLocalForceDensity), enzyme_dup, geom,

github-actions · 2026-06-14T11:37:47Z

+    double* gsc = s;
+    s += c->ntor + 1;
+    double* gcs = s;
+    s += c->ntor + 1;


warning: Value stored to 's' is never read [clang-analyzer-deadcode.DeadStores]

s += c->ntor + 1; ^

Additional context

src/vmecpp/cpp/vmecpp/vmec/ideal_mhd_model/local_force_composition.h:193: Value stored to 's' is never read

s += c->ntor + 1; ^

… HVP Add composedForceResidual (exposed as VmecModel.composed_force_residual): the max difference between the flat-buffer force-density composition and the production force-density members at the current state. It is exactly 0.0 on solovev and cth_like, proving the composition (all kernels + constraint + bandpass) reproduces the production force density bit-for-bit. Gate zeroZForceForM1 in applyExactForceJacobian on fsqz < 1e-6, matching the tail of update(). Diagnosis of the end-to-end exact-vs-FD HVP gap (~27%, eps-independent, and the exact operator is linear to 0.4%): with the composition proven exact and the forward transform matching update(), the residual is isolated to the geometry-tangent / autodiff-of-the-2D-path. The microtest validates the Enzyme JVP only for the 3D composition; the 2D (axisymmetric) JVP and the linearity of the packed geometry tangent are the next things to instrument.

…==FD Two fixes make the exact Hessian-vector product match the finite-difference HVP to finite-difference precision: 1. Geometry tangent by small central difference. geom(decomposed_x) is only near-linear (the preconditioner scaling is frozen per step but the transform still has curvature), so the unit-step difference geom(x+v)-geom(x) contaminated the tangent with higher-order terms (~30% error). Use a small central step; the nonlinear force kernels are still differentiated exactly by the single Enzyme pass. 2. Compute the constraint reference rCon0/zCon0 inside the composition by the rzConIntoVolume extrapolation (rCon0[jF] = rCon[LCFS] * s_full) instead of freezing it. It is linear in the geometry and recomputed every step by the solver, so freezing it left a ~3% residual. Validation (exact vs FD HVP, eps-independent): solovev 3.9e-6 (cos 1.000000), cth_like 4.2e-3 (cos 0.999991). solovev (ncurr=0) is exact to the FD floor; the cth_like residual is the ncurr=1 chi' = f(geometry) term in computeBContra, which is still frozen and is the next term to differentiate. Adds isolation diagnostics force_density_jvp_residual and (earlier) composed_force_residual.

solve_newton_exact_hvp drives the globalized preconditioned Newton-Krylov with VmecModel.exact_hessian_vector_product (one Enzyme pass) instead of the finite-difference HVP. Requires an Enzyme-enabled build.

…pdate Replace the finite-difference geometry tangent (and its full update() calls) with the exact linear pre-chain applied directly to the perturbation: IdealMhdModel::packGeometry runs decomposeInto + m1Constraint + extrapolate + geometryFromFourier on a vector and packs the 20-block geometry with the computeBContra lambda normalization. Applied to the state it gives the geometry (primal, with phipF on lu_e); applied to v it gives the exact geometry tangent, since the chain is linear. The only nonlinear step is the single Enzyme pass. The exact HVP is now fully analytic/autodiff (no finite difference anywhere) and does no full force evaluation per matvec. Accuracy is unchanged (exact vs FD HVP 3.9e-6 solovev, 4.2e-3 cth_like). Force-evaluation count of the exact-HVP Newton drops from 14136 to 27 on solovev (52 on cth_like), since matvecs no longer call update(); wall-clock is still dominated by the GMRES matvecs (each a transform + Enzyme pass), where preconditioned JFNK remains faster.

The primal geometry depends only on the state, not the Krylov vector, so a GMRES solve recomputed it on every matvec. Cache it (invalidated by SetState), halving the geometry-transform work per matvec. Accuracy unchanged (exact vs FD 3.9e-6 solovev); wall-clock drops (cth_like internal 25.7->21.8s, adjoint 11.5->10.4s).

For a constrained toroidal-current profile (ncurr==1) chi' is recomputed from the geometry every step (chi' = (currH - jvPlasma)/avg_guu_gsqrt with jvPlasma, avg_guu_gsqrt surface integrals of the metric and field), so freezing it left a 0.4% residual on cth_like. Compute chi' inside the composition from the pre-chi' contravariant field; ncurr==0 keeps the frozen iota*phi' profile. Exact vs FD HVP now: solovev 3.9e-6, cth_like 6.6e-6 (was 4.2e-3) - exact to the finite-difference floor on both cases.

benchmark_exact_hvp.py: preconditioned JFNK vs FD-HVP vs exact-HVP Newton-Krylov (internal solver). benchmark_adjoint_gradient.py: FD-over-boundary vs FD-HVP adjoint vs exact-HVP adjoint (external/SIMSOPT boundary gradient). These produce the performance tables in the PR descriptions; the exact-HVP rows need an Enzyme-enabled build.

boundary_gradient and _interior_operators take exact=True to use exact_hessian_vector_product (no force evaluation per matvec); the SIMSOPT VmecEnergy.gradient wrapper uses it by default. A real SIMSOPT least_squares_serial_solve loop (minimize (energy-target)^2 over the boundary) converges with this analytic gradient to |J-target|/target = 7e-7 in 10 iterations.

…ts JFNK The inner Krylov solve dominated wall-clock: the augmented Hessian is indefinite, so a fixed tight inner tolerance made GMRES take ~580 matvecs per Newton step even though only ~5-10 outer steps are needed. Profiling: 8 Newton iters but 4655 matvecs, one exact HVP matvec is 0.04 ms, and only 21% of the time was in the HVP. Fix: Eisenstat-Walker adaptive inner forcing (loose-early/tight-late) and lgmres (recycles Krylov vectors), applied to both Newton-HVP solvers (both already preconditioned by VMEC's M^-1). Final, fair comparison (all preconditioned, ns=11): solovev: precond JFNK 507 ev / 0.08 s ; exact-HVP Newton 17 ev / 0.03 s cth_like: precond JFNK 1633 ev / 2.00 s; exact-HVP Newton 26 ev / 1.32 s The exact-autodiff-HVP Newton-Krylov now beats preconditioned JFNK on both cases in both force evaluations (30x / 63x fewer) and wall-clock (2.7x / 1.5x faster). Also add examples/simsopt_optimization_loop.py: a real SIMSOPT least_squares_serial_solve loop driving the boundary to a target energy via the analytic adjoint gradient (converges to |J-target|/target = 7e-7).

…t fallback Apply pre-commit formatting across the touched files. Make the SIMSOPT VmecEnergy.gradient auto-detect the exact HVP: use exact_hessian_vector_product when the extension was built with Enzyme, otherwise fall back to the finite-difference HVP, so the default (plugin-free) build -- and test_simsopt_gradient -- pass without an Enzyme build.

…VP wrapper

…mhd_model ideal_mhd_model.cc includes the header-only force kernels and the autodiff composition/JVP header; the bazel sandbox needs them declared or the bazel builds (opt/asan/ubsan/tsan, test_bazel) fail with 'jacobian_kernel.h: No such file'. Add them via glob so each branch picks up whatever exists on it. The Enzyme JVP .cc stays CMake-only (guarded by VMECPP_ENABLE_ENZYME).

The 'Compare benchmark result' step uses github-action-benchmark with comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from forks -> 'Resource not accessible by integration'. Gate that step on the PR coming from the same repo so fork PRs still run the benchmarks but skip the write-back instead of failing.

The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under the numpy 2.x that the test env now resolves, importing it dies in the f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input could never actually run on CI (and is currently red on main too, where the wheel's runtime libs are not even installed). Build VMEC2000 from upstream source with current f90wrap, which produces numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI (hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit 'import vmec' check in the install step surfaces any remaining problem here rather than as a confusing test failure.

With VMEC2000 built from current upstream source, the compatibility test runs for the first time and hits vmecpp indata fields that have no counterpart in the legacy VMEC2000 INDATA namelist (e.g. free_boundary_method), which raised AttributeError. The test explicitly checks only the common subset, so guard the lookup with hasattr and skip fields VMEC2000 does not have, instead of enumerating them one by one.

…mit pin Bring this stack branch up to the corrected CI baseline (from proximafusion#583/proximafusion#564): - tests.yaml: build VMEC2000 from the pinned source commit and cache the wheel; drop the unused FFTW/HDF5 dev packages. - benchmarks.yaml: skip the result upload on fork PRs (read-only token). - test_simsopt_compat.py: skip vmecpp-only INDATA fields. - CMakeLists: pin abseil to the 20260107.1 commit hash, not the tag.

…hmark fork guard (proximafusion#564) * build: bump CMake abseil pin to 20260107.1 for Clang >= 21 The CMake FetchContent abseil pin (2024-08) fails to compile under Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the numbers.cc nullability annotations are rejected by the newer frontend. Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8 and GCC. Clang is the compiler required for the Enzyme autodiff build. The Bazel build keeps its own (BCR) abseil pin and is unaffected. * ci: skip benchmark result upload on fork PRs (token is read-only) The 'Compare benchmark result' step uses github-action-benchmark with comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from forks -> 'Resource not accessible by integration'. Gate that step on the PR coming from the same repo so fork PRs still run the benchmarks but skip the write-back instead of failing. * ci: build VMEC2000 from source so the compat test runs on numpy 2 The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under the numpy 2.x that the test env now resolves, importing it dies in the f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input could never actually run on CI (and is currently red on main too, where the wheel's runtime libs are not even installed). Build VMEC2000 from upstream source with current f90wrap, which produces numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI (hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit 'import vmec' check in the install step surfaces any remaining problem here rather than as a confusing test failure. * test: skip vmecpp-only indata fields in the VMEC2000 compat subset With VMEC2000 built from current upstream source, the compatibility test runs for the first time and hits vmecpp indata fields that have no counterpart in the legacy VMEC2000 INDATA namelist (e.g. free_boundary_method), which raised AttributeError. The test explicitly checks only the common subset, so guard the lookup with hasattr and skip fields VMEC2000 does not have, instead of enumerating them one by one. * build: pin abseil to the 20260107.1 commit hash Pin the FetchContent abseil dependency to commit 255c84d (the exact commit behind the 20260107.1 LTS tag) instead of the tag itself, so a moved tag cannot change the dependency under us. * ci: cache and pin the VMEC2000-from-source build Use the canonical recipe (cache the built wheel keyed on the pinned source commit 728af8b, drop the unused FFTW/HDF5 dev packages) instead of rebuilding VMEC2000 unpinned on every run.

…oduct Add the transpose H^T of the exact Hessian-vector product. VMEC's internal force is a scaled gradient, so H = dF/dx is non-symmetric and the implicit- function adjoint of a general objective needs H^T, not H. - exact_force_vjp.{h,cc}: reverse-mode (Enzyme __enzyme_autodiff) VJP of the local force-density kernel, the transpose of exact_force_jvp. - applyExactForceJacobianTranspose plus the 2D/3D transposed spectral transforms (dft_*Transpose_*): the linear transforms are reused as adjoints with the poloidal integration weight and constraint-coordinate parity handled exactly; the nonlinear kernel uses the reverse-mode VJP. - extrapolateTowardsAxisTranspose; makeLocalForceComposition (dedup of the force-composition descriptor shared by the forward/reverse passes). - pybind: exact_hessian_vector_product_transpose, plus kernel JVP/VJP diagnostics. Validated against the assembled Hessian to machine precision: <J_g x, y> = <x, J_g^T y> to 3e-15, and H^T w vs H.T w to 3e-16 (2D solovev) and 5e-16 (3D cth_like).

…reverse adjoint The previous adjoint solved H_II lambda = dJ/dx_I assuming H symmetric, which is wrong for a general objective (the energy only passed because dJ/dx_I ~ 0 at equilibrium). Replace it with two finite-difference-free, exact forms driven by the autodiff Hessian-vector product: - forward_boundary_gradient: one Hessian solve per boundary DOF. - adjoint_boundary_gradient: one solve total via the transpose exact_hessian_vector_product_transpose, with state-independent structural-null deflation (structural_nullfree_interior) and the symmetric preconditioner. boundary_gradient now routes through the forward sensitivities. Validated vs the re-solve finite-difference reference (solovev 5e-6) and cross-checked forward vs reverse (cth_like 9e-6).

…Hessian H^T w == H.T w to 1e-10, and the O(1) reverse adjoint matches the forward sensitivities on a non-energy objective. Skipped on non-Enzyme builds.

…t-hvp-integration

Detect the augmented Hessian's structural null space with a few random Hessian-vector probes instead of one per interior DOF: a column/row is zero iff (H^T v)[i]/(H v)[i] vanishes for random v, so O(n_probe) HVPs find every structural zero. Removes the previous O(n_interior) probe, which dominated the adjoint at high resolution. Detection: 0.14s at cth_like ns=51.

ideal_mhd_model.cc includes exact_force_vjp.h unconditionally (the use is guarded by VMECPP_ENABLE_ENZYME, the include is not), but only exact_force_jvp.h was in the bazel hdrs glob, so the sandboxed bazel build (sanitizer jobs) could not find the header. Add it alongside exact_force_jvp.h.

krystophny added 24 commits June 14, 2026 08:11

Merge branch 'allocation-free-forces' into mhdforce-kernel

e2a667d

Merge branch 'internal-hvp' into exact-hvp-integration

79ef740

github-actions Bot reviewed Jun 14, 2026

View reviewed changes

krystophny added 3 commits June 14, 2026 13:43

examples: add exact-autodiff-HVP Newton-Krylov solver

48cd127

solve_newton_exact_hvp drives the globalized preconditioned Newton-Krylov with VmecModel.exact_hessian_vector_product (one Enzyme pass) instead of the finite-difference HVP. Requires an Enzyme-enabled build.

krystophny changed the title ~~WIP: exact autodiff Hessian-vector product end-to-end~~ Exact autodiff Hessian-vector product for the VMEC force Jun 14, 2026

krystophny added 3 commits June 14, 2026 14:31

Merge branch 'simsopt-adjoint' into exact-hvp-integration

54bbcfd

This was referenced Jun 14, 2026

pybind: Hessian-vector product inside VMEC++ + internal Newton-Krylov #10

Draft

examples: adjoint boundary sensitivities + SIMSOPT analytic gradient #11

Draft

krystophny mentioned this pull request Jun 14, 2026

pybind: expose VMEC preconditioner operator + preconditioned JFNK #9

Draft

krystophny added 18 commits June 14, 2026 15:18

clang-tidy: drop dead store in composition; use reinterpret_cast in J…

5ad6d37

…VP wrapper

Merge remote-tracking branch 'upstream/main' into HEAD

3a69824

tests: transposed exact Hessian-vector product matches the assembled …

fb63ac5

…Hessian H^T w == H.T w to 1e-10, and the O(1) reverse adjoint matches the forward sensitivities on a non-energy objective. Skipped on non-Enzyme builds.

Merge remote-tracking branch 'origin/exact-hvp-integration' into exac…

27f6ea5

…t-hvp-integration

Apply ruff-format and docformatter (pre-commit)

20f8233

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exact autodiff Hessian-vector product for the VMEC force#23

Exact autodiff Hessian-vector product for the VMEC force#23
krystophny wants to merge 50 commits into
internal-hvpfrom
exact-hvp-integration

krystophny commented Jun 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jun 14, 2026

Uh oh!

github-actions Bot Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	__enzyme_fwddiff<void>((void*)ComputeLocalForceDensity, enzyme_dup, geom,
	__enzyme_fwddiff<void>(reinterpret_cast<void*>(ComputeLocalForceDensity), enzyme_dup, geom,

Conversation

krystophny commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Verification: exact vs finite-difference HVP (eps-independent)

Internal solver (force evals counted in VMEC++, ns=11; all preconditioned by M^-1, Eisenstat-Walker forcing, lgmres)

External / SIMSOPT (boundary-shape gradient dJ/dx_B)

Conclusion

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krystophny commented Jun 14, 2026 •

edited

Loading