Skip to content

Exact autodiff Hessian-vector product for the VMEC force#23

Draft
krystophny wants to merge 50 commits into
internal-hvpfrom
exact-hvp-integration
Draft

Exact autodiff Hessian-vector product for the VMEC force#23
krystophny wants to merge 50 commits into
internal-hvpfrom
exact-hvp-integration

Conversation

@krystophny

@krystophny krystophny commented Jun 14, 2026

Copy link
Copy Markdown
Member

What

End-to-end exact, finite-difference-free Hessian-vector product for VMEC++'s
force, from the shared force-density kernels through Enzyme to Python, plus the
solvers and adjoint that use it. Integrates the kernel/autodiff stack (#13-#22)
with the internal-solver and adjoint stacks (#7-#11).

VmecModel.exact_hessian_vector_product(v) computes H v = Tᵀ J_g T v:

  • geometry tangent T v: the linear pre-chain applied directly to v --
    exact, no finite difference, no full update() (packGeometry); the primal
    geometry is cached across Krylov matvecs.
  • J_g: one Enzyme forward pass over the full nonlinear force density
    (MHD + lambda + spectral-condensation constraint with its bandpass and
    rzConIntoVolume reference, + the ncurr=1 chi' recomputation).
  • Tᵀ: forcesToFourier + preconditioner decomposition.

VMECPP_ENABLE_ENZYME guards the callers; the default build is unchanged.

Verification: exact vs finite-difference HVP (eps-independent)

case rel. difference cosine
solovev (ncurr=0) 3.9e-6 1.000000
cth_like (ncurr=1) 6.6e-6 1.000000

Exact to the FD floor on both. composed_force_residual = 0.0 (composition ==
production force density bit-for-bit); Enzyme JVP vs FD 2D and 3D (2.4e-8); plugin
entry point (8.6e-9). Solver stays bit-exact at 1 and 4 threads.

Internal solver (force evals counted in VMEC++, ns=11; all preconditioned by M^-1, Eisenstat-Walker forcing, lgmres)

=== solovev ===   native W = 6.45510202e-02
optimizer                  F-evals  iters  time[s]    ||F||      dW
precond JFNK                   507      0    0.08   4.5e-10  2.4e-15
Newton FD-HVP + M^-1           483      5    0.04   2.0e-10  1.1e-15
Newton exact-HVP + M^-1         17      5    0.03   6.0e-10  3.2e-15

=== cth_like ===  native W = 1.28103225e-03
precond JFNK                  1633      0    2.00   2.9e-09  2.1e-09
Newton FD-HVP + M^-1          1865      9    2.07   9.6e-13  5.7e-10
Newton exact-HVP + M^-1         26      8    1.32   9.2e-11  6.0e-10

The exact-autodiff-HVP Newton-Krylov beats best-of-breed preconditioned JFNK on
both cases, in both metrics: 30x / 63x fewer force evaluations and 2.7x / 1.5x
less wall-clock.
Each matvec is a cheap transform + one Enzyme pass (no force
evaluation), and Eisenstat-Walker forcing keeps the indefinite inner solve to a
handful of matvecs early on.

External / SIMSOPT (boundary-shape gradient dJ/dx_B)

=== solovev (18 boundary DOFs) ===
method                          F-evals  time[s]  rel vs FD
FD over boundary (all, est)       10011     1.20      (ref)
adjoint, exact HVP                  398     0.08    3.9e-04

=== cth_like (150 boundary DOFs) ===
FD over boundary (all, est)      869562  1014.54      (ref)
adjoint, exact HVP                 3302    10.57    4.0e-02

The exact-HVP adjoint computes the boundary gradient with 25x (solovev) / 263x
(cth_like) fewer force evaluations
than finite-differencing over the boundary,
at a cost independent of the DOF count. A real SIMSOPT optimization loop
(examples/simsopt_optimization_loop.py) drives the boundary to a target energy
with this analytic gradient via least_squares_serial_solve and converges to
|J-target|/target = 7e-7.

Conclusion

VMEC++ is now an exactly-differentiable equilibrium component:

  • internally, the exact-HVP Newton-Krylov beats preconditioned JFNK on every
    case and metric measured;
  • externally, the adjoint gives the SIMSOPT boundary gradient 25x-263x cheaper
    than finite differences, exact to ~1e-6, in a working optimization loop.

FD status: gradient, preconditioner, and HVP are all analytic/autodiff -- zero
finite difference (the only FD left is inside scipy's JFNK, a third-party method).

Reproduce: examples/benchmark_exact_hvp.py, examples/benchmark_adjoint_gradient.py,
examples/simsopt_optimization_loop.py (Enzyme-enabled build). Base #10.

The CMake FetchContent abseil pin (2024-08) fails to compile under
Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the
numbers.cc nullability annotations are rejected by the newer frontend.
Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8
and GCC. Clang is the compiler required for the Enzyme autodiff build.

The Bazel build keeps its own (BCR) abseil pin and is unaffected.
Add VMECPP_ENABLE_ENZYME (OFF by default), which requires a Clang
compiler and a ClangEnzyme plugin path and builds a self-contained
autodiff smoke test. The test differentiates a scalar objective written
over Eigen::Map'd caller buffers and checks reverse- and forward-mode
Enzyme gradients against the closed form and central finite differences.

enzyme.h documents the intrinsic ABI and the allocation constraint that
shapes the differentiable kernels: Enzyme cannot track Eigen's aligned
allocator, so differentiable paths use Eigen::Map over caller-owned
buffers and avoid heap expression temporaries.

With the option off the build is unchanged.
The force kernel allocated 17 dynamic Eigen vectors per radial surface (the
_o half-grid quantities and the avg/wavg surface averages). Move them to
preallocated per-thread ThreadLocalStorage scratch and assign in place, so
the radial loop allocates nothing.

Two benefits: it removes per-surface heap churn from the hot force loop, and
it makes the kernel differentiable by Enzyme, which cannot trace dynamic
Eigen temporaries (forward and reverse mode both abort on them). This is the
allocation-free prerequisite for an exact autodiff Hessian.

Pure refactor, identical arithmetic. Verified bit-for-bit: vmec_standalone
MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy
(5.057191e-02).
Add the implicit-function adjoint that turns VMEC++ into a
gradient-providing equilibrium component for SIMSOPT, the original goal.

vmecpp_adjoint.py: for a converged fixed-boundary equilibrium F_I(x)=0,
the boundary sensitivity of a scalar objective J follows from
H_II lambda = dJ/dx_I, dJ/dx_B = dJ/dx_B - (dF_I/dx_B)^T lambda, with H
the symmetric Hessian of the augmented functional. It is matrix-free via
hessian_vector_product and apply_preconditioner (the SPD interior system
is solved with preconditioned CG). One Hessian solve gives the whole
boundary gradient, versus one equilibrium re-solve per boundary DOF for
finite differences.

simsopt_vmec_gradient.py: VmecEnergy wraps this as a SIMSOPT Optimizable
whose dJ is the adjoint gradient, plus a gradient-cost benchmark.

Verified: the adjoint gradient matches brute-force re-solve finite
differences (rel 2.4e-4) and the SIMSOPT Optimizable's dJ matches finite
differences of J (rel ~1e-6). On solovev (ns=11, 18 boundary DOFs) the
adjoint boundary gradient costs 762 force evaluations versus 9112 for
finite differences (12x), and the gap grows with the boundary DOF count.
Two correctness fixes for stiff 3D equilibria (cth_like):

- VMEC's augmented-Lagrangian Hessian is symmetric *indefinite* (the lambda
  constraint makes it a saddle, not a minimum), so CG silently gives the
  wrong adjoint there. Use GMRES, which handles indefinite systems, for the
  H_II solve and the interior Newton solve. With a loose, restarted tolerance
  the adjoint solve stays cheap.
- Add a backtracking line search to solve_interior so the interior re-solve
  (used by the SIMSOPT wrapper and the finite-difference reference) converges
  on 3D instead of overshooting.

Verified with a directional-derivative check against a re-converged
finite-difference reference: solovev 1.5e-4, cth_like 2.2e-2 relative; both
previously agreed only in 2D. Boundary-gradient cost on solovev: 626 force
evaluations (analytic adjoint) versus 10460 (finite differences).
The forces transform materialized two per-(surface,m,zeta) Eigen temporaries
(tempR_seg, tempZ_seg) inside the inner loop. Reuse per-thread scratch
instead, so the whole FFTX-off force path (geometryFromFourier,
computeJacobian/Metric/BContra/BCo, pressureAndEnergies, computeMHDForces,
forcesToFourier) is now allocation-free end to end.

Same arithmetic as the previous .eval(); verified bit-for-bit: solovev
2.548352e+00, cth_like_fixed_bdy 5.057191e-02.
Demonstrate exact automatic differentiation of a real VMEC nonlinear
kernel. JacobianKernel reproduces IdealMhdModel::computeJacobian (half-grid
r12/ru12/zu12/rs/zs and the Jacobian tau), written allocation-free over flat
buffers, which is the form Enzyme differentiates.

For L = 0.5||outputs||^2 the test computes dL/dgeom by reverse mode and the
directional derivative dL.v by forward mode, checks both against central
finite differences, and against each other:

  reverse dL.v vs FD : 1.9e-9
  forward dL.v vs FD : 1.9e-9
  forward vs reverse : 2.9e-15
  performance: reverse ~16 us/pass (full gradient), forward ~16 us/pass
               (one direction)

Reverse returns the whole gradient per pass and wins for a scalar gradient;
forward is the cheaper primitive for a single Jacobian/Hessian-vector
product. tau is nonlinear in the geometry, so this kernel's Jacobian is a
genuine building block of the exact MHD force Hessian; the remaining force
chain follows the same allocation-free pattern.
Move the half-grid Jacobian arithmetic into jacobian_kernel.h
(ComputeHalfGridJacobian), allocation-free over flat buffers. Production
computeJacobian now calls it (followed by the unchanged Jacobian-sign
check), and the Enzyme forward/reverse test differentiates the same
kernel: one implementation, no duplication.

Bit-exact: vmec_standalone MHD energy unchanged on solovev
(2.548352e+00) and cth_like_fixed_bdy (5.057191e-02). Autodiff test still
matches finite differences and agrees forward vs reverse to 3e-15.
Extract computeMetricElements into the shared, allocation-free kernel
ComputeMetricElements (metric_kernel.h), over flat buffers, and call it
from the solver. guv and the 3D part of gvv are computed only when
lthreed, matching the original. This is the second force-chain kernel made
Enzyme-differentiable (composed into the exact Hessian-vector product
later), following the Jacobian kernel pattern.

Bit-exact: vmec_standalone MHD energy unchanged on solovev (2.548352e+00,
2D) and cth_like_fixed_bdy (5.057191e-02, 3D path with guv/gvv).
Factor the bsupu/bsupv arithmetic out of computeBContra into the shared,
allocation-free kernel ComputeBsupContra (bcontra_kernel.h). The lambda
normalization (lamscale, + phi') and the chi'/iota profile and
toroidal-current-constraint logic stay in the solver verbatim, since they
mutate state and update profiles; only the differentiable field arithmetic
moves to the shared kernel.

Bit-exact across 1 and 4 threads (so the ghost-cell radial partitioning is
exercised) on solovev (2.548352e+00, 2D) and cth_like_fixed_bdy
(5.057191e-02, 3D).
Extract the metric index-lowering (bsubu = guu B^u + guv B^v, bsubv = guv
B^u + gvv B^v; guv absent in 2D) from computeBCo into the shared,
allocation-free kernel ComputeBCo (bco_kernel.h).

Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and
cth_like_fixed_bdy (5.057191e-02).
Extract the field-dependent magnetic pressure |B|^2/2 = 0.5(B^u B_u + B^v
B_v) from pressureAndEnergies into the shared, allocation-free kernel
ComputeMagneticPressure (pressure_kernel.h). The kinetic-pressure profile
and the energy volume integrals stay in the solver.

Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and
cth_like_fixed_bdy (5.057191e-02). Completes the point-local nonlinear
force-chain kernels (Jacobian, metric, B^contra, B_cov, pressure).
Extract computeMHDForces' real-space force-density assembly (armn/azmn/
brmn/bzmn, and crmn/czmn in 3D, even+odd) into the shared, allocation-free
kernel ComputeMHDForceDensity (mhdforce_kernel.h). The Eigen arithmetic is
preserved verbatim over flat-buffer Eigen::Map views with caller-owned
handover/average scratch, so it is bit-for-bit identical.

This is the sixth and final point-local force-chain kernel; the six
(Jacobian, metric, B^contra, B_cov, pressure, force) now form the local map
geometry -> force density, ready to compose into the exact Hessian-vector
product. (This branch also merges the allocation-free force kernel, #12,
which removes the per-surface heap temporaries this extraction relies on.)

Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and
cth_like_fixed_bdy (5.057191e-02).
Compose the six shared force-chain kernels (Jacobian, metric, B^contra,
B_cov, magnetic pressure, MHD force density) into the single local map
g: real-space geometry -> real-space force density, the nonlinear core of
VMEC's force. The full MHD force is T^T . g . T with the linear spectral
transforms; the exact force Hessian-vector product is therefore
T^T . J_g . T . v, and this provides J_g by autodiff.

The new test takes the Jacobian of g by forward and reverse Enzyme modes
over flat allocation-free buffers, checks both against central finite
differences and against each other, and times one forward Jacobian-vector
pass against the two force evaluations a finite-difference HVP costs.
Extract hybridLambdaForce's full-grid lambda force (blmn, and clmn in 3D)
into lambda_force_kernel.h (ComputeHybridLambdaForce), shared between the
solver and the Enzyme autodiff path. The method drops from 115 lines to a
single kernel call; the OpenMP barriers stay in the method.

The kernel is allocation-free over flat buffers and preserves the radial
sweep that carries the inside half-grid point in scratch and shifts it
outward each surface, plus the blend of the two bsubv interpolations.

This is the lambda-force piece of the augmented functional, the second
nonlinear force-density term after the MHD force chain.
Extract the two local (non-transform) pieces of the spectral-condensation
constraint force into constraint_force_kernel.h, shared between the solver
and the Enzyme autodiff path:

- ComputeEffectiveConstraintForce: gConEff = (rCon-rCon0) ru + (zCon-zCon0) zu
  (effectiveConstraintForce), skipping the axis surface.
- AddConstraintForces: add the bandpass-filtered gCon back into the MHD R/Z
  forces and write frcon/fzcon (the constraint part of assembleTotalForces).

The Fourier-space bandpass between them stays the shared free function
deAliasConstraintForce; the free-boundary rBSq contribution stays in
assembleTotalForces. Allocation-free over flat buffers.

This completes the local force-density terms of the augmented functional
(MHD + lambda + constraint), the nonlinear core of the exact Hessian.
Add the hybrid lambda force (lambda_force_kernel.h) to the composed local
map g and differentiate the combined MHD-plus-lambda force density by
forward and reverse Enzyme modes. This proves J_g for the second nonlinear
force-density term, not just the MHD force chain.

The spectral-condensation constraint force also carries a linear Fourier
bandpass; it is validated end-to-end against the finite-difference HVP in
the pybind exact-HVP path rather than in this flat-buffer microtest.
Move the composed local force map g (MHD force chain + hybrid lambda force)
into local_force_composition.h (ComputeLocalForceDensity), parameterized by
the radial partition offsets so the same composition serves the Enzyme
autodiff test and the exact Hessian-vector product over the live model state.

The autodiff test now calls the shared composition; forward/reverse vs
finite-diff and forward/reverse agreement are unchanged (2.55e-8, 4e-15).
Add exact_force_jvp.cc/.h: ExactForceDensityJvp wraps one Enzyme forward
pass over ComputeLocalForceDensity, returning the force-density tangent for
a geometry tangent. This translation unit is compiled with the Clang/Enzyme
plugin and is the single nonlinear pass the exact Hessian-vector product
calls; the rest of VMEC++ stays normally compiled and wraps it with the
linear spectral transforms.

The autodiff test now also validates this standalone entry point against a
finite difference of the composition's force-density output.
Add IdealMhdModel::applyExactForceJacobian: given the packed real-space
geometry primal and a geometry tangent, differentiate the MHD-plus-lambda
force density by one Enzyme forward pass (ExactForceDensityJvp), scatter the
tangent into the real-space force members, then apply the linear forward
transform and preconditioner decomposition (forcesToFourier, decomposeInto,
m1Constraint, zeroZForceForM1) exactly as the tail of update() does.

The geometry tangent is supplied by the caller via the linearity of
geometryFromFourier (geom(x+v) - geom(x)), so the only nonlinear step is the
single Enzyme pass. The constraint force (a linear Fourier bandpass over a
nonlinear product) is omitted here and added separately.

Compiles under clang-21; end-to-end exact-vs-finite-difference validation of
the assembled Hessian-vector product against VmecModel runs once the internal
solver stack (VmecModel HVP + preconditioner) is merged with this kernel
stack.
Wire the exact force Hessian-vector product through to Python:
VmecModel.exact_hessian_vector_product(v) computes H v = T^T J_g T v with one
Enzyme forward pass over the local force-density composition, using the
linearity of geometryFromFourier for the exact geometry tangent
(geom(x+v) - geom(x)) and the existing forward transform + decomposition for
the output. exact_force_jvp.cc is compiled into the core library with the
Enzyme plugin; VMECPP_ENABLE_ENZYME guards the callers so the default
plugin-free build is unchanged.

Built into the extension with clang-21 + Enzyme. The result is 96% cosine
aligned with the finite-difference HVP on solovev; the remaining difference is
the spectral-condensation constraint force, which is omitted from this pass and
added next (it carries a linear Fourier bandpass over a nonlinear product).
Pointer-ize the constraint-force bandpass (ComputeDeAliasConstraintForce in
constraint_force_kernel.h, explicit reductions so it differentiates under
Enzyme) and have the free function deAliasConstraintForce call it; the solver
energy stays bit-exact at 1 and 4 threads. Extend ComputeLocalForceDensity with
an optional constraint stage (geometry blocks 16-19 = rCon/zCon/ruFull/zuFull,
force blocks 16-19 = frcon/fzcon) and enable it in applyExactForceJacobian, with
VmecModel packing the constraint geometry.

Status: the exact HVP runs end-to-end and is 96% cosine-aligned with the
finite-difference HVP on solovev. Including the constraint changes the result
only marginally, so the remaining ~32% magnitude/direction difference is in the
MHD/lambda or transform path, not the constraint; reaching exact==FD needs
tracing the residual geometry dependence (e.g. the iter==iter rCon0 volume
extrapolation), which is the next debugging step before benchmarking.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

void ExactForceDensityJvp(const double* geom, const double* dgeom, double* work,
double* dwork, double* force, double* dforce,
const LocalForceComposition* c) {
__enzyme_fwddiff<void>((void*)ComputeLocalForceDensity, enzyme_dup, geom,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: C-style casts are discouraged; use reinterpret_cast [google-readability-casting]

Suggested change
__enzyme_fwddiff<void>((void*)ComputeLocalForceDensity, enzyme_dup, geom,
__enzyme_fwddiff<void>(reinterpret_cast<void*>(ComputeLocalForceDensity), enzyme_dup, geom,

double* gsc = s;
s += c->ntor + 1;
double* gcs = s;
s += c->ntor + 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: Value stored to 's' is never read [clang-analyzer-deadcode.DeadStores]

    s += c->ntor + 1;
    ^
Additional context

src/vmecpp/cpp/vmecpp/vmec/ideal_mhd_model/local_force_composition.h:193: Value stored to 's' is never read

    s += c->ntor + 1;
    ^

… HVP

Add composedForceResidual (exposed as VmecModel.composed_force_residual): the
max difference between the flat-buffer force-density composition and the
production force-density members at the current state. It is exactly 0.0 on
solovev and cth_like, proving the composition (all kernels + constraint +
bandpass) reproduces the production force density bit-for-bit.

Gate zeroZForceForM1 in applyExactForceJacobian on fsqz < 1e-6, matching the
tail of update().

Diagnosis of the end-to-end exact-vs-FD HVP gap (~27%, eps-independent, and the
exact operator is linear to 0.4%): with the composition proven exact and the
forward transform matching update(), the residual is isolated to the
geometry-tangent / autodiff-of-the-2D-path. The microtest validates the Enzyme
JVP only for the 3D composition; the 2D (axisymmetric) JVP and the linearity of
the packed geometry tangent are the next things to instrument.
…==FD

Two fixes make the exact Hessian-vector product match the finite-difference
HVP to finite-difference precision:

1. Geometry tangent by small central difference. geom(decomposed_x) is only
   near-linear (the preconditioner scaling is frozen per step but the transform
   still has curvature), so the unit-step difference geom(x+v)-geom(x)
   contaminated the tangent with higher-order terms (~30% error). Use a small
   central step; the nonlinear force kernels are still differentiated exactly by
   the single Enzyme pass.

2. Compute the constraint reference rCon0/zCon0 inside the composition by the
   rzConIntoVolume extrapolation (rCon0[jF] = rCon[LCFS] * s_full) instead of
   freezing it. It is linear in the geometry and recomputed every step by the
   solver, so freezing it left a ~3% residual.

Validation (exact vs FD HVP, eps-independent): solovev 3.9e-6 (cos 1.000000),
cth_like 4.2e-3 (cos 0.999991). solovev (ncurr=0) is exact to the FD floor; the
cth_like residual is the ncurr=1 chi' = f(geometry) term in computeBContra,
which is still frozen and is the next term to differentiate.

Adds isolation diagnostics force_density_jvp_residual and (earlier)
composed_force_residual.
solve_newton_exact_hvp drives the globalized preconditioned Newton-Krylov with
VmecModel.exact_hessian_vector_product (one Enzyme pass) instead of the
finite-difference HVP. Requires an Enzyme-enabled build.
@krystophny krystophny changed the title WIP: exact autodiff Hessian-vector product end-to-end Exact autodiff Hessian-vector product for the VMEC force Jun 14, 2026
…pdate

Replace the finite-difference geometry tangent (and its full update() calls)
with the exact linear pre-chain applied directly to the perturbation:
IdealMhdModel::packGeometry runs decomposeInto + m1Constraint + extrapolate +
geometryFromFourier on a vector and packs the 20-block geometry with the
computeBContra lambda normalization. Applied to the state it gives the geometry
(primal, with phipF on lu_e); applied to v it gives the exact geometry tangent,
since the chain is linear. The only nonlinear step is the single Enzyme pass.

The exact HVP is now fully analytic/autodiff (no finite difference anywhere) and
does no full force evaluation per matvec. Accuracy is unchanged (exact vs FD HVP
3.9e-6 solovev, 4.2e-3 cth_like). Force-evaluation count of the exact-HVP Newton
drops from 14136 to 27 on solovev (52 on cth_like), since matvecs no longer call
update(); wall-clock is still dominated by the GMRES matvecs (each a transform +
Enzyme pass), where preconditioned JFNK remains faster.
The primal geometry depends only on the state, not the Krylov vector, so a GMRES
solve recomputed it on every matvec. Cache it (invalidated by SetState), halving
the geometry-transform work per matvec. Accuracy unchanged (exact vs FD 3.9e-6
solovev); wall-clock drops (cth_like internal 25.7->21.8s, adjoint 11.5->10.4s).
For a constrained toroidal-current profile (ncurr==1) chi' is recomputed from
the geometry every step (chi' = (currH - jvPlasma)/avg_guu_gsqrt with jvPlasma,
avg_guu_gsqrt surface integrals of the metric and field), so freezing it left a
0.4% residual on cth_like. Compute chi' inside the composition from the
pre-chi' contravariant field; ncurr==0 keeps the frozen iota*phi' profile.

Exact vs FD HVP now: solovev 3.9e-6, cth_like 6.6e-6 (was 4.2e-3) - exact to the
finite-difference floor on both cases.
benchmark_exact_hvp.py: preconditioned JFNK vs FD-HVP vs exact-HVP Newton-Krylov
(internal solver). benchmark_adjoint_gradient.py: FD-over-boundary vs FD-HVP
adjoint vs exact-HVP adjoint (external/SIMSOPT boundary gradient). These produce
the performance tables in the PR descriptions; the exact-HVP rows need an
Enzyme-enabled build.
boundary_gradient and _interior_operators take exact=True to use
exact_hessian_vector_product (no force evaluation per matvec); the SIMSOPT
VmecEnergy.gradient wrapper uses it by default. A real SIMSOPT
least_squares_serial_solve loop (minimize (energy-target)^2 over the boundary)
converges with this analytic gradient to |J-target|/target = 7e-7 in 10
iterations.
…ts JFNK

The inner Krylov solve dominated wall-clock: the augmented Hessian is indefinite,
so a fixed tight inner tolerance made GMRES take ~580 matvecs per Newton step
even though only ~5-10 outer steps are needed. Profiling: 8 Newton iters but 4655
matvecs, one exact HVP matvec is 0.04 ms, and only 21% of the time was in the HVP.

Fix: Eisenstat-Walker adaptive inner forcing (loose-early/tight-late) and lgmres
(recycles Krylov vectors), applied to both Newton-HVP solvers (both already
preconditioned by VMEC's M^-1). Final, fair comparison (all preconditioned,
ns=11):

  solovev:  precond JFNK 507 ev / 0.08 s ;  exact-HVP Newton  17 ev / 0.03 s
  cth_like: precond JFNK 1633 ev / 2.00 s;  exact-HVP Newton  26 ev / 1.32 s

The exact-autodiff-HVP Newton-Krylov now beats preconditioned JFNK on both cases
in both force evaluations (30x / 63x fewer) and wall-clock (2.7x / 1.5x faster).

Also add examples/simsopt_optimization_loop.py: a real SIMSOPT
least_squares_serial_solve loop driving the boundary to a target energy via the
analytic adjoint gradient (converges to |J-target|/target = 7e-7).
…t fallback

Apply pre-commit formatting across the touched files. Make the SIMSOPT
VmecEnergy.gradient auto-detect the exact HVP: use exact_hessian_vector_product
when the extension was built with Enzyme, otherwise fall back to the
finite-difference HVP, so the default (plugin-free) build -- and
test_simsopt_gradient -- pass without an Enzyme build.
…mhd_model

ideal_mhd_model.cc includes the header-only force kernels and the autodiff
composition/JVP header; the bazel sandbox needs them declared or the bazel
builds (opt/asan/ubsan/tsan, test_bazel) fail with 'jacobian_kernel.h: No such
file'. Add them via glob so each branch picks up whatever exists on it. The
Enzyme JVP .cc stays CMake-only (guarded by VMECPP_ENABLE_ENZYME).
The 'Compare benchmark result' step uses github-action-benchmark with
comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from
forks -> 'Resource not accessible by integration'. Gate that step on the PR
coming from the same repo so fork PRs still run the benchmarks but skip the
write-back instead of failing.
The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under
the numpy 2.x that the test env now resolves, importing it dies in the
f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension
must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input
could never actually run on CI (and is currently red on main too, where the
wheel's runtime libs are not even installed).

Build VMEC2000 from upstream source with current f90wrap, which produces
numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI
(hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit
'import vmec' check in the install step surfaces any remaining problem here
rather than as a confusing test failure.
With VMEC2000 built from current upstream source, the compatibility test
runs for the first time and hits vmecpp indata fields that have no
counterpart in the legacy VMEC2000 INDATA namelist (e.g.
free_boundary_method), which raised AttributeError. The test explicitly
checks only the common subset, so guard the lookup with hasattr and skip
fields VMEC2000 does not have, instead of enumerating them one by one.
…mit pin

Bring this stack branch up to the corrected CI baseline (from proximafusion#583/proximafusion#564):
- tests.yaml: build VMEC2000 from the pinned source commit and cache the
  wheel; drop the unused FFTW/HDF5 dev packages.
- benchmarks.yaml: skip the result upload on fork PRs (read-only token).
- test_simsopt_compat.py: skip vmecpp-only INDATA fields.
- CMakeLists: pin abseil to the 20260107.1 commit hash, not the tag.
…hmark fork guard (proximafusion#564)

* build: bump CMake abseil pin to 20260107.1 for Clang >= 21

The CMake FetchContent abseil pin (2024-08) fails to compile under
Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the
numbers.cc nullability annotations are rejected by the newer frontend.
Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8
and GCC. Clang is the compiler required for the Enzyme autodiff build.

The Bazel build keeps its own (BCR) abseil pin and is unaffected.

* ci: skip benchmark result upload on fork PRs (token is read-only)

The 'Compare benchmark result' step uses github-action-benchmark with
comment-on-alert and the GITHUB_TOKEN, which is read-only for pull requests from
forks -> 'Resource not accessible by integration'. Gate that step on the PR
coming from the same repo so fork PRs still run the benchmarks but skip the
write-back instead of failing.

* ci: build VMEC2000 from source so the compat test runs on numpy 2

The pinned vmec-0.0.6 cp310 wheel was f90wrapped against numpy 1.x. Under
the numpy 2.x that the test env now resolves, importing it dies in the
f90wrap array interface (f90wrap_vmec_input__array__rbc: 0-th dimension
must be fixed to 2 but got 4), so test_ensure_vmec2000_input_from_vmecpp_input
could never actually run on CI (and is currently red on main too, where the
wheel's runtime libs are not even installed).

Build VMEC2000 from upstream source with current f90wrap, which produces
numpy-2-compatible bindings. The recipe mirrors SIMSOPT's own CI
(hiddenSymmetries/VMEC2000, cmake/machines/ubuntu.json). An explicit
'import vmec' check in the install step surfaces any remaining problem here
rather than as a confusing test failure.

* test: skip vmecpp-only indata fields in the VMEC2000 compat subset

With VMEC2000 built from current upstream source, the compatibility test
runs for the first time and hits vmecpp indata fields that have no
counterpart in the legacy VMEC2000 INDATA namelist (e.g.
free_boundary_method), which raised AttributeError. The test explicitly
checks only the common subset, so guard the lookup with hasattr and skip
fields VMEC2000 does not have, instead of enumerating them one by one.

* build: pin abseil to the 20260107.1 commit hash

Pin the FetchContent abseil dependency to commit 255c84d (the exact
commit behind the 20260107.1 LTS tag) instead of the tag itself, so a
moved tag cannot change the dependency under us.

* ci: cache and pin the VMEC2000-from-source build

Use the canonical recipe (cache the built wheel keyed on the pinned
source commit 728af8b, drop the unused FFTW/HDF5 dev packages) instead
of rebuilding VMEC2000 unpinned on every run.
…oduct

Add the transpose H^T of the exact Hessian-vector product. VMEC's internal
force is a scaled gradient, so H = dF/dx is non-symmetric and the implicit-
function adjoint of a general objective needs H^T, not H.

- exact_force_vjp.{h,cc}: reverse-mode (Enzyme __enzyme_autodiff) VJP of the
  local force-density kernel, the transpose of exact_force_jvp.
- applyExactForceJacobianTranspose plus the 2D/3D transposed spectral
  transforms (dft_*Transpose_*): the linear transforms are reused as adjoints
  with the poloidal integration weight and constraint-coordinate parity handled
  exactly; the nonlinear kernel uses the reverse-mode VJP.
- extrapolateTowardsAxisTranspose; makeLocalForceComposition (dedup of the
  force-composition descriptor shared by the forward/reverse passes).
- pybind: exact_hessian_vector_product_transpose, plus kernel JVP/VJP
  diagnostics.

Validated against the assembled Hessian to machine precision: <J_g x, y> =
<x, J_g^T y> to 3e-15, and H^T w vs H.T w to 3e-16 (2D solovev) and 5e-16
(3D cth_like).
…reverse adjoint

The previous adjoint solved H_II lambda = dJ/dx_I assuming H symmetric, which is
wrong for a general objective (the energy only passed because dJ/dx_I ~ 0 at
equilibrium). Replace it with two finite-difference-free, exact forms driven by
the autodiff Hessian-vector product:

- forward_boundary_gradient: one Hessian solve per boundary DOF.
- adjoint_boundary_gradient: one solve total via the transpose
  exact_hessian_vector_product_transpose, with state-independent structural-null
  deflation (structural_nullfree_interior) and the symmetric preconditioner.

boundary_gradient now routes through the forward sensitivities. Validated vs the
re-solve finite-difference reference (solovev 5e-6) and cross-checked
forward vs reverse (cth_like 9e-6).
…Hessian

H^T w == H.T w to 1e-10, and the O(1) reverse adjoint matches the forward
sensitivities on a non-energy objective. Skipped on non-Enzyme builds.
Detect the augmented Hessian's structural null space with a few random
Hessian-vector probes instead of one per interior DOF: a column/row is zero iff
(H^T v)[i]/(H v)[i] vanishes for random v, so O(n_probe) HVPs find every
structural zero. Removes the previous O(n_interior) probe, which dominated the
adjoint at high resolution. Detection: 0.14s at cth_like ns=51.
ideal_mhd_model.cc includes exact_force_vjp.h unconditionally (the use is guarded
by VMECPP_ENABLE_ENZYME, the include is not), but only exact_force_jvp.h was in
the bazel hdrs glob, so the sandboxed bazel build (sanitizer jobs) could not find
the header. Add it alongside exact_force_jvp.h.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant