Skip to content

Conversation

@f0uriest
Copy link
Member

@f0uriest f0uriest commented Dec 11, 2025

This allows you to use both ESS and automatic jacobian scaling in the same problem, by using different scales for different objects/variables. Note this still does not allow using automatic scaling "on top of" ESS, since in that case I think the automatic scaling would undo the effect of ESS.

Basically, any element of x_scale set to 0 means use the automatic jacobian scaling, and now x_scale=="auto" basically just sets x_scale=np.zeros()

A few questions/concerns:

  • I'm not totally sure how it works with LinearConstraintProjection if you have a constraint that couples variables with different scaling types. IE, if you used "ess" for R_lmn but "auto" for Rb_lmn and have a BoundaryRSelfConsistency constraint, the results may be somewhat undefined
  • Right now for ESS, we use a default scale of 1 for any extra variables, but we could maybe use a scale of 0 to use auto scaling for non-ess variables?

Chris J and others added 30 commits May 23, 2025 09:40
Generalize spectral scale creation beyond OmnigeousField to handle any
Optimizable objects. Equilibrium instances create exponential scales,
while non-Equilibrium instances default to ones. Results are concatenated
to preserve proper ordering.
Ensure x_scale has full state vector size (eq.dim_x) for compatibility
with all problem types, including those with linear constraints or
ProximalProjection objectives. Assign scaling selectively using
x_idx keys (e.g., Rb_lmn), preserving generality across optimization
scenarios beyond fixed-boundary cases.
…n objectives

Address dimensional mismatch when x_scale is used with objectives like
ProximalProjection by ensuring x_scale is initialized with eq.dim_x and
then projected appropriately. This resolves issues where the reduced
optimization space omits excluded parameters (e.g., R_lmn, Z_lmn, L_lmn)
and avoids errors during eq.solve(x_scale='ess').

Also improved docstring to clarify size and ordering requirements for
custom x_scale inputs.
…alSurface

- Add validation to prevent ESS in eq.solve() (optimization only)
- Improve documentation on ESS parameters and defaults
- Fix ProximalProjection x_scale handling for multiple things
- Add basic tests for ESS scaling scenarios
@f0uriest f0uriest mentioned this pull request Dec 12, 2025
3 tasks
@f0uriest f0uriest changed the base branch from cj/ESS to master December 12, 2025 18:23
@f0uriest f0uriest changed the title Rc/auto scale Allow combination of ESS and automatic scaling Dec 12, 2025
maxiter : int
Maximum number of solver steps.
x_scale : array, list[dict | ``'ess'``], ``'ess'`` or ``'auto'``, optional
x_scale : array, list[dict | ``'ess'``, ``'auto'``], ``'ess'`` or ``'auto'``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example use to use auto for just lambda:

x_scale = get_ess_scale(eq)

x_scale["L_lmn"] = np.zeros_like(x_scale["L_lmn"])

@dpanici
Copy link
Collaborator

dpanici commented Dec 15, 2025

  • Have LinearConstraintProjection apply the ess scale (or whatever fixed, pre-computable scale is passed in)
    • in which case, only have one xscale, not two
  • How to mix ess / auto? if ess for just some variables make sense, what to choose for the others?
    • e.g. equilibrium, I,G should be what? I am in favor of "auto" for these, using jac scaling
  • rename "auto" to "jac" for optimizer xscale option? to make not degenerate with linear constraint projections xscale

Can we write down the possible branches of how the xscale can be applied right now in master code before this PR is applied? (and with/without constraints on the optimization, if no linear constraints we get no D matrix bc we never project/recover, an option here would be to always make the LinearConstraintProjection but have it be identity / D for project if there are no linear constraints passed in)

  • optimizer xscale = "auto" for ALL variables
    • linear constraint projection xscale = "auto" - Fine, is how we already do it, this works fine
    • linear constraint projection xscale = some passed-in array -this also I think works fine, we have to trust user is passing in something sensible
  • optimizer xscale = "ess" for ALL variables
  • linear constraint projection xscale = "auto" - this does nothing in almost all cases, unless coefficient is >1e2
    • this may matter in coil optimization with shape change and current change. what happens in this case rn is that current is not really "scaled" other than making x_reduced order unity assuming initial currents were correctly scaled to expected magnitudes
  • linear constraint projection xscale = some passed-in array

Different problems if auto/ess are mixed for the optimizer xscale (some for auto, some for ess). Need to figure out good solution for this case, and figure out if linear constraint projections are still working correctly in this case

How do we want this to change with this PR?

@ddudt
Copy link
Collaborator

ddudt commented Dec 15, 2025

Proposed solution:

  • User only has a single xscale input option.
  • If xscale="auto" for all variables. Linear constraint projection D matrix scales based on initial values, and the optimizer xscale uses norm of Jacobian columns. (Same as existing defaults.)
  • If xscale="ess" for all variables. ESS scaling is applied in the linear constraint projection D matrix to make optimizer variables order unity, and the optimizer does no additional scaling.
  • If xscale is not the same for all variables. TBD, we need to think of a good solution.

@dpanici dpanici requested review from a team, YigitElma, ddudt, dpanici, rahulgaur104 and unalmis and removed request for a team December 17, 2025 20:11
options.setdefault("initial_trust_radius", 1e-3)
options.setdefault("max_trust_radius", 1.0)
elif options.get("initial_trust_radius", "scipy") == "scipy":
if options.get("initial_trust_radius", "scipy") == "scipy":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have these here and not in the optimization function itself? I remember getting confused because this default is different than the one set inside the function. If there is no specific reason, I would vote for having these defaults set in the same place.

scl = tng._get_ess_scale(ess_alpha, ess_order, ess_min_value, ess_default)
all_scales.append(scl)
elif isinstance(xsc, str) and xsc == "auto":
scl = tree_map(jnp.zeros_like, tng.params_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether #2041 is merged before or after this, we need to add special logic to SGD type optimizers to deal with these 0 values. SGD only looks at the gradient and norm scaling doesn't work there.

return 1 / scale_inv, scale_inv
scale = 1 / scale_inv
if user_scale is not None:
scale = jnp.where(user_scale == 0, scale, user_scale)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this will work as intended. The reason Jacobian column scaling works (to some extent) is when you look at all the norms, their relative relation gives a sense of scaling. If we mix the column norms with the random scaling given by the user, we lose this relationship. For example, let's say norms are [1000, 2000, 1000] and user given x_scale is [1, 0, 1], this will result in [1, 2000, 1]. I don't think is will good results in most cases due to this inconsistent scaling.

Maybe we can first normalize the norms based on the maximum of the user given x_scale, then use that? Something like:

if user_scale is not None:
        user_scale_max = jnp.max(user_scale)
        user_scale_max_id = jnp.where(user_scale == user_scale_max)[0][0]
        scale_max_at_id = scale[user_scale_max_id]
        scale = scale * user_scale_max / scale_max_at_id
        scale = jnp.where(user_scale == 0, scale, user_scale)

This first normalizes the scales such that the norm of the column corresponding to maximum user scale, is equal to maximum user scale, this removes the order of magnitude difference of the user scale and Jacobian column norm. We can find a better solution probably.

return 1 / scale_inv, scale_inv
scale = 1 / scale_inv
if user_scale is not None:
scale = jnp.where(user_scale == 0, scale, user_scale)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@dpanici dpanici requested review from ddudt and dpanici and removed request for ddudt and dpanici December 22, 2025 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants