-
Notifications
You must be signed in to change notification settings - Fork 40
Allow combination of ESS and automatic scaling #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Generalize spectral scale creation beyond OmnigeousField to handle any Optimizable objects. Equilibrium instances create exponential scales, while non-Equilibrium instances default to ones. Results are concatenated to preserve proper ordering.
Ensure x_scale has full state vector size (eq.dim_x) for compatibility with all problem types, including those with linear constraints or ProximalProjection objectives. Assign scaling selectively using x_idx keys (e.g., Rb_lmn), preserving generality across optimization scenarios beyond fixed-boundary cases.
…n objectives Address dimensional mismatch when x_scale is used with objectives like ProximalProjection by ensuring x_scale is initialized with eq.dim_x and then projected appropriately. This resolves issues where the reduced optimization space omits excluded parameters (e.g., R_lmn, Z_lmn, L_lmn) and avoids errors during eq.solve(x_scale='ess'). Also improved docstring to clarify size and ordering requirements for custom x_scale inputs.
…alSurface - Add validation to prevent ESS in eq.solve() (optimization only) - Improve documentation on ESS parameters and defaults - Fix ProximalProjection x_scale handling for multiple things - Add basic tests for ESS scaling scenarios
| maxiter : int | ||
| Maximum number of solver steps. | ||
| x_scale : array, list[dict | ``'ess'``], ``'ess'`` or ``'auto'``, optional | ||
| x_scale : array, list[dict | ``'ess'``, ``'auto'``], ``'ess'`` or ``'auto'`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example use to use auto for just lambda:
x_scale = get_ess_scale(eq)
x_scale["L_lmn"] = np.zeros_like(x_scale["L_lmn"])
Can we write down the possible branches of how the xscale can be applied right now in master code before this PR is applied? (and with/without constraints on the optimization, if no linear constraints we get no D matrix bc we never project/recover, an option here would be to always make the LinearConstraintProjection but have it be identity / D for project if there are no linear constraints passed in)
Different problems if auto/ess are mixed for the optimizer xscale (some for auto, some for ess). Need to figure out good solution for this case, and figure out if linear constraint projections are still working correctly in this case How do we want this to change with this PR? |
|
Proposed solution:
|
| options.setdefault("initial_trust_radius", 1e-3) | ||
| options.setdefault("max_trust_radius", 1.0) | ||
| elif options.get("initial_trust_radius", "scipy") == "scipy": | ||
| if options.get("initial_trust_radius", "scipy") == "scipy": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have these here and not in the optimization function itself? I remember getting confused because this default is different than the one set inside the function. If there is no specific reason, I would vote for having these defaults set in the same place.
| scl = tng._get_ess_scale(ess_alpha, ess_order, ess_min_value, ess_default) | ||
| all_scales.append(scl) | ||
| elif isinstance(xsc, str) and xsc == "auto": | ||
| scl = tree_map(jnp.zeros_like, tng.params_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether #2041 is merged before or after this, we need to add special logic to SGD type optimizers to deal with these 0 values. SGD only looks at the gradient and norm scaling doesn't work there.
| return 1 / scale_inv, scale_inv | ||
| scale = 1 / scale_inv | ||
| if user_scale is not None: | ||
| scale = jnp.where(user_scale == 0, scale, user_scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this will work as intended. The reason Jacobian column scaling works (to some extent) is when you look at all the norms, their relative relation gives a sense of scaling. If we mix the column norms with the random scaling given by the user, we lose this relationship. For example, let's say norms are [1000, 2000, 1000] and user given x_scale is [1, 0, 1], this will result in [1, 2000, 1]. I don't think is will good results in most cases due to this inconsistent scaling.
Maybe we can first normalize the norms based on the maximum of the user given x_scale, then use that? Something like:
if user_scale is not None:
user_scale_max = jnp.max(user_scale)
user_scale_max_id = jnp.where(user_scale == user_scale_max)[0][0]
scale_max_at_id = scale[user_scale_max_id]
scale = scale * user_scale_max / scale_max_at_id
scale = jnp.where(user_scale == 0, scale, user_scale)This first normalizes the scales such that the norm of the column corresponding to maximum user scale, is equal to maximum user scale, this removes the order of magnitude difference of the user scale and Jacobian column norm. We can find a better solution probably.
| return 1 / scale_inv, scale_inv | ||
| scale = 1 / scale_inv | ||
| if user_scale is not None: | ||
| scale = jnp.where(user_scale == 0, scale, user_scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
This allows you to use both ESS and automatic jacobian scaling in the same problem, by using different scales for different objects/variables. Note this still does not allow using automatic scaling "on top of" ESS, since in that case I think the automatic scaling would undo the effect of ESS.
Basically, any element of
x_scaleset to 0 means use the automatic jacobian scaling, and nowx_scale=="auto"basically just setsx_scale=np.zeros()A few questions/concerns:
R_lmnbut "auto" forRb_lmnand have aBoundaryRSelfConsistencyconstraint, the results may be somewhat undefined