Skip to content

Change the stopping criterion#281

Draft
MaxenceGollier wants to merge 3 commits intoJuliaSmoothOptimizers:masterfrom
MaxenceGollier:switch_stopping_criterion
Draft

Change the stopping criterion#281
MaxenceGollier wants to merge 3 commits intoJuliaSmoothOptimizers:masterfrom
MaxenceGollier:switch_stopping_criterion

Conversation

@MaxenceGollier
Copy link
Collaborator

@MaxenceGollier MaxenceGollier commented Jan 28, 2026

Switch from $$√\xi/\nu$$ to $$‖s_{cp}‖/\nu$$, following multiple discussions, I think that this will really improve the robustness of the solvers.

I am not sure what I should do for R2DH and TRDH;
We compute

spectral_test ? prox!(s, ψ, mν∇fk, ν₁) : iprox!(s, ψ, ∇fk, dkσk)

So if the spectral_test is false, what should be the measure ? $$‖s‖/‖dkσk‖_{\infty}$$ ?

I will also make the changements for LM and LMTR. We can compare in terms of number of iteration we the other CI runs.

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.46%. Comparing base (e0f214d) to head (252da43).
⚠️ Report is 265 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #281       +/-   ##
===========================================
+ Coverage   61.53%   84.46%   +22.93%     
===========================================
  Files          11       13        +2     
  Lines        1292     1590      +298     
===========================================
+ Hits          795     1343      +548     
+ Misses        497      247      -250     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MaxenceGollier
Copy link
Collaborator Author

At least on the CIs, it looks like this does not really change the number of iterations, but we will definitely win in terms of robustness.

@dpo
Copy link
Member

dpo commented Jan 29, 2026

In TRDH, if we assume that $D_k \succ 0$ and that $h = 0$, the unconstrained step is $s_k = -D_k^{-1} \nabla f_k$. If we also assume that that step lies inside the trust region, we have $\nabla f_k = -D_k s_k$, and thus, $\Vert \nabla f_k \Vert = \Vert D_k s_k \Vert \leq \Vert d_k \Vert_{\infty} \Vert s_k \Vert \leq \Vert s_k \Vert / \nu_k$. The inequalities are equalities if $D_k$ is a multiple of the identity. So I would use $\Vert s_k \Vert / \nu_k$.

- `sub_kwargs::NamedTuple = NamedTuple()`: a named tuple containing the keyword arguments to be sent to the subsolver. The solver will fail if invalid keyword arguments are provided to the subsolver. For example, if the subsolver is `R2Solver`, you can pass `sub_kwargs = (max_iter = 100, σmin = 1e-6,)`.

The algorithm stops either when `√(ξₖ/νₖ) < atol + rtol*√(ξ₀/ν₀) ` or `ξₖ < 0` and `√(-ξₖ/νₖ) < neg_tol` where ξₖ := f(xₖ) + h(xₖ) - φ(sₖ; xₖ) - ψ(sₖ; xₖ), and √(ξₖ/νₖ) is a stationarity measure.
The algorithm stops when `‖sᶜᵖ‖/ν < atol + rtol*‖s₀ᶜᵖ‖/ν ` where sᶜᵖ ∈ argminₛ f(xₖ) + ∇f(xₖ)ᵀs + ψ(s; xₖ) ½ ν⁻¹ ‖s‖².
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep both stopping conditions. It's not always true that one measure is smaller than the other.

Copy link
Collaborator Author

@MaxenceGollier MaxenceGollier Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two stopping conditions in one solver is odd in my opinion.
I could add a keyword argument to switch between these if we really want to keep it.

Numerically, I strongly believe that taking the square root of something that can suffer from catastrophic cancellation and/or be very small is a bad idea that makes the solvers not robust. Please take a look at my comment below where I see a 25% increase in problems I can solve with this PR.
I think neg_tol is also a weird argument to have and that it should be removed from all solvers.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

In TRDH, if we assume that D k ≻ 0 and that h = 0 , the unconstrained step is s k = − D k − 1 ∇ f k . If we also assume that that step lies inside the trust region, we have ∇ f k = − D k s k , and thus, ‖ ∇ f k ‖ = ‖ D k s k ‖ ≤ ‖ d k ‖ ∞ ‖ s k ‖ ≤ ‖ s k ‖ / ν k . The inequalities are equalities if D k is a multiple of the identity. So I would use ‖ s k ‖ / ν k .

Same argument applies on the implementation of R2DH, we have $\nabla f_k = -(D_k + \sigma_k I) s_k$, and thus, $\Vert \nabla f_k \Vert = \Vert(D_k + \sigma_k I)s_k \Vert \leq (\Vert d_k \Vert_{\infty} + \sigma_k) \Vert s_k \Vert \leq \Vert d_k \sigma_k \Vert_{\infty}\Vert s_k \Vert$.

So I suggest to use rather $\Vert d_k \sigma_k \Vert_{\infty}\Vert s_k \Vert$

@MaxenceGollier
Copy link
Collaborator Author

Same argument applies on the implementation of R2DH, we have ∇ f k = − ( D k + σ k I ) s k , and thus, ‖ ∇ f k ‖ = ‖ ( D k + σ k I ) s k ‖ ≤ ( ‖ d k ‖ ∞ + σ k ) ‖ s k ‖ ≤ ‖ d k σ k ‖ ∞ ‖ s k ‖ .

So I suggest to use rather ‖ d k σ k ‖ ∞ ‖ s k ‖

I don't see how $$\lVert d_k \rVert_{\infty} + \sigma_k \leq \lVert d_k \sigma_k \rVert_{\infty}$$, am I missing something ?

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

MohamedLaghdafHABIBOULLAH commented Jan 30, 2026

Ok let me correct myself, we have $\Vert \nabla f_k \Vert = \Vert(D_k + \sigma_k I)s_k \Vert \leq \Vert d_k \sigma_k \Vert_{\infty}\Vert s_k \Vert$.
By definition of $d_k \sigma_k$ as $D_k + \sigma_k I$.

@MaxenceGollier
Copy link
Collaborator Author

MaxenceGollier commented Feb 3, 2026

At least on the CIs, it looks like this does not really change the number of iterations, but we will definitely win in terms of robustness.

To prove my point, see the benchmark results in my penalty solver:
MaxenceGollier/ExactPenalty.jl#48

In particular the line Hessian model = σI; Tolerance = 1e-9. This is the line where I use R2 as a subsolver; I am seeing a 25% increase in numbers of problems I can solve.

The other lines do not see such improvements because for Tolerance = 1e-3, the problem of negative ξ does not occur too often and the lines with R2N as a subsolver still fail due to the negative ξ (I only removed the negative ξ1 issue in R2N).

Can you please take a look @dpo ?

@MaxenceGollier
Copy link
Collaborator Author

MaxenceGollier commented Feb 3, 2026

Ok let me correct myself, we have ‖ ∇ f k ‖ = ‖ ( D k + σ k I ) s k ‖ ≤ ‖ d k σ k ‖ ∞ ‖ s k ‖ . By definition of d k σ k as D k + σ k I .

To keep things similar, can't we use the last inequality from this comment:

In TRDH, if we assume that D k ≻ 0 and that h = 0 , the unconstrained step is s k = − D k − 1 ∇ f k . If we also assume that that step lies inside the trust region, we have ∇ f k = − D k s k , and thus, ‖ ∇ f k ‖ = ‖ D k s k ‖ ≤ ‖ d k ‖ ∞ ‖ s k ‖ ≤ ‖ s k ‖ / ν k . The inequalities are equalities if D k is a multiple of the identity. So I would use ‖ s k ‖ / ν k .

to have a measure based on ν instead of d_k σ_k in R2DH?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants