Skip to content

Detailed model training control and batch modelling#365

Merged
roussel-ryan merged 44 commits intoxopt-org:mainfrom
nikitakuklev:model_training
Feb 19, 2026
Merged

Detailed model training control and batch modelling#365
roussel-ryan merged 44 commits intoxopt-org:mainfrom
nikitakuklev:model_training

Conversation

@nikitakuklev
Copy link
Copy Markdown
Collaborator

@nikitakuklev nikitakuklev commented Sep 29, 2025

This PR introduces batch GP models and finer control over model training. The former can be useful for scalarized objectives, while latter is necessary to speed up BO in operational contexts. As recently demonstrated at a NAPAC25 talk, one can significantly relax fitting tolerances to meet real-time requirements without impact on convergence, especially with scalarized objectives. There is physical motivation - we cannot set the physical devices precisely enough for exact fitting to matter.

Changes:

  • New model constructor parameters to control training (that will be passed to the optimizer) + Pydantic classes wrapping key Adam/LBFGS knobs
  • New option for using adam/torch as optimizer, with appropriate defaults
  • New batched GP model constructor. It is not used by default for now, but once stuff like visualizer supports it, we should probably make it so for large datasets.
  • Complete rework of the benchmarking scripts, moving them into resources for easier import. It is now possible to run profiling and benchmarking for snippets with either fixed run count or fixed time budget.

Caveats:

  • Batched model hyperparameter do not match list model exactly, except if data is identical on all outputs. This is an issue somewhere in botorch, since bare gpytorch works as expected.
  • Batched model is slower on small problems on GPU, and sometimes on CPU. Several variables determine crossover point, such as switch from cholesky to CG-based solvers. As a rough guideline, n <= 100. For large problems, there are significant gains on GPU which makes batch modelling worthwhile.
  • Default behavior with cached hyperparameters has changed - model is now trained in all cases. With cached hyperparameters, this results in fine tuning of previous state. To disable this, set train_model=False.

Benchmarks for n_vars=12, n_obj=5, n_constr=2, n=500
CPU:

f n t_avg t_med t_max t_min t_tot std
0 bench_build_standard 10 1.266 1.220 1.684 1.214 12.660 0.139
1 bench_build_batched 10 1.563 1.561 1.580 1.553 15.630 0.009
2 bench_build_standard_adam 10 2.689 2.607 3.407 2.572 26.892 0.241
3 bench_build_batched_adam 10 2.977 2.953 3.051 2.943 29.772 0.040
4 bench_build_standard_gpytorch 10 15.057 15.042 15.415 14.909 150.569 0.144
5 bench_build_batched_gpytorch 10 14.926 14.867 15.100 14.820 149.261 0.100

GPU (RTX 3070, H100 todo):

f n t_avg t_med t_max t_min t_tot std
0 bench_build_standard 10 0.924 0.858 1.493 0.844 9.238 0.190
1 bench_build_batched 10 0.726 0.716 0.826 0.666 7.258 0.041
2 bench_build_standard_adam 10 1.918 1.884 2.505 1.690 19.177 0.218
3 bench_build_batched_adam 10 1.148 1.136 1.314 1.063 11.475 0.087
4 bench_build_standard_gpytorch 10 7.773 7.762 7.890 7.706 77.725 0.061
5 bench_build_batched_gpytorch 10 5.884 5.861 6.171 5.746 58.838 0.116

To reproduce:
python bench_runner.py bench_build_standard bench_build_batched bench_build_standard_adam bench_build_batched_adam bench_build_standard_gpytorch bench_build_batched_gpytorch -n 10 -device cpu

@roussel-ryan
Copy link
Copy Markdown
Collaborator

Looking good! LMK when this is ready for review and we can have a short discussion to go over it

@nikitakuklev nikitakuklev added the enhancement New feature or request label Oct 13, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 89.80769% with 53 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ests/generators/bayesian/test_model_constructor.py 91.92% 17 Missing and 9 partials ⚠️
xopt/generators/bayesian/models/standard.py 81.61% 18 Missing and 7 partials ⚠️
xopt/generators/bayesian/utils.py 92.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@nikitakuklev
Copy link
Copy Markdown
Collaborator Author

Some images for the record. Aiming to finish tomorrow.

image image

@roussel-ryan do we have agreement on just using pydantic objects for LBFGS and other optimizers with direct 1:1 option translation?

@roussel-ryan
Copy link
Copy Markdown
Collaborator

roussel-ryan commented Feb 17, 2026

Hey Nikita, one thought on this is if we could unify this work with the NumericalOptimizer classes provided by Xopt down the road. I think either the direction you propose here with pydantic objects or synchronizing with the NumericalOptimizer classes would work. Merging the two could happen in a future PR since I think you want to get this in before the Xopt 3.0 release?

@nikitakuklev
Copy link
Copy Markdown
Collaborator Author

Yes, I agree on moving towards shared API/object. There are some issues with how botorch uses different lbfgs algorithms in model vs acqf parts - current NumericalOptimizer object will not work. This is why I focused on just wrapping scipy/pytorch options into config pydantic models for now. Definitely a 3.0+ thing.

@roussel-ryan
Copy link
Copy Markdown
Collaborator

Sounds good to me. I'll review this in a few hours. Would you be able to handle updating the v3.0 branch with these changes once merged? I don't want to make a mistake with fixing merge conflicts

@nikitakuklev
Copy link
Copy Markdown
Collaborator Author

nikitakuklev commented Feb 18, 2026

Yes, that is ok, I'll rebase.

@nikitakuklev nikitakuklev marked this pull request as ready for review February 18, 2026 19:55
@nikitakuklev
Copy link
Copy Markdown
Collaborator Author

In fact, for 3.0 I'll integrate with #337 and merge configs. In 2.x let's avoid major save-breaking changes.

Copy link
Copy Markdown
Collaborator

@roussel-ryan roussel-ryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me. I've added a couple of asks for more documentation. I'm also wondering if it would be possible to add an example notebook that shows the difference between doing hyperparameter training / acq optimization with stronger / weaker convergence criteria to show the trade-off between speed and precision?

Comment thread xopt/generators/bayesian/models/standard.py
Comment thread xopt/generators/bayesian/utils.py Outdated
@nikitakuklev
Copy link
Copy Markdown
Collaborator Author

ready

@roussel-ryan roussel-ryan merged commit 0f4c138 into xopt-org:main Feb 19, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants