Detailed model training control and batch modelling by nikitakuklev · Pull Request #365 · xopt-org/Xopt

nikitakuklev · 2025-09-29T20:16:22Z

This PR introduces batch GP models and finer control over model training. The former can be useful for scalarized objectives, while latter is necessary to speed up BO in operational contexts. As recently demonstrated at a NAPAC25 talk, one can significantly relax fitting tolerances to meet real-time requirements without impact on convergence, especially with scalarized objectives. There is physical motivation - we cannot set the physical devices precisely enough for exact fitting to matter.

Changes:

New model constructor parameters to control training (that will be passed to the optimizer) + Pydantic classes wrapping key Adam/LBFGS knobs
New option for using adam/torch as optimizer, with appropriate defaults
New batched GP model constructor. It is not used by default for now, but once stuff like visualizer supports it, we should probably make it so for large datasets.
Complete rework of the benchmarking scripts, moving them into resources for easier import. It is now possible to run profiling and benchmarking for snippets with either fixed run count or fixed time budget.

Caveats:

Batched model hyperparameter do not match list model exactly, except if data is identical on all outputs. This is an issue somewhere in botorch, since bare gpytorch works as expected.
Batched model is slower on small problems on GPU, and sometimes on CPU. Several variables determine crossover point, such as switch from cholesky to CG-based solvers. As a rough guideline, n <= 100. For large problems, there are significant gains on GPU which makes batch modelling worthwhile.
Default behavior with cached hyperparameters has changed - model is now trained in all cases. With cached hyperparameters, this results in fine tuning of previous state. To disable this, set train_model=False.

Benchmarks for n_vars=12, n_obj=5, n_constr=2, n=500
CPU:

	f	n	t_avg	t_med	t_max	t_min	t_tot	std
0	bench_build_standard	10	1.266	1.220	1.684	1.214	12.660	0.139
1	bench_build_batched	10	1.563	1.561	1.580	1.553	15.630	0.009
2	bench_build_standard_adam	10	2.689	2.607	3.407	2.572	26.892	0.241
3	bench_build_batched_adam	10	2.977	2.953	3.051	2.943	29.772	0.040
4	bench_build_standard_gpytorch	10	15.057	15.042	15.415	14.909	150.569	0.144
5	bench_build_batched_gpytorch	10	14.926	14.867	15.100	14.820	149.261	0.100

GPU (RTX 3070, H100 todo):

	f	n	t_avg	t_med	t_max	t_min	t_tot	std
0	bench_build_standard	10	0.924	0.858	1.493	0.844	9.238	0.190
1	bench_build_batched	10	0.726	0.716	0.826	0.666	7.258	0.041
2	bench_build_standard_adam	10	1.918	1.884	2.505	1.690	19.177	0.218
3	bench_build_batched_adam	10	1.148	1.136	1.314	1.063	11.475	0.087
4	bench_build_standard_gpytorch	10	7.773	7.762	7.890	7.706	77.725	0.061
5	bench_build_batched_gpytorch	10	5.884	5.861	6.171	5.746	58.838	0.116

To reproduce:
python bench_runner.py bench_build_standard bench_build_batched bench_build_standard_adam bench_build_batched_adam bench_build_standard_gpytorch bench_build_batched_gpytorch -n 10 -device cpu

roussel-ryan · 2025-10-01T15:01:20Z

Looking good! LMK when this is ready for review and we can have a short discussion to go over it

codecov · 2026-02-16T14:30:47Z

Codecov Report

❌ Patch coverage is 89.80769% with 53 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ests/generators/bayesian/test_model_constructor.py	91.92%	17 Missing and 9 partials ⚠️
xopt/generators/bayesian/models/standard.py	81.61%	18 Missing and 7 partials ⚠️
xopt/generators/bayesian/utils.py	92.00%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

nikitakuklev · 2026-02-17T01:09:28Z

Some images for the record. Aiming to finish tomorrow.

@roussel-ryan do we have agreement on just using pydantic objects for LBFGS and other optimizers with direct 1:1 option translation?

roussel-ryan · 2026-02-17T19:56:51Z

Hey Nikita, one thought on this is if we could unify this work with the NumericalOptimizer classes provided by Xopt down the road. I think either the direction you propose here with pydantic objects or synchronizing with the NumericalOptimizer classes would work. Merging the two could happen in a future PR since I think you want to get this in before the Xopt 3.0 release?

nikitakuklev · 2026-02-18T19:10:47Z

Yes, I agree on moving towards shared API/object. There are some issues with how botorch uses different lbfgs algorithms in model vs acqf parts - current NumericalOptimizer object will not work. This is why I focused on just wrapping scipy/pytorch options into config pydantic models for now. Definitely a 3.0+ thing.

roussel-ryan · 2026-02-18T19:18:39Z

Sounds good to me. I'll review this in a few hours. Would you be able to handle updating the v3.0 branch with these changes once merged? I don't want to make a mistake with fixing merge conflicts

nikitakuklev · 2026-02-18T19:55:18Z

Yes, that is ok, I'll rebase.

nikitakuklev · 2026-02-18T20:06:47Z

In fact, for 3.0 I'll integrate with #337 and merge configs. In 2.x let's avoid major save-breaking changes.

roussel-ryan

Looks pretty good to me. I've added a couple of asks for more documentation. I'm also wondering if it would be possible to add an example notebook that shows the difference between doing hyperparameter training / acq optimization with stronger / weaker convergence criteria to show the trade-off between speed and precision?

nikitakuklev · 2026-02-18T23:25:05Z

ready

nikitakuklev force-pushed the model_training branch from 4d8076a to 745516e Compare September 29, 2025 22:17

nikitakuklev added the enhancement New feature or request label Oct 13, 2025

nikitakuklev force-pushed the model_training branch from 5f75338 to a76ae75 Compare February 6, 2026 21:57

nikitakuklev added 26 commits February 16, 2026 05:37

Add model training kwargs

cc5b564

Add batched subclass

9a8e136

Add batch support for simple mean config

cbb2783

Add comparison test

19963a5

Benchmarking helper class

293694f

Add standard vs batched benchmark

fa3a685

Fix model training kwargs

5bdc436

Benchmark fixes

bcc6ea2

Switch test to variable vocs params

51ad5e3

Add batch option to model utils

cc523e6

Add profiler to dev deps

f4aec6b

New benchmarking code

20af99f

Add gpytorch core bench functions

7083636

Improve bench reporting

a478885

Make tests vocs-adaptive

0c11f6b

Support multiple functions per run

2d6494e

Add pydantic objects for numopt

7c84890

Test fixes

c1da030

Add generator tests

1192a1d

Add generator benchmarklets

94f09d6

fix test func

a8277e9

Support time override in benchmarks

8774b87

Document pydantic params

13fa0d8

Add explicit gradient/loss comparison test

f12ebc3

Add test with custom params

b21d3ee

fmt

6a878ae

nikitakuklev added 2 commits February 16, 2026 05:37

Use explicit kwargs

a49e3e8

Batched model notebook

ae8f86e

nikitakuklev force-pushed the model_training branch from a76ae75 to ae8f86e Compare February 16, 2026 11:37

nikitakuklev added 6 commits February 16, 2026 05:53

fix pyproject

d9f7bc0

add cast

ecb829c

formatting

6ccac23

adjust to match main

58ee2d2

guard batched notebook behind GPU test

2ae0570

adjust test data size

3808a45

nikitakuklev added 6 commits February 18, 2026 11:24

move cuda guards

153c491

fmt

f8b2f40

remove model testing

789bd32

support heteroskedastic batches

1771914

fix mock

2628161

rename benchmarking tools

c085e2b

nikitakuklev marked this pull request as ready for review February 18, 2026 19:55

add docs

1ff37c1

roussel-ryan reviewed Feb 18, 2026

View reviewed changes

Comment thread xopt/generators/bayesian/models/standard.py

Comment thread xopt/generators/bayesian/utils.py Outdated

nikitakuklev added 2 commits February 18, 2026 17:17

docs

c05a102

adjust batch notebooks

e5258ad

roussel-ryan approved these changes Feb 19, 2026

View reviewed changes

roussel-ryan merged commit 0f4c138 into xopt-org:main Feb 19, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detailed model training control and batch modelling#365

Detailed model training control and batch modelling#365
roussel-ryan merged 44 commits intoxopt-org:mainfrom
nikitakuklev:model_training

nikitakuklev commented Sep 29, 2025 •

edited

Loading

Uh oh!

roussel-ryan commented Oct 1, 2025

Uh oh!

codecov Bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

nikitakuklev commented Feb 17, 2026

Uh oh!

roussel-ryan commented Feb 17, 2026 •

edited

Loading

Uh oh!

nikitakuklev commented Feb 18, 2026

Uh oh!

roussel-ryan commented Feb 18, 2026

Uh oh!

nikitakuklev commented Feb 18, 2026 •

edited

Loading

Uh oh!

nikitakuklev commented Feb 18, 2026

Uh oh!

roussel-ryan left a comment

Uh oh!

Uh oh!

Uh oh!

nikitakuklev commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nikitakuklev commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roussel-ryan commented Oct 1, 2025

Uh oh!

codecov Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nikitakuklev commented Feb 17, 2026

Uh oh!

roussel-ryan commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitakuklev commented Feb 18, 2026

Uh oh!

roussel-ryan commented Feb 18, 2026

Uh oh!

nikitakuklev commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitakuklev commented Feb 18, 2026

Uh oh!

roussel-ryan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nikitakuklev commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikitakuklev commented Sep 29, 2025 •

edited

Loading

codecov Bot commented Feb 16, 2026 •

edited

Loading

roussel-ryan commented Feb 17, 2026 •

edited

Loading

nikitakuklev commented Feb 18, 2026 •

edited

Loading