SPS - Stochastic Polyak Step-size [paper]
Fast convergence with SPS optimizer. The first efficient stochastic variant of the classical Polyak step-size for SGD
pip install git+https://github.com/IssamLaradji/sps.git
Use Sps in your code by adding the following script.
import sps
opt = sps.Sps(model.parameters())
for epoch in range(100):
for X, y in loader:
# create loss closure
def closure():
loss = torch.nn.MSELoss()(model(X), y)
loss.backward()
return loss
# update parameters
opt.zero_grad()
opt.step(closure=closure)python trainval.py -e kernel,
-sb [Directory where the experiments are saved]
-d [Directory where the datasets are saved]
-r [Flag for whether to save the experiments]
-v [Flag for visualizing the results in results/plots]Example:
python trainval.py -e mnist -sb results -d data -r 1
@inproceedings{loizou2021stochastic,
title={Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence},
author={Loizou, Nicolas and Vaswani, Sharan and Laradji, Issam Hadj and Lacoste-Julien, Simon},
booktitle={International conference on artificial intelligence and statistics},
pages={1306--1314},
year={2021},
organization={PMLR}
}
It is a collaborative work between labs at MILA, Element AI, and UBC.
Check out these other line search optimizers: [sls], [AdaSls]
- Thank you Less Wright for incorporating the gradient centralization method, it seems to improve the results in some experiments.