Learning rate

Training a Deep Neural Network (DNN) is a difficult global optimization problem. Learning Rate (LR) is a crucial hyper-parameter to tune when training DNNs. A very small learning rate can lead to very slow training, while a very large learning rate can hinder convergence as the loss function fluctuates around the minimum, or even diverges. Details and great visualizations are presented here.

Smith discovered a new method for setting learning rate, named Cyclical Learning Rates (CLRs). Instead of using a fixed, or a decreasing learning rate, the CLR method allows learning rate to continuously oscillate between reasonable minimum and maximum bounds.

One CLR cycle consists of two steps; one in which the learning rate increases and one in which it decreases. Each step has a size (called stepsize), which is the number of iterations (e.g. 1k, 5k, etc) where learning rate increases or decreases. Two steps form a cycle.

https://twitter.com/radekosmulski/status/1574413519235084289

The weight decay can be set when using One Cycle Policy, as described and shown here.