Gradient descent

tags
Optimization, Algorithm

Fixed learning rate

The simplest way to apply the gradient descent algorithm on a function \(g\) convex and $L-$smooth on \(\mathbb{R}^d\) is to use the parameter update:

\[ \theta_t = \theta_{t-1} - \gamma g'(\theta_{t-1}) \]

Another common way to apply the gradient descent algorithm with non fixed learning rate is to use line search at each step. This corresponds to searching for the best \(\gamma\) to use in \(\theta_{} + \gamma \Delta\theta\) (in gradient descent, \(\Delta\theta = - g'(\theta)\)).

For exact line search, we take \(\gamma = \arg\min_{s\geq 0} g(\theta + s \Delta \theta)\). This method is used when the cost of minimizing the function above is low compared to computing the search direction \(\Delta\theta\).

This is an algorithm for inexact line search. It depends on two parameters \(0 < \alpha < 0.5\) and \(0 < \beta < 1\). It is composed of one step which is run until the following condition is verified

\[ g(\theta + t \Delta \theta)> g(\theta) + \alpha t \nabla g(\theta)^T \Delta\theta \]

At each step, \(t \leftarrow \beta t\), \(t\) is initialized to 1.


← Back to Notes