tags
Optimization, Algorithm

## Fixed learning rate

The simplest way to apply the gradient descent algorithm on a function $$g$$ convex and $L-$smooth on $$\mathbb{R}^d$$ is to use the parameter update:

$\theta_t = \theta_{t-1} - \gamma g'(\theta_{t-1})$

Another common way to apply the gradient descent algorithm with non fixed learning rate is to use line search at each step. This corresponds to searching for the best $$\gamma$$ to use in $$\theta_{} + \gamma \Delta\theta$$ (in gradient descent, $$\Delta\theta = - g'(\theta)$$).

For exact line search, we take $$\gamma = \arg\min_{s\geq 0} g(\theta + s \Delta \theta)$$. This method is used when the cost of minimizing the function above is low compared to computing the search direction $$\Delta\theta$$.

This is an algorithm for inexact line search. It depends on two parameters $$0 < \alpha < 0.5$$ and $$0 < \beta < 1$$. It is composed of one step which is run until the following condition is verified

$g(\theta + t \Delta \theta)> g(\theta) + \alpha t \nabla g(\theta)^T \Delta\theta$

At each step, $$t \leftarrow \beta t$$, $$t$$ is initialized to 1.

← Back to Notes