- tags
- Optimization, Algorithm

## Fixed learning rate

The simplest way to apply the gradient descent algorithm on a function \(g\) convex and $L-$smooth on \(\mathbb{R}^d\) is to use the parameter update:

\[ \theta_t = \theta_{t-1} - \gamma g'(\theta_{t-1}) \]

## Line search

Another common way to apply the gradient descent algorithm with non fixed learning rate is to use line search at each step. This corresponds to searching for the best \(\gamma\) to use in \(\theta_{} + \gamma \Delta\theta\) (in gradient descent, \(\Delta\theta = - g'(\theta)\)).

### Exact line search

For exact line search, we take \(\gamma = \arg\min_{s\geq 0} g(\theta + s \Delta \theta)\). This method is used when the cost of minimizing the function above is low compared to computing the search direction \(\Delta\theta\).

### Backtracking line search

This is an algorithm for inexact line search. It depends on two parameters \(0 < \alpha < 0.5\) and \(0 < \beta < 1\). It is composed of one step which is run until the following condition is verified

\[ g(\theta + t \Delta \theta)> g(\theta) + \alpha t \nabla g(\theta)^T \Delta\theta \]

At each step, \(t \leftarrow \beta t\), \(t\) is initialized to 1.

## Backlinks

- Genetic algorithms
- Supervised learning
- Notes on: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data by Bender, E. M., & Koller, A. (2020)
- Notes on: Network Deconvolution by Ye, C., Evanusa, M., He, H., Mitrokhin, A., Goldstein, T., Yorke, J. A., Fermuller, Cornelia, … (2020)