# Neural tangent kernel

tags
Neural networks

For a neural network trying to minimize a quadratic loss, the gradient flow can be re-written from $\dot{w} = - \nabla L (w(t))$ to $\dot{w} = - \nabla y(w) (y(w) - \bar{y})$

Therefore, the time derivative of $$y$$ is $\dot{y}(w) = \nabla y(w)^T \dot{w} = - \nabla y(w)^T \nabla y(w) (y(w) - \bar{y})$ The NTK is the quantity to the left of the last term: $$\nabla y(w)^T \nabla y(w)$$.

