- tags
- Neural networks
For a neural network trying to minimize a quadratic loss, the gradient flow can be re-written from \[ \dot{w} = - \nabla L (w(t)) \] to \[ \dot{w} = - \nabla y(w) (y(w) - \bar{y}) \]
Therefore, the time derivative of \(y\) is \[ \dot{y}(w) = \nabla y(w)^T \dot{w} = - \nabla y(w)^T \nabla y(w) (y(w) - \bar{y}) \] The NTK is the quantity to the left of the last term: \(\nabla y(w)^T \nabla y(w)\). SIR model