Distillation is used to describe the process of transferring performances from a large trained teacher neural network to a untrained student network.
Instead of training the target network to score best according the task’s loss function, distillation optimizes for the target network to match the output distribution or neuron activation patterns of the teacher network.
A review: (Beyer et al. 2021).
- Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov. . "Knowledge Distillation: A Good Teacher Is Patient and Consistent". Arxiv:2106.05237 [cs]. http://arxiv.org/abs/2106.05237.