Machine learning, Neural networks, Transfer learning

Distillation is used to describe the process of transferring performances from a large trained teacher neural network to a untrained student network.

Instead of training the target network to score best according the task’s loss function, distillation optimizes for the target network to match the output distribution or neuron activation patterns of the teacher network.

A review: (Beyer et al. 2021).


  1. . . "Knowledge Distillation: A Good Teacher Is Patient and Consistent". Arxiv:2106.05237 [cs]. http://arxiv.org/abs/2106.05237.
Last changed | authored by


← Back to Notes