The Lottery ticket hypothesis

tags: Neural network training
resources: The AI podcast
papers: (Frankle, Carbin 2018)

When training very large neural networks, the obtained net might have a lot of unused neurons. It is possible, through neural network pruning, to remove a lot of those unused connections to make the overall architecture lighter and faster to run on some hardware.

However, once you have the pruned architecture, it will often not be able to learn anything interesting when it is trained from scratch. The lottery ticket hypothesis is about the reason some of these randomly initialized neurons became more important than others. It is possible that random initialization is actually very important for a neuron to be useful after training, and the lottery ticket hypothesis is about finding this “magic” initialization to use it on small networks.

From (Frankle, Carbin 2018):

The Lottery Ticket Hypothesis. A randomly-initialized, dense neural network contains a subnetwork that is initialized such that — when trained in isolation — it can match the test accuracy of the original network after training for at most the same number of iterations.

It turns out restarting pruned small network training from the exact initialization the initial network started from achieves excellent results. However, this hypothesis holds mostly for small neural networks. In larger networks, the pruned network is usually found more early in training.

Bibliography

Jonathan Frankle, Michael Carbin. March 9, 2018. "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". Arxiv:1803.03635 [cs]. http://arxiv.org/abs/1803.03635.

The Lottery ticket hypothesis

Bibliography

Links to this note

Comments