- Machine learning
Continual learning is a type of supervised learning where there is no “testing phase” associated to a decision process. Instead, training samples keep being processed by the algorithm which has to simultaneously make predictions and keep learning.
This is challenging for a fixed neural network architecture since it has a fixed capacity and is bound to either forget things or be unable to learn anything new.
A definition from the survey (De Lange et al. 2020):
The General Continual Learning setting considers an infinite stream of training data where at each time step, the system receives a (number of) new sample(s) drawn non i.i.d from a current distribution that could itself experience sudden of gradual changes.
Examples of continual learning systems
- Never Ending Language Learner (NELL) (Carlson, Betteridge, and Kisiel 2010)
Computer vision based benchmarks
Split MNIST: the MNIST dataset is split into 5 2-classes tasks.
Split CIFAR10: the CIFAR10 dataset is split into 5 2-classes tasks.
Split mini-ImageNet: a mini ImageNet (100 classes) task split into 20 5-classes tasks.
Continual Transfer Learning Benchmark: A benchmark from Facebook AI, built from 7 computer vision datasets: MNIST, CIFAR10, CIFAR100, DTD, SVHN, Rainbow-MNIST, Fashion MNIST. The tasks are all 5-classes or 10-classes classification tasks. Some example task sequence constructions from (Veniat, Denoyer, and Ranzato 2021):
The last task of \(S_out\) consists of a shuffling of the output labels of the first task. The last task of \(S_in\) is the same as its first task except that MNIST images have a different background color. \(S_long\) has 100 tasks, and it is constructed by first sampling a dataset, then 5 classes at random, and finally the amount of training data from a distribution that favors small tasks by the end of the learning experience.
Permuted MNIST: here for each different task the pixels of the MNIST digits are permuted, generating a new task of equal difficulty as the original one but different solution. This task is not suitable if the model has some spatial prior (like a CNN). Used first in (Kirkpatrick et al. 2017).
Rotated MNIST: each task contains digits rotated by a fixed angle between 0 and 180 degrees.
- Carlson, Andrew, Justin Betteridge, and Bryan Kisiel. 2010. "Toward an Architecture for Never-Ending Language Learning.". In Proceedings of the Conference on Artificial Intelligence (AAAI) (2010), 1306–13. DOI.
- De Lange, Matthias, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. May 26, 2020. “A Continual Learning Survey: Defying Forgetting in Classification Tasks”. arXiv:1909.08383 [Cs, Stat]. http://arxiv.org/abs/1909.08383.
- Kirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, et al.. January 25, 2017. “Overcoming Catastrophic Forgetting in Neural Networks”. arXiv:1612.00796 [Cs, Stat]. http://arxiv.org/abs/1612.00796.
- Veniat, Tom, Ludovic Denoyer, and Marc’Aurelio Ranzato. February 12, 2021. “Efficient Continual Learning with Modular Networks and Task-Driven Priors”. arXiv:2012.12631 [Cs]. http://arxiv.org/abs/2012.12631.