 tags
 Machine learning
Continual learning is a type of supervised learning where there is no “testing phase” associated to a decision process. Instead, training samples keep being processed by the algorithm which has to simultaneously make predictions and keep learning.
This is challenging for a fixed neural network architecture since it has a fixed capacity and is bound to either forget things or be unable to learn anything new.
A definition from the survey (De Lange et al. 2020):
The General Continual Learning setting considers an infinite stream of training data where at each time step, the system receives a (number of) new sample(s) drawn non i.i.d from a current distribution that could itself experience sudden or gradual changes.
Theoretical foundations
Concept shift
(Bartlett et al. 1996) explores how to learn under the assumption of concept shift:
The learner sees a sequence of random examples, labelled according to a sequence of functions, and must provide an accurate estimate of the target function sequence.
Formally, a learner sees at time \(t\) a random example \(x_t\) from some domain \(X\). He also sees the value of \(f_t(x_t) \in \{0, 1\}\) where \(f_t\) is an unknown function from some known class \(F\).
The paper addresses two problems of learning with changing concepts:
 Estimation: When can we estimate a sequence \((f_1, \cdots, f_n)\) from observations \(((x_1, f_1(x_1)), \cdots, (x_n, f_n(x_n)))\)?
 Prediction: When can one predict the next concept \(f_{n+1}\) from a sequence of concepts \((f_1, \cdots, f_n)\)?
Formal definitions of different aspects of continual learning
Learning to learn
The paper (Baxter 1998) defines the problem of learning to learn as follows (notations are chosen to contrast with regular supervised learning):
 an input space \(X\) and an output space \(Y\),
 a loss function \(l: Y \times Y \rightarrow \mathbb{R}\),
 an environment \((P, Q)\) where \(P\) is the set of all probability distributions on \(X \times Y\) and \(Q\) is a distribution on \(P\),
 a hypothesis space family \(H = {\mathcal{H}}\) where each \(\mathcal{H} \in H\) is a set of functions \(h: X \rightarrow Y\).
Examples of continual learning systems
 Never Ending Language Learner (NELL) (Carlson et al. 2010)
Benchmarks
Computer vision based benchmarks

Split MNIST: the MNIST dataset is split into 5 2classes tasks (Nguyen et al. 2017; Zenke et al. 2017; Shin et al. 2017).

Split CIFAR10: the CIFAR10 dataset is split into 5 2classes tasks (Krizhevsky, Hinton 2009).

Split miniImageNet: a mini ImageNet (100 classes) task split into 20 5classes tasks.

Continual Transfer Learning Benchmark: A benchmark from Facebook AI, built from 7 computer vision datasets: MNIST, CIFAR10, CIFAR100, DTD, SVHN, RainbowMNIST, Fashion MNIST. The tasks are all 5classes or 10classes classification tasks. Some example task sequence constructions from (Veniat et al. 2021):
The last task of \(S_{out}\) consists of a shuffling of the output labels of the first task. The last task of \(S_{in}\) is the same as its first task except that MNIST images have a different background color. \(S_{long}\) has 100 tasks, and it is constructed by first sampling a dataset, then 5 classes at random, and finally the amount of training data from a distribution that favors small tasks by the end of the learning experience.

Permuted MNIST: here for each different task the pixels of the MNIST digits are permuted, generating a new task of equal difficulty as the original one but different solution. This task is not suitable if the model has some spatial prior (like a CNN). Used first in (Goodfellow et al. 2014; Srivastava et al. 2013). Also in (Kirkpatrick et al. 2017)

Rotated MNIST: each task contains digits rotated by a fixed angle between 0 and 180 degrees.
Bibliography
 Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, Tinne Tuytelaars. . "A Continual Learning Survey: Defying Forgetting in Classification Tasks". Arxiv:1909.08383 [cs, Stat]. http://arxiv.org/abs/1909.08383.
 Peter L. Bartlett, Shai BenDavid, Sanjeev R. Kulkarni. . "Learning Changing Concepts by Exploiting the Structure of Change". In Proceedings of the Ninth Annual Conference on Computational Learning Theory  COLT '96, 131–39. Desenzano del Garda, Italy: ACM Press. DOI.
 Jonathan Baxter. . "Theoretical Models of Learning to Learn". In Learning to Learn, edited by Sebastian Thrun and Lorien Pratt, 71–94. Boston, MA: Springer US. DOI.
 Anastasia Pentina, Shai BenDavid. . "Multitask and Lifelong Learning of Kernels". In Algorithmic Learning Theory, edited by Kamalika Chaudhuri, CLAUDIO GENTILE, and Sandra Zilles, 194–208. Lecture Notes in Computer Science. Cham: Springer International Publishing. DOI.
 Anastasia Pentina, Ruth Urner. . "Lifelong Learning with Weighted Majority Votes". In Advances in Neural Information Processing Systems. Vol. 29. Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2016/hash/f39ae9ff3a81f499230c4126e01f421bAbstract.html.
 Anastasia Pentina, Christoph H Lampert. . "Lifelong Learning with Noni.i.d. Tasks". In Advances in Neural Information Processing Systems. Vol. 28. Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2015/hash/9232fe81225bcaef853ae32870a2b0feAbstract.html.
 MariaFlorina Balcan, Avrim Blum, Vaishnavh Nagarajan. . "Lifelong Learning in Costly Feature Spaces". Theoretical Computer Science, Special Issue on Algorithmic Learning Theory, 808 (February):14–37. DOI.
 Daniel Kifer, Shai BenDavid, Johannes Gehrke. . "Detecting Change in Data Streams". In VLDB, 4:180–91. Toronto, Canada.
 Shai BenDavid, Reba Schuller Borbely. . "A Notion of Task Relatedness Yielding Provable Multipletask Learning Guarantees". Machine Learning 73 (3):273–87. DOI.
 Shai BenDavid, Reba Schuller. . "Exploiting Task Relatedness for Multiple Task Learning". In Learning Theory and Kernel Machines, edited by Bernhard Schölkopf and Manfred K. Warmuth, 567–80. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. DOI.
 Ali Geisa, Ronak Mehta, Hayden S. Helm, Jayanta Dey, Eric Eaton, Jeffery Dick, Carey E. Priebe, Joshua T. Vogelstein. . "Towards a Theory of Outofdistribution Learning". arXiv. DOI.
 Andrew Carlson, Justin Betteridge, Bryan Kisiel. . "Toward an Architecture for Neverending Language Learning.". In Proceedings of the Conference on Artificial Intelligence (AAAI) (2010), 1306–13. DOI.
 Cuong V. Nguyen, Yingzhen Li, Thang D. Bui, Richard E. Turner. . "Variational Continual Learning". Corr abs/1710.10628. http://arxiv.org/abs/1710.10628.
 Friedemann Zenke, Ben Poole, Surya Ganguli. . "Continual Learning Through Synaptic Intelligence". In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 611 August 2017, edited by Doina Precup and Yee Whye Teh, 70:3987–95. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v70/zenke17a.html.
 Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim. . "Continual Learning with Deep Generative Replay". In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 49, 2017, Long Beach, CA, USA, edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, 2990–99. https://proceedings.neurips.cc/paper/2017/hash/0efbe98067c6c73dba1250d2beaa81f9Abstract.html.
 Alex Krizhevsky, Geoffrey Hinton. . "Learning Multiple Layers of Features from Tiny Images". University of Toronto.
 Tom Veniat, Ludovic Denoyer, Marc'Aurelio Ranzato. . "Efficient Continual Learning with Modular Networks and Taskdriven Priors". Arxiv:2012.12631 [cs]. http://arxiv.org/abs/2012.12631.
 Ian J. Goodfellow, Mehdi Mirza, Xia Da, Aaron C. Courville, Yoshua Bengio. . "An Empirical Investigation of Catastrophic Forgeting in Gradientbased Neural Networks". In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 1416, 2014, Conference Track Proceedings, edited by Yoshua Bengio and Yann LeCun. http://arxiv.org/abs/1312.6211.
 Rupesh Kumar Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino J. Gomez, Jürgen Schmidhuber. . "Compete to Compute". In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting Held December 58, 2013, Lake Tahoe, Nevada, United States, edited by Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, 2310–18. https://proceedings.neurips.cc/paper/2013/hash/8f1d43620bc6bb580df6e80b0dc05c48Abstract.html.
 James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, et al.. . "Overcoming Catastrophic Forgetting in Neural Networks". Arxiv:1612.00796 [cs, Stat]. http://arxiv.org/abs/1612.00796.