Nowadays, the design of AI systems is approached through one main way (more or less): the implementation of some elementary building blocks — like convolutions, skip connections, activation functions, attention, etc. We currently have no clear idea how to combine these relatively successful blocks into a global system that would take advantage of each and every one of them.
In essence, Darwinian evolution is a form of algorithm that moved its way up from the simplest replicators to the human mind. It is probably not a very smart algorithm, but it was given an unfathomable amount of computation. The question of computation is therefore still open.
There would be three pillars for AI-GA:
This part is related to Neural architecture search: algorithms learn to create the learning architectures.
Meta-learning the learning algorithms themselves
This part wants to avoid learning algorithms such as SGD and instead learn them directly. Ex: Model Agnostic Meta Learning (MAML) aims at finding an initial set of weights that is the fastest at learning a range of tasks.
Another idea is to use RNN (Turing complete) trained via an outer loop optimization algorithm to be the most sample-efficient learning algorithm and use it then to train other models. This has been done in (Wang et al. 2017) and (Duan et al. 2016) in the context of Reinforcement learning, with policy gradients as the outer loop (in nature this would be the role of evolution).
The main question with meta-learning is: what tasks should the meta-learner learn on ? The author argues that can be learned too.
Automatically generating effective learning environments
We should have algorithms that can learn to generate learning environments. This includes defining a reward in this environment. This is likely the hardest of the three pillars. This is what people attempt at reproducing in Alife simulations, coevolution, self-play (Dota, Go, etc.). They all more or less fail to create this open-ended complexity explosion like what happened on Earth. This is usually due to the fact that the environmental component of the algorithm is not evolving.
The authors suggests to focus on explicitly optimizing for environments that favor learning instead of hoping we can create environments that create dynamics that can lead to evolutionary explosions. The author sees open-ended evolution as a way to generate endlessly environments that are growing in complexity. This can be related in natural evolution to the active process of creating new species and niche, where the new species create more complex environments that can in turn create new even more complex creatures.
The author explicitly avoids the term evolutionary approach for AI-GAs because he argues that the outer-loop optimization method doesn’t have to be evolution and can rather be some other optimization algorithms.
According to the authors, interesting path to this 3rd pillar include:
- Encouraging behavioral diversity (like Novelty search or Curiosity
- Quality diversity, meaning that the algorithm should be able to generate
many solutions to a problem where each of those solutions is as high performing as possible for its type (or species) this is related to MAP-elites.
The authors believe that POET is a step in that direction.
The rest of the paper is a long discussion about the pros and cons of the method and potential ways it could be used/useful.
This work resonates well with many of my opinions about the current state of machine learning research and its relation to AI research in general.
I believe the 3rd pillar, which is the generation of effective learning environments is indeed the hardest and I even think that it is AI complete: meaning this would in itself be a sufficient to have AI. This is also because pillar 3 would necessitate open-ended evolution to work which is in my opinion AI-complete. Without even mentioning effectiveness.
Overall, the goal and means to get to this goal seem very close to what many researchers are envisioning too, and I agree. However, these solutions don’t seem very promising to me , because the very system the author thinks could be generating this endless variety of environments is still undefined and remains the hardest part of the problem.
Duan, Yan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel. November 2016. “RL\(^2\): Fast Reinforcement Learning via Slow Reinforcement Learning”. arXiv:1611.02779 [Cs, Stat], November.
Wang, Jane X., Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z. Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. January 2017. “Learning to Reinforcement Learn”. arXiv:1611.05763 [Cs, Stat], January.