This paper is about introducing the POET architecture. The core idea behind this framework is to build a system that can make agents learn complex behavior through joint evolution of agents and the environment. The better the agent, the more complex environment we can give it.
There are 3 main components to the algorithm: an evolutionary strategy (ES) for the environment itself, resembling genetic algorithm, another ES for the agents (although these agents might also be trained with RL), and a transfer mechanism whereby agents trained in a particular environment can be trained on another one.
The authors test this new framework on a 2D Bipedal Walker landscape example and show that their agents are able to learn to walk in very challenging settings. They also show that for complex environments, direct optimization is not enough to learn the needed complex behavior.
I think this is an interesting curriculum learning paper that addresses the issue of learning continuously in an ever changing environment. However, the step towards open-endedness advertised in the paper seems really tiny.
The authors haven’t addressed the main issue which is having open-ended ever more complex environments. The genetic algorithm-based method might keep generating new combinations of obstacles but will never create a fundamentally new obstacle, or combine previous environments into a new one.
This is also not mentioning the fact that the complexity of the behavior of the agents are bounded if we only use neural networks as controllers. This framework would probably be the holy grail if we had both open-ended environment generation and open-ended learning complexity.
This confirms my initial thought about one of the authors that this group probably has the right motivations but doesn’t seem to be going in the right direction with this neural-network, RL-looking methods. Getting to open-endedness will likely require significant paradigm changes.