Reinforcement learning

tags: Machine learning

In reinforcement learning, agents take actions within an environment. Usually, both the agent and environment states change in reaction to this action. A reward is given to the agent to tell it if the action was positive or negative.

The goal of a learning agent is to act so as to maximize that reward.

An agent can be anything from a fixed set of if-else statements to a deep neural network.

Algorithms

Q-learning

A3C

TRPO

PPO

SAC

Evolutionary strategies in RL

A survey of evolutionary strategies for RL (Müller, Glasmachers 2018).

Other/Misc algorithms, hacks and tricks

Current RL is full of tricks to make the algorithms behave the way we want them to. It is not clear if the algorithms are getting better overall thanks to that collection of tricks or if this makes them over-specialized for a particular type of application.

Exploration bonuses

Exploration bonuses are a class of methods that encourage an agent to explore even when the environment reward is sparse. This is done by adding an extra reward term. This may help an agent explore more states that are visually different from the ones before, or with different histories, etc.

An example of exploration bonus using random network distillation (Burda et al. 2018).

Bibliography

Nils Müller, Tobias Glasmachers. July 1, 2018. "Challenges in High-dimensional Reinforcement Learning with Evolution Strategies". Arxiv:1806.01224 [cs]. http://arxiv.org/abs/1806.01224.
Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov. 2018. "Exploration by Random Network Distillation". Arxiv Preprint Arxiv:1810.12894.