Notes on: Efficient Neural Architecture Search via Parameter Sharing by Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018)

tags: Neural architecture search
source: (Pham et al. 2018)

Summary

Like other papers, the controller is a RNN that generates each part of the architecture in sequence. The main contribution of this paper is to introduce parameter sharing in child models. For, this, it represents all possible architectures in a single DAG of operations and share weights between same operations. They explain how to design a RNN cell with their model, a convolutional network (and convolutional cell to build a CNN) and how to train. Training is done in two alternating steps:

Train parameters for a fixed policy: gradient are estimated by sampling architectures and average the gradients for each of them (apparently works when sampling only 1 architecture).
Train the policy (controller parameters) The authors present results on text and vision tasks.

Comments

This approach seems more interesting than before, because the weight sharing gives a significant speedup. The task are however rather small compared to SOTA in 2018.

Bibliography

Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean. February 11, 2018. "Efficient Neural Architecture Search via Parameter Sharing". Arxiv:1802.03268 [cs, Stat]. http://arxiv.org/abs/1802.03268.