Efficient Neural Architecture Search via Parameter Sharing by Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018)

(Pham et al. 2018)


Like other papers, the controller is a RNN that generates each part of the architecture in sequence. The main contribution of this paper is to introduce parameter sharing in child models. For, this, it represents all possible architectures in a single DAG of operations and share weights between same operations. They explain how to design a RNN cell with their model, a convolutional network (and convolutional cell to build a CNN) and how to train. Training is done in two alternating steps: - Train paramters for a fixed policy: gradient are estimated by sampling architectures and averaage the gradients for each of them (apparently works when sampling only 1 architecture). - Train the policy (controller parameters) The authors present results on text and vision tasks.


This approach seems more interesting than before, because the weight sharing gives a significant speedup. The task are however rather small compared to SOTA in 2018.


Pham, Hieu, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. “Efficient Neural Architecture Search via Parameter Sharing.” arXiv:1802.03268 [Cs, Stat], February.

← Back to Notes