ALBERT

Architecture

It is an encoder-only architecture. It extends BERT by using parameter-sharing and is more efficient than BERT with the same number of parameters.

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. February 8, 2020. "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". arXiv. DOI.