- tags
- Transformers, BERT, NLP
- paper
- (Lan et al. 2020)
Architecture
It is an encoder-only architecture. It extends BERT by using parameter-sharing and is more efficient than BERT with the same number of parameters.
Parameter count
- Base = 12M
- Large = 18M
- XLarge = 60M
Bibliography
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. . "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". arXiv. DOI.