ALBERT

tags
Transformers, BERT, NLP
paper
(Lan et al. 2020)

Architecture

It is an encoder-only architecture. It extends BERT by using parameter-sharing and is more efficient than BERT with the same number of parameters.

Parameter count

  • Base = 12M
  • Large = 18M
  • XLarge = 60M

Bibliography

  1. . . "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". arXiv. DOI.

Comments


← Back to Notes