- tags
- Transformers, BERT, NLP
- paper
- (Sanh et al. 2020)
Architecture
It is a distilled version of BERT that is much more efficient.
Parameter count
66M
Bibliography
- Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. . "Distilbert, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter". arXiv. DOI.