- tags
- Transformers, NLP
- paper
- (Devlin et al. 2019)
Parameter count
- Base = 110M
- Large = 340M
Bibliography
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. . "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv. DOI.