- tags
- Transformers, BERT, NLP
- paper
- (Liu et al. 2019)
Architecture
This is an extension of BERT with more data and a better optimized training procedure.
Parameter count
356M
Bibliography
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. . "Roberta: A Robustly Optimized BERT Pretraining Approach". arXiv. http://arxiv.org/abs/1907.11692.