RoBERTa

tags
Transformers, BERT, NLP
paper
(Liu et al. 2019)

Architecture

This is an extension of BERT with more data and a better optimized training procedure.

Parameter count

356M

Bibliography

  1. . . "Roberta: A Robustly Optimized BERT Pretraining Approach". arXiv. http://arxiv.org/abs/1907.11692.

Links to this note

Comments


← Back to Notes