BERT

tags: Transformers, NLP
paper: (Devlin et al. 2019)

Parameter count

Base = 110M
Large = 340M

Bibliography

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. May 24, 2019. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv. DOI.

Links to this note

Last changed 2022.07.22 | authored by Hugo Cisneros

Comments

Loading comments...

Back to Notes