BERT

tags
Transformers, NLP
paper
(Devlin et al. 2019)

Parameter count

  • Base = 110M
  • Large = 340M

Bibliography

  1. . . "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv. DOI.
Last changed | authored by

Comments


← Back to Notes