XLNet

tags
Transformers, Transformer-XL, NLP
paper
(Yang et al. 2020)

Architecture

The model adapts Transformer-XL to be a permutation based language model.

Parameter count

  • Base = 117M
  • Large = 360M

Bibliography

  1. . . "Xlnet: Generalized Autoregressive Pretraining for Language Understanding". arXiv. DOI.
Last changed | authored by

Comments


← Back to Notes