- tags
- Transformers, Transformer-XL, NLP
- paper
- (Yang et al. 2020)
Architecture
The model adapts Transformer-XL to be a permutation based language model.
Parameter count
- Base = 117M
- Large = 360M
Bibliography
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. . "Xlnet: Generalized Autoregressive Pretraining for Language Understanding". arXiv. DOI.