- tags
- Transformers, NLP
- paper
- (Dai et al. 2019)
Architecture
This model uses relative positional embedding to enable using attention over longer contexts than the vanilla Transformer.
Parameter count
151M
Bibliography
- Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. . "Transformer-xl: Attentive Language Models Beyond a Fixed-length Context". arXiv. DOI.