Transformer-XL

tags: Transformers, NLP
paper: (Dai et al. 2019)

Architecture

This model uses relative positional embedding to enable using attention over longer contexts than the vanilla Transformer.

Parameter count

151M

Bibliography

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. June 2, 2019. "Transformer-xl: Attentive Language Models Beyond a Fixed-length Context". arXiv. DOI.

Links to this note

T5
XLNet

Last changed 27/07/2022 | authored by Hugo Cisneros

Comments

← Back to Notes