Transformer-XL

tags
Transformers, NLP
paper
(Dai et al. 2019)

Architecture

This model uses relative positional embedding to enable using attention over longer contexts than the vanilla Transformer.

Parameter count

151M

Bibliography

  1. . . "Transformer-xl: Attentive Language Models Beyond a Fixed-length Context". arXiv. DOI.

Links to this note

Last changed | authored by

Comments


← Back to Notes