T5

tags
Transformers, NLP
paper
(Raffel et al. 2020)

Architecture

It is the same as the original transformer with some relative positional embedding added (similar to Transformer-XL).

Parameter count

11B

Bibliography

  1. . . "Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer". arXiv. DOI.

Links to this note

Last changed | authored by

Comments


← Back to Notes