T5

tags: Transformers, NLP
paper: (Raffel et al. 2020)

Architecture

It is the same as the original transformer with some relative positional embedding added (similar to Transformer-XL).

Parameter count

11B

Bibliography

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. July 28, 2020. "Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer". arXiv. DOI.

Links to this note

Last changed 27/07/2022 | authored by Hugo Cisneros

Comments

← Back to Notes