- tags
- Transformers, NLP
- paper
- (Raffel et al. 2020)
Architecture
It is the same as the original transformer with some relative positional embedding added (similar to Transformer-XL).
Parameter count
11B
Bibliography
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. . "Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer". arXiv. DOI.