It is the same as the original transformer with some relative positional embedding added (similar to Transformer-XL).
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. . "Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer". arXiv. DOI.