- tags
- Transformers, T5, NLP
- paper
- (Fedus et al. 2022)
Architecture
This model increases the parameter count of T5-like architecture while allowing efficient routing through different experts in a mixture of experts.
Parameter count
1T
Bibliography
- William Fedus, Barret Zoph, Noam Shazeer. . "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". arXiv. DOI.