Switch transformer

tags
Transformers, T5, NLP
paper
(Fedus et al. 2022)

Architecture

This model increases the parameter count of T5-like architecture while allowing efficient routing through different experts in a mixture of experts.

Parameter count

1T

Bibliography

  1. . . "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". arXiv. DOI.
Last changed | authored by

Comments


← Back to Notes