Mixture of Experts

tags
Transformers, LLM, Machine learning, Scaling laws

Sparse neural network architecture that routes inputs to a subset of expert subnetworks, enabling parameter scaling without proportional compute increase.

Last changed | authored by

Comments

Loading comments...

Leave a comment

Back to Notes