- tags
- Transformers, LLM, Machine learning, Scaling laws
Sparse neural network architecture that routes inputs to a subset of expert subnetworks, enabling parameter scaling without proportional compute increase.
Sparse neural network architecture that routes inputs to a subset of expert subnetworks, enabling parameter scaling without proportional compute increase.
Loading comments...