Scaling laws inform the training and scaling of the largest models.
Scaling laws
Links to this note
- Knowledge Base Index
- Mixture of Experts
- Notes on: Attention Residuals by Kimi Team, Guangyu Chen, Yu Zhang, Jianlin Su et al. (2026)
- Notes on: MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention by MiniMax (2025)
- Notes on: Residual Matrix Transformers: Scaling the Size of the Residual Stream by Brian Mak, Jeffrey Flanigan (2025)
- Notes on: V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning by Lorenzo Mur-Labadia, Matthew Muckley, Amir Bar, Mido Assran, Koustuv Sinha, Mike Rabbat, Yann LeCun, Nicolas Ballas, Adrien Bardes (2026)
Last changed | authored by Hugo Cisneros
Loading comments...