Scaling laws

tags: Machine learning, LLM, The Scaling Hypothesis

Scaling laws inform the training and scaling of the largest models.

Links to this note

Knowledge Base Index
Mixture of Experts
Notes on: Attention Residuals by Kimi Team, Guangyu Chen, Yu Zhang, Jianlin Su et al. (2026)
Notes on: MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention by MiniMax (2025)
Notes on: Residual Matrix Transformers: Scaling the Size of the Residual Stream by Brian Mak, Jeffrey Flanigan (2025)
Notes on: V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning by Lorenzo Mur-Labadia, Matthew Muckley, Amir Bar, Mido Assran, Koustuv Sinha, Mike Rabbat, Yann LeCun, Nicolas Ballas, Adrien Bardes (2026)

Last changed 2026.04.07 | authored by Hugo Cisneros

Comments

Loading comments...

Back to Notes