Mixture of Experts tags Transformers, LLM, Machine learning, Scaling laws Links to this note Notes on: MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention by MiniMax (2025) Switch transformer Last changed 2026.04.08 | authored by Hugo Cisneros
Loading comments...