Chain-of-Thought reasoning

tags: LLM, Test-time compute, Reinforcement learning, Token-level credit assignment in reasoning traces

Prompting and training paradigm where models emit intermediate reasoning steps before a final answer, improving multi-step problem solving and enabling RL on verifiable outcomes.

Links to this note

Knowledge Base Index
Notes on: MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention by MiniMax (2025)

Last changed 2026.04.19 | authored by Hugo Cisneros

Comments

Loading comments...

Back to Notes

Chain-of-Thought reasoning

Links to this note

Comments

Leave a comment