Chain-of-Thought reasoning

tags
LLM, Test-time compute, Reinforcement learning, Token-level credit assignment in reasoning traces

Prompting and training paradigm where models emit intermediate reasoning steps before a final answer, improving multi-step problem solving and enabling RL on verifiable outcomes.

Last changed | authored by

Comments

Loading comments...

Leave a comment

Back to Notes