- tags
- LLM, Test-time compute, Reinforcement learning, Token-level credit assignment in reasoning traces
Prompting and training paradigm where models emit intermediate reasoning steps before a final answer, improving multi-step problem solving and enabling RL on verifiable outcomes.
Loading comments...