Reward shaping

tags: Reinforcement learning, Reinforcement learning with verifiable rewards, GRPO

Patterns for designing RL rewards that produce intended behaviors (e.g., conditional bonuses that trigger only on correct outcomes).

Links to this note

Last changed 2026.04.19 | authored by Hugo Cisneros

Comments

Loading comments...

Back to Notes