Reward shaping

tags
Reinforcement learning, Reinforcement learning with verifiable rewards, GRPO

Patterns for designing RL rewards that produce intended behaviors (e.g., conditional bonuses that trigger only on correct outcomes).

Links to this note

Last changed | authored by

Comments

Loading comments...

Leave a comment

Back to Notes