Reward shaping

tags
Reinforcement learning, Reinforcement learning with verifiable rewards, GRPO

Patterns for designing RL rewards that produce intended behaviors (e.g., conditional bonuses that trigger only on correct outcomes).

Last changed | authored by

Comments

Loading comments...

Leave a comment

Back to Notes