Reward hacking tags Reinforcement learning, Reinforcement learning with verifiable rewards, GRPO Pathologies where agents exploit literal reward structure (e.g., spamming tool use without accuracy gains). Links to this note Knowledge Base Index Last changed 2026.04.19 | authored by Hugo Cisneros
Loading comments...