Agentic reinforcement learning

tags: Reinforcement learning, Reinforcement learning with verifiable rewards, GRPO, Tool calling

RL post-training for LLM/VLM agents that decide when and how to invoke tools during reasoning, with rewards shaped around tool-use policy (necessity, efficiency, trajectory geometry) rather than just final-answer correctness

Links to this note

Notes on: DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning by Ziwei Zheng, Michael Yang, Jack Hong, Chenxiao Zhao, Guohai Xu, Le Yang, Chao Shen, Xing Yu (2025)
Coding agent
Geospatial AI
Knowledge Base Index
Notes on: GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery by Fengxiang Wang, Mingshuo Chen, Yueying Li, Yajie Yang, Yifan Zhang, Long Lan, Xue Yang, Hongda Sun, Yulin Wang, Di Wang, Jun Song, Jing Zhang, Bo Du (2026)

Last changed 2026.04.19 | authored by Hugo Cisneros

Comments

Loading comments...

Back to Notes

Agentic reinforcement learning

Links to this note

Comments

Leave a comment