Task of answering natural-language questions grounded in image content, spanning global scene understanding to fine-grained perception and compositional reasoning
Visual question answering
Links to this note
- Notes on: DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning by Ziwei Zheng, Michael Yang, Jack Hong, Chenxiao Zhao, Guohai Xu, Le Yang, Chao Shen, Xing Yu (2025)
- Geospatial AI
- Knowledge Base Index
- Notes on: GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery by Fengxiang Wang, Mingshuo Chen, Yueying Li, Yajie Yang, Yifan Zhang, Long Lan, Xue Yang, Hongda Sun, Yulin Wang, Di Wang, Jun Song, Jing Zhang, Bo Du (2026)
- Vision Language Models
Last changed | authored by Hugo Cisneros
Loading comments...