Supervised Fine Tuning

tags: Large language models, Foundation models, Catastrophic forgetting, Continual learning

Off-policy adaptation of a pretrained model by training on (input, target) pairs from expert demonstrations using cross-entropy loss.

It is the dominant post-training recipe for skill and knowledge injection but prone to catastrophic forgetting due to its off-policy nature

Links to this note

Notes on: DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning by Ziwei Zheng, Michael Yang, Jack Hong, Chenxiao Zhao, Guohai Xu, Le Yang, Chao Shen, Xing Yu (2025)
Knowledge Base Index
Notes on: Embarrassingly Simple Self-Distillation Improves Code Generation by Zhang, R., Bai, R. H., Zheng, H., Jaitly, N., Collobert, R., & Zhang, Y. (2026)
Notes on: GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery by Fengxiang Wang, Mingshuo Chen, Yueying Li, Yajie Yang, Yifan Zhang, Long Lan, Xue Yang, Hongda Sun, Yulin Wang, Di Wang, Jun Song, Jing Zhang, Bo Du (2026)
Notes on: LoRA Learns Less and Forgets Less by Dan Biderman, Jacob Portes, Jose Javier Gonzalez Ortiz, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John P. Cunningham (2024)
Notes on: Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding by Christopher Clark, Jieyu Zhang, Zixian Ma, Jae Sung Park, Mohammadreza Salehi, Rohun Tripathi, Sangho Lee, Zhongzheng Ren, Chris Dongjoo Kim, Yinuo Yang, Vincent Shao, Yue Yang, Weikai Huang, Ziqi Gao, Taira Anderson, Jianrui Zhang, Jitesh Jain, George Stoica, Winson Han, Ali Farhadi, Ranjay Krishna (2026)
Notes on: Reinforcement Learning via Self-Distillation by Hübotter, J., Lübeck, F., Behric, L., Baumann, A., Bagatella, M., Marta, D., Hakimi, I., Shenfeld, I., Kleine Buening, T., Guestrin, C. & Krause, A. (2026)
Notes on: Self-Distillation Enables Continual Learning by Idan Shenfeld, Mehul Damani, Jonas Hübotter, Pulkit Agrawal (2026)
Off-policy distillation

Last changed 2026.04.26 | authored by Hugo Cisneros

Comments

Loading comments...

Back to Notes

Supervised Fine Tuning

Links to this note

Comments

Leave a comment