Supervised Fine Tuning

tags
Large language models, Foundation models, Catastrophic forgetting, Continual learning

Off-policy adaptation of a pretrained model by training on (input, target) pairs from expert demonstrations using cross-entropy loss.

It is the dominant post-training recipe for skill and knowledge injection but prone to catastrophic forgetting due to its off-policy nature

Links to this note

Last changed | authored by

Comments

Loading comments...

Leave a comment

Back to Notes