- Transformers, GPT, NLP
- (Ouyang et al. 2022)
This model starts off from a pretrained GPT-3. Reward modeling is added with Reinforcement learning.
- Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, et al.. . "Training Language Models to Follow Instructions with Human Feedback". arXiv. DOI.