- tags
- Reinforcement learning, NLP
Reinforcement learning with human feedback
Links to this note
- ChatGPT
- Inverse reinforcement learning
- Knowledge Base Index
- Notes on: Reinforcement Learning via Self-Distillation by Hübotter, J., Lübeck, F., Behric, L., Baumann, A., Bagatella, M., Marta, D., Hakimi, I., Shenfeld, I., Kleine Buening, T., Guestrin, C. & Krause, A. (2026)
- Notes on: Self-Distillation Enables Continual Learning by Idan Shenfeld, Mehul Damani, Jonas Hübotter, Pulkit Agrawal (2026)
- Reinforcement learning with verifiable rewards
- Sparrow
Last changed | authored by Hugo Cisneros
Loading comments...