Reinforcement learning with human feedback
Links to this note
-
ChatGPT
-
Knowledge Base Index
-
Notes on: Reinforcement Learning via Self-Distillation by Hübotter, J., Lübeck, F., Behric, L., Baumann, A., Bagatella, M., Marta, D., Hakimi, I., Shenfeld, I., Kleine Buening, T., Guestrin, C. & Krause, A. (2026)
-
Reinforcement learning with verifiable rewards
-
Sparrow
Last changed
| authored by
Hugo Cisneros
Comments
Back to Notes
Loading comments...