Hugo Cisneros

Blog
Notes
Projects
Resume
Contact

Home /
Notes /
Reinforcement learning with human feedback

Reinforcement learning with human feedback

tags: Reinforcement learning, NLP

Links to this note

ChatGPT
Inverse reinforcement learning
Knowledge Base Index
Notes on: Reinforcement Learning via Self-Distillation by Hübotter, J., Lübeck, F., Behric, L., Baumann, A., Bagatella, M., Marta, D., Hakimi, I., Shenfeld, I., Kleine Buening, T., Guestrin, C. & Krause, A. (2026)
Notes on: Self-Distillation Enables Continual Learning by Idan Shenfeld, Mehul Damani, Jonas Hübotter, Pulkit Agrawal (2026)
Reinforcement learning with verifiable rewards
Sparrow

Last changed 2023.02.13 | authored by Hugo Cisneros

Comments

Loading comments...

Leave a comment

Name *

Email (optional, not displayed)

Comment *

Blog
Code
© Hugo Cisneros 2026