R
rlhf
Projects with this topic
-
https://github.com/InternLM/InternLM Official release of InternLM2.5 base and chat models. 1M context support
🔗 https://internlm.intern-ai.org.cn/Updated -
https://github.com/princeton-nlp/SimPO SimPO: Simple Preference Optimization with a Reference-Free Reward
Updated -
🔧 🔗 https://github.com/forhaoliu/chain-of-hindsightChain-of-Hindsight, A Scalable RLHF Method
Updated -
https://github.com/THUDM/ImageReward [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Updated