R
rlhf
Projects with this topic
-
https://github.com/princeton-nlp/SimPO SimPO: Simple Preference Optimization with a Reference-Free Reward
Updated
https://github.com/princeton-nlp/SimPO SimPO: Simple Preference Optimization with a Reference-Free Reward