Back
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
References
Rafailov et. al. (2023),
Direct Preference Optimization: Your Language Model is Secretly a Reward Model