Tag: rlhf
-
Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory
Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory arXiv:2506.12350v1 Announce Type: new Abstract: Despite its empirical success, Reinforcement Learning from Human Feedback (RLHF) has been shown to violate almost all the fundamental axioms in social choice theory — such as majority consistency, pairwise majority consistency, and Condorcet consistency. This raises…