Tag: rlhf

Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory

Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory arXiv:2506.12350v1 Announce Type: new Abstract: Despite its empirical success, Reinforcement Learning from Human Feedback (RLHF) has been shown to violate almost all the fundamental axioms in social choice theory — such as majority consistency, pairwise majority consistency, and Condorcet consistency. This raises…

June 17, 2025

Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory