Tag: reward

Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback

Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback arXiv:2512.03208v1 Announce Type: new Abstract: We study estimation and statistical inference for reward models used in aligning large language models (LLMs). A key component of LLM alignment is reinforcement learning from human feedback (RLHF), where humans compare pairs of model-generated answers and their…

December 4, 2025
Iterative Tilting for Diffusion Fine-Tuning

Iterative Tilting for Diffusion Fine-Tuning arXiv:2512.03234v1 Announce Type: new Abstract: We introduce iterative tilting, a gradient-free method for fine-tuning diffusion models toward reward-tilted distributions. The method decomposes a large reward tilt $exp(lambda r)$ into $N$ sequential smaller tilts, each admitting a tractable score update via first-order Taylor expansion. This requires only forward evaluations of the…

December 4, 2025
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis

Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis arXiv:2507.05913v1 Announce Type: new Abstract: A simple yet effective method for inference-time alignment of generative models is Best-of-$N$ (BoN), where $N$ outcomes are sampled from a reference policy, evaluated using a proxy reward model, and the highest-scoring one is selected. While prior work argues that…

July 9, 2025
Learning Guarantee of Reward Modeling Using Deep Neural Networks

Learning Guarantee of Reward Modeling Using Deep Neural Networks arXiv:2505.06601v1 Announce Type: new Abstract: In this work, we study the learning theory of reward modeling with pairwise comparison data using deep neural networks. We establish a novel non-asymptotic regret bound for deep reward estimators in a non-parametric setting, which depends explicitly on the network architecture.…

May 13, 2025
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality

Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality arXiv:2503.17865v1 Announce Type: new Abstract: The goal of the Inverse reinforcement learning (IRL) task is to identify the underlying reward function and the corresponding optimal policy from a set of expert demonstrations. While most IRL algorithms’ theoretical guarantees rely on a linear reward structure,…

March 25, 2025