Tag: reward
-
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback arXiv:2512.03208v1 Announce Type: new Abstract: We study estimation and statistical inference for reward models used in aligning large language models (LLMs). A key component of LLM alignment is reinforcement learning from human feedback (RLHF), where humans compare pairs of model-generated answers and their…
-
Iterative Tilting for Diffusion Fine-Tuning
Iterative Tilting for Diffusion Fine-Tuning arXiv:2512.03234v1 Announce Type: new Abstract: We introduce iterative tilting, a gradient-free method for fine-tuning diffusion models toward reward-tilted distributions. The method decomposes a large reward tilt $exp(lambda r)$ into $N$ sequential smaller tilts, each admitting a tractable score update via first-order Taylor expansion. This requires only forward evaluations of the…
-
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis arXiv:2507.05913v1 Announce Type: new Abstract: A simple yet effective method for inference-time alignment of generative models is Best-of-$N$ (BoN), where $N$ outcomes are sampled from a reference policy, evaluated using a proxy reward model, and the highest-scoring one is selected. While prior work argues that…
-
Learning Guarantee of Reward Modeling Using Deep Neural Networks
Learning Guarantee of Reward Modeling Using Deep Neural Networks arXiv:2505.06601v1 Announce Type: new Abstract: In this work, we study the learning theory of reward modeling with pairwise comparison data using deep neural networks. We establish a novel non-asymptotic regret bound for deep reward estimators in a non-parametric setting, which depends explicitly on the network architecture.…
-
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality arXiv:2503.17865v1 Announce Type: new Abstract: The goal of the Inverse reinforcement learning (IRL) task is to identify the underlying reward function and the corresponding optimal policy from a set of expert demonstrations. While most IRL algorithms’ theoretical guarantees rely on a linear reward structure,…