Tag: regret

SCaLE: Switching Cost aware Learning and Exploration

SCaLE: Switching Cost aware Learning and Exploration arXiv:2601.09042v1 Announce Type: cross Abstract: This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the…

January 15, 2026
Online Learning with Limited Information in the Sliding Window Model

Online Learning with Limited Information in the Sliding Window Model arXiv:2601.03533v1 Announce Type: new Abstract: Motivated by recent work on the experts problem in the streaming model, we consider the experts problem in the sliding window model. The sliding window model is a well-studied model that captures applications such as traffic monitoring, epidemic tracking, and…

January 8, 2026
No-Regret Gaussian Process Optimization of Time-Varying Functions

No-Regret Gaussian Process Optimization of Time-Varying Functions arXiv:2512.00517v1 Announce Type: new Abstract: Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and…

December 2, 2025
Infinite-Dimensional Operator/Block Kaczmarz Algorithms: Regret Bounds and $lambda$-Effectiveness

Infinite-Dimensional Operator/Block Kaczmarz Algorithms: Regret Bounds and $lambda$-Effectiveness arXiv:2511.07604v1 Announce Type: new Abstract: We present a variety of projection-based linear regression algorithms with a focus on modern machine-learning models and their algorithmic performance. We study the role of the relaxation parameter in generalized Kaczmarz algorithms and establish a priori regret bounds with explicit $lambda$-dependence to…

November 12, 2025
Q-Learning with Fine-Grained Gap-Dependent Regret

Q-Learning with Fine-Grained Gap-Dependent Regret arXiv:2510.06647v1 Announce Type: new Abstract: We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing…

October 9, 2025
Bayesian Optimization with Expected Improvement: No Regret and the Choice of Incumbent

Bayesian Optimization with Expected Improvement: No Regret and the Choice of Incumbent arXiv:2508.15674v1 Announce Type: new Abstract: Expected improvement (EI) is one of the most widely used acquisition functions in Bayesian optimization (BO). Despite its proven empirical success in applications, the cumulative regret upper bound of EI remains an open question. In this paper, we…

August 22, 2025
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning arXiv:2506.04626v1 Announce Type: new Abstract: Motivated by real-world settings where data collection and policy deployment — whether for a single agent or across multiple agents — are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a…

June 6, 2025
Optimal Regret of Bernoulli Bandits under Global Differential Privacy

Optimal Regret of Bernoulli Bandits under Global Differential Privacy arXiv:2505.05613v1 Announce Type: new Abstract: As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under $epsilon$-global Differential Privacy (DP) has been widely studied. Unlike bandits…

May 12, 2025
Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks

Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks arXiv:2504.10598v1 Announce Type: new Abstract: We revisit online binary classification by shifting the focus from competing with the best-in-class binary loss to competing against relaxed benchmarks that capture smoothed notions of optimality. Instead of measuring regret relative to the exact minimal binary error — a…

April 16, 2025
No-Regret Generative Modeling via Parabolic Monge-Amp`ere PDE

No-Regret Generative Modeling via Parabolic Monge-Amp`ere PDE arXiv:2504.09279v1 Announce Type: new Abstract: We introduce a novel generative modeling framework based on a discretized parabolic Monge-Amp`ere PDE, which emerges as a continuous limit of the Sinkhorn algorithm commonly used in optimal transport. Our method performs iterative refinement in the space of Brenier maps using a mirror…

April 15, 2025
Sparsity-Based Interpolation of External, Internal and Swap Regret

Sparsity-Based Interpolation of External, Internal and Swap Regret arXiv:2502.04543v1 Announce Type: new Abstract: Focusing on the expert problem in online learning, this paper studies the interpolation of several performance metrics via $phi$-regret minimization, which measures the performance of an algorithm by its regret with respect to an arbitrary action modification rule $phi$. With $d$ experts…

February 10, 2025
On the Precise Asymptotics and Refined Regret of the Variance-Aware UCB Algorithm

On the Precise Asymptotics and Refined Regret of the Variance-Aware UCB Algorithm arXiv:2412.08843v1 Announce Type: new Abstract: In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates into its decision-making process. More…

December 13, 2024