Tag: bandit

  • ADAM Optimization with Adaptive Batch Selection

    ADAM Optimization with Adaptive Batch Selection arXiv:2512.06795v1 Announce Type: new Abstract: Adam is a widely used optimizer in neural network training due to its adaptive learning rate. However, because different data samples influence model updates to varying degrees, treating them equally can lead to inefficient convergence. To address this, a prior work proposed adapting the…

  • A Two-armed Bandit Framework for A/B Testing

    A Two-armed Bandit Framework for A/B Testing arXiv:2507.18118v1 Announce Type: new Abstract: A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinforcement learning methods developed in the literature are applicable…

  • Fixed-Confidence Multiple Change Point Identification under Bandit Feedback

    Fixed-Confidence Multiple Change Point Identification under Bandit Feedback arXiv:2507.08994v1 Announce Type: new Abstract: Piecewise constant functions describe a variety of real-world phenomena in domains ranging from chemistry to manufacturing. In practice, it is often required to confidently identify the locations of the abrupt changes in these functions as quickly as possible. For this, we introduce…

  • Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback

    Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback arXiv:2505.13562v1 Announce Type: new Abstract: Learning in games is a fundamental problem in machine learning and artificial intelligence, with numerous applications~citep{silver2016mastering,schrittwieser2020mastering}. This work investigates two-player zero-sum matrix games with an unknown payoff matrix and bandit feedback, where each player observes their actions and the…

  • Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization

    Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization arXiv:2503.10836v1 Announce Type: new Abstract: The contextual bandit framework is widely used to solve sequential optimization problems where the reward of each decision depends on auxiliary context variables. In settings such as medicine, business, and engineering, the decision maker often possesses additional structural information on the…

  • Selective Reviews of Bandit Problems in AI via a Statistical View

    Selective Reviews of Bandit Problems in AI via a Statistical View arXiv:2412.02251v1 Announce Type: new Abstract: Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making…