Category: math.OC

Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits

Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits arXiv:2603.02417v1 Announce Type: new Abstract: We develop a Fisher-geometric theory of stochastic gradient descent (SGD) in which mini-batch noise is an intrinsic, loss-induced matrix — not an exogenous scalar variance. Under exchangeable sampling, the mini-batch gradient covariance is pinned down (to leading…

March 4, 2026
Combinatorial Sparse PCA Beyond the Spiked Identity Model

Combinatorial Sparse PCA Beyond the Spiked Identity Model arXiv:2603.02607v1 Announce Type: new Abstract: Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Sigma$, whose top eigenvector $v in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into…

March 4, 2026
Stochastic Gradient Variational Inference with Price’s Gradient Estimator from Bures-Wasserstein to Parameter Space

Stochastic Gradient Variational Inference with Price’s Gradient Estimator from Bures-Wasserstein to Parameter Space arXiv:2602.18718v1 Announce Type: new Abstract: For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space)…

February 24, 2026
Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

Implicit Bias and Convergence of Matrix Stochastic Mirror Descent arXiv:2602.18997v1 Announce Type: new Abstract: We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove…

February 24, 2026
Functional Central Limit Theorem for Stochastic Gradient Descent

Functional Central Limit Theorem for Stochastic Gradient Descent arXiv:2602.15538v1 Announce Type: new Abstract: We study the asymptotic shape of the trajectory of the stochastic gradient descent algorithm applied to a convex objective function. Under mild regularity assumptions, we prove a functional central limit theorem for the properly rescaled trajectory. Our result characterizes the long-term fluctuations…

February 18, 2026
Provable Offline Reinforcement Learning for Structured Cyclic MDPs

Provable Offline Reinforcement Learning for Structured Cyclic MDPs arXiv:2602.11679v1 Announce Type: new Abstract: We introduce a novel cyclic Markov decision process (MDP) framework for multi-step decision problems with heterogeneous stage-specific dynamics, transitions, and discount factors across the cycle. In this setting, offline learning is challenging: optimizing a policy at any stage shifts the state distributions…

February 13, 2026
Logarithmic-time Schedules for Scaling Language Models with Momentum

Logarithmic-time Schedules for Scaling Language Models with Momentum arXiv:2602.05298v1 Announce Type: new Abstract: In practice, the hyperparameters $(beta_1, beta_2)$ and weight-decay $lambda$ in AdamW are typically kept at fixed values. Is there any reason to do otherwise? We show that for large-scale language model training, the answer is yes: by exploiting the power-law structure of…

February 6, 2026
Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

Byzantine Machine Learning: MultiKrum and an optimal notion of robustness arXiv:2602.03899v1 Announce Type: new Abstract: Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule…

February 5, 2026
Convergence of Muon with Newton-Schulz

Convergence of Muon with Newton-Schulz arXiv:2601.19156v1 Announce Type: new Abstract: We analyze Muon as originally proposed and used in practice — using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a…

January 28, 2026
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization arXiv:2601.12238v1 Announce Type: new Abstract: While momentum-based acceleration has been studied extensively in deterministic optimization problems, its behavior in nonstationary environments — where the data distribution and optimal parameters drift over time — remains underexplored. We analyze the tracking performance of Stochastic Gradient Descent…

January 21, 2026
Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach

Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach arXiv:2601.11016v1 Announce Type: new Abstract: In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules that prescribe decisions using covariates.…

January 19, 2026
Accelerated Regularized Wasserstein Proximal Sampling Algorithms

Accelerated Regularized Wasserstein Proximal Sampling Algorithms arXiv:2601.09848v1 Announce Type: new Abstract: We consider sampling from a Gibbs distribution by evolving a finite number of particles using a particular score estimator rather than Brownian motion. To accelerate the particles, we consider a second-order score-based ODE, similar to Nesterov acceleration. In contrast to traditional kernel density score…

January 16, 2026
SCaLE: Switching Cost aware Learning and Exploration

SCaLE: Switching Cost aware Learning and Exploration arXiv:2601.09042v1 Announce Type: cross Abstract: This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the…

January 15, 2026
Inference-Time Alignment for Diffusion Models via Doob’s Matching

Inference-Time Alignment for Diffusion Models via Doob’s Matching arXiv:2601.06514v1 Announce Type: new Abstract: Inference-time alignment for diffusion models aims to adapt a pre-trained diffusion model toward a target distribution without retraining the base score network, thereby preserving the generative capacity of the base model while enforcing desired properties at the inference time. A central mechanism…

January 13, 2026
Constrained Density Estimation via Optimal Transport

Constrained Density Estimation via Optimal Transport arXiv:2601.06830v1 Announce Type: new Abstract: A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the constraints that the expected value of a set of functions adopts or exceeds given values. The framework…

January 13, 2026
A brief note on learning problem with global perspectives

A brief note on learning problem with global perspectives arXiv:2601.05441v1 Announce Type: new Abstract: This brief note considers the problem of learning with dynamic-optimizing principal-agent setting, in which the agents are allowed to have global perspectives about the learning process, i.e., the ability to view things according to their relative importances or in their true…

January 12, 2026
First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data

First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data arXiv:2601.02523v1 Announce Type: cross Abstract: Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy resources. Yet the optimization algorithms behind…

January 7, 2026
A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue

A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue arXiv:2512.22282v1 Announce Type: new Abstract: Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1…

December 30, 2025
Fast and Exact Least Absolute Deviations Line Fitting via Piecewise Affine Lower-Bounding

Fast and Exact Least Absolute Deviations Line Fitting via Piecewise Affine Lower-Bounding arXiv:2512.20682v1 Announce Type: new Abstract: Least-absolute-deviations (LAD) line fitting is robust to outliers but computationally more involved than least squares regression. Although the literature includes linear and near-linear time algorithms for the LAD line fitting problem, these methods are difficult to implement and,…

December 25, 2025
Generative modeling of conditional probability distributions on the level-sets of collective variables

Generative modeling of conditional probability distributions on the level-sets of collective variables arXiv:2512.17374v1 Announce Type: new Abstract: Given a probability distribution $mu$ in $mathbb{R}^d$ represented by data, we study in this paper the generative modeling of its conditional probability distributions on the level-sets of a collective variable $xi: mathbb{R}^d rightarrow mathbb{R}^k$, where $1 le k…

December 22, 2025
STARK denoises spatial transcriptomics images via adaptive regularization

STARK denoises spatial transcriptomics images via adaptive regularization arXiv:2512.10994v1 Announce Type: new Abstract: We present an approach to denoising spatial transcriptomics images that is particularly effective for uncovering cell identities in the regime of ultra-low sequencing depths, and also allows for interpolation of gene expression. The method — Spatial Transcriptomics via Adaptive Regularization and Kernels…

December 15, 2025
Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming

Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming arXiv:2512.08948v1 Announce Type: new Abstract: We study online statistical inference for the solutions of stochastic optimization problems with equality and inequality constraints. Such problems are prevalent in statistics and machine learning, encompassing constrained $M$-estimation, physics-informed models, safe reinforcement learning, and algorithmic fairness. We develop…

December 11, 2025
Worst-case generation via minimax optimization in Wasserstein space

Worst-case generation via minimax optimization in Wasserstein space arXiv:2512.08176v1 Announce Type: new Abstract: Worst-case generation plays a critical role in evaluating robustness and stress-testing systems under distribution shifts, in applications ranging from machine learning models to power grids and medical prediction systems. We develop a generative modeling framework for worst-case generation for a pre-specified risk,…

December 10, 2025
Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis

Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis arXiv:2512.08601v1 Announce Type: new Abstract: Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts…

December 10, 2025
Symmetric Linear Dynamical Systems are Learnable from Few Observations

Symmetric Linear Dynamical Systems are Learnable from Few Observations arXiv:2512.05337v1 Announce Type: new Abstract: We consider the problem of learning the parameters of a $N$-dimensional stochastic linear dynamics under both full and partial observations from a single trajectory of time $T$. We introduce and analyze a new estimator that achieves a small maximum element-wise error…

December 8, 2025
No-Regret Gaussian Process Optimization of Time-Varying Functions

No-Regret Gaussian Process Optimization of Time-Varying Functions arXiv:2512.00517v1 Announce Type: new Abstract: Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and…

December 2, 2025
Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition

Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition arXiv:2511.16796v1 Announce Type: cross Abstract: Penalty-based methods have become popular for solving bilevel optimization (BLO) problems, thanks to their effective first-order nature. However, they often require inner-loop iterations to solve the lower-level (LL) problem and small outer-loop step sizes to handle the increased smoothness…

November 24, 2025
Splat Regression Models

Splat Regression Models arXiv:2511.14042v1 Announce Type: new Abstract: We introduce a highly expressive class of function approximators called Splat Regression Models. Model outputs are mixtures of heterogeneous and anisotropic bump functions, termed splats, each weighted by an output vector. The power of splat modeling lies in its ability to locally adjust the scale and direction…

November 19, 2025
SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation

SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation arXiv:2511.14146v1 Announce Type: new Abstract: We propose a distributionally robust formulation for simultaneously estimating the covariance matrix and the precision matrix of a random vector.The proposed model minimizes the worst-case weighted sum of the Frobenius loss of the covariance estimator and Stein’s loss of the precision…

November 19, 2025
Generalized infinite dimensional Alpha-Procrustes based geometries

Generalized infinite dimensional Alpha-Procrustes based geometries arXiv:2511.09801v1 Announce Type: new Abstract: This work extends the recently introduced Alpha-Procrustes family of Riemannian metrics for symmetric positive definite (SPD) matrices by incorporating generalized versions of the Bures-Wasserstein (GBW), Log-Euclidean, and Wasserstein distances. While the Alpha-Procrustes framework has unified many classical metrics in both finite- and infinite- dimensional…

November 14, 2025
Operator Models for Continuous-Time Offline Reinforcement Learning

Operator Models for Continuous-Time Offline Reinforcement Learning arXiv:2511.10383v1 Announce Type: new Abstract: Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often unsafe or impractical, motivating offline reinforcement learning from historical data. However, there is limited statistical understanding of the approximation errors…

November 14, 2025
Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems

Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems arXiv:2510.26061v1 Announce Type: new Abstract: We propose a data-driven framework for efficiently solving quadratic programming (QP) problems by reducing the number of variables in high-dimensional QPs using instance-specific projection. A graph neural network-based model is designed to generate projections tailored to each QP instance, enabling…

October 31, 2025
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization

Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization arXiv:2510.20883v1 Announce Type: new Abstract: Adversarial training has emerged as a key technique to enhance model robustness against adversarial input perturbations. Many of the existing methods rely on computationally expensive min-max problems that limit their application in practice. We propose a novel formulation of adversarial…

October 27, 2025
The Tree-SNE Tree Exists

The Tree-SNE Tree Exists arXiv:2510.15014v1 Announce Type: new Abstract: The clustering and visualisation of high-dimensional data is a ubiquitous task in modern data science. Popular techniques include nonlinear dimensionality reduction methods like t-SNE or UMAP. These methods face the `scale-problem’ of clustering: when dealing with the MNIST dataset, do we want to distinguish different digits…

October 20, 2025
Exact Dynamics of Multi-class Stochastic Gradient Descent

Exact Dynamics of Multi-class Stochastic Gradient Descent arXiv:2510.14074v1 Announce Type: new Abstract: We develop a framework for analyzing the training and learning rate dynamics on a variety of high- dimensional optimization problems trained using one-pass stochastic gradient descent (SGD) with data generated from multiple anisotropic classes. We give exact expressions for a large class of…

October 17, 2025
On the Rate of Gaussian Approximation for Linear Regression Problems

On the Rate of Gaussian Approximation for Linear Regression Problems arXiv:2509.14039v1 Announce Type: new Abstract: In this paper, we consider the problem of Gaussian approximation for the online linear regression task. We derive the corresponding rates for the setting of a constant learning rate and study the explicit dependence of the convergence rate upon the…

September 18, 2025
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation

Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation arXiv:2509.09802v1 Announce Type: cross Abstract: We propose and study Sparse Polyak, a variant of Polyak’s adaptive step size, designed to solve high-dimensional statistical estimation problems where the problem dimension is allowed to grow much faster than the sample size. In such settings, the standard Polyak…

September 15, 2025
Cryo-EM as a Stochastic Inverse Problem

Cryo-EM as a Stochastic Inverse Problem arXiv:2509.05541v1 Announce Type: new Abstract: Cryo-electron microscopy (Cryo-EM) enables high-resolution imaging of biomolecules, but structural heterogeneity remains a major challenge in 3D reconstruction. Traditional methods assume a discrete set of conformations, limiting their ability to recover continuous structural variability. In this work, we formulate cryo-EM reconstruction as a stochastic…

September 9, 2025
Risk-averse Fair Multi-class Classification

Risk-averse Fair Multi-class Classification arXiv:2509.05771v1 Announce Type: new Abstract: We develop a new classification framework based on the theory of coherent risk measures and systemic risk. The proposed approach is suitable for multi-class problems when the data is noisy, scarce (relative to the dimension of the problem), and the labeling might be unreliable. In the…

September 9, 2025
The Nondecreasing Rank

The Nondecreasing Rank arXiv:2509.00265v1 Announce Type: new Abstract: In this article the notion of the nondecreasing (ND) rank of a matrix or tensor is introduced. A tensor has an ND rank of r if it can be represented as a sum of r outer products of vectors, with each vector satisfying a monotonicity constraint. It…

September 3, 2025
Stochastic Gradients under Nuisances

Stochastic Gradients under Nuisances arXiv:2508.20326v1 Announce Type: new Abstract: Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while…

August 29, 2025
Flow Matching-Based Generative Modeling for Efficient and Scalable Data Assimilation

Flow Matching-Based Generative Modeling for Efficient and Scalable Data Assimilation arXiv:2508.13313v1 Announce Type: new Abstract: Data assimilation (DA) is the problem of sequentially estimating the state of a dynamical system from noisy observations. Recent advances in generative modeling have inspired new approaches to DA in high-dimensional nonlinear settings, especially the ensemble score filter (EnSF). However,…

August 20, 2025
A pseudo-inverse of a line graph

A pseudo-inverse of a line graph arXiv:2508.09412v1 Announce Type: new Abstract: Line graphs are an alternative representation of graphs where each vertex of the original (root) graph becomes an edge. However not all graphs have a corresponding root graph, hence the transformation from graphs to line graphs is not invertible. We investigate the case when…

August 14, 2025
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing

Objective Soups: Multilingual Multi-Task Modeling for Speech Processing arXiv:2508.09228v1 Announce Type: cross Abstract: Training a single model for multilingual, multi-task speech processing (MSP) is severely hampered by conflicting objectives between tasks like speech recognition and translation. While multi-objective optimization (MOO) aims to align gradient updates, its effectiveness diminishes as the number of tasks grows, making…

August 14, 2025
High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation

High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation arXiv:2508.05570v1 Announce Type: new Abstract: In this paper, we study the bias and high-order error bounds of the Linear Stochastic Approximation (LSA) algorithm with Polyak-Ruppert (PR) averaging under Markovian noise. We focus on the version of the algorithm with constant step size $alpha$ and propose a…

August 8, 2025
Reinforcement Learning in MDPs with Information-Ordered Policies

Reinforcement Learning in MDPs with Information-Ordered Policies arXiv:2508.03904v1 Announce Type: new Abstract: We propose an epoch-based reinforcement learning algorithm for infinite-horizon average-cost Markov decision processes (MDPs) that leverages a partial order over a policy class. In this structure, $pi’ leq pi$ if data collected under $pi$ can be used to estimate the performance of $pi’$,…

August 7, 2025
A Smoothing Newton Method for Rank-one Matrix Recovery

A Smoothing Newton Method for Rank-one Matrix Recovery arXiv:2507.23017v1 Announce Type: new Abstract: We consider the phase retrieval problem, which involves recovering a rank-one positive semidefinite matrix from rank-one measurements. A recently proposed algorithm based on Bures-Wasserstein gradient descent (BWGD) exhibits superlinear convergence, but it is unstable, and existing theory can only prove local linear…

August 1, 2025
Statistical and Algorithmic Foundations of Reinforcement Learning

Statistical and Algorithmic Foundations of Reinforcement Learning arXiv:2507.14444v1 Announce Type: new Abstract: As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL…

July 22, 2025
Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization

Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization arXiv:2507.09093v1 Announce Type: new Abstract: We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts.…

July 15, 2025
Adjoint Schr”odinger Bridge Sampler

Adjoint Schr”odinger Bridge Sampler arXiv:2506.22565v1 Announce Type: new Abstract: Computational methods for learning to sample from the Boltzmann distribution — where the target distribution is known only up to an unnormalized energy function — have advanced significantly recently. Due to the lack of explicit target samples, however, prior diffusion-based methods, known as diffusion samplers, often…

July 1, 2025
Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19

Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19 arXiv:2506.21739v1 Announce Type: new Abstract: Authors Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, and Tzu-Hsuan Liu use the Finite Impulse Response (FIR) linear system filtering method to track and predict the number of people infected and recovered from COVID-19, in a…

June 30, 2025
The final solution of the Hitchhiker’s problem #5

The final solution of the Hitchhiker’s problem #5 arXiv:2506.20672v1 Announce Type: new Abstract: A recent survey, nicknamed “Hitchhiker’s Guide”, J.J. Arias-Garc{i}a, R. Mesiar, and B. De Baets, A hitchhiker’s guide to quasi-copulas, Fuzzy Sets and Systems 393 (2020) 1-28, has raised the rating of quasi-copula problems in the dependence modeling community in spite of the…

June 27, 2025
Zeroth-Order Optimization Finds Flat Minima

Zeroth-Order Optimization Finds Flat Minima arXiv:2506.05454v1 Announce Type: cross Abstract: Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit…

June 9, 2025
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference

Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference arXiv:2505.21721v1 Announce Type: new Abstract: We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at an almost dimension-independent rate. Specifically, for strongly log-concave and log-smooth targets, the number of iterations for BBVI with a sub-Gaussian family to achieve…

May 29, 2025
Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling

Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling arXiv:2505.18327v1 Announce Type: new Abstract: Constrained stochastic nonlinear optimization problems have attracted significant attention for their ability to model complex real-world scenarios in physics, economics, and biology. As datasets continue to grow, online inference methods have become crucial for enabling real-time decision-making without the need…

May 27, 2025
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine

Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine arXiv:2505.17283v1 Announce Type: new Abstract: Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach…

May 26, 2025
Dimension-adapted Momentum Outscales SGD

Dimension-adapted Momentum Outscales SGD arXiv:2505.16098v1 Announce Type: new Abstract: We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target…

May 23, 2025
Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization

Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization arXiv:2505.11089v1 Announce Type: new Abstract: In this paper, we consider a score-based Integer Programming (IP) approach for solving the Bayesian Network Structure Learning (BNSL) problem. State-of-the-art BNSL IP formulations suffer from the exponentially large number of variables and constraints. A standard approach in IP…

May 19, 2025
A Scalable Gradient-Based Optimization Framework for Sparse Minimum-Variance Portfolio Selection

A Scalable Gradient-Based Optimization Framework for Sparse Minimum-Variance Portfolio Selection arXiv:2505.10099v1 Announce Type: new Abstract: Portfolio optimization involves selecting asset weights to minimize a risk-reward objective, such as the portfolio variance in the classical minimum-variance framework. Sparse portfolio selection extends this by imposing a cardinality constraint: only $k$ assets from a universe of $p$ may…

May 16, 2025
Optimal Transport for Machine Learners

Optimal Transport for Machine Learners arXiv:2505.06589v1 Announce Type: new Abstract: Optimal Transport is a foundational mathematical theory that connects optimization, partial differential equations, and probability. It offers a powerful framework for comparing probability distributions and has recently become an important tool in machine learning, especially for designing and evaluating generative models. These course notes cover…

May 13, 2025
Safe-EF: Error Feedback for Nonsmooth Constrained Optimization

Safe-EF: Error Feedback for Nonsmooth Constrained Optimization arXiv:2505.06053v1 Announce Type: cross Abstract: Federated learning faces severe communication bottlenecks due to the high dimensionality of model updates. Communication compression with contractive compressors (e.g., Top-K) is often preferable in practice but can degrade performance without proper handling. Error feedback (EF) mitigates such issues but has been largely…

May 12, 2025
Mixed-Integer Optimization for Responsible Machine Learning

Mixed-Integer Optimization for Responsible Machine Learning arXiv:2505.05857v1 Announce Type: cross Abstract: In the last few decades, Machine Learning (ML) has achieved significant success across domains ranging from healthcare, sustainability, and the social sciences, to criminal justice and finance. But its deployment in increasingly sophisticated, critical, and sensitive areas affecting individuals, the groups they belong to,…

May 12, 2025
Inference for max-linear Bayesian networks with noise

Inference for max-linear Bayesian networks with noise arXiv:2505.00229v1 Announce Type: new Abstract: Max-Linear Bayesian Networks (MLBNs) provide a powerful framework for causal inference in extreme-value settings; we consider MLBNs with noise parameters with a given topology in terms of the max-plus algebra by taking its logarithm. Then, we show that an estimator of a parameter…

May 2, 2025
No-Regret Generative Modeling via Parabolic Monge-Amp`ere PDE

No-Regret Generative Modeling via Parabolic Monge-Amp`ere PDE arXiv:2504.09279v1 Announce Type: new Abstract: We introduce a novel generative modeling framework based on a discretized parabolic Monge-Amp`ere PDE, which emerges as a continuous limit of the Sinkhorn algorithm commonly used in optimal transport. Our method performs iterative refinement in the space of Brenier maps using a mirror…

April 15, 2025
On Model Protection in Federated Learning against Eavesdropping Attacks

On Model Protection in Federated Learning against Eavesdropping Attacks arXiv:2504.02114v1 Announce Type: cross Abstract: In this study, we investigate the protection offered by federated learning algorithms against eavesdropping adversaries. In our model, the adversary is capable of intercepting model updates transmitted from clients to the server, enabling it to create its own estimate of the…

April 4, 2025
DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization

DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization arXiv:2503.23430v1 Announce Type: new Abstract: Domain generalization (DG) aims to learn models that can generalize well to unseen domains by training only on a set of source domains. Sharpness-Aware Minimization (SAM) has been a popular approach for this, aiming to find flat minima in the total loss landscape.…

April 1, 2025
Accelerated Stein Variational Gradient Flow

Accelerated Stein Variational Gradient Flow arXiv:2503.23462v1 Announce Type: new Abstract: Stein variational gradient descent (SVGD) is a kernel-based particle method for sampling from a target distribution, e.g., in generative modeling and Bayesian inference. SVGD does not require estimating the gradient of the log-density, which is called score estimation. In practice, SVGD can be slow compared…

April 1, 2025
A stochastic gradient descent algorithm with random search directions

A stochastic gradient descent algorithm with random search directions arXiv:2503.19942v1 Announce Type: new Abstract: Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted…

March 27, 2025
Universal Architectures for the Learning of Polyhedral Norms and Convex Regularization Functionals

Universal Architectures for the Learning of Polyhedral Norms and Convex Regularization Functionals arXiv:2503.19190v1 Announce Type: new Abstract: This paper addresses the task of learning convex regularizers to guide the reconstruction of images from limited data. By imposing that the reconstruction be amplitude-equivariant, we narrow down the class of admissible functionals to those that can be…

March 26, 2025
Procrustes Wasserstein Metric: A Modified Benamou-Brenier Approach with Applications to Latent Gaussian Distributions

Procrustes Wasserstein Metric: A Modified Benamou-Brenier Approach with Applications to Latent Gaussian Distributions arXiv:2503.16580v1 Announce Type: new Abstract: We introduce a modified Benamou-Brenier type approach leading to a Wasserstein type distance that allows global invariance, specifically, isometries, and we show that the problem can be summarized to orthogonal transformations. This distance is defined by penalizing…

March 24, 2025
Ranking and Selection with Simultaneous Input Data Collection

Ranking and Selection with Simultaneous Input Data Collection arXiv:2503.11773v1 Announce Type: new Abstract: In this paper, we propose a general and novel formulation of ranking and selection with the existence of streaming input data. The collection of multiple streams of such data may consume different types of resources, and hence can be conducted simultaneously. To…

March 18, 2025
Reheated Gradient-based Discrete Sampling for Combinatorial Optimization

Reheated Gradient-based Discrete Sampling for Combinatorial Optimization arXiv:2503.04047v1 Announce Type: new Abstract: Recently, gradient-based discrete sampling has emerged as a highly efficient, general-purpose solver for various combinatorial optimization (CO) problems, achieving performance comparable to or surpassing the popular data-driven approaches. However, we identify a critical issue in these methods, which we term ”wandering in contours”.…

March 7, 2025
Efficient Risk-sensitive Planning via Entropic Risk Measures

Efficient Risk-sensitive Planning via Entropic Risk Measures arXiv:2502.20423v1 Announce Type: new Abstract: Risk-sensitive planning aims to identify policies maximizing some tail-focused metrics in Markov Decision Processes (MDPs). Such an optimization task can be very costly for the most widely used and interpretable metrics such as threshold probabilities or (Conditional) Values at Risk. Indeed, previous work…

March 3, 2025
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition

New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition arXiv:2502.14060v1 Announce Type: new Abstract: We study fundamental limits of first-order stochastic optimization in a range of nonconvex settings, including L-smooth functions satisfying Quasar-Convexity (QC), Quadratic Growth (QG), and Restricted Secant Inequalities (RSI). While the convergence properties of standard algorithms are well-understood in deterministic regimes,…

February 21, 2025
Online Covariance Matrix Estimation in Sketched Newton Methods

Online Covariance Matrix Estimation in Sketched Newton Methods arXiv:2502.07114v1 Announce Type: new Abstract: Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique…

February 12, 2025
Online Covariance Estimation in Nonsmooth Stochastic Approximation

Online Covariance Estimation in Nonsmooth Stochastic Approximation arXiv:2502.05305v1 Announce Type: new Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H’ajek and Le Cam.…

February 11, 2025
Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees

Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees arXiv:2502.00240v1 Announce Type: new Abstract: Learning effective regularization is crucial for solving ill-posed inverse problems, which arise in a wide range of scientific and engineering applications. While data-driven methods that parameterize regularizers using deep neural networks have demonstrated strong empirical performance, they often…

February 4, 2025
Learning to Fuse Temporal Proximity Networks: A Case Study in Chimpanzee Social Interactions

Learning to Fuse Temporal Proximity Networks: A Case Study in Chimpanzee Social Interactions arXiv:2502.00302v1 Announce Type: new Abstract: How can we identify groups of primate individuals which could be conjectured to drive social structure? To address this question, one of us has collected a time series of data for social interactions between chimpanzees. Here we…

February 4, 2025
A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization arXiv:2501.18756v1 Announce Type: new Abstract: Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function’s optimum and…

February 3, 2025
On the convergence of noisy Bayesian Optimization with Expected Improvement

On the convergence of noisy Bayesian Optimization with Expected Improvement arXiv:2501.09262v1 Announce Type: new Abstract: Expected improvement (EI) is one of the most widely-used acquisition functions in Bayesian optimization (BO). Despite its proven success in applications for decades, important open questions remain on the theoretical convergence behaviors and rates for EI. In this paper, we…

January 17, 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks arXiv:2501.09137v1 Announce Type: cross Abstract: We study the gradient descent (GD) dynamics of a depth-2 linear neural network with a single input and output. We show that GD converges at an explicit linear rate to a global minimum of the training…

January 17, 2025
Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks

Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks arXiv:2501.05930v1 Announce Type: cross Abstract: We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global optimality of non-convex…

January 13, 2025
Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity arXiv:2501.04134v1 Announce Type: new Abstract: We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA…

January 9, 2025
Robust Multi-Dimensional Scaling via Accelerated Alternating Projections

Robust Multi-Dimensional Scaling via Accelerated Alternating Projections arXiv:2501.02208v1 Announce Type: new Abstract: We consider the robust multi-dimensional scaling (RMDS) problem in this paper. The goal is to localize point locations from pairwise distances that may be corrupted by outliers. Inspired by classic MDS theories, and nonconvex works for the robust principal component analysis (RPCA) problem,…

January 7, 2025
Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games

Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games arXiv:2412.20279v1 Announce Type: new Abstract: Generalized Schr”odinger Bridges (GSBs) are a fundamental mathematical framework used to analyze the most likely particle evolution based on the principle of least action including kinetic and potential energy. In parallel to their well-established presence in the theoretical realms…

December 31, 2024
Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces

Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces arXiv:2412.20556v1 Announce Type: new Abstract: We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using…

December 31, 2024
Localized exploration in contextual dynamic pricing achieves dimension-free regret

Localized exploration in contextual dynamic pricing achieves dimension-free regret arXiv:2412.19252v1 Announce Type: new Abstract: We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy,…

December 30, 2024
Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems

Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems arXiv:2412.17916v1 Announce Type: new Abstract: We establish the theoretical framework for implementing the maximumn entropy on the mean (MEM) method for linear inverse problems in the setting of approximate (data-driven) priors. We prove a.s. convergence for empirical means and further develop…

December 25, 2024
Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes

Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes arXiv:2412.14291v1 Announce Type: cross Abstract: We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the “vanilla” PG method, achieving the…

December 20, 2024
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints

An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints arXiv:2412.08060v1 Announce Type: new Abstract: We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make repeated decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the algorithm has access to predictions of…

December 12, 2024
Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application

Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application arXiv:2412.05906v1 Announce Type: new Abstract: We study the discrete-time linear-quadratic (LQ) control model using reinforcement learning (RL). Using entropy to measure the cost of exploration, we prove that the optimal feedback policy for the problem must be Gaussian type. Then, we apply the results…

December 10, 2024
Nonparametric Filtering, Estimation and Classification using Neural Jump ODEs

Nonparametric Filtering, Estimation and Classification using Neural Jump ODEs arXiv:2412.03271v1 Announce Type: new Abstract: Neural Jump ODEs model the conditional expectation between observations by neural ODEs and jump at arrival of new observations. They have demonstrated effectiveness for fully data-driven online forecasting in settings with irregular and partial observations, operating under weak regularity assumptions. This…

December 5, 2024
Explicit and data-Efficient Encoding via Gradient Flow

Explicit and data-Efficient Encoding via Gradient Flow arXiv:2412.00864v1 Announce Type: new Abstract: The autoencoder model typically uses an encoder to map data to a lower dimensional latent space and a decoder to reconstruct it. However, relying on an encoder for inversion can lead to suboptimal representations, particularly limiting in physical sciences where precision is key.…

December 3, 2024