Category: math.OC
-
Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits
Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits arXiv:2603.02417v1 Announce Type: new Abstract: We develop a Fisher-geometric theory of stochastic gradient descent (SGD) in which mini-batch noise is an intrinsic, loss-induced matrix — not an exogenous scalar variance. Under exchangeable sampling, the mini-batch gradient covariance is pinned down (to leading…
-
Combinatorial Sparse PCA Beyond the Spiked Identity Model
Combinatorial Sparse PCA Beyond the Spiked Identity Model arXiv:2603.02607v1 Announce Type: new Abstract: Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Sigma$, whose top eigenvector $v in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into…
-
Stochastic Gradient Variational Inference with Price’s Gradient Estimator from Bures-Wasserstein to Parameter Space
Stochastic Gradient Variational Inference with Price’s Gradient Estimator from Bures-Wasserstein to Parameter Space arXiv:2602.18718v1 Announce Type: new Abstract: For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space)…
-
Implicit Bias and Convergence of Matrix Stochastic Mirror Descent
Implicit Bias and Convergence of Matrix Stochastic Mirror Descent arXiv:2602.18997v1 Announce Type: new Abstract: We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove…
-
Functional Central Limit Theorem for Stochastic Gradient Descent
Functional Central Limit Theorem for Stochastic Gradient Descent arXiv:2602.15538v1 Announce Type: new Abstract: We study the asymptotic shape of the trajectory of the stochastic gradient descent algorithm applied to a convex objective function. Under mild regularity assumptions, we prove a functional central limit theorem for the properly rescaled trajectory. Our result characterizes the long-term fluctuations…
-
Provable Offline Reinforcement Learning for Structured Cyclic MDPs
Provable Offline Reinforcement Learning for Structured Cyclic MDPs arXiv:2602.11679v1 Announce Type: new Abstract: We introduce a novel cyclic Markov decision process (MDP) framework for multi-step decision problems with heterogeneous stage-specific dynamics, transitions, and discount factors across the cycle. In this setting, offline learning is challenging: optimizing a policy at any stage shifts the state distributions…
-
Logarithmic-time Schedules for Scaling Language Models with Momentum
Logarithmic-time Schedules for Scaling Language Models with Momentum arXiv:2602.05298v1 Announce Type: new Abstract: In practice, the hyperparameters $(beta_1, beta_2)$ and weight-decay $lambda$ in AdamW are typically kept at fixed values. Is there any reason to do otherwise? We show that for large-scale language model training, the answer is yes: by exploiting the power-law structure of…
-
Byzantine Machine Learning: MultiKrum and an optimal notion of robustness
Byzantine Machine Learning: MultiKrum and an optimal notion of robustness arXiv:2602.03899v1 Announce Type: new Abstract: Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule…
-
Convergence of Muon with Newton-Schulz
Convergence of Muon with Newton-Schulz arXiv:2601.19156v1 Announce Type: new Abstract: We analyze Muon as originally proposed and used in practice — using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a…
-
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization arXiv:2601.12238v1 Announce Type: new Abstract: While momentum-based acceleration has been studied extensively in deterministic optimization problems, its behavior in nonstationary environments — where the data distribution and optimal parameters drift over time — remains underexplored. We analyze the tracking performance of Stochastic Gradient Descent…
-
Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach
Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach arXiv:2601.11016v1 Announce Type: new Abstract: In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules that prescribe decisions using covariates.…
-
Accelerated Regularized Wasserstein Proximal Sampling Algorithms
Accelerated Regularized Wasserstein Proximal Sampling Algorithms arXiv:2601.09848v1 Announce Type: new Abstract: We consider sampling from a Gibbs distribution by evolving a finite number of particles using a particular score estimator rather than Brownian motion. To accelerate the particles, we consider a second-order score-based ODE, similar to Nesterov acceleration. In contrast to traditional kernel density score…
-
SCaLE: Switching Cost aware Learning and Exploration
SCaLE: Switching Cost aware Learning and Exploration arXiv:2601.09042v1 Announce Type: cross Abstract: This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the…
-
Inference-Time Alignment for Diffusion Models via Doob’s Matching
Inference-Time Alignment for Diffusion Models via Doob’s Matching arXiv:2601.06514v1 Announce Type: new Abstract: Inference-time alignment for diffusion models aims to adapt a pre-trained diffusion model toward a target distribution without retraining the base score network, thereby preserving the generative capacity of the base model while enforcing desired properties at the inference time. A central mechanism…
-
Constrained Density Estimation via Optimal Transport
Constrained Density Estimation via Optimal Transport arXiv:2601.06830v1 Announce Type: new Abstract: A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the constraints that the expected value of a set of functions adopts or exceeds given values. The framework…
-
A brief note on learning problem with global perspectives
A brief note on learning problem with global perspectives arXiv:2601.05441v1 Announce Type: new Abstract: This brief note considers the problem of learning with dynamic-optimizing principal-agent setting, in which the agents are allowed to have global perspectives about the learning process, i.e., the ability to view things according to their relative importances or in their true…
-
First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data
First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data arXiv:2601.02523v1 Announce Type: cross Abstract: Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy resources. Yet the optimization algorithms behind…
-
A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue
A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue arXiv:2512.22282v1 Announce Type: new Abstract: Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1…
-
Fast and Exact Least Absolute Deviations Line Fitting via Piecewise Affine Lower-Bounding
Fast and Exact Least Absolute Deviations Line Fitting via Piecewise Affine Lower-Bounding arXiv:2512.20682v1 Announce Type: new Abstract: Least-absolute-deviations (LAD) line fitting is robust to outliers but computationally more involved than least squares regression. Although the literature includes linear and near-linear time algorithms for the LAD line fitting problem, these methods are difficult to implement and,…
-
Generative modeling of conditional probability distributions on the level-sets of collective variables
Generative modeling of conditional probability distributions on the level-sets of collective variables arXiv:2512.17374v1 Announce Type: new Abstract: Given a probability distribution $mu$ in $mathbb{R}^d$ represented by data, we study in this paper the generative modeling of its conditional probability distributions on the level-sets of a collective variable $xi: mathbb{R}^d rightarrow mathbb{R}^k$, where $1 le k…
-
STARK denoises spatial transcriptomics images via adaptive regularization
STARK denoises spatial transcriptomics images via adaptive regularization arXiv:2512.10994v1 Announce Type: new Abstract: We present an approach to denoising spatial transcriptomics images that is particularly effective for uncovering cell identities in the regime of ultra-low sequencing depths, and also allows for interpolation of gene expression. The method — Spatial Transcriptomics via Adaptive Regularization and Kernels…
-
Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming
Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming arXiv:2512.08948v1 Announce Type: new Abstract: We study online statistical inference for the solutions of stochastic optimization problems with equality and inequality constraints. Such problems are prevalent in statistics and machine learning, encompassing constrained $M$-estimation, physics-informed models, safe reinforcement learning, and algorithmic fairness. We develop…
-
Worst-case generation via minimax optimization in Wasserstein space
Worst-case generation via minimax optimization in Wasserstein space arXiv:2512.08176v1 Announce Type: new Abstract: Worst-case generation plays a critical role in evaluating robustness and stress-testing systems under distribution shifts, in applications ranging from machine learning models to power grids and medical prediction systems. We develop a generative modeling framework for worst-case generation for a pre-specified risk,…
-
Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis
Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis arXiv:2512.08601v1 Announce Type: new Abstract: Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts…
-
Symmetric Linear Dynamical Systems are Learnable from Few Observations
Symmetric Linear Dynamical Systems are Learnable from Few Observations arXiv:2512.05337v1 Announce Type: new Abstract: We consider the problem of learning the parameters of a $N$-dimensional stochastic linear dynamics under both full and partial observations from a single trajectory of time $T$. We introduce and analyze a new estimator that achieves a small maximum element-wise error…
-
No-Regret Gaussian Process Optimization of Time-Varying Functions
No-Regret Gaussian Process Optimization of Time-Varying Functions arXiv:2512.00517v1 Announce Type: new Abstract: Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and…
-
Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition
Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition arXiv:2511.16796v1 Announce Type: cross Abstract: Penalty-based methods have become popular for solving bilevel optimization (BLO) problems, thanks to their effective first-order nature. However, they often require inner-loop iterations to solve the lower-level (LL) problem and small outer-loop step sizes to handle the increased smoothness…
-
Splat Regression Models
Splat Regression Models arXiv:2511.14042v1 Announce Type: new Abstract: We introduce a highly expressive class of function approximators called Splat Regression Models. Model outputs are mixtures of heterogeneous and anisotropic bump functions, termed splats, each weighted by an output vector. The power of splat modeling lies in its ability to locally adjust the scale and direction…
-
SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation
SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation arXiv:2511.14146v1 Announce Type: new Abstract: We propose a distributionally robust formulation for simultaneously estimating the covariance matrix and the precision matrix of a random vector.The proposed model minimizes the worst-case weighted sum of the Frobenius loss of the covariance estimator and Stein’s loss of the precision…
-
Generalized infinite dimensional Alpha-Procrustes based geometries
Generalized infinite dimensional Alpha-Procrustes based geometries arXiv:2511.09801v1 Announce Type: new Abstract: This work extends the recently introduced Alpha-Procrustes family of Riemannian metrics for symmetric positive definite (SPD) matrices by incorporating generalized versions of the Bures-Wasserstein (GBW), Log-Euclidean, and Wasserstein distances. While the Alpha-Procrustes framework has unified many classical metrics in both finite- and infinite- dimensional…
-
Operator Models for Continuous-Time Offline Reinforcement Learning
Operator Models for Continuous-Time Offline Reinforcement Learning arXiv:2511.10383v1 Announce Type: new Abstract: Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often unsafe or impractical, motivating offline reinforcement learning from historical data. However, there is limited statistical understanding of the approximation errors…
-
Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems
Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems arXiv:2510.26061v1 Announce Type: new Abstract: We propose a data-driven framework for efficiently solving quadratic programming (QP) problems by reducing the number of variables in high-dimensional QPs using instance-specific projection. A graph neural network-based model is designed to generate projections tailored to each QP instance, enabling…
-
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization arXiv:2510.20883v1 Announce Type: new Abstract: Adversarial training has emerged as a key technique to enhance model robustness against adversarial input perturbations. Many of the existing methods rely on computationally expensive min-max problems that limit their application in practice. We propose a novel formulation of adversarial…
-
The Tree-SNE Tree Exists
The Tree-SNE Tree Exists arXiv:2510.15014v1 Announce Type: new Abstract: The clustering and visualisation of high-dimensional data is a ubiquitous task in modern data science. Popular techniques include nonlinear dimensionality reduction methods like t-SNE or UMAP. These methods face the `scale-problem’ of clustering: when dealing with the MNIST dataset, do we want to distinguish different digits…
-
Exact Dynamics of Multi-class Stochastic Gradient Descent
Exact Dynamics of Multi-class Stochastic Gradient Descent arXiv:2510.14074v1 Announce Type: new Abstract: We develop a framework for analyzing the training and learning rate dynamics on a variety of high- dimensional optimization problems trained using one-pass stochastic gradient descent (SGD) with data generated from multiple anisotropic classes. We give exact expressions for a large class of…
-
On the Rate of Gaussian Approximation for Linear Regression Problems
On the Rate of Gaussian Approximation for Linear Regression Problems arXiv:2509.14039v1 Announce Type: new Abstract: In this paper, we consider the problem of Gaussian approximation for the online linear regression task. We derive the corresponding rates for the setting of a constant learning rate and study the explicit dependence of the convergence rate upon the…
-
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation arXiv:2509.09802v1 Announce Type: cross Abstract: We propose and study Sparse Polyak, a variant of Polyak’s adaptive step size, designed to solve high-dimensional statistical estimation problems where the problem dimension is allowed to grow much faster than the sample size. In such settings, the standard Polyak…
-
Cryo-EM as a Stochastic Inverse Problem
Cryo-EM as a Stochastic Inverse Problem arXiv:2509.05541v1 Announce Type: new Abstract: Cryo-electron microscopy (Cryo-EM) enables high-resolution imaging of biomolecules, but structural heterogeneity remains a major challenge in 3D reconstruction. Traditional methods assume a discrete set of conformations, limiting their ability to recover continuous structural variability. In this work, we formulate cryo-EM reconstruction as a stochastic…
-
Risk-averse Fair Multi-class Classification
Risk-averse Fair Multi-class Classification arXiv:2509.05771v1 Announce Type: new Abstract: We develop a new classification framework based on the theory of coherent risk measures and systemic risk. The proposed approach is suitable for multi-class problems when the data is noisy, scarce (relative to the dimension of the problem), and the labeling might be unreliable. In the…
-
The Nondecreasing Rank
The Nondecreasing Rank arXiv:2509.00265v1 Announce Type: new Abstract: In this article the notion of the nondecreasing (ND) rank of a matrix or tensor is introduced. A tensor has an ND rank of r if it can be represented as a sum of r outer products of vectors, with each vector satisfying a monotonicity constraint. It…
-
Stochastic Gradients under Nuisances
Stochastic Gradients under Nuisances arXiv:2508.20326v1 Announce Type: new Abstract: Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while…
-
Flow Matching-Based Generative Modeling for Efficient and Scalable Data Assimilation
Flow Matching-Based Generative Modeling for Efficient and Scalable Data Assimilation arXiv:2508.13313v1 Announce Type: new Abstract: Data assimilation (DA) is the problem of sequentially estimating the state of a dynamical system from noisy observations. Recent advances in generative modeling have inspired new approaches to DA in high-dimensional nonlinear settings, especially the ensemble score filter (EnSF). However,…
-
A pseudo-inverse of a line graph
A pseudo-inverse of a line graph arXiv:2508.09412v1 Announce Type: new Abstract: Line graphs are an alternative representation of graphs where each vertex of the original (root) graph becomes an edge. However not all graphs have a corresponding root graph, hence the transformation from graphs to line graphs is not invertible. We investigate the case when…
-
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing arXiv:2508.09228v1 Announce Type: cross Abstract: Training a single model for multilingual, multi-task speech processing (MSP) is severely hampered by conflicting objectives between tasks like speech recognition and translation. While multi-objective optimization (MOO) aims to align gradient updates, its effectiveness diminishes as the number of tasks grows, making…
-
High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation
High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation arXiv:2508.05570v1 Announce Type: new Abstract: In this paper, we study the bias and high-order error bounds of the Linear Stochastic Approximation (LSA) algorithm with Polyak-Ruppert (PR) averaging under Markovian noise. We focus on the version of the algorithm with constant step size $alpha$ and propose a…
-
Reinforcement Learning in MDPs with Information-Ordered Policies
Reinforcement Learning in MDPs with Information-Ordered Policies arXiv:2508.03904v1 Announce Type: new Abstract: We propose an epoch-based reinforcement learning algorithm for infinite-horizon average-cost Markov decision processes (MDPs) that leverages a partial order over a policy class. In this structure, $pi’ leq pi$ if data collected under $pi$ can be used to estimate the performance of $pi’$,…
-
A Smoothing Newton Method for Rank-one Matrix Recovery
A Smoothing Newton Method for Rank-one Matrix Recovery arXiv:2507.23017v1 Announce Type: new Abstract: We consider the phase retrieval problem, which involves recovering a rank-one positive semidefinite matrix from rank-one measurements. A recently proposed algorithm based on Bures-Wasserstein gradient descent (BWGD) exhibits superlinear convergence, but it is unstable, and existing theory can only prove local linear…
-
Statistical and Algorithmic Foundations of Reinforcement Learning
Statistical and Algorithmic Foundations of Reinforcement Learning arXiv:2507.14444v1 Announce Type: new Abstract: As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL…
-
Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization
Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization arXiv:2507.09093v1 Announce Type: new Abstract: We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts.…
-
Adjoint Schr”odinger Bridge Sampler
Adjoint Schr”odinger Bridge Sampler arXiv:2506.22565v1 Announce Type: new Abstract: Computational methods for learning to sample from the Boltzmann distribution — where the target distribution is known only up to an unnormalized energy function — have advanced significantly recently. Due to the lack of explicit target samples, however, prior diffusion-based methods, known as diffusion samplers, often…
-
Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19
Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19 arXiv:2506.21739v1 Announce Type: new Abstract: Authors Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, and Tzu-Hsuan Liu use the Finite Impulse Response (FIR) linear system filtering method to track and predict the number of people infected and recovered from COVID-19, in a…
-
The final solution of the Hitchhiker’s problem #5
The final solution of the Hitchhiker’s problem #5 arXiv:2506.20672v1 Announce Type: new Abstract: A recent survey, nicknamed “Hitchhiker’s Guide”, J.J. Arias-Garc{i}a, R. Mesiar, and B. De Baets, A hitchhiker’s guide to quasi-copulas, Fuzzy Sets and Systems 393 (2020) 1-28, has raised the rating of quasi-copula problems in the dependence modeling community in spite of the…
-
Zeroth-Order Optimization Finds Flat Minima
Zeroth-Order Optimization Finds Flat Minima arXiv:2506.05454v1 Announce Type: cross Abstract: Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit…
-
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference arXiv:2505.21721v1 Announce Type: new Abstract: We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at an almost dimension-independent rate. Specifically, for strongly log-concave and log-smooth targets, the number of iterations for BBVI with a sub-Gaussian family to achieve…
-
Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling
Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling arXiv:2505.18327v1 Announce Type: new Abstract: Constrained stochastic nonlinear optimization problems have attracted significant attention for their ability to model complex real-world scenarios in physics, economics, and biology. As datasets continue to grow, online inference methods have become crucial for enabling real-time decision-making without the need…
-
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine arXiv:2505.17283v1 Announce Type: new Abstract: Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach…
-
Dimension-adapted Momentum Outscales SGD
Dimension-adapted Momentum Outscales SGD arXiv:2505.16098v1 Announce Type: new Abstract: We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target…
-
Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization
Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization arXiv:2505.11089v1 Announce Type: new Abstract: In this paper, we consider a score-based Integer Programming (IP) approach for solving the Bayesian Network Structure Learning (BNSL) problem. State-of-the-art BNSL IP formulations suffer from the exponentially large number of variables and constraints. A standard approach in IP…
-
A Scalable Gradient-Based Optimization Framework for Sparse Minimum-Variance Portfolio Selection
A Scalable Gradient-Based Optimization Framework for Sparse Minimum-Variance Portfolio Selection arXiv:2505.10099v1 Announce Type: new Abstract: Portfolio optimization involves selecting asset weights to minimize a risk-reward objective, such as the portfolio variance in the classical minimum-variance framework. Sparse portfolio selection extends this by imposing a cardinality constraint: only $k$ assets from a universe of $p$ may…
-
Optimal Transport for Machine Learners
Optimal Transport for Machine Learners arXiv:2505.06589v1 Announce Type: new Abstract: Optimal Transport is a foundational mathematical theory that connects optimization, partial differential equations, and probability. It offers a powerful framework for comparing probability distributions and has recently become an important tool in machine learning, especially for designing and evaluating generative models. These course notes cover…
-
Safe-EF: Error Feedback for Nonsmooth Constrained Optimization
Safe-EF: Error Feedback for Nonsmooth Constrained Optimization arXiv:2505.06053v1 Announce Type: cross Abstract: Federated learning faces severe communication bottlenecks due to the high dimensionality of model updates. Communication compression with contractive compressors (e.g., Top-K) is often preferable in practice but can degrade performance without proper handling. Error feedback (EF) mitigates such issues but has been largely…
-
Mixed-Integer Optimization for Responsible Machine Learning
Mixed-Integer Optimization for Responsible Machine Learning arXiv:2505.05857v1 Announce Type: cross Abstract: In the last few decades, Machine Learning (ML) has achieved significant success across domains ranging from healthcare, sustainability, and the social sciences, to criminal justice and finance. But its deployment in increasingly sophisticated, critical, and sensitive areas affecting individuals, the groups they belong to,…
-
Inference for max-linear Bayesian networks with noise
Inference for max-linear Bayesian networks with noise arXiv:2505.00229v1 Announce Type: new Abstract: Max-Linear Bayesian Networks (MLBNs) provide a powerful framework for causal inference in extreme-value settings; we consider MLBNs with noise parameters with a given topology in terms of the max-plus algebra by taking its logarithm. Then, we show that an estimator of a parameter…
-
No-Regret Generative Modeling via Parabolic Monge-Amp`ere PDE
No-Regret Generative Modeling via Parabolic Monge-Amp`ere PDE arXiv:2504.09279v1 Announce Type: new Abstract: We introduce a novel generative modeling framework based on a discretized parabolic Monge-Amp`ere PDE, which emerges as a continuous limit of the Sinkhorn algorithm commonly used in optimal transport. Our method performs iterative refinement in the space of Brenier maps using a mirror…
-
On Model Protection in Federated Learning against Eavesdropping Attacks
On Model Protection in Federated Learning against Eavesdropping Attacks arXiv:2504.02114v1 Announce Type: cross Abstract: In this study, we investigate the protection offered by federated learning algorithms against eavesdropping adversaries. In our model, the adversary is capable of intercepting model updates transmitted from clients to the server, enabling it to create its own estimate of the…
-
DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization
DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization arXiv:2503.23430v1 Announce Type: new Abstract: Domain generalization (DG) aims to learn models that can generalize well to unseen domains by training only on a set of source domains. Sharpness-Aware Minimization (SAM) has been a popular approach for this, aiming to find flat minima in the total loss landscape.…
-
Accelerated Stein Variational Gradient Flow
Accelerated Stein Variational Gradient Flow arXiv:2503.23462v1 Announce Type: new Abstract: Stein variational gradient descent (SVGD) is a kernel-based particle method for sampling from a target distribution, e.g., in generative modeling and Bayesian inference. SVGD does not require estimating the gradient of the log-density, which is called score estimation. In practice, SVGD can be slow compared…
-
A stochastic gradient descent algorithm with random search directions
A stochastic gradient descent algorithm with random search directions arXiv:2503.19942v1 Announce Type: new Abstract: Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted…
-
Universal Architectures for the Learning of Polyhedral Norms and Convex Regularization Functionals
Universal Architectures for the Learning of Polyhedral Norms and Convex Regularization Functionals arXiv:2503.19190v1 Announce Type: new Abstract: This paper addresses the task of learning convex regularizers to guide the reconstruction of images from limited data. By imposing that the reconstruction be amplitude-equivariant, we narrow down the class of admissible functionals to those that can be…
-
Procrustes Wasserstein Metric: A Modified Benamou-Brenier Approach with Applications to Latent Gaussian Distributions
Procrustes Wasserstein Metric: A Modified Benamou-Brenier Approach with Applications to Latent Gaussian Distributions arXiv:2503.16580v1 Announce Type: new Abstract: We introduce a modified Benamou-Brenier type approach leading to a Wasserstein type distance that allows global invariance, specifically, isometries, and we show that the problem can be summarized to orthogonal transformations. This distance is defined by penalizing…
-
Ranking and Selection with Simultaneous Input Data Collection
Ranking and Selection with Simultaneous Input Data Collection arXiv:2503.11773v1 Announce Type: new Abstract: In this paper, we propose a general and novel formulation of ranking and selection with the existence of streaming input data. The collection of multiple streams of such data may consume different types of resources, and hence can be conducted simultaneously. To…
-
Reheated Gradient-based Discrete Sampling for Combinatorial Optimization
Reheated Gradient-based Discrete Sampling for Combinatorial Optimization arXiv:2503.04047v1 Announce Type: new Abstract: Recently, gradient-based discrete sampling has emerged as a highly efficient, general-purpose solver for various combinatorial optimization (CO) problems, achieving performance comparable to or surpassing the popular data-driven approaches. However, we identify a critical issue in these methods, which we term ”wandering in contours”.…
-
Efficient Risk-sensitive Planning via Entropic Risk Measures
Efficient Risk-sensitive Planning via Entropic Risk Measures arXiv:2502.20423v1 Announce Type: new Abstract: Risk-sensitive planning aims to identify policies maximizing some tail-focused metrics in Markov Decision Processes (MDPs). Such an optimization task can be very costly for the most widely used and interpretable metrics such as threshold probabilities or (Conditional) Values at Risk. Indeed, previous work…
-
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition arXiv:2502.14060v1 Announce Type: new Abstract: We study fundamental limits of first-order stochastic optimization in a range of nonconvex settings, including L-smooth functions satisfying Quasar-Convexity (QC), Quadratic Growth (QG), and Restricted Secant Inequalities (RSI). While the convergence properties of standard algorithms are well-understood in deterministic regimes,…
-
Online Covariance Matrix Estimation in Sketched Newton Methods
Online Covariance Matrix Estimation in Sketched Newton Methods arXiv:2502.07114v1 Announce Type: new Abstract: Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique…
-
Online Covariance Estimation in Nonsmooth Stochastic Approximation
Online Covariance Estimation in Nonsmooth Stochastic Approximation arXiv:2502.05305v1 Announce Type: new Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H’ajek and Le Cam.…
-
Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees
Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees arXiv:2502.00240v1 Announce Type: new Abstract: Learning effective regularization is crucial for solving ill-posed inverse problems, which arise in a wide range of scientific and engineering applications. While data-driven methods that parameterize regularizers using deep neural networks have demonstrated strong empirical performance, they often…
-
A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization
A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization arXiv:2501.18756v1 Announce Type: new Abstract: Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function’s optimum and…
-
On the convergence of noisy Bayesian Optimization with Expected Improvement
On the convergence of noisy Bayesian Optimization with Expected Improvement arXiv:2501.09262v1 Announce Type: new Abstract: Expected improvement (EI) is one of the most widely-used acquisition functions in Bayesian optimization (BO). Despite its proven success in applications for decades, important open questions remain on the theoretical convergence behaviors and rates for EI. In this paper, we…
-
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks arXiv:2501.09137v1 Announce Type: cross Abstract: We study the gradient descent (GD) dynamics of a depth-2 linear neural network with a single input and output. We show that GD converges at an explicit linear rate to a global minimum of the training…
-
Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks
Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks arXiv:2501.05930v1 Announce Type: cross Abstract: We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global optimality of non-convex…
-
Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity
Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity arXiv:2501.04134v1 Announce Type: new Abstract: We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA…
-
Robust Multi-Dimensional Scaling via Accelerated Alternating Projections
Robust Multi-Dimensional Scaling via Accelerated Alternating Projections arXiv:2501.02208v1 Announce Type: new Abstract: We consider the robust multi-dimensional scaling (RMDS) problem in this paper. The goal is to localize point locations from pairwise distances that may be corrupted by outliers. Inspired by classic MDS theories, and nonconvex works for the robust principal component analysis (RPCA) problem,…
-
Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games
Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games arXiv:2412.20279v1 Announce Type: new Abstract: Generalized Schr”odinger Bridges (GSBs) are a fundamental mathematical framework used to analyze the most likely particle evolution based on the principle of least action including kinetic and potential energy. In parallel to their well-established presence in the theoretical realms…
-
Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces
Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces arXiv:2412.20556v1 Announce Type: new Abstract: We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using…
-
Localized exploration in contextual dynamic pricing achieves dimension-free regret
Localized exploration in contextual dynamic pricing achieves dimension-free regret arXiv:2412.19252v1 Announce Type: new Abstract: We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy,…
-
Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems
Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems arXiv:2412.17916v1 Announce Type: new Abstract: We establish the theoretical framework for implementing the maximumn entropy on the mean (MEM) method for linear inverse problems in the setting of approximate (data-driven) priors. We prove a.s. convergence for empirical means and further develop…
-
Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes
Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes arXiv:2412.14291v1 Announce Type: cross Abstract: We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the “vanilla” PG method, achieving the…
-
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints arXiv:2412.08060v1 Announce Type: new Abstract: We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make repeated decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the algorithm has access to predictions of…
-
Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application
Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application arXiv:2412.05906v1 Announce Type: new Abstract: We study the discrete-time linear-quadratic (LQ) control model using reinforcement learning (RL). Using entropy to measure the cost of exploration, we prove that the optimal feedback policy for the problem must be Gaussian type. Then, we apply the results…
-
Nonparametric Filtering, Estimation and Classification using Neural Jump ODEs
Nonparametric Filtering, Estimation and Classification using Neural Jump ODEs arXiv:2412.03271v1 Announce Type: new Abstract: Neural Jump ODEs model the conditional expectation between observations by neural ODEs and jump at arrival of new observations. They have demonstrated effectiveness for fully data-driven online forecasting in settings with irregular and partial observations, operating under weak regularity assumptions. This…
-
Explicit and data-Efficient Encoding via Gradient Flow
Explicit and data-Efficient Encoding via Gradient Flow arXiv:2412.00864v1 Announce Type: new Abstract: The autoencoder model typically uses an encoder to map data to a lower dimensional latent space and a decoder to reconstruct it. However, relying on an encoder for inversion can lead to suboptimal representations, particularly limiting in physical sciences where precision is key.…