Category: math.PR

  • Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

    Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective arXiv:2603.04479v1 Announce Type: new Abstract: We study the Collatz total stopping time $tau(n)$ over $nle 10^7$ from a probabilistic machine learning viewpoint. Empirically, $tau(n)$ is a skewed and heavily overdispersed count with pronounced arithmetic heterogeneity. We develop two complementary models. First, a Bayesian hierarchical…

  • The Partition Principle Revisited: Non-Equal Volume Designs Achieve Minimal Expected Star Discrepancy

    The Partition Principle Revisited: Non-Equal Volume Designs Achieve Minimal Expected Star Discrepancy arXiv:2603.00202v1 Announce Type: new Abstract: We study the expected star discrepancy under a newly designed class of non-equal volume partitions. The main contributions are twofold. First, we establish a strong partition principle for the star discrepancy, showing that our newly designed non-equal volume…

  • Deep Neural Networks as Iterated Function Systems and a Generalization Bound

    Deep Neural Networks as Iterated Function Systems and a Generalization Bound arXiv:2601.19958v1 Announce Type: new Abstract: Deep neural networks (DNNs) achieve remarkable performance on a wide range of tasks, yet their mathematical analysis remains fragmented: stability and generalization are typically studied in disparate frameworks and on a case-by-case basis. Architecturally, DNNs rely on the recursive…

  • Distributional Computational Graphs: Error Bounds

    Distributional Computational Graphs: Error Bounds arXiv:2601.16250v1 Announce Type: new Abstract: We study a general framework of distributional computational graphs: computational graphs whose inputs are probability distributions rather than point values. We analyze the discretization error that arises when these graphs are evaluated using finite approximations of continuous probability distributions. Such an approximation might be the…

  • Parametric RDT approach to computational gap of symmetric binary perceptron

    Parametric RDT approach to computational gap of symmetric binary perceptron arXiv:2601.10628v1 Announce Type: new Abstract: We study potential presence of statistical-computational gaps (SCG) in symmetric binary perceptrons (SBP) via a parametric utilization of emph{fully lifted random duality theory} (fl-RDT) [96]. A structural change from decreasingly to arbitrarily ordered $c$-sequence (a key fl-RDT parametric component) is…

  • Tail-Sensitive KL and R’enyi Convergence of Unadjusted Hamiltonian Monte Carlo via One-Shot Couplings

    Tail-Sensitive KL and R’enyi Convergence of Unadjusted Hamiltonian Monte Carlo via One-Shot Couplings arXiv:2601.09019v1 Announce Type: new Abstract: Hamiltonian Monte Carlo (HMC) algorithms are among the most widely used sampling methods in high dimensional settings, yet their convergence properties are poorly understood in divergences that quantify relative density mismatch, such as Kullback-Leibler (KL) and R’enyi…

  • SCaLE: Switching Cost aware Learning and Exploration

    SCaLE: Switching Cost aware Learning and Exploration arXiv:2601.09042v1 Announce Type: cross Abstract: This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the…

  • Constrained Density Estimation via Optimal Transport

    Constrained Density Estimation via Optimal Transport arXiv:2601.06830v1 Announce Type: new Abstract: A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the constraints that the expected value of a set of functions adopts or exceeds given values. The framework…

  • Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem

    Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem arXiv:2601.06009v1 Announce Type: new Abstract: We develop a practical framework for distinguishing diffusive stochastic processes from deterministic signals using only a single discrete time series. Our approach is based on classical excursion and crossing theorems for continuous semimartingales, which correlates number $N_varepsilon$ of excursions of magnitude…

  • Constructive Approximation of Random Process via Stochastic Interpolation Neural Network Operators

    Constructive Approximation of Random Process via Stochastic Interpolation Neural Network Operators arXiv:2512.24106v1 Announce Type: new Abstract: In this paper, we construct a class of stochastic interpolation neural network operators (SINNOs) with random coefficients activated by sigmoidal functions. We establish their boundedness, interpolation accuracy, and approximation capabilities in the mean square sense, in probability, as well…

  • Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments

    Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments arXiv:2512.21005v1 Announce Type: new Abstract: Modeling sparse count data, which arise across numerous scientific fields, presents significant statistical challenges. This chapter addresses these challenges in the context of infectious disease prediction, with a focus on predicting outbreaks in geographic regions that have historically…

  • Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler

    Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler arXiv:2512.17977v1 Announce Type: new Abstract: Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling — classical MCMC methods, even with tempering, can suffer from exponential mixing times —…

  • On The Hidden Biases of Flow Matching Samplers

    On The Hidden Biases of Flow Matching Samplers arXiv:2512.16768v1 Announce Type: new Abstract: We study the implicit bias of flow matching (FM) samplers via the lens of empirical flow matching. Although population FM may produce gradient-field velocities resembling optimal transport (OT), we show that the empirical FM minimizer is almost never a gradient field, even…

  • Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels

    Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels arXiv:2512.10256v1 Announce Type: new Abstract: We analyze prediction error in stochastic dynamical systems with memory, focusing on generalized Langevin equations (GLEs) formulated as stochastic Volterra equations. We establish that, under a strongly convex potential, trajectory discrepancies decay at a rate determined by the decay of…

  • Provable Diffusion Posterior Sampling for Bayesian Inversion

    Provable Diffusion Posterior Sampling for Bayesian Inversion arXiv:2512.08022v1 Announce Type: new Abstract: This paper proposes a novel diffusion-based posterior sampling method within a plug-and-play (PnP) framework. Our approach constructs a probability transport from an easy-to-sample terminal distribution to the target posterior, using a warm-start strategy to initialize the particles. To approximate the posterior score, we…

  • How to Tame Your LLM: Semantic Collapse in Continuous Systems

    How to Tame Your LLM: Semantic Collapse in Continuous Systems arXiv:2512.05162v1 Announce Type: new Abstract: We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve under probabilistic transition operators. The associated transfer operator $P: L^2(M,mu) to L^2(M,mu)$ encodes…

  • Novelty detection on path space

    Novelty detection on path space arXiv:2512.03243v1 Announce Type: new Abstract: We frame novelty detection on path space as a hypothesis testing problem with signature-based test statistics. Using transportation-cost inequalities of Gasteratos and Jacquier (2023), we obtain tail bounds for false positive rates that extend beyond Gaussian measures to laws of RDE solutions with smooth bounded…

  • Algorithms and Scientific Software for Quasi-Monte Carlo, Fast Gaussian Process Regression, and Scientific Machine Learning

    Algorithms and Scientific Software for Quasi-Monte Carlo, Fast Gaussian Process Regression, and Scientific Machine Learning arXiv:2511.21915v1 Announce Type: new Abstract: Most scientific domains elicit the development of efficient algorithms and accessible scientific software. This thesis unifies our developments in three broad domains: Quasi-Monte Carlo (QMC) methods for efficient high-dimensional integration, Gaussian process (GP) regression for…

  • Precise asymptotic analysis of Sobolev training for random feature models

    Precise asymptotic analysis of Sobolev training for random feature models arXiv:2511.03050v1 Announce Type: new Abstract: Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training — regression with both function and gradient data…

  • Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

    Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks arXiv:2511.02258v1 Announce Type: new Abstract: This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD) for single-layer networks. Building on the seminal work of Saad and Solla, which analyzed the deterministic (ballistic) scaling limits of SGD corresponding to the gradient flow of…

  • Accuracy estimation of neural networks by extreme value theory

    Accuracy estimation of neural networks by extreme value theory arXiv:2511.00490v1 Announce Type: new Abstract: Neural networks are able to approximate any continuous function on a compact set. However, it is not obvious how to quantify the error of the neural network, i.e., the remaining bias between the function and the neural network. Here, we propose…

  • Exponential Convergence Guarantees for Iterative Markovian Fitting

    Exponential Convergence Guarantees for Iterative Markovian Fitting arXiv:2510.20871v1 Announce Type: new Abstract: The Schr”odinger Bridge (SB) problem has become a fundamental tool in computational optimal transport and generative modeling. To address this problem, ideal methods such as Iterative Proportional Fitting and Iterative Markovian Fitting (IMF) have been proposed-alongside practical approximations like Diffusion Schr”odinger Bridge and…

  • Exact Dynamics of Multi-class Stochastic Gradient Descent

    Exact Dynamics of Multi-class Stochastic Gradient Descent arXiv:2510.14074v1 Announce Type: new Abstract: We develop a framework for analyzing the training and learning rate dynamics on a variety of high- dimensional optimization problems trained using one-pass stochastic gradient descent (SGD) with data generated from multiple anisotropic classes. We give exact expressions for a large class of…

  • Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models

    Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models arXiv:2510.11789v1 Announce Type: new Abstract: We study the convergence rate of learning pairwise interactions in single-layer attention-style models, where tokens interact through a weight matrix and a non-linear activation function. We prove that the minimax rate is $M^{-frac{2beta}{2beta+1}}$ with $M$ being the sample size, depending…

  • Distributionally robust approximation property of neural networks

    Distributionally robust approximation property of neural networks arXiv:2510.09177v1 Announce Type: new Abstract: The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby extending classical universal approximation theorems even beyond…

  • Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix

    Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix arXiv:2510.06685v1 Announce Type: new Abstract: Self-attention layers have become fundamental building blocks of modern deep neural networks, yet their theoretical understanding remains limited, particularly from the perspective of random matrix theory. In this work, we provide a rigorous analysis of the singular value spectrum of…

  • Minima and Critical Points of the Bethe Free Energy Are Invariant Under Deformation Retractions of Factor Graphs

    Minima and Critical Points of the Bethe Free Energy Are Invariant Under Deformation Retractions of Factor Graphs arXiv:2510.05380v1 Announce Type: new Abstract: In graphical models, factor graphs, and more generally energy-based models, the interactions between variables are encoded by a graph, a hypergraph, or, in the most general case, a partially ordered set (poset). Inference…

  • Concept activation vectors: a unifying view and adversarial attacks

    Concept activation vectors: a unifying view and adversarial attacks arXiv:2509.22755v1 Announce Type: new Abstract: Concept Activation Vectors (CAVs) are a tool from explainable AI, offering a promising approach for understanding how human-understandable concepts are encoded in a model’s latent spaces. They are computed from hidden-layer activations of inputs belonging either to a concept class or…

  • Effective continuous equations for adaptive SGD: a stochastic analysis view

    Effective continuous equations for adaptive SGD: a stochastic analysis view arXiv:2509.21614v1 Announce Type: new Abstract: We present a theoretical analysis of some popular adaptive Stochastic Gradient Descent (SGD) methods in the small learning rate regime. Using the stochastic modified equations framework introduced by Li et al., we derive effective continuous stochastic dynamics for these methods.…

  • Anchored Langevin Algorithms

    Anchored Langevin Algorithms arXiv:2509.19455v1 Announce Type: new Abstract: Standard first-order Langevin algorithms such as the unadjusted Langevin algorithm (ULA) are obtained by discretizing the Langevin diffusion and are widely used for sampling in machine learning because they scale to high dimensions and large datasets. However, they face two key limitations: (i) they require differentiable log-densities,…

  • Phase Transition for Stochastic Block Model with more than $sqrt{n}$ Communities

    Phase Transition for Stochastic Block Model with more than $sqrt{n}$ Communities arXiv:2509.15822v1 Announce Type: new Abstract: Predictions from statistical physics postulate that recovery of the communities in Stochastic Block Model (SBM) is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that…

  • A hierarchical entropy method for the delocalization of bias in high-dimensional Langevin Monte Carlo

    A hierarchical entropy method for the delocalization of bias in high-dimensional Langevin Monte Carlo arXiv:2509.08619v1 Announce Type: new Abstract: The unadjusted Langevin algorithm is widely used for sampling from complex high-dimensional distributions. It is well known to be biased, with the bias typically scaling linearly with the dimension when measured in squared Wasserstein distance. However,…

  • An invertible generative model for forward and inverse problems

    An invertible generative model for forward and inverse problems arXiv:2509.03910v1 Announce Type: new Abstract: We formulate the inverse problem in a Bayesian framework and aim to train a generative model that allows us to simulate (i.e., sample from the likelihood) and do inference (i.e., sample from the posterior). We review the use of triangular normalizing…

  • Scale-Adaptive Generative Flows for Multiscale Scientific Data

    Scale-Adaptive Generative Flows for Multiscale Scientific Data arXiv:2509.02971v1 Announce Type: new Abstract: Flow-based generative models can face significant challenges when modeling scientific data with multiscale Fourier spectra, often producing large errors in fine-scale features. We address this problem within the framework of stochastic interpolants, via principled design of noise distributions and interpolation schedules. The key…

  • Assessing One-Dimensional Cluster Stability by Extreme-Point Trimming

    Assessing One-Dimensional Cluster Stability by Extreme-Point Trimming arXiv:2509.00258v1 Announce Type: new Abstract: We develop a probabilistic method for assessing the tail behavior and geometric stability of one-dimensional n i.i.d. samples by tracking how their span contracts when the most extreme points are trimmed. Central to our approach is the diameter-shrinkage ratio, that quantifies the relative…

  • Underdamped Langevin MCMC with third order convergence

    Underdamped Langevin MCMC with third order convergence arXiv:2508.16485v1 Announce Type: new Abstract: In this paper, we propose a new numerical method for the underdamped Langevin diffusion (ULD) and present a non-asymptotic analysis of its sampling error in the 2-Wasserstein distance when the $d$-dimensional target distribution $p(x)propto e^{-f(x)}$ is strongly log-concave and has varying degrees of…

  • Optimal Subspace Embeddings: Resolving Nelson-Nguyen Conjecture Up to Sub-Polylogarithmic Factors

    Optimal Subspace Embeddings: Resolving Nelson-Nguyen Conjecture Up to Sub-Polylogarithmic Factors arXiv:2508.14234v1 Announce Type: cross Abstract: We give a proof of the conjecture of Nelson and Nguyen [FOCS 2013] on the optimal dimension and sparsity of oblivious subspace embeddings, up to sub-polylogarithmic factors: For any $ngeq d$ and $epsilongeq d^{-O(1)}$, there is a random $tilde O(d/epsilon^2)times…

  • Nonparametric learning of stochastic differential equations from sparse and noisy data

    Nonparametric learning of stochastic differential equations from sparse and noisy data arXiv:2508.11597v1 Announce Type: new Abstract: The paper proposes a systematic framework for building data-driven stochastic differential equation (SDE) models from sparse, noisy observations. Unlike traditional parametric approaches, which assume a known functional form for the drift, our goal here is to learn the entire…

  • Dimension-Free Bounds for Generalized First-Order Methods via Gaussian Coupling

    Dimension-Free Bounds for Generalized First-Order Methods via Gaussian Coupling arXiv:2508.10782v1 Announce Type: new Abstract: We establish non-asymptotic bounds on the finite-sample behavior of generalized first-order iterative algorithms — including gradient-based optimization methods and approximate message passing (AMP) — with Gaussian data matrices and full-memory, non-separable nonlinearities. The central result constructs an explicit coupling between the…

  • On Experiments

    On Experiments arXiv:2508.08288v1 Announce Type: new Abstract: The scientific process is a means for turning the results of experiments into knowledge about the world in which we live. Much research effort has been directed toward automating this process. To do this, one needs to formulate the scientific process in a precise mathematical language. This paper…

  • Inequalities for Optimization of Classification Algorithms: A Perspective Motivated by Diagnostic Testing

    Inequalities for Optimization of Classification Algorithms: A Perspective Motivated by Diagnostic Testing arXiv:2508.01065v1 Announce Type: new Abstract: Motivated by canonical problems in medical diagnostics, we propose and study properties of an objective function that uniformly bounds uncertainties in quantities of interest extracted from classifiers and related data analysis tools. We begin by adopting a set-theoretic…

  • Regime-Aware Conditional Neural Processes with Multi-Criteria Decision Support for Operational Electricity Price Forecasting

    Regime-Aware Conditional Neural Processes with Multi-Criteria Decision Support for Operational Electricity Price Forecasting arXiv:2508.00040v1 Announce Type: cross Abstract: This work integrates Bayesian regime detection with conditional neural processes for 24-hour electricity price prediction in the German market. Our methodology integrates regime detection using a disentangled sticky hierarchical Dirichlet process hidden Markov model (DS-HDP-HMM) applied to…

  • Simulating Posterior Bayesian Neural Networks with Dependent Weights

    Simulating Posterior Bayesian Neural Networks with Dependent Weights arXiv:2507.22095v1 Announce Type: new Abstract: In this paper we consider posterior Bayesian fully connected and feedforward deep neural networks with dependent weights. Particularly, if the likelihood is Gaussian, we identify the distribution of the wide width limit and provide an algorithm to sample from the network. In…

  • Central limit theorems for the eigenvalues of graph Laplacians on data clouds

    Central limit theorems for the eigenvalues of graph Laplacians on data clouds arXiv:2507.18803v1 Announce Type: new Abstract: Given i.i.d. samples $X_n ={ x_1, dots, x_n }$ from a distribution supported on a low dimensional manifold ${M}$ embedded in Eucliden space, we consider the graph Laplacian operator $Delta_n$ associated to an $varepsilon$-proximity graph over $X_n$ and…

  • Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

    Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights arXiv:2507.12686v1 Announce Type: new Abstract: We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit assuming a Lipschitz activation…

  • Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation

    Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation arXiv:2507.08108v1 Announce Type: new Abstract: textit{Mallows model} is a widely-used probabilistic framework for learning from ranking data, with applications ranging from recommendation systems and voting to aligning language models with human preferences~cite{chen2024mallows, kleinberg2021algorithmic, rafailov2024direct}. Under this model, observed rankings are noisy perturbations of a…

  • A Malliavin calculus approach to score functions in diffusion generative models

    A Malliavin calculus approach to score functions in diffusion generative models arXiv:2507.05550v1 Announce Type: new Abstract: Score-based diffusion generative models have recently emerged as a powerful tool for modelling complex data distributions. These models aim at learning the score function, which defines a map from a known probability distribution to the target data distribution via…

  • Asymptotic convexity of wide and shallow neural networks

    Asymptotic convexity of wide and shallow neural networks arXiv:2507.01044v1 Announce Type: new Abstract: For a simple model of shallow and wide neural networks, we show that the epigraph of its input-output map as a function of the network parameters approximates epigraph of a. convex function in a precise sense. This leads to a plausible explanation…

  • Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

    Strategic A/B testing via Maximum Probability-driven Two-armed Bandit arXiv:2506.22536v1 Announce Type: new Abstract: Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their…

  • Data-Driven Dynamic Factor Modeling via Manifold Learning

    Data-Driven Dynamic Factor Modeling via Manifold Learning arXiv:2506.19945v1 Announce Type: new Abstract: We propose a data-driven dynamic factor framework where a response variable depends on a high-dimensional set of covariates, without imposing any parametric model on the joint dynamics. Leveraging Anisotropic Diffusion Maps, a nonlinear manifold learning technique introduced by Singer and Coifman, our framework…

  • Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks

    Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks arXiv:2506.19695v1 Announce Type: new Abstract: This paper studies the $ell^p$-Lipschitz constants of ReLU neural networks $Phi: mathbb{R}^d to mathbb{R}$ with random parameters for $p in [1,infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn…

  • Gaussian Processes and Reproducing Kernels: Connections and Equivalences

    Gaussian Processes and Reproducing Kernels: Connections and Equivalences arXiv:2506.17366v1 Announce Type: new Abstract: This monograph studies the relations between two approaches using positive definite kernels: probabilistic methods using Gaussian processes, and non-probabilistic methods using reproducing kernel Hilbert spaces (RKHS). They are widely studied and used in machine learning, statistics, and numerical analysis. Connections and equivalences…

  • Scalable Machine Learning Algorithms using Path Signatures

    Scalable Machine Learning Algorithms using Path Signatures arXiv:2506.17634v1 Announce Type: new Abstract: The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures – iterated integrals that provide faithful, hierarchical representations of paths – offering a principled and universal feature map for sequential and structured data. Rooted in rough path…

  • Sampling conditioned diffusions via Pathspace Projected Monte Carlo

    Sampling conditioned diffusions via Pathspace Projected Monte Carlo arXiv:2506.15743v1 Announce Type: new Abstract: We present an algorithm to sample stochastic differential equations conditioned on rather general constraints, including integral constraints, endpoint constraints, and stochastic integral constraints. The algorithm is a pathspace Metropolis-adjusted manifold sampling scheme, which samples stochastic paths on the submanifold of realizations that…

  • Rademacher learning rates for iterated random functions

    Rademacher learning rates for iterated random functions arXiv:2506.13946v1 Announce Type: new Abstract: Most existing literature on supervised machine learning assumes that the training dataset is drawn from an i.i.d. sample. However, many real-world problems exhibit temporal dependence and strong correlations between the marginal distributions of the data-generating process, suggesting that the i.i.d. assumption is often…

  • Enabling Probabilistic Learning on Manifolds through Double Diffusion Maps

    Enabling Probabilistic Learning on Manifolds through Double Diffusion Maps arXiv:2506.02254v1 Announce Type: new Abstract: We present a generative learning framework for probabilistic sampling based on an extension of the Probabilistic Learning on Manifolds (PLoM) approach, which is designed to generate statistically consistent realizations of a random vector in a finite-dimensional Euclidean space, informed by a…

  • A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

    A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging arXiv:2505.21796v1 Announce Type: new Abstract: Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing…

  • Liouville PDE-based sliced-Wasserstein flow for fair regression

    Liouville PDE-based sliced-Wasserstein flow for fair regression arXiv:2505.17204v1 Announce Type: new Abstract: The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is applied to fair regression. We have improved the SWF in a few aspects. First, the stochastic diffusive term from the Fokker-Planck equation-based Monte Carlo is transformed to Liouville partial differential…

  • An Exponential Averaging Process with Strong Convergence Properties

    An Exponential Averaging Process with Strong Convergence Properties arXiv:2505.10605v1 Announce Type: new Abstract: Averaging, or smoothing, is a fundamental approach to obtain stable, de-noised estimates from noisy observations. In certain scenarios, observations made along trajectories of random dynamical systems are of particular interest. One popular smoothing technique for such a scenario is exponential moving averaging…

  • Minimax learning rates for estimating binary classifiers under margin conditions

    Minimax learning rates for estimating binary classifiers under margin conditions arXiv:2505.10628v1 Announce Type: new Abstract: We study classification problems using binary estimators where the decision boundary is described by horizon functions and where the data distribution satisfies a geometric margin condition. We establish upper and lower bounds for the minimax learning rate over broad function…

  • Optimal Transport-Based Domain Adaptation for Rotated Linear Regression

    Optimal Transport-Based Domain Adaptation for Rotated Linear Regression arXiv:2505.09229v1 Announce Type: new Abstract: Optimal Transport (OT) has proven effective for domain adaptation (DA) by aligning distributions across domains with differing statistical properties. Building on the approach of Courty et al. (2016), who mapped source data to the target domain for improved model transfer, we focus…

  • Diffusion-based supervised learning of generative models for efficient sampling of multimodal distributions

    Diffusion-based supervised learning of generative models for efficient sampling of multimodal distributions arXiv:2505.07825v1 Announce Type: new Abstract: We propose a hybrid generative model for efficient sampling of high-dimensional, multimodal probability distributions for Bayesian inference. Traditional Monte Carlo methods, such as the Metropolis-Hastings and Langevin Monte Carlo sampling methods, are effective for sampling from single-mode distributions…

  • Feature Representation Transferring to Lightweight Models via Perception Coherence

    Feature Representation Transferring to Lightweight Models via Perception Coherence arXiv:2505.06595v1 Announce Type: new Abstract: In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called textit{perception coherence}. Based on this notion, we propose a loss function, which takes into account…

  • Physics-Informed Inference Time Scaling via Simulation-Calibrated Scientific Machine Learning

    Physics-Informed Inference Time Scaling via Simulation-Calibrated Scientific Machine Learning arXiv:2504.16172v1 Announce Type: cross Abstract: High-dimensional partial differential equations (PDEs) pose significant computational challenges across fields ranging from quantum chemistry to economics and finance. Although scientific machine learning (SciML) techniques offer approximate solutions, they often suffer from bias and neglect crucial physical insights. Inspired by inference-time…

  • Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

    Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents arXiv:2504.07347v1 Announce Type: new Abstract: As demand for Large Language Models (LLMs) and AI agents rapidly grows, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little is explored through a mathematical modeling and queuing perspective. In this paper, we…

  • Performance of Rank-One Tensor Approximation on Incomplete Data

    Performance of Rank-One Tensor Approximation on Incomplete Data arXiv:2504.07818v1 Announce Type: new Abstract: We are interested in the estimation of a rank-one tensor signal when only a portion $varepsilon$ of its noisy observation is available. We show that the study of this problem can be reduced to that of a random matrix model whose spectral…

  • Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

    Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows arXiv:2504.07820v1 Announce Type: new Abstract: Negative distance kernels $K(x,y) := – |x-y|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit…

  • High-dimensional ridge regression with random features for non-identically distributed data with a variance profile

    High-dimensional ridge regression with random features for non-identically distributed data with a variance profile arXiv:2504.03035v1 Announce Type: new Abstract: The behavior of the random feature model in the high-dimensional regression framework has become a popular issue of interest in the machine learning literature}. This model is generally considered for feature vectors $x_i = Sigma^{1/2} x_i’$,…

  • A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials

    A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials arXiv:2504.03097v1 Announce Type: new Abstract: In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=tfrac{1}{sqrt{1+sigma^2}}(Pi_* X Q_* +…

  • Denoising guarantees for optimized sampling schemes in compressed sensing

    Denoising guarantees for optimized sampling schemes in compressed sensing arXiv:2504.01046v1 Announce Type: new Abstract: Compressed sensing with subsampled unitary matrices benefits from emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused…

  • Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

    Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions arXiv:2503.23896v1 Announce Type: new Abstract: Deep neural networks learn structured features from complex, non-Gaussian inputs, but the mechanisms behind this process remain poorly understood. Our work is motivated by the observation that the first-layer filters learnt by deep convolutional neural networks…

  • A stochastic gradient descent algorithm with random search directions

    A stochastic gradient descent algorithm with random search directions arXiv:2503.19942v1 Announce Type: new Abstract: Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted…

  • Procrustes Wasserstein Metric: A Modified Benamou-Brenier Approach with Applications to Latent Gaussian Distributions

    Procrustes Wasserstein Metric: A Modified Benamou-Brenier Approach with Applications to Latent Gaussian Distributions arXiv:2503.16580v1 Announce Type: new Abstract: We introduce a modified Benamou-Brenier type approach leading to a Wasserstein type distance that allows global invariance, specifically, isometries, and we show that the problem can be summarized to orthogonal transformations. This distance is defined by penalizing…

  • Optimal Nonlinear Online Learning under Sequential Price Competition via s-Concavity

    Optimal Nonlinear Online Learning under Sequential Price Competition via s-Concavity arXiv:2503.16737v1 Announce Type: new Abstract: We consider price competition among multiple sellers over a selling horizon of $T$ periods. In each period, sellers simultaneously offer their prices and subsequently observe their respective demand that is unobservable to competitors. The demand function for each seller depends…

  • Nonlinear Bayesian Update via Ensemble Kernel Regression with Clustering and Subsampling

    Nonlinear Bayesian Update via Ensemble Kernel Regression with Clustering and Subsampling arXiv:2503.15160v1 Announce Type: new Abstract: Nonlinear Bayesian update for a prior ensemble is proposed to extend traditional ensemble Kalman filtering to settings characterized by non-Gaussian priors and nonlinear measurement operators. In this framework, the observed component is first denoised via a standard Kalman update,…

  • On Statistical Estimation of Edge-Reinforced Random Walks

    On Statistical Estimation of Edge-Reinforced Random Walks arXiv:2503.06115v1 Announce Type: new Abstract: Reinforced random walks (RRWs), including vertex-reinforced random walks (VRRWs) and edge-reinforced random walks (ERRWs), model random walks where the transition probabilities evolve based on prior visitation history~cite{mgr, fmk, tarres, volkov}. These models have found applications in various areas, such as network representation learning~cite{xzzs},…

  • A characterization of sample adaptivity in UCB data

    A characterization of sample adaptivity in UCB data arXiv:2503.04855v1 Announce Type: new Abstract: We characterize a joint CLT of the number of pulls and the sample mean reward of the arms in a stochastic two-armed bandit environment under UCB algorithms. Several implications of this result are in place: (1) a nonstandard CLT of the number…

  • Applications of Entropy in Data Analysis and Machine Learning: A Review

    Applications of Entropy in Data Analysis and Machine Learning: A Review arXiv:2503.02921v1 Announce Type: new Abstract: Since its origin in the thermodynamics of the 19th century, the concept of entropy has also permeated other fields of physics and mathematics, such as Classical and Quantum Statistical Mechanics, Information Theory, Probability Theory, Ergodic Theory and the Theory…

  • Efficient Risk-sensitive Planning via Entropic Risk Measures

    Efficient Risk-sensitive Planning via Entropic Risk Measures arXiv:2502.20423v1 Announce Type: new Abstract: Risk-sensitive planning aims to identify policies maximizing some tail-focused metrics in Markov Decision Processes (MDPs). Such an optimization task can be very costly for the most widely used and interpretable metrics such as threshold probabilities or (Conditional) Values at Risk. Indeed, previous work…

  • Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs

    Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs arXiv:2502.09832v1 Announce Type: new Abstract: In this paper, assuming a natural strengthening of the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated ErdH{o}s-R’enyi graphs $mathcal G(n,q;rho)$ when the edge-density $q=n^{-1+o(1)}$ and…

  • Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling

    Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling arXiv:2502.09306v1 Announce Type: new Abstract: We investigate the theoretical properties of general diffusion (interpolation) paths and their Langevin Monte Carlo implementation, referred to as diffusion annealed Langevin Monte Carlo (DALMC), under weak conditions on the data distribution. Specifically, we analyse and provide non-asymptotic error…

  • Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models

    Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models arXiv:2502.01919v1 Announce Type: new Abstract: In this work, we present a comprehensive Bayesian posterior analysis of what we term Poisson Hierarchical Indian Buffet Processes, designed for complex random sparse count species sampling models that allow…

  • Statistical Verification of Linear Classifiers

    Statistical Verification of Linear Classifiers arXiv:2501.14430v1 Announce Type: new Abstract: We propose a homogeneity test closely related to the concept of linear separability between two samples. Using the test one can answer the question whether a linear classifier is merely “random” or effectively captures differences between two classes. We focus on establishing upper bounds for…

  • Simulation of Random LR Fuzzy Intervals

    Simulation of Random LR Fuzzy Intervals arXiv:2501.10482v1 Announce Type: new Abstract: Random fuzzy variables join the modeling of the impreciseness (due to their “fuzzy part”) and randomness. Statistical samples of such objects are widely used, and their direct, numerically effective generation is therefore necessary. Usually, these samples consist of triangular or trapezoidal fuzzy numbers. In…

  • Generative Models with ELBOs Converging to Entropy Sums

    Generative Models with ELBOs Converging to Entropy Sums arXiv:2501.09022v1 Announce Type: new Abstract: The evidence lower bound (ELBO) is one of the most central objectives for probabilistic unsupervised learning. For the ELBOs of several generative models and model classes, we here prove convergence to entropy sums. As one result, we provide a list of generative…

  • Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve

    Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve arXiv:2501.08288v1 Announce Type: new Abstract: Across the scientific realm, we find ourselves subtracting or dividing stochastic signals. For instance, consider a stochastic realization, $x$, generated from the addition or multiplication of two stochastic signals $a$ and $b$, namely $x=a+b$ or $x = ab$. For…

  • Robust random graph matching in dense graphs via vector approximate message passing

    Robust random graph matching in dense graphs via vector approximate message passing arXiv:2412.16457v1 Announce Type: new Abstract: In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation…

  • Generative Modeling with Diffusion

    Generative Modeling with Diffusion arXiv:2412.10948v1 Announce Type: new Abstract: We introduce the diffusion model as a method to generate new samples. Generative models have been recently adopted for tasks such as art generation (Stable Diffusion, Dall-E) and text generation (ChatGPT). Diffusion models in particular apply noise to sample data and then “reverse” this noising process…

  • Nonparametric Filtering, Estimation and Classification using Neural Jump ODEs

    Nonparametric Filtering, Estimation and Classification using Neural Jump ODEs arXiv:2412.03271v1 Announce Type: new Abstract: Neural Jump ODEs model the conditional expectation between observations by neural ODEs and jump at arrival of new observations. They have demonstrated effectiveness for fully data-driven online forecasting in settings with irregular and partial observations, operating under weak regularity assumptions. This…

  • Selective Reviews of Bandit Problems in AI via a Statistical View

    Selective Reviews of Bandit Problems in AI via a Statistical View arXiv:2412.02251v1 Announce Type: new Abstract: Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making…