Category: math.ST

  • Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

    Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective arXiv:2603.04479v1 Announce Type: new Abstract: We study the Collatz total stopping time $tau(n)$ over $nle 10^7$ from a probabilistic machine learning viewpoint. Empirically, $tau(n)$ is a skewed and heavily overdispersed count with pronounced arithmetic heterogeneity. We develop two complementary models. First, a Bayesian hierarchical…

  • Random Features for Operator-Valued Kernels: Bridging Kernel Methods and Neural Operators

    Random Features for Operator-Valued Kernels: Bridging Kernel Methods and Neural Operators arXiv:2603.00971v1 Announce Type: new Abstract: In this work, we investigate the generalization properties of random feature methods. Our analysis extends prior results for Tikhonov regularization to a broad class of spectral regularization techniques and further generalizes the setting to operator-valued kernels. This unified framework…

  • Multivariate Spatio-Temporal Neural Hawkes Processes

    Multivariate Spatio-Temporal Neural Hawkes Processes arXiv:2602.23629v1 Announce Type: new Abstract: We propose a Multivariate Spatio-Temporal Neural Hawkes Process for modeling complex multivariate event data with spatio-temporal dynamics. The proposed model extends continuous-time neural Hawkes processes by integrating spatial information into latent state evolution through learned temporal and spatial decay dynamics, enabling flexible modeling of excitation…

  • Flow Matching is Adaptive to Manifold Structures

    Flow Matching is Adaptive to Manifold Structures arXiv:2602.22486v1 Announce Type: new Abstract: Flow matching has emerged as a simulation-free alternative to diffusion-based generative modeling, producing samples by solving an ODE whose time-dependent velocity field is learned along an interpolation between a simple source distribution (e.g., a standard normal) and a target data distribution. Flow-based methods…

  • Efficient Inference after Directionally Stable Adaptive Experiments

    Efficient Inference after Directionally Stable Adaptive Experiments arXiv:2602.21478v1 Announce Type: new Abstract: We study inference on scalar-valued pathwise differentiable targets after adaptive data collection, such as a bandit algorithm. We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed target-agnostic stability conditions. Under directional stability, we show that estimators that…

  • Interactive Learning of Single-Index Models via Stochastic Gradient Descent

    Interactive Learning of Single-Index Models via Stochastic Gradient Descent arXiv:2602.17876v1 Announce Type: new Abstract: Stochastic gradient descent (SGD) is a cornerstone algorithm for high-dimensional optimization, renowned for its empirical successes. Recent theoretical advances have provided a deep understanding of how SGD enables feature learning in high-dimensional nonlinear models, most notably the textit{single-index model} with i.i.d.…

  • Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

    Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget arXiv:2602.17894v1 Announce Type: new Abstract: Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical…

  • On the Generalization and Robustness in Conditional Value-at-Risk

    On the Generalization and Robustness in Conditional Value-at-Risk arXiv:2602.18053v1 Announce Type: new Abstract: Conditional Value-at-Risk (CVaR) is a widely used risk-sensitive objective for learning under rare but high-impact losses, yet its statistical behavior under heavy-tailed data remains poorly understood. Unlike expectation-based risk, CVaR depends on an endogenous, data-dependent quantile, which couples tail averaging with threshold…

  • Linear Regression with Unknown Truncation Beyond Gaussian Features

    Linear Regression with Unknown Truncation Beyond Gaussian Features arXiv:2602.12534v1 Announce Type: new Abstract: In truncated linear regression, samples $(x,y)$ are shown only when the outcome $y$ falls inside a certain survival set $S^star$ and the goal is to estimate the unknown $d$-dimensional regressor $w^star$. This problem has a long history of study in Statistics and…

  • Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise

    Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise arXiv:2602.10530v1 Announce Type: new Abstract: Multiview datasets are common in scientific and engineering applications, yet existing fusion methods offer limited theoretical guarantees, particularly in the presence of heterogeneous and high-dimensional noise. We propose Generalized Robust Adaptive-Bandwidth Multiview Diffusion Maps (GRAB-MDM), a new kernel-based diffusion…

  • Total Variation Rates for Riemannian Flow Matching

    Total Variation Rates for Riemannian Flow Matching arXiv:2602.05174v1 Announce Type: new Abstract: Riemannian flow matching (RFM) extends flow-based generative modeling to data supported on manifolds by learning a time-dependent tangent vector field whose flow-ODE transports a simple base distribution to the data law. We develop a nonasymptotic Total Variation (TV) convergence analysis for RFM samplers…

  • Finite-Particle Rates for Regularized Stein Variational Gradient Descent

    Finite-Particle Rates for Regularized Stein Variational Gradient Descent arXiv:2602.05172v1 Announce Type: new Abstract: We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type preconditioner to the kernelized Wasserstein gradient. For the resulting interacting $N$-particle…

  • Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

    Byzantine Machine Learning: MultiKrum and an optimal notion of robustness arXiv:2602.03899v1 Announce Type: new Abstract: Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule…

  • Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks

    Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks arXiv:2602.03948v1 Announce Type: new Abstract: In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network.…

  • Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks

    Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks arXiv:2602.02791v1 Announce Type: new Abstract: We study a supervised multiclass classification problem for diffusion processes, where each class is characterized by a distinct drift function and trajectories are observed at discrete times. Extending the one-dimensional multiclass framework of Denis et al. (2024) to multidimensional…

  • Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators

    Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators arXiv:2601.20888v1 Announce Type: new Abstract: We study sampling from posterior distributions in Bayesian linear inverse problems where $A$, the parameters to observables operator, is computationally expensive. In many applications, $A$ can be factored in a manner that facilitates the construction of a cost-effective approximation $tilde{A}$.…

  • Efficient Learning of Stationary Diffusions with Stein-type Discrepancies

    Efficient Learning of Stationary Diffusions with Stein-type Discrepancies arXiv:2601.16597v1 Announce Type: new Abstract: Learning a stationary diffusion amounts to estimating the parameters of a stochastic differential equation whose stationary distribution matches a target distribution. We build on the recently introduced kernel deviation from stationarity (KDS), which enforces stationarity by evaluating expectations of the diffusion’s generator…

  • Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization

    Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization arXiv:2601.15500v1 Announce Type: new Abstract: In recent years, Rectified flow (RF) has gained considerable popularity largely due to its generation efficiency and state-of-the-art performance. In this paper, we investigate the degree to which RF automatically adapts to the intrinsic…

  • On damage of interpolation to adversarial robustness in regression

    On damage of interpolation to adversarial robustness in regression arXiv:2601.16070v1 Announce Type: new Abstract: Deep neural networks (DNNs) typically involve a large number of parameters and are trained to achieve zero or near-zero training error. Despite such interpolation, they often exhibit strong generalization performance on unseen data, a phenomenon that has motivated extensive theoretical investigations.…

  • Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers

    Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers arXiv:2601.15014v1 Announce Type: new Abstract: We study in-context learning for nonparametric regression with $alpha$-H”older smooth regression functions, for some $alpha>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Theta(log n)$ parameters and $Omegabigl(n^{2alpha/(2alpha+d)}log^3 nbigr)$ pretraining sequences can achieve the minimax-optimal…

  • Approximate full conformal prediction in RKHS

    Approximate full conformal prediction in RKHS arXiv:2601.13102v1 Announce Type: new Abstract: Full conformal prediction is a framework that implicitly formulates distribution-free confidence prediction regions for a wide range of estimators. However, a classical limitation of the full conformal framework is the computation of the confidence prediction regions, which is usually impossible since it requires training…

  • Tail-Sensitive KL and R’enyi Convergence of Unadjusted Hamiltonian Monte Carlo via One-Shot Couplings

    Tail-Sensitive KL and R’enyi Convergence of Unadjusted Hamiltonian Monte Carlo via One-Shot Couplings arXiv:2601.09019v1 Announce Type: new Abstract: Hamiltonian Monte Carlo (HMC) algorithms are among the most widely used sampling methods in high dimensional settings, yet their convergence properties are poorly understood in divergences that quantify relative density mismatch, such as Kullback-Leibler (KL) and R’enyi…

  • Inference-Time Alignment for Diffusion Models via Doob’s Matching

    Inference-Time Alignment for Diffusion Models via Doob’s Matching arXiv:2601.06514v1 Announce Type: new Abstract: Inference-time alignment for diffusion models aims to adapt a pre-trained diffusion model toward a target distribution without retraining the base score network, thereby preserving the generative capacity of the base model while enforcing desired properties at the inference time. A central mechanism…

  • Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data

    Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data arXiv:2601.05227v1 Announce Type: new Abstract: I propose a novel framework that integrates stochastic differential equations (SDEs) with deep generative models to improve uncertainty quantification in machine learning applications involving structured and temporal data. This approach, termed Stochastic Latent Differential Inference (SLDI), embeds…

  • Learning Multinomial Logits in $O(n log n)$ time

    Learning Multinomial Logits in $O(n log n)$ time arXiv:2601.04423v1 Announce Type: cross Abstract: A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]={1,…, n}$, each assigned a positive weight. A query specifies an admissible subset — called a slate — and the model chooses one item from that slate with probability…

  • Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights

    Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights arXiv:2601.01029v1 Announce Type: new Abstract: This paper develops a practical framework for using observational data to audit the consumer surplus effects of AI-driven decisions, specifically in targeted pricing and algorithmic lending. Traditional approaches first estimate demand functions and then integrate to compute consumer surplus, but…

  • Deep learning estimation of the spectral density of functional time series on large domains

    Deep learning estimation of the spectral density of functional time series on large domains arXiv:2601.00284v1 Announce Type: cross Abstract: We derive an estimator of the spectral density of a functional time series that is the output of a multilayer perceptron neural network. The estimator is motivated by difficulties with the computation of existing spectral density…

  • A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue

    A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue arXiv:2512.22282v1 Announce Type: new Abstract: Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1…

  • Likelihood-Preserving Embeddings for Statistical Inference

    Likelihood-Preserving Embeddings for Statistical Inference arXiv:2512.22638v1 Announce Type: new Abstract: Modern machine learning embeddings provide powerful compression of high-dimensional data, yet they typically destroy the geometric structure required for classical likelihood-based statistical inference. This paper develops a rigorous theory of likelihood-preserving embeddings: learned representations that can replace raw data in likelihood-based workflows — hypothesis testing,…

  • Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry

    Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry arXiv:2512.21411v1 Announce Type: cross Abstract: Singular learning theory (SLT) citep{watanabe2009algebraic,watanabe2018mathematical} provides a rigorous asymptotic framework for Bayesian models with non-identifiable parameterizations, yet the statistical meaning of its second-order invariant, the emph{singular fluctuation}, has remained unclear. In this work, we show…

  • Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler

    Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler arXiv:2512.17977v1 Announce Type: new Abstract: Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling — classical MCMC methods, even with tempering, can suffer from exponential mixing times —…

  • Sharp Structure-Agnostic Lower Bounds for General Functional Estimation

    Sharp Structure-Agnostic Lower Bounds for General Functional Estimation arXiv:2512.17341v1 Announce Type: new Abstract: The design of efficient nonparametric estimators has long been a central problem in statistics, machine learning, and decision making. Classical optimal procedures often rely on strong structural assumptions, which can be misspecified in practice and complicate deployment. This limitation has sparked growing…

  • Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics

    Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics arXiv:2512.13997v1 Announce Type: new Abstract: Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require discarding valuable data, unnecessarily reducing test power.…

  • Interval Fisher’s Discriminant Analysis and Visualisation

    Interval Fisher’s Discriminant Analysis and Visualisation arXiv:2512.11945v1 Announce Type: new Abstract: In Data Science, entities are typically represented by single valued measurements. Symbolic Data Analysis extends this framework to more complex structures, such as intervals and histograms, that express internal variability. We propose an extension of multiclass Fisher’s Discriminant Analysis to interval-valued data, using Moore’s…

  • STARK denoises spatial transcriptomics images via adaptive regularization

    STARK denoises spatial transcriptomics images via adaptive regularization arXiv:2512.10994v1 Announce Type: new Abstract: We present an approach to denoising spatial transcriptomics images that is particularly effective for uncovering cell identities in the regime of ultra-low sequencing depths, and also allows for interpolation of gene expression. The method — Spatial Transcriptomics via Adaptive Regularization and Kernels…

  • Diffusion differentiable resampling

    Diffusion differentiable resampling arXiv:2512.10401v1 Announce Type: new Abstract: This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). We propose a new informative resampling method that is instantly pathwise differentiable, based on an ensemble score diffusion model. We prove that our diffusion resampling method provides a consistent estimate…

  • Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming

    Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming arXiv:2512.08948v1 Announce Type: new Abstract: We study online statistical inference for the solutions of stochastic optimization problems with equality and inequality constraints. Such problems are prevalent in statistics and machine learning, encompassing constrained $M$-estimation, physics-informed models, safe reinforcement learning, and algorithmic fairness. We develop…

  • Estimation of Stochastic Optimal Transport Maps

    Estimation of Stochastic Optimal Transport Maps arXiv:2512.09499v1 Announce Type: new Abstract: The optimal transport (OT) map is a geometry-driven transformation between high-dimensional probability distributions which underpins a wide range of tasks in statistics, applied probability, and machine learning. However, existing statistical theory for OT map estimation is quite restricted, hinging on Brenier’s theorem (quadratic cost,…

  • Provable Diffusion Posterior Sampling for Bayesian Inversion

    Provable Diffusion Posterior Sampling for Bayesian Inversion arXiv:2512.08022v1 Announce Type: new Abstract: This paper proposes a novel diffusion-based posterior sampling method within a plug-and-play (PnP) framework. Our approach constructs a probability transport from an easy-to-sample terminal distribution to the target posterior, using a warm-start strategy to initialize the particles. To approximate the posterior score, we…

  • Provable FDR Control for Deep Feature Selection: Deep MLPs and Beyond

    Provable FDR Control for Deep Feature Selection: Deep MLPs and Beyond arXiv:2512.04696v1 Announce Type: new Abstract: We develop a flexible feature selection framework based on deep neural networks that approximately controls the false discovery rate (FDR), a measure of Type-I error. The method applies to architectures whose first layer is fully connected. From the second…

  • Towards a unified framework for guided diffusion models

    Towards a unified framework for guided diffusion models arXiv:2512.04985v1 Announce Type: new Abstract: Guided or controlled data generation with diffusion modelsblfootnote{Partial preliminary results of this work appeared in International Conference on Machine Learning 2025 citep{li2025provable}.} has become a cornerstone of modern generative modeling. Despite substantial advances in diffusion model theory, the theoretical understanding of guided…

  • A note on the impossibility of conditional PAC-efficient reasoning in large language models

    A note on the impossibility of conditional PAC-efficient reasoning in large language models arXiv:2512.03057v1 Announce Type: new Abstract: We prove an impossibility result for conditional Probably Approximately Correct (PAC)-efficient reasoning in large language models. While recent work has established marginal PAC efficiency guarantees for composite models that switch between expensive expert models and cheaper fast…

  • Novelty detection on path space

    Novelty detection on path space arXiv:2512.03243v1 Announce Type: new Abstract: We frame novelty detection on path space as a hypothesis testing problem with signature-based test statistics. Using transportation-cost inequalities of Gasteratos and Jacquier (2023), we obtain tail bounds for false positive rates that extend beyond Gaussian measures to laws of RDE solutions with smooth bounded…

  • Revisiting Theory of Contrastive Learning for Domain Generalization

    Revisiting Theory of Contrastive Learning for Domain Generalization arXiv:2512.02831v1 Announce Type: new Abstract: Contrastive learning is among the most popular and powerful approaches for self-supervised representation learning, where the goal is to map semantically similar samples close together while separating dissimilar ones in the latent space. Existing theoretical methods assume that downstream task classes are…

  • Statistical-computational gap in multiple Gaussian graph alignment

    Statistical-computational gap in multiple Gaussian graph alignment arXiv:2512.00610v1 Announce Type: new Abstract: We investigate the existence of a statistical-computational gap in multiple Gaussian graph alignment. We first generalize a previously established informational threshold from Vassaux and Massouli’e (2025) to regimes where the number of observed graphs $p$ may also grow with the number of nodes…

  • Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification

    Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification arXiv:2511.20960v1 Announce Type: new Abstract: Modern artificial intelligence systems make critical decisions yet often fail silently when uncertain. We develop a geometric framework for post-hoc calibration of neural network probability outputs, treating probability vectors as points on the $(c-1)$-dimensional probability simplex equipped with the Fisher–Rao metric.…

  • Gradient flow for deep equilibrium single-index models

    Gradient flow for deep equilibrium single-index models arXiv:2511.16976v1 Announce Type: cross Abstract: Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training…

  • Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

    Convex Clustering Redefined: Robust Learning with the Median of Means Estimator arXiv:2511.14784v1 Announce Type: new Abstract: Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the…

  • Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

    Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit arXiv:2511.15120v1 Announce Type: new Abstract: In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbol{x})=g(boldsymbol{U}boldsymbol{x})$ with hidden subspace $boldsymbol{U}in mathbb{R}^{rtimes d}$, which is the…

  • Beyond Uncertainty Sets: Leveraging Optimal Transport to Extend Conformal Predictive Distribution to Multivariate Settings

    Beyond Uncertainty Sets: Leveraging Optimal Transport to Extend Conformal Predictive Distribution to Multivariate Settings arXiv:2511.15146v1 Announce Type: new Abstract: Conformal prediction (CP) constructs uncertainty sets for model outputs with finite-sample coverage guarantees. A candidate output is included in the prediction set if its non-conformity score is not considered extreme relative to the scores observed on…

  • Empirical Likelihood for Random Forests and Ensembles

    Empirical Likelihood for Random Forests and Ensembles arXiv:2511.13934v1 Announce Type: new Abstract: We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling…

  • Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths

    Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths arXiv:2511.11161v1 Announce Type: new Abstract: This paper addresses the nonparametric estimation of the drift function over a compact domain for a time-homogeneous diffusion process, based on high-frequency discrete observations from $N$ independent trajectories. We propose a neural network-based estimator and derive…

  • Precise asymptotic analysis of Sobolev training for random feature models

    Precise asymptotic analysis of Sobolev training for random feature models arXiv:2511.03050v1 Announce Type: new Abstract: Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training — regression with both function and gradient data…

  • Data-driven Learning of Interaction Laws in Multispecies Particle Systems with Gaussian Processes: Convergence Theory and Applications

    Data-driven Learning of Interaction Laws in Multispecies Particle Systems with Gaussian Processes: Convergence Theory and Applications arXiv:2511.02053v1 Announce Type: new Abstract: We develop a Gaussian process framework for learning interaction kernels in multi-species interacting particle systems from trajectory data. Such systems provide a canonical setting for multiscale modeling, where simple microscopic interaction rules generate complex…

  • Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

    Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks arXiv:2511.02258v1 Announce Type: new Abstract: This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD) for single-layer networks. Building on the seminal work of Saad and Solla, which analyzed the deterministic (ballistic) scaling limits of SGD corresponding to the gradient flow of…

  • Minimax-Optimal Two-Sample Test with Sliced Wasserstein

    Minimax-Optimal Two-Sample Test with Sliced Wasserstein arXiv:2510.27498v1 Announce Type: new Abstract: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees and computational efficiency, its theoretical foundations for hypothesis testing remain limited.…

  • Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

    Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms arXiv:2510.25811v1 Announce Type: new Abstract: We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn…

  • Bridging Prediction and Attribution: Identifying Forward and Backward Causal Influence Ranges Using Assimilative Causal Inference

    Bridging Prediction and Attribution: Identifying Forward and Backward Causal Influence Ranges Using Assimilative Causal Inference arXiv:2510.21889v1 Announce Type: new Abstract: Causal inference identifies cause-and-effect relationships between variables. While traditional approaches rely on data to reveal causal links, a recently developed method, assimilative causal inference (ACI), integrates observations with dynamical models. It utilizes Bayesian data assimilation…

  • Frequentist Validity of Epistemic Uncertainty Estimators

    Frequentist Validity of Epistemic Uncertainty Estimators arXiv:2510.22063v1 Announce Type: new Abstract: Decomposing prediction uncertainty into its aleatoric (irreducible) and epistemic (reducible) components is critical for the development and deployment of machine learning systems. A popular, principled measure for epistemic uncertainty is the mutual information between the response variable and model parameters. However, evaluating this measure…

  • Testing Most Influential Sets

    Testing Most Influential Sets arXiv:2510.20372v1 Announce Type: new Abstract: Small subsets of data with disproportionate influence on model outcomes can have dramatic impacts on conclusions, with a few data points sometimes overturning key findings. While recent work has developed methods to identify these emph{most influential sets}, no formal theory exists to determine when their influence…

  • The Coverage Principle: How Pre-training Enables Post-Training

    The Coverage Principle: How Pre-training Enables Post-Training arXiv:2510.15020v1 Announce Type: new Abstract: Language models demonstrate remarkable abilities when pre-trained on large text corpora and fine-tuned for specific tasks, but how and why pre-training shapes the success of the final model remains poorly understood. Notably, although pre-training success is often quantified by cross entropy loss, cross-entropy…

  • The Minimax Lower Bound of Kernel Stein Discrepancy Estimation

    The Minimax Lower Bound of Kernel Stein Discrepancy Estimation arXiv:2510.15058v1 Announce Type: new Abstract: Kernel Stein discrepancies (KSDs) have emerged as a powerful tool for quantifying goodness-of-fit over the last decade, featuring numerous successful applications. To the best of our knowledge, all existing KSD estimators with known rate achieve $sqrt n$-convergence. In this work, we…

  • Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models

    Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models arXiv:2510.11789v1 Announce Type: new Abstract: We study the convergence rate of learning pairwise interactions in single-layer attention-style models, where tokens interact through a weight matrix and a non-linear activation function. We prove that the minimax rate is $M^{-frac{2beta}{2beta+1}}$ with $M$ being the sample size, depending…

  • Quantile-Scaled Bayesian Optimization Using Rank-Only Feedback

    Quantile-Scaled Bayesian Optimization Using Rank-Only Feedback arXiv:2510.03277v1 Announce Type: new Abstract: Bayesian Optimization (BO) is widely used for optimizing expensive black-box functions, particularly in hyperparameter tuning. However, standard BO assumes access to precise objective values, which may be unavailable, noisy, or unreliable in real-world settings where only relative or rank-based feedback can be obtained. In…

  • Transformed $ell_1$ Regularizations for Robust Principal Component Analysis: Toward a Fine-Grained Understanding

    Transformed $ell_1$ Regularizations for Robust Principal Component Analysis: Toward a Fine-Grained Understanding arXiv:2510.03624v1 Announce Type: new Abstract: Robust Principal Component Analysis (RPCA) aims to recover a low-rank structure from noisy, partially observed data that is also corrupted by sparse, potentially large-magnitude outliers. Traditional RPCA models rely on convex relaxations, such as nuclear norm and $ell_1$…

  • Higher-arity PAC learning, VC dimension and packing lemma

    Higher-arity PAC learning, VC dimension and packing lemma arXiv:2510.02420v1 Announce Type: new Abstract: The aim of this note is to overview some of our work in Chernikov, Towsner’20 (arXiv:2010.00726) developing higher arity VC theory (VC$_n$ dimension), including a generalization of Haussler packing lemma, and an associated tame (slice-wise) hypergraph regularity lemma; and to demonstrate that…

  • Predictive inference for time series: why is split conformal effective despite temporal dependence?

    Predictive inference for time series: why is split conformal effective despite temporal dependence? arXiv:2510.02471v1 Announce Type: new Abstract: We consider the problem of uncertainty quantification for prediction in a time series: if we use past data to forecast the next time point, can we provide valid prediction intervals around our forecasts? To avoid placing distributional…

  • Identifying All {epsilon}-Best Arms in (Misspecified) Linear Bandits

    Identifying All {epsilon}-Best Arms in (Misspecified) Linear Bandits arXiv:2510.00073v1 Announce Type: new Abstract: Motivated by the need to efficiently identify multiple candidates in high trial-and-error cost tasks such as drug discovery, we propose a near-optimal algorithm to identify all {epsilon}-best arms (i.e., those at most {epsilon} worse than the optimum). Specifically, we introduce LinFACT, an…

  • CINDES: Classification induced neural density estimator and simulator

    CINDES: Classification induced neural density estimator and simulator arXiv:2510.00367v1 Announce Type: new Abstract: Neural network-based methods for (un)conditional density estimation have recently gained substantial attention, as various neural density estimators have outperformed classical approaches in real-data experiments. Despite these empirical successes, implementation can be challenging due to the need to ensure non-negativity and unit-mass constraints,…

  • One-shot Conditional Sampling: MMD meets Nearest Neighbors

    One-shot Conditional Sampling: MMD meets Nearest Neighbors arXiv:2509.25507v1 Announce Type: new Abstract: How can we generate samples from a conditional distribution that we never fully observe? This question arises across a broad range of applications in both modern machine learning and classical statistics, including image post-processing in computer vision, approximate posterior sampling in simulation-based inference,…

  • Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression

    Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression arXiv:2509.22794v1 Announce Type: new Abstract: We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing…

  • Sample completion, structured correlation, and Netflix problems

    Sample completion, structured correlation, and Netflix problems arXiv:2509.20404v1 Announce Type: new Abstract: We develop a new high-dimensional statistical learning model which can take advantage of structured correlation in data even in the presence of randomness. We completely characterize learnability in this model in terms of VCN${}_{k,k}$-dimension (essentially $k$-dependence from Shelah’s classification theory). This model suggests…

  • A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity

    A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity arXiv:2509.20618v1 Announce Type: new Abstract: We study gapped scale-sensitive dimensions of a function class in both sequential and non-sequential settings. We demonstrate that covering numbers for any uniformly bounded class are controlled above by these gapped dimensions, generalizing the results of cite{anthony2000function,alon1997scale}. Moreover, we…

  • Phase Transition for Stochastic Block Model with more than $sqrt{n}$ Communities

    Phase Transition for Stochastic Block Model with more than $sqrt{n}$ Communities arXiv:2509.15822v1 Announce Type: new Abstract: Predictions from statistical physics postulate that recovery of the communities in Stochastic Block Model (SBM) is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that…

  • What is a good matching of probability measures? A counterfactual lens on transport maps

    What is a good matching of probability measures? A counterfactual lens on transport maps arXiv:2509.16027v1 Announce Type: new Abstract: Coupling probability measures lies at the core of many problems in statistics and machine learning, from domain adaptation to transfer learning and causal inference. Yet, even when restricted to deterministic transports, such couplings are not identifiable:…

  • Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus

    Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus arXiv:2509.13923v1 Announce Type: cross Abstract: Cross-validation is one of the most widely used methods for model selection and evaluation; its efficiency for large covariance matrix estimation appears robust in practice, but little is known about the theoretical behavior of its error. In this paper,…

  • Jackknife Variance Estimation for H’ajek-Dominated Generalized U-Statistics

    Jackknife Variance Estimation for H’ajek-Dominated Generalized U-Statistics arXiv:2509.12356v1 Announce Type: cross Abstract: We prove ratio-consistency of the jackknife variance estimator, and certain variants, for a broad class of generalized U-statistics whose variance is asymptotically dominated by their H’ajek projection, with the classical fixed-order case recovered as a special instance. This H’ajek projection dominance condition unifies…

  • Kernel-based Stochastic Approximation Framework for Nonlinear Operator Learning

    Kernel-based Stochastic Approximation Framework for Nonlinear Operator Learning arXiv:2509.11070v1 Announce Type: new Abstract: We develop a stochastic approximation framework for learning nonlinear operators between infinite-dimensional spaces utilizing general Mercer operator-valued kernels. Our framework encompasses two key classes: (i) compact kernels, which admit discrete spectral decompositions, and (ii) diagonal kernels of the form $K(x,x’)=k(x,x’)T$, where $k$…

  • kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions

    kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions arXiv:2509.08366v1 Announce Type: new Abstract: We study a missing-value imputation method, termed kNNSampler, that imputes a given unit’s missing response by randomly sampling from the observed responses of the $k$ most similar units to the given unit in terms of the observed covariates. This method can sample…

  • Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation

    Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation arXiv:2509.05852v1 Announce Type: new Abstract: Motivated by the need for rigorous and scalable evaluation of large language models, we study contextual preference inference for pairwise comparison functionals of context-dependent preference score functions across domains. Focusing on the contextual Bradley-Terry-Luce model, we develop…

  • Testing for correlation between network structure and high-dimensional node covariates

    Testing for correlation between network structure and high-dimensional node covariates arXiv:2509.03772v1 Announce Type: new Abstract: In many application domains, networks are observed with node-level features. In such settings, a common problem is to assess whether or not nodal covariates are correlated with the network structure itself. Here, we present four novel methods for addressing this…

  • Fast kernel methods: Sobolev, physics-informed, and additive models

    Fast kernel methods: Sobolev, physics-informed, and additive models arXiv:2509.02649v1 Announce Type: new Abstract: Kernel methods are powerful tools in statistical learning, but their cubic complexity in the sample size n limits their use on large-scale datasets. In this work, we introduce a scalable framework for kernel regression with O(n log n) complexity, fully leveraging GPU…

  • Partial Functional Dynamic Backdoor Diffusion-based Causal Model

    Partial Functional Dynamic Backdoor Diffusion-based Causal Model arXiv:2509.00472v1 Announce Type: new Abstract: We introduce a Partial Functional Dynamic Backdoor Diffusion-based Causal Model (PFD-BDCM), specifically designed for causal inference in the presence of unmeasured confounders with spatial heterogeneity and temporal dependency. The proposed PFD-BDCM framework addresses the restrictions of the existing approaches by uniquely integrating models…

  • Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation

    Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation arXiv:2508.20942v1 Announce Type: new Abstract: In this paper, we extend the transfer learning classification framework from regression function-based methods to decision rules. We propose a novel methodology for modeling posterior drift through Bayes decision rules. By exploiting the geometric…

  • A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions

    A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions arXiv:2508.16306v1 Announce Type: new Abstract: Diffusion-based generative models have emerged as highly effective methods for synthesizing high-quality samples. Recent works have focused on analyzing the convergence of their generation process with minimal assumptions, either through reverse SDEs or Probability Flow ODEs. The best known guarantees,…

  • Underdamped Langevin MCMC with third order convergence

    Underdamped Langevin MCMC with third order convergence arXiv:2508.16485v1 Announce Type: new Abstract: In this paper, we propose a new numerical method for the underdamped Langevin diffusion (ULD) and present a non-asymptotic analysis of its sampling error in the 2-Wasserstein distance when the $d$-dimensional target distribution $p(x)propto e^{-f(x)}$ is strongly log-concave and has varying degrees of…

  • Structural Foundations for Leading Digit Laws: Beyond Probabilistic Mixtures

    Structural Foundations for Leading Digit Laws: Beyond Probabilistic Mixtures arXiv:2508.13237v1 Announce Type: new Abstract: This article presents a modern deterministic framework for the study of leading significant digit distributions in numerical data. Rather than relying on traditional probabilistic or mixture-based explanations, we demonstrate that the observed frequencies of leading digits are determined by the underlying…

  • Dimension-Free Bounds for Generalized First-Order Methods via Gaussian Coupling

    Dimension-Free Bounds for Generalized First-Order Methods via Gaussian Coupling arXiv:2508.10782v1 Announce Type: new Abstract: We establish non-asymptotic bounds on the finite-sample behavior of generalized first-order iterative algorithms — including gradient-based optimization methods and approximate message passing (AMP) — with Gaussian data matrices and full-memory, non-separable nonlinearities. The central result constructs an explicit coupling between the…

  • An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise

    An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise arXiv:2508.10879v1 Announce Type: new Abstract: Given $n$ i.i.d. random matrices $A_i in mathbb{R}^{d times d}$ that share a common expectation $Sigma$, the objective of Differentially Private Stochastic PCA is to identify a subspace of dimension $k$ that captures the largest variance directions of $Sigma$, while…

  • On Experiments

    On Experiments arXiv:2508.08288v1 Announce Type: new Abstract: The scientific process is a means for turning the results of experiments into knowledge about the world in which we live. Much research effort has been directed toward automating this process. To do this, one needs to formulate the scientific process in a precise mathematical language. This paper…

  • Stochastic dynamics learning with state-space systems

    Stochastic dynamics learning with state-space systems arXiv:2508.07876v1 Announce Type: new Abstract: This work advances the theoretical foundations of reservoir computing (RC) by providing a unified treatment of fading memory and the echo state property (ESP) in both deterministic and stochastic settings. We investigate state-space systems, a central model class in time series learning, and establish…

  • Differentially Private Model-X Knockoffs via Johnson-Lindenstrauss Transform

    Differentially Private Model-X Knockoffs via Johnson-Lindenstrauss Transform arXiv:2508.04800v1 Announce Type: new Abstract: We introduce a novel privatization framework for high-dimensional controlled variable selection. Our framework enables rigorous False Discovery Rate (FDR) control under differential privacy constraints. While the Model-X knockoff procedure provides FDR guarantees by constructing provably exchangeable “negative control” features, existing privacy mechanisms like…

  • High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation

    High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation arXiv:2508.05570v1 Announce Type: new Abstract: In this paper, we study the bias and high-order error bounds of the Linear Stochastic Approximation (LSA) algorithm with Polyak-Ruppert (PR) averaging under Markovian noise. We focus on the version of the algorithm with constant step size $alpha$ and propose a…

  • Likelihood Matching for Diffusion Models

    Likelihood Matching for Diffusion Models arXiv:2508.03636v1 Announce Type: new Abstract: We propose a Likelihood Matching approach for training diffusion models by first establishing an equivalence between the likelihood of the target data distribution and a likelihood along the sample path of the reverse diffusion. To efficiently compute the reverse sample likelihood, a quasi-likelihood is considered…

  • Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration

    Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration arXiv:2507.22170v1 Announce Type: new Abstract: Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets. A commonly used model assumes that the data matrices are noisy observations of low-rank matrices with a shared singular subspace. In this case, two…

  • Perfect Clustering in Very Sparse Diverse Multiplex Networks

    Perfect Clustering in Very Sparse Diverse Multiplex Networks arXiv:2507.19423v1 Announce Type: new Abstract: The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)), where all layers of the network have the same collection of nodes. In addition, all layers can be partitioned into groups such that the layers…

  • Sliding Window Informative Canonical Correlation Analysis

    Sliding Window Informative Canonical Correlation Analysis arXiv:2507.17921v1 Announce Type: new Abstract: Canonical correlation analysis (CCA) is a technique for finding correlated sets of features between two datasets. In this paper, we propose a novel extension of CCA to the online, streaming data setting: Sliding Window Informative Canonical Correlation Analysis (SWICCA). Our method uses a streaming…

  • On Reconstructing Training Data From Bayesian Posteriors and Trained Models

    On Reconstructing Training Data From Bayesian Posteriors and Trained Models arXiv:2507.18372v1 Announce Type: new Abstract: Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three…

  • Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality

    Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality arXiv:2507.16953v1 Announce Type: new Abstract: Estimating high-dimensional covariance matrices is a key task across many fields. This paper explores the theoretical limits of distributed covariance estimation in a feature-split setting, where communication between agents is constrained. Specifically, we study a scenario…

  • Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis

    Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis arXiv:2507.16682v1 Announce Type: new Abstract: Regularized linear discriminant analysis (RLDA) is a widely used tool for classification and dimensionality reduction, but its performance in high-dimensional scenarios is inconsistent. Existing theoretical analyses of RLDA often lack clear insight into how data structure affects classification performance.…