Category: cs.LG

  • The Volterra signature

    The Volterra signature arXiv:2603.04525v1 Announce Type: new Abstract: Modern approaches for learning from non-Markovian time series, such as recurrent neural networks, neural controlled differential equations or transformers, typically rely on implicit memory mechanisms that can be difficult to interpret or to train over long horizons. We propose the Volterra signature $mathrm{VSig}(x;K)$ as a principled, explicit…

  • Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

    Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective arXiv:2603.04479v1 Announce Type: new Abstract: We study the Collatz total stopping time $tau(n)$ over $nle 10^7$ from a probabilistic machine learning viewpoint. Empirically, $tau(n)$ is a skewed and heavily overdispersed count with pronounced arithmetic heterogeneity. We develop two complementary models. First, a Bayesian hierarchical…

  • Dictionary Based Pattern Entropy for Causal Direction Discovery

    Dictionary Based Pattern Entropy for Causal Direction Discovery arXiv:2603.04473v1 Announce Type: new Abstract: Discovering causal direction from temporal observational data is particularly challenging for symbolic sequences, where functional models and noise assumptions are often unavailable. We propose a novel emph{Dictionary Based Pattern Entropy ($DPE$)} framework that infers both the direction of causation and the specific…

  • The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

    The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization arXiv:2603.04807v1 Announce Type: new Abstract: We study how architectural inductive bias reshapes the implicit regularization induced by the edge-of-stability phenomenon in gradient descent. Prior work has established that for fully connected networks, the strength of this regularization is governed solely by…

  • Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

    Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions arXiv:2603.04635v1 Announce Type: new Abstract: Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $epsilon$-far from all product distributions in total variation distance.…

  • The Theory behind UMAP?

    The Theory behind UMAP? arXiv:2603.03375v1 Announce Type: new Abstract: In 2018, McInnes et al. introduced a dimensionality reduction algorithm called UMAP, which enjoys wide popularity among data scientists. Their work introduces a finite variant of a functor called the metric realization, based on an unpublished draft by Spivak. This draft contains many errors, most of…

  • Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

    Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents arXiv:2603.03401v1 Announce Type: new Abstract: This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify iteration increments in KGD, deriving an adaptive parameter selection strategy…

  • Learning Order Forest for Qualitative-Attribute Data Clustering

    Learning Order Forest for Qualitative-Attribute Data Clustering arXiv:2603.03387v1 Announce Type: new Abstract: Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e.g., the nominal values of attributes like symptoms, marital status,…

  • Surprisal-R’enyi Free Energy

    Surprisal-R’enyi Free Energy arXiv:2603.03405v1 Announce Type: new Abstract: The forward and reverse Kullback-Leibler (KL) divergences arise as limiting objectives in learning and inference yet induce markedly different inductive biases that cannot be explained at the level of expectations alone. In this work, we introduce the Surprisal-R’enyi Free Energy (SRFE), a log-moment-based functional of the likelihood…

  • Scalable Contrastive Causal Discovery under Unknown Soft Interventions

    Scalable Contrastive Causal Discovery under Unknown Soft Interventions arXiv:2603.03411v1 Announce Type: new Abstract: Observational causal discovery is only identifiable up to the Markov equivalence class. While interventions can reduce this ambiguity, in practice interventions are often soft with multiple unknown targets. In many realistic scenarios, only a single intervention regime is observed. We propose a…

  • Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits

    Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits arXiv:2603.02417v1 Announce Type: new Abstract: We develop a Fisher-geometric theory of stochastic gradient descent (SGD) in which mini-batch noise is an intrinsic, loss-induced matrix — not an exogenous scalar variance. Under exchangeable sampling, the mini-batch gradient covariance is pinned down (to leading…

  • Low-Degree Method Fails to Predict Robust Subspace Recovery

    Low-Degree Method Fails to Predict Robust Subspace Recovery arXiv:2603.02594v1 Announce Type: new Abstract: The low-degree polynomial framework has been highly successful in predicting computational versus statistical gaps for high-dimensional problems in average-case analysis and machine learning. This success has led to the low-degree conjecture, which posits that this method captures the power and limitations of…

  • Geometric structures and deviations on James’ symmetric positive-definite matrix bicone domain

    Geometric structures and deviations on James’ symmetric positive-definite matrix bicone domain arXiv:2603.02483v1 Announce Type: new Abstract: Symmetric positive-definite (SPD) matrix datasets play a central role across numerous scientific disciplines, including signal processing, statistics, finance, computer vision, information theory, and machine learning among others. The set of SPD matrices forms a cone which can be viewed…

  • Conformal Graph Prediction with Z-Gromov Wasserstein Distances

    Conformal Graph Prediction with Z-Gromov Wasserstein Distances arXiv:2603.02460v1 Announce Type: new Abstract: Supervised graph prediction addresses regression problems where the outputs are structured graphs. Although several approaches exist for graph–valued prediction, principled uncertainty quantification remains limited. We propose a conformal prediction framework for graph-valued outputs, providing distribution–free coverage guarantees in structured output spaces. Our method…

  • Combinatorial Sparse PCA Beyond the Spiked Identity Model

    Combinatorial Sparse PCA Beyond the Spiked Identity Model arXiv:2603.02607v1 Announce Type: new Abstract: Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Sigma$, whose top eigenvector $v in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into…

  • Initialization-Aware Score-Based Diffusion Sampling

    Initialization-Aware Score-Based Diffusion Sampling arXiv:2603.00772v1 Announce Type: new Abstract: Score-based generative models (SGMs) aim at generating samples from a target distribution by approximating the reverse-time dynamics of a stochastic differential equation. Despite their strong empirical performance, classical samplers initialized from a Gaussian distribution require a long time horizon noising typically inducing a large number of…

  • The Partition Principle Revisited: Non-Equal Volume Designs Achieve Minimal Expected Star Discrepancy

    The Partition Principle Revisited: Non-Equal Volume Designs Achieve Minimal Expected Star Discrepancy arXiv:2603.00202v1 Announce Type: new Abstract: We study the expected star discrepancy under a newly designed class of non-equal volume partitions. The main contributions are twofold. First, we establish a strong partition principle for the star discrepancy, showing that our newly designed non-equal volume…

  • Time-Aware Latent Space Bayesian Optimization

    Time-Aware Latent Space Bayesian Optimization arXiv:2603.00935v1 Announce Type: new Abstract: Latent-space Bayesian optimization (LSBO) extends Bayesian optimization to structured domains, such as molecular design, by searching in the continuous latent space of a generative model. However, most LSBO methods assume a fixed objective, whereas real design campaigns often face temporal drift (e.g., evolving preferences or…

  • Random Features for Operator-Valued Kernels: Bridging Kernel Methods and Neural Operators

    Random Features for Operator-Valued Kernels: Bridging Kernel Methods and Neural Operators arXiv:2603.00971v1 Announce Type: new Abstract: In this work, we investigate the generalization properties of random feature methods. Our analysis extends prior results for Tikhonov regularization to a broad class of spectral regularization techniques and further generalizes the setting to operator-valued kernels. This unified framework…

  • Learning with the Nash-Sutcliffe loss

    Learning with the Nash-Sutcliffe loss arXiv:2603.00968v1 Announce Type: new Abstract: The Nash-Sutcliffe efficiency ($text{NSE}$) is a widely used, positively oriented relative measure for evaluating forecasts across multiple time series. However, it lacks a decision-theoretic foundation for this purpose. To address this, we examine its negatively oriented counterpart, which we refer to as Nash-Sutcliffe loss, defined…

  • Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models

    Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models arXiv:2602.23518v1 Announce Type: new Abstract: Deep generative models (DGMs) compress high-dimensional data but often entangle distinct physical factors in their latent spaces. We present an auxiliary-variable-guided framework for disentangling representations of thermal Sunyaev-Zel’dovich (tSZ) maps of dark matter halos. We introduce halo mass…

  • Partition Function Estimation under Bounded f-Divergence

    Partition Function Estimation under Bounded f-Divergence arXiv:2602.23535v1 Announce Type: new Abstract: We study the statistical complexity of estimating partition functions given sample access to a proposal distribution and an unnormalized density ratio for a target distribution. While partition function estimation is a classical problem, existing guarantees typically rely on structural assumptions about the domain or…

  • Multivariate Spatio-Temporal Neural Hawkes Processes

    Multivariate Spatio-Temporal Neural Hawkes Processes arXiv:2602.23629v1 Announce Type: new Abstract: We propose a Multivariate Spatio-Temporal Neural Hawkes Process for modeling complex multivariate event data with spatio-temporal dynamics. The proposed model extends continuous-time neural Hawkes processes by integrating spatial information into latent state evolution through learned temporal and spatial decay dynamics, enabling flexible modeling of excitation…

  • Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables

    Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables arXiv:2602.23611v1 Announce Type: new Abstract: Algorithmic decisions about individuals require predictions that are not only accurate but also fair with respect to sensitive attributes such as gender and race. Causal notions of fairness align with legal requirements, yet many…

  • Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data

    Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data arXiv:2602.23602v1 Announce Type: new Abstract: Heteroscedasticity — where the variance of a variable changes with other variables — is pervasive in real data, and elucidating why it arises from the perspective of statistical moments is crucial in scientific knowledge discovery and decision-making. However,…

  • LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees

    LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees arXiv:2602.22432v1 Announce Type: new Abstract: Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under…

  • Flow Matching is Adaptive to Manifold Structures

    Flow Matching is Adaptive to Manifold Structures arXiv:2602.22486v1 Announce Type: new Abstract: Flow matching has emerged as a simulation-free alternative to diffusion-based generative modeling, producing samples by solving an ODE whose time-dependent velocity field is learned along an interpolation between a simple source distribution (e.g., a standard normal) and a target data distribution. Flow-based methods…

  • From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

    From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference arXiv:2602.22492v1 Announce Type: new Abstract: In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence…

  • Unsupervised Continual Learning for Amortized Bayesian Inference

    Unsupervised Continual Learning for Amortized Bayesian Inference arXiv:2602.22884v1 Announce Type: new Abstract: Amortized Bayesian Inference (ABI) enables efficient posterior estimation using generative neural networks trained on simulated data, but often suffers from performance degradation under model misspecification. While self-consistency (SC) training on unlabeled empirical data can enhance network robustness, current approaches are limited to static,…

  • Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

    Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks arXiv:2602.22925v1 Announce Type: new Abstract: We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly…

  • Counterdiabatic Hamiltonian Monte Carlo

    Counterdiabatic Hamiltonian Monte Carlo arXiv:2602.21272v1 Announce Type: new Abstract: Hamiltonian Monte Carlo (HMC) is a state of the art method for sampling from distributions with differentiable densities, but can converge slowly when applied to challenging multimodal problems. Running HMC with a time varying Hamiltonian, in order to interpolate from an initial tractable distribution to the…

  • Efficient Uncoupled Learning Dynamics with $tilde{O}!left(T^{-1/4}right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback

    Efficient Uncoupled Learning Dynamics with $tilde{O}!left(T^{-1/4}right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback arXiv:2602.21436v1 Announce Type: new Abstract: In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging…

  • Conditional neural control variates for variance reduction in Bayesian inverse problems

    Conditional neural control variates for variance reduction in Bayesian inverse problems arXiv:2602.21357v1 Announce Type: new Abstract: Bayesian inference for inverse problems involves computing expectations under posterior distributions — e.g., posterior means, variances, or predictive quantities — typically via Monte Carlo (MC) estimation. When the quantity of interest varies significantly under the posterior, accurate estimates demand…

  • ConformalHDC: Uncertainty-Aware Hyperdimensional Computing with Application to Neural Decoding

    ConformalHDC: Uncertainty-Aware Hyperdimensional Computing with Application to Neural Decoding arXiv:2602.21446v1 Announce Type: new Abstract: Hyperdimensional Computing (HDC) offers a computationally efficient paradigm for neuromorphic learning. Yet, it lacks rigorous uncertainty quantification, leading to open decision boundaries and, consequently, vulnerability to outliers, adversarial perturbations, and out-of-distribution inputs. To address these limitations, we introduce ConformalHDC, a unified…

  • Efficient Inference after Directionally Stable Adaptive Experiments

    Efficient Inference after Directionally Stable Adaptive Experiments arXiv:2602.21478v1 Announce Type: new Abstract: We study inference on scalar-valued pathwise differentiable targets after adaptive data collection, such as a bandit algorithm. We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed target-agnostic stability conditions. Under directional stability, we show that estimators that…

  • Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

    Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation arXiv:2602.20297v1 Announce Type: new Abstract: We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly…

  • Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,lambda}$ Targets

    Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,lambda}$ Targets arXiv:2602.20555v1 Announce Type: new Abstract: The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard Transformers can…

  • Selecting Optimal Variable Order in Autoregressive Ising Models

    Selecting Optimal Variable Order in Autoregressive Ising Models arXiv:2602.20394v1 Announce Type: new Abstract: Autoregressive models enable tractable sampling from learned probability distributions, but their performance critically depends on the variable ordering used in the factorization via complexities of the resulting conditional distributions. We propose to learn the Markov random field describing the underlying data, and…

  • Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness

    Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness arXiv:2602.20585v1 Announce Type: new Abstract: Understanding minimal assumptions that enable learning and generalization is perhaps the central question of learning theory. Several celebrated results in statistical learning theory, such as the VC theorem and Littlestone’s characterization of online learnability, establish conditions on the hypothesis…

  • Amortized Bayesian inference for actigraph time sheet data from mobile devices

    Amortized Bayesian inference for actigraph time sheet data from mobile devices arXiv:2602.20611v1 Announce Type: new Abstract: Mobile data technologies use “actigraphs” to furnish information on health variables as a function of a subject’s movement. The advent of wearable devices and related technologies has propelled the creation of health databases consisting of human movement data to…

  • Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function

    Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function arXiv:2602.18573v1 Announce Type: new Abstract: Machine-generated probability predictions are essential in modern classification tasks such as image classification. A model is well calibrated when its predicted probabilities correspond to observed event frequencies. Despite the need for multicategory recalibration methods, existing…

  • Stochastic Gradient Variational Inference with Price’s Gradient Estimator from Bures-Wasserstein to Parameter Space

    Stochastic Gradient Variational Inference with Price’s Gradient Estimator from Bures-Wasserstein to Parameter Space arXiv:2602.18718v1 Announce Type: new Abstract: For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space)…

  • Bounds and Identification of Joint Probabilities of Potential Outcomes and Observed Variables under Monotonicity Assumptions

    Bounds and Identification of Joint Probabilities of Potential Outcomes and Observed Variables under Monotonicity Assumptions arXiv:2602.18762v1 Announce Type: new Abstract: Evaluating joint probabilities of potential outcomes and observed variables, and their linear combinations, is a fundamental challenge in causal inference. This paper addresses the bounding and identification of these probabilities in settings with discrete treatment…

  • Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

    Implicit Bias and Convergence of Matrix Stochastic Mirror Descent arXiv:2602.18997v1 Announce Type: new Abstract: We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove…

  • Federated Measurement of Demographic Disparities from Quantile Sketches

    Federated Measurement of Demographic Disparities from Quantile Sketches arXiv:2602.18870v1 Announce Type: new Abstract: Many fairness goals are defined at a population level that misaligns with siloed data collection, which remains unsharable due to privacy regulations. Horizontal federated learning (FL) enables collaborative modeling across clients with aligned features without sharing raw data. We study federated auditing…

  • Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval

    Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval arXiv:2602.17779v1 Announce Type: new Abstract: We consider the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models). The objective is to recover an unknown signal $boldsymbol{theta}^star in mathbb{R}^d$ (where $d gg 1$) from a loss function $hat{R}(boldsymbol{theta})$…

  • Interactive Learning of Single-Index Models via Stochastic Gradient Descent

    Interactive Learning of Single-Index Models via Stochastic Gradient Descent arXiv:2602.17876v1 Announce Type: new Abstract: Stochastic gradient descent (SGD) is a cornerstone algorithm for high-dimensional optimization, renowned for its empirical successes. Recent theoretical advances have provided a deep understanding of how SGD enables feature learning in high-dimensional nonlinear models, most notably the textit{single-index model} with i.i.d.…

  • Drift Estimation for Stochastic Differential Equations with Denoising Diffusion Models

    Drift Estimation for Stochastic Differential Equations with Denoising Diffusion Models arXiv:2602.17830v1 Announce Type: new Abstract: We study the estimation of time-homogeneous drift functions in multivariate stochastic differential equations with known diffusion coefficient, from multiple trajectories observed at high frequency over a fixed time horizon. We formulate drift estimation as a denoising problem conditional on previous…

  • Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

    Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget arXiv:2602.17894v1 Announce Type: new Abstract: Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical…

  • On the Generalization and Robustness in Conditional Value-at-Risk

    On the Generalization and Robustness in Conditional Value-at-Risk arXiv:2602.18053v1 Announce Type: new Abstract: Conditional Value-at-Risk (CVaR) is a widely used risk-sensitive objective for learning under rare but high-impact losses, yet its statistical behavior under heavy-tailed data remains poorly understood. Unlike expectation-based risk, CVaR depends on an endogenous, data-dependent quantile, which couples tail averaging with threshold…

  • Beyond Procedure: Substantive Fairness in Conformal Prediction

    Beyond Procedure: Substantive Fairness in Conformal Prediction arXiv:2602.16794v1 Announce Type: new Abstract: Conformal prediction (CP) offers distribution-free uncertainty quantification for machine learning models, yet its interplay with fairness in downstream decision-making remains underexplored. Moving beyond CP as a standalone operation (procedural fairness), we analyze the holistic decision-making pipeline to evaluate substantive fairness-the equity of downstream…

  • Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

    Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals arXiv:2602.16923v1 Announce Type: new Abstract: We study dynamic joint assortment and pricing where a seller updates decisions at regular accounting/operating intervals to maximize the cumulative per-period revenue over a horizon $T$. In many settings, assortment and prices affect not only what an…

  • Anti-causal domain generalization: Leveraging unlabeled data

    Anti-causal domain generalization: Leveraging unlabeled data arXiv:2602.17187v1 Announce Type: new Abstract: The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we…

  • Semi-Supervised Learning on Graphs using Graph Neural Networks

    Semi-Supervised Learning on Graphs using Graph Neural Networks arXiv:2602.17115v1 Announce Type: new Abstract: Graph neural networks (GNNs) work remarkably well in semi-supervised node regression, yet a rigorous theory explaining when and why they succeed remains lacking. To address this gap, we study an aggregate-and-readout model that encompasses several common message passing architectures: node features are…

  • MGD: Moment Guided Diffusion for Maximum Entropy Generation

    MGD: Moment Guided Diffusion for Maximum Entropy Generation arXiv:2602.17211v1 Announce Type: new Abstract: Generating samples from limited information is a fundamental problem across scientific domains. Classical maximum entropy methods provide principled uncertainty quantification from moment constraints but require sampling via MCMC or Langevin dynamics, which typically exhibit exponential slowdown in high dimensions. In contrast, generative…

  • Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability

    Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability arXiv:2602.15919v1 Announce Type: new Abstract: Can the privacy vulnerability of individual data points be assessed without retraining models or explicitly simulating attacks? We answer affirmatively by showing that exposure to membership inference attack (MIA) is fundamentally governed by a data point’s influence on the learned model.…

  • Robust Stochastic Gradient Posterior Sampling with Lattice Based Discretisation

    Robust Stochastic Gradient Posterior Sampling with Lattice Based Discretisation arXiv:2602.15925v1 Announce Type: new Abstract: Stochastic-gradient MCMC methods enable scalable Bayesian posterior sampling but often suffer from sensitivity to minibatch size and gradient noise. To address this, we propose Stochastic Gradient Lattice Random Walk (SGLRW), an extension of the Lattice Random Walk discretization. Unlike conventional Stochastic…

  • Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

    Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models arXiv:2602.16061v1 Announce Type: new Abstract: Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard…

  • Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis

    Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis arXiv:2602.16131v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as agents to solve complex tasks such as question answering (QA), scientific debate, and software development. A standard evaluation procedure aggregates multiple responses from LLM agents into a single final answer, often via…

  • Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs

    Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs arXiv:2602.15091v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate. Within an information-theoretic learning framework, we specialize…

  • Universal priors: solving empirical Bayes via Bayesian inference and pretraining

    Universal priors: solving empirical Bayes via Bayesian inference and pretraining arXiv:2602.15136v1 Announce Type: new Abstract: We theoretically justify the recent empirical finding of [Teh et al., 2025] that a transformer pretrained on synthetically generated data achieves strong performance on empirical Bayes (EB) problems. We take an indirect approach to this question: rather than analyzing the…

  • Functional Central Limit Theorem for Stochastic Gradient Descent

    Functional Central Limit Theorem for Stochastic Gradient Descent arXiv:2602.15538v1 Announce Type: new Abstract: We study the asymptotic shape of the trajectory of the stochastic gradient descent algorithm applied to a convex objective function. Under mild regularity assumptions, we prove a functional central limit theorem for the properly rescaled trajectory. Our result characterizes the long-term fluctuations…

  • Sparse Additive Model Pruning for Order-Based Causal Structure Learning

    Sparse Additive Model Pruning for Order-Based Causal Structure Learning arXiv:2602.15306v1 Announce Type: new Abstract: Causal structure learning, also known as causal discovery, aims to estimate causal relationships between variables as a form of a causal directed acyclic graph (DAG) from observational data. One of the major frameworks is the order-based approach that first estimates a…

  • Near-Optimal Sample Complexity for Online Constrained MDPs

    Near-Optimal Sample Complexity for Online Constrained MDPs arXiv:2602.15076v1 Announce Type: cross Abstract: Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used to enforce safety constraints while optimizing performance. However, existing methods often suffer…

  • Nonparametric Distribution Regression Re-calibration

    Nonparametric Distribution Regression Re-calibration arXiv:2602.13362v1 Announce Type: new Abstract: A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow…

  • Locally Private Parametric Methods for Change-Point Detection

    Locally Private Parametric Methods for Change-Point Detection arXiv:2602.13619v1 Announce Type: new Abstract: We study parametric change-point detection, where the goal is to identify distributional changes in time series, under local differential privacy. In the non-private setting, we derive improved finite-sample accuracy guarantees for a change-point detection algorithm based on the generalized log-likelihood ratio test, via…

  • A Theoretical Framework for LLM Fine-tuning Using Early Stopping for Non-random Initialization

    A Theoretical Framework for LLM Fine-tuning Using Early Stopping for Non-random Initialization arXiv:2602.13942v1 Announce Type: new Abstract: In the era of large language models (LLMs), fine-tuning pretrained models has become ubiquitous. Yet the theoretical underpinning remains an open question. A central question is why only a few epochs of fine-tuning are typically sufficient to achieve…

  • Quantifying Normality: Convergence Rate to Gaussian Limit for Stochastic Approximation and Unadjusted OU Algorithm

    Quantifying Normality: Convergence Rate to Gaussian Limit for Stochastic Approximation and Unadjusted OU Algorithm arXiv:2602.13906v1 Announce Type: new Abstract: Stochastic approximation (SA) is a method for finding the root of an operator perturbed by noise. There is a rich literature establishing the asymptotic normality of rescaled SA iterates under fairly mild conditions. However, these asymptotic…

  • Linear Regression with Unknown Truncation Beyond Gaussian Features

    Linear Regression with Unknown Truncation Beyond Gaussian Features arXiv:2602.12534v1 Announce Type: new Abstract: In truncated linear regression, samples $(x,y)$ are shown only when the outcome $y$ falls inside a certain survival set $S^star$ and the goal is to estimate the unknown $d$-dimensional regressor $w^star$. This problem has a long history of study in Statistics and…

  • A Regularization-Sharpness Tradeoff for Linear Interpolators

    A Regularization-Sharpness Tradeoff for Linear Interpolators arXiv:2602.12680v1 Announce Type: new Abstract: The rule of thumb regarding the relationship between the bias-variance tradeoff and model size plays a key role in classical machine learning, but is now well-known to break down in the overparameterized setting as per the double descent curve. In particular, minimum-norm interpolating estimators…

  • Blessings of Multiple Good Arms in Multi-Objective Linear Bandits

    Blessings of Multiple Good Arms in Multi-Objective Linear Bandits arXiv:2602.12901v1 Announce Type: new Abstract: The multi objective bandit setting has traditionally been regarded as more complex than the single objective case, as multiple objectives must be optimized simultaneously. In contrast to this prevailing view, we demonstrate that when multiple good arms exist for multiple objectives,…

  • Annealing in variational inference mitigates mode collapse: A theoretical study on Gaussian mixtures

    Annealing in variational inference mitigates mode collapse: A theoretical study on Gaussian mixtures arXiv:2602.12923v1 Announce Type: new Abstract: Mode collapse, the failure to capture one or more modes when targetting a multimodal distribution, is a central challenge in modern variational inference. In this work, we provide a mathematical analysis of annealing based strategies for mitigating…

  • TFTF: Training-Free Targeted Flow for Conditional Sampling

    TFTF: Training-Free Targeted Flow for Conditional Sampling arXiv:2602.12932v1 Announce Type: new Abstract: We propose a training-free conditional sampling method for flow matching models based on importance sampling. Because a na”ive application of importance sampling suffers from weight degeneracy in high-dimensional settings, we modify and incorporate a resampling technique in sequential Monte Carlo (SMC) during intermediate…

  • The Cost of Learning under Multiple Change Points

    The Cost of Learning under Multiple Change Points arXiv:2602.11406v1 Announce Type: new Abstract: We consider an online learning problem in environments with multiple change points. In contrast to the single change point problem that is widely studied using classical “high confidence” detection schemes, the multiple change point environment presents new learning-theoretic and algorithmic challenges. Specifically,…

  • Amortised and provably-robust simulation-based inference

    Amortised and provably-robust simulation-based inference arXiv:2602.11325v1 Announce Type: new Abstract: Complex simulator-based models are now routinely used to perform inference across the sciences and engineering, but existing inference methods are often unable to account for outliers and other extreme values in data which occur due to faulty measurement instruments or human error. In this paper,…

  • Provable Offline Reinforcement Learning for Structured Cyclic MDPs

    Provable Offline Reinforcement Learning for Structured Cyclic MDPs arXiv:2602.11679v1 Announce Type: new Abstract: We introduce a novel cyclic Markov decision process (MDP) framework for multi-step decision problems with heterogeneous stage-specific dynamics, transitions, and discount factors across the cycle. In this setting, offline learning is challenging: optimizing a policy at any stage shifts the state distributions…

  • Estimation of instrument and noise parameters for inverse problem based on prior diffusion model

    Estimation of instrument and noise parameters for inverse problem based on prior diffusion model arXiv:2602.11711v1 Announce Type: new Abstract: This article addresses the issue of estimating observation parameters (response and error parameters) in inverse problems. The focus is on cases where regularization is introduced in a Bayesian framework and the prior is modeled by a…

  • PAC-Bayesian Generalization Guarantees for Fairness on Stochastic and Deterministic Classifiers

    PAC-Bayesian Generalization Guarantees for Fairness on Stochastic and Deterministic Classifiers arXiv:2602.11722v1 Announce Type: new Abstract: Classical PAC generalization bounds on the prediction risk of a classifier are insufficient to provide theoretical guarantees on fairness when the goal is to learn models balancing predictive risk and fairness constraints. We propose a PAC-Bayesian framework for deriving generalization…

  • Dissecting Performative Prediction: A Comprehensive Survey

    Dissecting Performative Prediction: A Comprehensive Survey arXiv:2602.10176v1 Announce Type: new Abstract: The field of performative prediction had its beginnings in 2020 with the seminal paper “Performative Prediction” by Perdomo et al., which established a novel machine learning setup where the deployment of a predictive model causes a distribution shift in the environment, which in turn…

  • When LLMs get significantly worse: A statistical approach to detect model degradations

    When LLMs get significantly worse: A statistical approach to detect model degradations arXiv:2602.10144v1 Announce Type: new Abstract: Minimizing the inference cost and latency of foundation models has become a crucial area of research. Optimization approaches include theoretically lossless methods and others without accuracy guarantees like quantization. In all of these cases it is crucial to…

  • Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning

    Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning arXiv:2602.10273v1 Announce Type: new Abstract: Many recent reasoning gains in large language models can be explained as distribution sharpening: biasing generation toward high-likelihood trajectories already supported by the pretrained model, rather than modifying its weights. A natural formalization is the sequence-level power distribution $pi_alpha(ymid x)propto p_theta(ymid…

  • Causal Effect Estimation with Learned Instrument Representations

    Causal Effect Estimation with Learned Instrument Representations arXiv:2602.10370v1 Announce Type: new Abstract: Instrumental variable (IV) methods mitigate bias from unobserved confounding in observational causal inference but rely on the availability of a valid instrument, which can often be difficult or infeasible to identify in practice. In this paper, we propose a representation learning approach that…

  • Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise

    Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise arXiv:2602.10530v1 Announce Type: new Abstract: Multiview datasets are common in scientific and engineering applications, yet existing fusion methods offer limited theoretical guarantees, particularly in the presence of heterogeneous and high-dimensional noise. We propose Generalized Robust Adaptive-Bandwidth Multiview Diffusion Maps (GRAB-MDM), a new kernel-based diffusion…

  • Minimum Distance Summaries for Robust Neural Posterior Estimation

    Minimum Distance Summaries for Robust Neural Posterior Estimation arXiv:2602.09161v1 Announce Type: new Abstract: Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is…

  • Persistent Entropy as a Detector of Phase Transitions

    Persistent Entropy as a Detector of Phase Transitions arXiv:2602.09058v1 Announce Type: new Abstract: Persistent entropy (PE) is an information-theoretic summary statistic of persistence barcodes that has been widely used to detect regime changes in complex systems. Despite its empirical success, a general theoretical understanding of when and why persistent entropy reliably detects phase transitions has…

  • Quantifying Epistemic Uncertainty in Diffusion Models

    Quantifying Epistemic Uncertainty in Diffusion Models arXiv:2602.09170v1 Announce Type: new Abstract: To ensure high quality outputs, it is important to quantify the epistemic uncertainty of diffusion models.Existing methods are often unreliable because they mix epistemic and aleatoric uncertainty. We introduce a method based on Fisher information that explicitly isolates epistemic variance, producing more reliable plausibility…

  • Mutual Information Collapse Explains Disentanglement Failure in $beta$-VAEs

    Mutual Information Collapse Explains Disentanglement Failure in $beta$-VAEs arXiv:2602.09277v1 Announce Type: new Abstract: The $beta$-VAE is a foundational framework for unsupervised disentanglement, using $beta$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $beta$ and…

  • The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning

    The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning arXiv:2602.09394v1 Announce Type: new Abstract: Manufacturing lines, service journeys, supply chains, and AI reasoning chains share a common challenge: attributing a terminal outcome to the intermediate stage that caused it. We establish an information-theoretic barrier to this credit assignment problem: the signal connecting…

  • Fast and Robust Likelihood-Guided Diffusion Posterior Sampling with Amortized Variational Inference

    Fast and Robust Likelihood-Guided Diffusion Posterior Sampling with Amortized Variational Inference arXiv:2602.07102v1 Announce Type: new Abstract: Zero-shot diffusion posterior sampling offers a flexible framework for inverse problems by accommodating arbitrary degradation operators at test time, but incurs high computational cost due to repeated likelihood-guided updates. In contrast, previous amortized diffusion approaches enable fast inference by…

  • Discrete Adjoint Matching

    Discrete Adjoint Matching arXiv:2602.07132v1 Announce Type: new Abstract: Computation methods for solving entropy-regularized reward optimization — a class of problems widely used for fine-tuning generative models — have advanced rapidly. Among those, Adjoint Matching (AM, Domingo-Enrich et al., 2025) has proven highly effective in continuous state spaces with differentiable rewards. Transferring these practical successes to…

  • Flow-Based Conformal Predictive Distributions

    Flow-Based Conformal Predictive Distributions arXiv:2602.07633v1 Announce Type: new Abstract: Conformal prediction provides a distribution-free framework for uncertainty quantification via prediction sets with exact finite-sample coverage. In low dimensions these sets are easy to interpret, but in high-dimensional or structured output spaces they are difficult to represent and use, which can limit their ability to integrate…

  • Scalable Mean-Field Variational Inference via Preconditioned Primal-Dual Optimization

    Scalable Mean-Field Variational Inference via Preconditioned Primal-Dual Optimization arXiv:2602.07632v1 Announce Type: new Abstract: In this work, we investigate the large-scale mean-field variational inference (MFVI) problem from a mini-batch primal-dual perspective. By reformulating MFVI as a constrained finite-sum problem, we develop a novel primal-dual algorithm based on an augmented Lagrangian formulation, termed primal-dual variational inference (PD-VI).…

  • On Generation in Metric Spaces

    On Generation in Metric Spaces arXiv:2602.07710v1 Announce Type: new Abstract: We study generation in separable metric instance spaces. We extend the language generation framework from Kleinberg and Mullainathan [2024] beyond countable domains by defining novelty through metric separation and allowing asymmetric novelty parameters for the adversary and the generator. We introduce the $(varepsilon,varepsilon’)$-closure dimension, a…

  • Deep networks learn to parse uniform-depth context-free languages from local statistics

    Deep networks learn to parse uniform-depth context-free languages from local statistics arXiv:2602.06065v1 Announce Type: new Abstract: Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text…

  • Algebraic Robustness Verification of Neural Networks

    Algebraic Robustness Verification of Neural Networks arXiv:2602.06105v1 Announce Type: new Abstract: We formulate formal robustness verification of neural networks as an algebraic optimization problem. We leverage the Euclidean Distance (ED) degree, which is the generic number of complex critical points of the distance minimization problem to a classifier’s decision boundary, as an architecture-dependent measure of…

  • Inheritance Between Feedforward and Convolutional Networks via Model Projection

    Inheritance Between Feedforward and Convolutional Networks via Model Projection arXiv:2602.06245v1 Announce Type: new Abstract: Techniques for feedforward networks (FFNs) and convolutional networks (CNNs) are frequently reused across families, but the relationship between the underlying model classes is rarely made explicit. We introduce a unified node-level formalization with tensor-valued activations and show that generalized feedforward networks…

  • High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory

    High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory arXiv:2602.06320v1 Announce Type: new Abstract: Modern machine learning models are typically trained via multi-pass stochastic gradient descent (SGD) with small batch sizes, and understanding their dynamics in high dimensions is of great interest. However, an analytical framework for describing the high-dimensional asymptotic behavior of multi-pass…

  • Time-uniform conformal and PAC prediction

    Time-uniform conformal and PAC prediction arXiv:2602.06297v1 Announce Type: new Abstract: Given that machine learning algorithms are increasingly being deployed to aid in high stakes decision-making, uncertainty quantification methods that wrap around these black box models such as conformal prediction have received much attention in recent years. In sequential settings, where data are observed/generated in a…

  • Total Variation Rates for Riemannian Flow Matching

    Total Variation Rates for Riemannian Flow Matching arXiv:2602.05174v1 Announce Type: new Abstract: Riemannian flow matching (RFM) extends flow-based generative modeling to data supported on manifolds by learning a time-dependent tangent vector field whose flow-ODE transports a simple base distribution to the data law. We develop a nonasymptotic Total Variation (TV) convergence analysis for RFM samplers…