Category: stat.ML
-
A Framework for Non-Linear Attention via Modern Hopfield Networks
A Framework for Non-Linear Attention via Modern Hopfield Networks arXiv:2506.11043v1 Announce Type: new Abstract: In this work we propose an energy functional along the lines of Modern Hopfield Networks (MNH), the stationary points of which correspond to the attention due to Vaswani et al. [12], thus unifying both frameworks. The minima of this landscape form…
-
Fast Bayesian Optimization of Function Networks with Partial Evaluations
Fast Bayesian Optimization of Function Networks with Partial Evaluations arXiv:2506.11456v1 Announce Type: new Abstract: Bayesian optimization of function networks (BOFN) is a framework for optimizing expensive-to-evaluate objective functions structured as networks, where some nodes’ outputs serve as inputs for others. Many real-world applications, such as manufacturing and drug discovery, involve function networks with additional properties…
-
Collaborative Prediction: To Join or To Disjoin Datasets
Collaborative Prediction: To Join or To Disjoin Datasets arXiv:2506.11271v1 Announce Type: new Abstract: With the recent rise of generative Artificial Intelligence (AI), the need of selecting high-quality dataset to improve machine learning models has garnered increasing attention. However, some part of this topic remains underexplored, even for simple prediction models. In this work, we study…
-
On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiologic boundary conditions
On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiologic boundary conditions arXiv:2506.11683v1 Announce Type: new Abstract: Solving inverse problems in cardiovascular modeling is particularly challenging due to the high computational cost of running high-fidelity simulations. In this work, we focus on Bayesian parameter estimation and explore different methods to reduce the…
-
Using Deep Operators to Create Spatio-temporal Surrogates for Dynamical Systems under Uncertainty
Using Deep Operators to Create Spatio-temporal Surrogates for Dynamical Systems under Uncertainty arXiv:2506.11761v1 Announce Type: new Abstract: Spatio-temporal data, which consists of responses or measurements gathered at different times and positions, is ubiquitous across diverse applications of civil infrastructure. While SciML methods have made significant progress in tackling the issue of response prediction for individual…
-
Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes
Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes arXiv:2506.10101v1 Announce Type: new Abstract: In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We…
-
Momentum Multi-Marginal Schr”odinger Bridge Matching
Momentum Multi-Marginal Schr”odinger Bridge Matching arXiv:2506.10168v1 Announce Type: new Abstract: Understanding complex systems by inferring trajectories from sparse sample snapshots is a fundamental challenge in a wide range of domains, e.g., single-cell biology, meteorology, and economics. Despite advancements in Bridge and Flow matching frameworks, current methodologies rely on pairwise interpolation between adjacent snapshots. This hinders…
-
Distributionally-Constrained Adversaries in Online Learning
Distributionally-Constrained Adversaries in Online Learning arXiv:2506.10293v1 Announce Type: new Abstract: There has been much recent interest in understanding the continuum from adversarial to stochastic settings in online learning, with various frameworks including smoothed settings proposed to bridge this gap. We consider the more general and flexible framework of distributionally constrained adversaries in which instances are…
-
Measuring Semantic Information Production in Generative Diffusion Models
Measuring Semantic Information Production in Generative Diffusion Models arXiv:2506.10433v1 Announce Type: new Abstract: It is well known that semantic and structural features of the generated images emerge at different times during the reverse dynamics of diffusion, a phenomenon that has been connected to physical phase transitions in magnets and other materials. In this paper, we…
-
Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration
Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration arXiv:2506.10572v1 Announce Type: new Abstract: Controlling the output probabilities of softmax-based models is a common problem in modern machine learning. Although the $mathrm{Softmax}$ function provides soft control via its temperature parameter, it lacks the ability to enforce hard constraints, such as box constraints, on output probabilities,…
-
Know What You Don’t Know: Uncertainty Calibration of Process Reward Models
Know What You Don’t Know: Uncertainty Calibration of Process Reward Models arXiv:2506.09338v1 Announce Type: new Abstract: Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present…
-
Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking
Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking arXiv:2506.09441v1 Announce Type: new Abstract: Tracking multiple particles in noisy and cluttered scenes remains challenging due to a combinatorial explosion of trajectory hypotheses, which scales super-exponentially with the number of particles and frames. The transformer architecture has shown a significant improvement in robustness against this high combinatorial…
-
Evasion Attacks Against Bayesian Predictive Models
Evasion Attacks Against Bayesian Predictive Models arXiv:2506.09640v1 Announce Type: new Abstract: There is an increasing interest in analyzing the behavior of machine learning systems against adversarial attacks. However, most of the research in adversarial machine learning has focused on studying weaknesses against evasion or poisoning attacks to predictive models in classical setups, with the susceptibility…
-
LLM-Powered CPI Prediction Inference with Online Text Time Series
LLM-Powered CPI Prediction Inference with Online Text Time Series arXiv:2506.09516v1 Announce Type: new Abstract: Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text…
-
Scaling Laws for Uncertainty in Deep Learning
Scaling Laws for Uncertainty in Deep Learning arXiv:2506.09648v1 Announce Type: new Abstract: Deep learning has recently revealed the existence of scaling laws, demonstrating that model performance follows predictable trends based on dataset and model sizes. Inspired by these findings and fascinating phenomena emerging in the over-parameterized regime, we examine a parallel direction: do similar scaling…
-
Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting
Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting arXiv:2506.08049v1 Announce Type: new Abstract: Subseasonal-to-seasonal (S2S) forecasting, which predicts climate conditions from several weeks to months in advance, presents significant challenges due to the chaotic dynamics of atmospheric systems and complex interactions across multiple scales. Current approaches often fail to explicitly model underlying physical processes and teleconnections…
-
Constrained Pareto Set Identification with Bandit Feedback
Constrained Pareto Set Identification with Bandit Feedback arXiv:2506.08127v1 Announce Type: new Abstract: In this paper, we address the problem of identifying the Pareto Set under feasibility constraints in a multivariate bandit setting. Specifically, given a $K$-armed bandit with unknown means $mu_1, dots, mu_K in mathbb{R}^d$, the goal is to identify the set of arms whose…
-
WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection
WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection arXiv:2506.08066v1 Announce Type: new Abstract: Change Point Detection (CPD) aims to identify moments of abrupt distribution shifts in data streams. Real-world high-dimensional CPD remains challenging due to data pattern complexity and violation of common assumptions. Resorting to standalone deep neural networks, the current state-of-the-art detectors…
-
Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces
Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces arXiv:2506.08325v1 Announce Type: new Abstract: Depth measures are powerful tools for defining level sets in emerging, non–standard, and complex random objects such as high-dimensional multivariate data, functional data, and random graphs. Despite their favorable theoretical properties, the integration of…
-
Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification
Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification arXiv:2506.08548v1 Announce Type: new Abstract: Many classification tasks involve imbalanced data, in which a class is largely underrepresented. Several techniques consists in creating a rebalanced dataset on which a classifier is trained. In this paper, we study theoretically such a procedure, when the classifier…
-
Direct Fisher Score Estimation for Likelihood Maximization
Direct Fisher Score Estimation for Likelihood Maximization arXiv:2506.06542v1 Announce Type: new Abstract: We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from…
-
On the Fundamental Impossibility of Hallucination Control in Large Language Models
On the Fundamental Impossibility of Hallucination Control in Large Language Models arXiv:2506.06382v1 Announce Type: new Abstract: This paper explains textbf{why it is impossible to create large language models that do not hallucinate and what are the trade-offs we should be looking for}. It presents a formal textbf{impossibility theorem} demonstrating that no inference mechanism can simultaneously…
-
Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations
Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations arXiv:2506.06613v1 Announce Type: new Abstract: Learning distribution families over $mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to…
-
Continuous Semi-Implicit Models
Continuous Semi-Implicit Models arXiv:2506.06778v1 Announce Type: new Abstract: Semi-implicit distributions have shown great promise in variational inference and generative modeling. Hierarchical semi-implicit models, which stack multiple semi-implicit layers, enhance the expressiveness of semi-implicit distributions and can be used to accelerate diffusion models given pretrained score networks. However, their sequential training often suffers from slow convergence.…
-
The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes
The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes arXiv:2506.06828v1 Announce Type: new Abstract: I present a novel approach to estimating the temporal and spatial patterns of violent conflict. I show how we can use highly temporally and spatially disaggregated data on conflict events in tandem with Gaussian processes to estimate temporospatial conflict trends.…
-
Nonlinear Causal Discovery through a Sequential Edge Orientation Approach
Nonlinear Causal Discovery through a Sequential Edge Orientation Approach arXiv:2506.05590v1 Announce Type: new Abstract: Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require…
-
Online Conformal Model Selection for Nonstationary Time Series
Online Conformal Model Selection for Nonstationary Time Series arXiv:2506.05544v1 Announce Type: new Abstract: This paper introduces the MPS (Model Prediction Set), a novel framework for online model selection for nonstationary time series. Classical model selection methods, such as information criteria and cross-validation, rely heavily on the stationarity assumption and often fail in dynamic environments which…
-
Multilevel neural simulation-based inference
Multilevel neural simulation-based inference arXiv:2506.06087v1 Announce Type: new Abstract: Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing…
-
Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series
Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series arXiv:2506.05354v1 Announce Type: cross Abstract: Nonstationarity of real-life time series requires model adaptation. In classical approaches like ARMA-ARCH there is assumed some arbitrarily chosen dependence type. To avoid their bias, we will focus on novel more agnostic approach: moving…
-
Zeroth-Order Optimization Finds Flat Minima
Zeroth-Order Optimization Finds Flat Minima arXiv:2506.05454v1 Announce Type: cross Abstract: Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit…
-
On the Wasserstein Geodesic Principal Component Analysis of probability measures
On the Wasserstein Geodesic Principal Component Analysis of probability measures arXiv:2506.04480v1 Announce Type: new Abstract: This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of…
-
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning arXiv:2506.04626v1 Announce Type: new Abstract: Motivated by real-world settings where data collection and policy deployment — whether for a single agent or across multiple agents — are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a…
-
Distributional encoding for Gaussian process regression with qualitative inputs
Distributional encoding for Gaussian process regression with qualitative inputs arXiv:2506.04813v1 Announce Type: new Abstract: Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However,…
-
Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models
Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models arXiv:2506.04945v1 Announce Type: new Abstract: Estimating causal effects of joint interventions on multiple variables is crucial in many domains, but obtaining data from such simultaneous interventions can be challenging. Our study explores how to learn joint interventional effects using only observational data and single-variable interventions.…
-
Nonlinear Causal Discovery for Grouped Data
Nonlinear Causal Discovery for Grouped Data arXiv:2506.05120v1 Announce Type: new Abstract: Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather…
-
SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search
SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search arXiv:2506.03657v1 Announce Type: new Abstract: Community detection is a fundamental task in graph analysis, with methods often relying on fitting models like the Stochastic Block Model (SBM) to observed networks. While many algorithms can accurately estimate SBM parameters when the input graph…
-
Models of Heavy-Tailed Mechanistic Universality
Models of Heavy-Tailed Mechanistic Universality arXiv:2506.03470v1 Announce Type: new Abstract: Recent theoretical and empirical successes in deep learning, including the celebrated neural scaling laws, are punctuated by the observation that many objects of interest tend to exhibit some form of heavy-tailed or power law behavior. In particular, the prevalence of heavy-tailed spectral densities in Jacobians,…
-
Position: There Is No Free Bayesian Uncertainty Quantification
Position: There Is No Free Bayesian Uncertainty Quantification arXiv:2506.03670v1 Announce Type: new Abstract: Due to their intuitive appeal, Bayesian methods of modeling and uncertainty quantification have become popular in modern machine and deep learning. When providing a prior distribution over the parameter space, it is straightforward to obtain a distribution over the parameters that is…
-
Latent Guided Sampling for Combinatorial Optimization
Latent Guided Sampling for Combinatorial Optimization arXiv:2506.03672v1 Announce Type: new Abstract: Combinatorial Optimization problems are widespread in domains such as logistics, manufacturing, and drug discovery, yet their NP-hard nature makes them computationally challenging. Recent Neural Combinatorial Optimization methods leverage deep learning to learn solution strategies, trained via Supervised or Reinforcement Learning (RL). While promising, these…
-
Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices
Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices arXiv:2506.03764v1 Announce Type: new Abstract: We present a theoretical framework for deriving the general $n$-th order Fr’echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato’s analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular…
-
Assumption-free stability for ranking problems
Assumption-free stability for ranking problems arXiv:2506.02257v1 Announce Type: new Abstract: In this work, we consider ranking problems among a finite set of candidates: for instance, selecting the top-$k$ items among a larger list of candidates or obtaining the full ranking of all items in the set. These problems are often unstable, in the sense that…
-
Enabling Probabilistic Learning on Manifolds through Double Diffusion Maps
Enabling Probabilistic Learning on Manifolds through Double Diffusion Maps arXiv:2506.02254v1 Announce Type: new Abstract: We present a generative learning framework for probabilistic sampling based on an extension of the Probabilistic Learning on Manifolds (PLoM) approach, which is designed to generate statistically consistent realizations of a random vector in a finite-dimensional Euclidean space, informed by a…
-
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements arXiv:2506.02260v1 Announce Type: new Abstract: The growing prevalence of digital health technologies has led to the generation of complex multi-modal data, such as physical activity measurements simultaneously collected from various sensors of mobile and wearable devices. These data hold immense potential for advancing health studies, but current…
-
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression arXiv:2506.02336v1 Announce Type: new Abstract: We study gradient descent (GD) with a constant stepsize for $ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective, achieving exponential convergence in $widetilde{mathcal{O}}(kappa)$ steps with $kappa$ being the condition…
-
Tensor State Space-based Dynamic Multilayer Network Modeling
Tensor State Space-based Dynamic Multilayer Network Modeling arXiv:2506.02413v1 Announce Type: new Abstract: Understanding the complex interactions within dynamic multilayer networks is critical for advancements in various scientific domains. Existing models often fail to capture such networks’ temporal and cross-layer dynamics. This paper introduces a novel Tensor State Space Model for Dynamic Multilayer Networks (TSSDMN), utilizing…
-
Minimax Rates for the Estimation of Eigenpairs of Weighted Laplace-Beltrami Operators on Manifolds
Minimax Rates for the Estimation of Eigenpairs of Weighted Laplace-Beltrami Operators on Manifolds arXiv:2506.00171v1 Announce Type: new Abstract: We study the problem of estimating eigenpairs of elliptic differential operators from samples of a distribution $rho$ supported on a manifold $M$. The operators discussed in the paper are relevant in unsupervised learning and in particular are…
-
Overfitting has a limitation: a model-independent generalization error bound based on R’enyi entropy
Overfitting has a limitation: a model-independent generalization error bound based on R’enyi entropy arXiv:2506.00182v1 Announce Type: new Abstract: Will further scaling up of machine learning models continue to bring success? A significant challenge in answering this question lies in understanding generalization error, which is the impact of overfitting. Understanding generalization error behavior of increasingly large-scale…
-
Riemannian Principal Component Analysis
Riemannian Principal Component Analysis arXiv:2506.00226v1 Announce Type: new Abstract: This paper proposes an innovative extension of Principal Component Analysis (PCA) that transcends the traditional assumption of data lying in Euclidean space, enabling its application to data on Riemannian manifolds. The primary challenge addressed is the lack of vector space operations on such manifolds. Fletcher et…
-
Bayesian Data Sketching for Varying Coefficient Regression Models
Bayesian Data Sketching for Varying Coefficient Regression Models arXiv:2506.00270v1 Announce Type: new Abstract: Varying coefficient models are popular for estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We introduce Bayesian…
-
Gibbs randomness-compression proposition: An efficient deep learning
Gibbs randomness-compression proposition: An efficient deep learning arXiv:2505.23869v1 Announce Type: new Abstract: A proposition that connects randomness and compression put forward via Gibbs entropy over set of measurement vectors associated with a compression process. The proposition states that a lossy compression process is equivalent to {it directed randomness} that preserves information content. The proposition originated…
-
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning arXiv:2505.23783v1 Announce Type: new Abstract: In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performances in classification. While calibration techniques are proposed to…
-
Conformal Object Detection by Sequential Risk Control
Conformal Object Detection by Sequential Risk Control arXiv:2505.24038v1 Announce Type: new Abstract: Recent advances in object detectors have led to their adoption for industrial uses. However, their deployment in critical applications is hindered by the inherent lack of reliability of neural networks and the complex structure of object detection models. To address these challenges, we…
-
Performative Risk Control: Calibrating Models for Reliable Deployment under Performativity
Performative Risk Control: Calibrating Models for Reliable Deployment under Performativity arXiv:2505.24097v1 Announce Type: new Abstract: Calibrating blackbox machine learning models to achieve risk control is crucial to ensure reliable decision-making. A rich line of literature has been studying how to calibrate a model so that its predictions satisfy explicit finite-sample statistical guarantees under a fixed,…
-
A Mathematical Perspective On Contrastive Learning
A Mathematical Perspective On Contrastive Learning arXiv:2505.24134v1 Announce Type: new Abstract: Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each modality, that align representations within a common latent…
-
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games arXiv:2505.22781v1 Announce Type: new Abstract: We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting,…
-
Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features
Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features arXiv:2505.22997v1 Announce Type: new Abstract: Traditional classifiers often assume feature independence or rely on overly simplistic relationships, leading to poor performance in settings where real-world dependencies matter. We introduce the Deep Copula Classifier (DCC), a generative model that separates the learning…
-
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Highly Efficient and Effective LLMs with Multi-Boolean Architectures arXiv:2505.22811v1 Announce Type: new Abstract: Weight binarization has emerged as a promising strategy to drastically reduce the complexity of large language models (LLMs). It is mainly classified into two approaches: post-training binarization and finetuning with training-aware binarization methods. The first approach, while having low complexity, leads to…
-
JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows
JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows arXiv:2505.23196v1 Announce Type: new Abstract: Conformal prediction provides a model-agnostic framework for uncertainty quantification with finite-sample validity guarantees, making it an attractive tool for constructing reliable prediction sets. However, existing approaches commonly rely on residual-based conformity scores, which impose geometric constraints and struggle when the underlying distribution is…
-
Stable Thompson Sampling: Valid Inference via Variance Inflation
Stable Thompson Sampling: Valid Inference via Variance Inflation arXiv:2505.23260v1 Announce Type: new Abstract: We consider the problem of statistical inference when the data is collected via a Thompson Sampling-type algorithm. While Thompson Sampling (TS) is known to be both asymptotically optimal and empirically effective, its adaptive sampling scheme poses challenges for constructing confidence intervals for…
-
A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models
A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models arXiv:2505.21580v1 Announce Type: new Abstract: Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as of an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in…
-
STACI: Spatio-Temporal Aleatoric Conformal Inference
STACI: Spatio-Temporal Aleatoric Conformal Inference arXiv:2505.21658v1 Announce Type: new Abstract: Fitting Gaussian Processes (GPs) provides interpretable aleatoric uncertainty quantification for estimation of spatio-temporal fields. Spatio-temporal deep learning models, while scalable, typically assume a simplistic independent covariance matrix for the response, failing to capture the underlying correlation structure. However, spatio-temporal GPs suffer from issues of scalability…
-
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference arXiv:2505.21721v1 Announce Type: new Abstract: We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at an almost dimension-independent rate. Specifically, for strongly log-concave and log-smooth targets, the number of iterations for BBVI with a sub-Gaussian family to achieve…
-
Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks
Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks arXiv:2505.21791v1 Announce Type: new Abstract: Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of…
-
A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging
A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging arXiv:2505.21796v1 Announce Type: new Abstract: Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing…
-
Differentially private ratio statistics
Differentially private ratio statistics arXiv:2505.20351v1 Announce Type: new Abstract: Ratio statistics–such as relative risk and odds ratios–play a central role in hypothesis testing, model evaluation, and decision-making across many areas of machine learning, including causal inference and fairness analysis. However, despite privacy concerns surrounding many datasets and despite increasing adoption of differential privacy, differentially private…
-
Learning with Expected Signatures: Theory and Applications
Learning with Expected Signatures: Theory and Applications arXiv:2505.20465v1 Announce Type: new Abstract: The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This “model-free” embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML)…
-
Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models
Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models arXiv:2505.20536v1 Announce Type: new Abstract: This paper studies the task of estimating heterogeneous treatment effects in causal panel data models, in the presence of covariate effects. We propose a novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models, that employs flexible model structures and powerful…
-
Balancing Performance and Costs in Best Arm Identification
Balancing Performance and Costs in Best Arm Identification arXiv:2505.20583v1 Announce Type: new Abstract: We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners…
-
Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems
Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems arXiv:2505.18276v1 Announce Type: new Abstract: Designing algorithms for solving high-dimensional Bayesian inverse problems directly in infinite-dimensional function spaces – where such problems are naturally formulated – is crucial to ensure stability and convergence as the discretization of the underlying problem is refined.…
-
Operator Learning for Schr”{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization
Operator Learning for Schr”{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization arXiv:2505.18288v1 Announce Type: new Abstract: We consider the problem of learning the evolution operator for the time-dependent Schr”{o}dinger equation, where the Hamiltonian may vary with time. Existing neural network-based surrogates often ignore fundamental properties of the Schr”{o}dinger equation, such as linearity and unitarity, and…
-
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective arXiv:2505.18346v1 Announce Type: new Abstract: Weak-to-strong generalization, where a student model trained on imperfect labels generated by a weaker teacher nonetheless surpasses that teacher, has been widely observed but the mechanisms that enable it have remained poorly understood. In this paper, through a theoretical analysis of…
-
Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling
Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling arXiv:2505.18327v1 Announce Type: new Abstract: Constrained stochastic nonlinear optimization problems have attracted significant attention for their ability to model complex real-world scenarios in physics, economics, and biology. As datasets continue to grow, online inference methods have become crucial for enabling real-time decision-making without the need…
-
Identifiability of latent causal graphical models without pure children
Identifiability of latent causal graphical models without pure children arXiv:2505.18410v1 Announce Type: new Abstract: This paper considers a challenging problem of identifying a causal graphical model under the presence of latent variables. While various identifiability conditions have been proposed in the literature, they often require multiple pure children per latent variable or restrictions on the…
-
Liouville PDE-based sliced-Wasserstein flow for fair regression
Liouville PDE-based sliced-Wasserstein flow for fair regression arXiv:2505.17204v1 Announce Type: new Abstract: The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is applied to fair regression. We have improved the SWF in a few aspects. First, the stochastic diffusive term from the Fokker-Planck equation-based Monte Carlo is transformed to Liouville partial differential…
-
Learning Probabilities of Causation from Finite Population Data
Learning Probabilities of Causation from Finite Population Data arXiv:2505.17133v1 Announce Type: new Abstract: Probabilities of causation play a crucial role in modern decision-making. This paper addresses the challenge of predicting probabilities of causation for subpopulations with textbf{insufficient} data using machine learning models. Tian and Pearl first defined and derived tight bounds for three fundamental probabilities…
-
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine arXiv:2505.17283v1 Announce Type: new Abstract: Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach…
-
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation arXiv:2505.17288v1 Announce Type: new Abstract: Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new…
-
Optimal Transport with Heterogeneously Missing Data
Optimal Transport with Heterogeneously Missing Data arXiv:2505.17291v1 Announce Type: new Abstract: We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous missingness probabilities across features and across the two distributions.…
-
PO-Flow: Flow-based Generative Models for Sampling Potential Outcomes and Counterfactuals
PO-Flow: Flow-based Generative Models for Sampling Potential Outcomes and Counterfactuals arXiv:2505.16051v1 Announce Type: new Abstract: We propose PO-Flow, a novel continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcomes and counterfactuals. Trained via flow matching, PO-Flow provides a unified framework for individualized potential outcome prediction, counterfactual predictions, and uncertainty-aware density learning.…
-
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision arXiv:2505.15927v1 Announce Type: new Abstract: Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the…
-
Oh SnapMMD! Forecasting Stochastic Dynamics Beyond the Schr”odinger Bridge’s End
Oh SnapMMD! Forecasting Stochastic Dynamics Beyond the Schr”odinger Bridge’s End arXiv:2505.16082v1 Announce Type: new Abstract: Scientists often want to make predictions beyond the observed time horizon of “snapshot” data following latent stochastic dynamics. For example, in time course single-cell mRNA profiling, scientists have access to cellular transcriptional state measurements (snapshots) from different biological replicates at…
-
Dimension-adapted Momentum Outscales SGD
Dimension-adapted Momentum Outscales SGD arXiv:2505.16098v1 Announce Type: new Abstract: We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target…
-
Exponential Convergence of CAVI for Bayesian PCA
Exponential Convergence of CAVI for Bayesian PCA arXiv:2505.16145v1 Announce Type: new Abstract: Probabilistic principal component analysis (PCA) and its Bayesian variant (BPCA) are widely used for dimension reduction in machine learning and statistics. The main advantage of probabilistic PCA over the traditional formulation is allowing uncertainty quantification. The parameters of BPCA are typically learned using…
-
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective arXiv:2505.14808v1 Announce Type: new Abstract: This work aims to demystify the out-of-distribution (OOD) capabilities of in-context learning (ICL) by studying linear regression tasks parameterized with low-rank covariance matrices. With such a parameterization, we can model distribution shifts as a varying angle between the subspace of the…
-
LOBSTUR: A Local Bootstrap Framework for Tuning Unsupervised Representations in Graph Neural Networks
LOBSTUR: A Local Bootstrap Framework for Tuning Unsupervised Representations in Graph Neural Networks arXiv:2505.14867v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) are increasingly used in conjunction with unsupervised learning techniques to learn powerful node representations, but their deployment is hindered by their high sensitivity to hyperparameter tuning and the absence of established methodologies for…
-
Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds
Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds arXiv:2505.15013v1 Announce Type: new Abstract: First-order adaptive optimization methods like Adam are the default choices for training modern deep neural networks. Despite their empirical success, the theoretical understanding of these methods in non-smooth settings, particularly in Deep ReLU networks, remains limited. ReLU…
-
A Linear Approach to Data Poisoning
A Linear Approach to Data Poisoning arXiv:2505.15175v1 Announce Type: new Abstract: We investigate the theoretical foundations of data poisoning attacks in machine learning models. Our analysis reveals that the Hessian with respect to the input serves as a diagnostic tool for detecting poisoning, exhibiting spectral signatures that characterize compromised datasets. We use random matrix theory…
-
Infinite hierarchical contrastive clustering for personal digital envirotyping
Infinite hierarchical contrastive clustering for personal digital envirotyping arXiv:2505.15022v1 Announce Type: new Abstract: Daily environments have profound influence on our health and behavior. Recent work has shown that digital envirotyping, where computer vision is applied to images of daily environments taken during ecological momentary assessment (EMA), can be used to identify meaningful relationships between environmental…
-
Continuous Domain Generalization
Continuous Domain Generalization arXiv:2505.13519v1 Announce Type: new Abstract: Real-world data distributions often shift continuously across multiple latent factors such as time, geography, and socioeconomic context. However, existing domain generalization approaches typically treat domains as discrete or evolving along a single axis (e.g., time), which fails to capture the complex, multi-dimensional nature of real-world variation. This…
-
Data Balancing Strategies: A Survey of Resampling and Augmentation Methods
Data Balancing Strategies: A Survey of Resampling and Augmentation Methods arXiv:2505.13518v1 Announce Type: new Abstract: Imbalanced data poses a significant obstacle in machine learning, as an unequal distribution of class labels often results in skewed predictions and diminished model accuracy. To mitigate this problem, various resampling strategies have been developed, encompassing both oversampling and undersampling…
-
Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback
Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback arXiv:2505.13562v1 Announce Type: new Abstract: Learning in games is a fundamental problem in machine learning and artificial intelligence, with numerous applications~citep{silver2016mastering,schrittwieser2020mastering}. This work investigates two-player zero-sum matrix games with an unknown payoff matrix and bandit feedback, where each player observes their actions and the…
-
Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles
Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles arXiv:2505.13585v1 Announce Type: new Abstract: This work introduces a new method called scalable Bayesian Monte Carlo (SBMC). The model interpolates between a point estimator and the posterior, and the algorithm is a parallel implementation of a consistent (asymptotically unbiased) Bayesian deep learning algorithm: sequential Monte…
-
Backward Conformal Prediction
Backward Conformal Prediction arXiv:2505.13732v1 Announce Type: new Abstract: We introduce $textit{Backward Conformal Prediction}$, a method that guarantees conformal coverage while providing flexible control over the size of prediction sets. Unlike standard conformal prediction, which fixes the coverage level and allows the conformal set size to vary, our approach defines a rule that constrains how prediction…
-
The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations
The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations arXiv:2505.11622v1 Announce Type: new Abstract: We present a novel kernel-based method for learning multivariate stochastic differential equations (SDEs). The method follows a two-step procedure: we first estimate the drift term function, then the (matrix-valued) diffusion function given the drift. Occupation kernels are integral functionals…
-
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles arXiv:2505.11671v1 Announce Type: new Abstract: Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)…
-
Missing Data Imputation by Reducing Mutual Information with Rectified Flows
Missing Data Imputation by Reducing Mutual Information with Rectified Flows arXiv:2505.11749v1 Announce Type: new Abstract: This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method…
-
Thompson Sampling-like Algorithms for Stochastic Rising Bandits
Thompson Sampling-like Algorithms for Stochastic Rising Bandits arXiv:2505.12092v1 Announce Type: new Abstract: Stochastic rising rested bandit (SRRB) is a setting where the arms’ expected rewards increase as they are pulled. It models scenarios in which the performances of the different options grow as an effect of an underlying learning process (e.g., online model selection). Even…
-
Multi-Attribute Graph Estimation with Sparse-Group Non-Convex Penalties
Multi-Attribute Graph Estimation with Sparse-Group Non-Convex Penalties arXiv:2505.11984v1 Announce Type: new Abstract: We consider the problem of inferring the conditional independence graph (CIG) of high-dimensional Gaussian vectors from multi-attribute data. Most existing methods for graph estimation are based on single-attribute models where one associates a scalar random variable with each node. In multi-attribute graphical models,…