Category: stat.ML

A Framework for Non-Linear Attention via Modern Hopfield Networks

A Framework for Non-Linear Attention via Modern Hopfield Networks arXiv:2506.11043v1 Announce Type: new Abstract: In this work we propose an energy functional along the lines of Modern Hopfield Networks (MNH), the stationary points of which correspond to the attention due to Vaswani et al. [12], thus unifying both frameworks. The minima of this landscape form…

June 16, 2025
Fast Bayesian Optimization of Function Networks with Partial Evaluations

Fast Bayesian Optimization of Function Networks with Partial Evaluations arXiv:2506.11456v1 Announce Type: new Abstract: Bayesian optimization of function networks (BOFN) is a framework for optimizing expensive-to-evaluate objective functions structured as networks, where some nodes’ outputs serve as inputs for others. Many real-world applications, such as manufacturing and drug discovery, involve function networks with additional properties…

June 16, 2025
Collaborative Prediction: To Join or To Disjoin Datasets

Collaborative Prediction: To Join or To Disjoin Datasets arXiv:2506.11271v1 Announce Type: new Abstract: With the recent rise of generative Artificial Intelligence (AI), the need of selecting high-quality dataset to improve machine learning models has garnered increasing attention. However, some part of this topic remains underexplored, even for simple prediction models. In this work, we study…

June 16, 2025
On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiologic boundary conditions

On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiologic boundary conditions arXiv:2506.11683v1 Announce Type: new Abstract: Solving inverse problems in cardiovascular modeling is particularly challenging due to the high computational cost of running high-fidelity simulations. In this work, we focus on Bayesian parameter estimation and explore different methods to reduce the…

June 16, 2025
Using Deep Operators to Create Spatio-temporal Surrogates for Dynamical Systems under Uncertainty

Using Deep Operators to Create Spatio-temporal Surrogates for Dynamical Systems under Uncertainty arXiv:2506.11761v1 Announce Type: new Abstract: Spatio-temporal data, which consists of responses or measurements gathered at different times and positions, is ubiquitous across diverse applications of civil infrastructure. While SciML methods have made significant progress in tackling the issue of response prediction for individual…

June 16, 2025
Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes

Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes arXiv:2506.10101v1 Announce Type: new Abstract: In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We…

June 13, 2025
Momentum Multi-Marginal Schr”odinger Bridge Matching

Momentum Multi-Marginal Schr”odinger Bridge Matching arXiv:2506.10168v1 Announce Type: new Abstract: Understanding complex systems by inferring trajectories from sparse sample snapshots is a fundamental challenge in a wide range of domains, e.g., single-cell biology, meteorology, and economics. Despite advancements in Bridge and Flow matching frameworks, current methodologies rely on pairwise interpolation between adjacent snapshots. This hinders…

June 13, 2025
Measuring Semantic Information Production in Generative Diffusion Models

Measuring Semantic Information Production in Generative Diffusion Models arXiv:2506.10433v1 Announce Type: new Abstract: It is well known that semantic and structural features of the generated images emerge at different times during the reverse dynamics of diffusion, a phenomenon that has been connected to physical phase transitions in magnets and other materials. In this paper, we…

June 13, 2025
Distributionally-Constrained Adversaries in Online Learning

Distributionally-Constrained Adversaries in Online Learning arXiv:2506.10293v1 Announce Type: new Abstract: There has been much recent interest in understanding the continuum from adversarial to stochastic settings in online learning, with various frameworks including smoothed settings proposed to bridge this gap. We consider the more general and flexible framework of distributionally constrained adversaries in which instances are…

June 13, 2025
Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration

Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration arXiv:2506.10572v1 Announce Type: new Abstract: Controlling the output probabilities of softmax-based models is a common problem in modern machine learning. Although the $mathrm{Softmax}$ function provides soft control via its temperature parameter, it lacks the ability to enforce hard constraints, such as box constraints, on output probabilities,…

June 13, 2025
Know What You Don’t Know: Uncertainty Calibration of Process Reward Models

Know What You Don’t Know: Uncertainty Calibration of Process Reward Models arXiv:2506.09338v1 Announce Type: new Abstract: Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present…

June 12, 2025
Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking

Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking arXiv:2506.09441v1 Announce Type: new Abstract: Tracking multiple particles in noisy and cluttered scenes remains challenging due to a combinatorial explosion of trajectory hypotheses, which scales super-exponentially with the number of particles and frames. The transformer architecture has shown a significant improvement in robustness against this high combinatorial…

June 12, 2025
Evasion Attacks Against Bayesian Predictive Models

Evasion Attacks Against Bayesian Predictive Models arXiv:2506.09640v1 Announce Type: new Abstract: There is an increasing interest in analyzing the behavior of machine learning systems against adversarial attacks. However, most of the research in adversarial machine learning has focused on studying weaknesses against evasion or poisoning attacks to predictive models in classical setups, with the susceptibility…

June 12, 2025
LLM-Powered CPI Prediction Inference with Online Text Time Series

LLM-Powered CPI Prediction Inference with Online Text Time Series arXiv:2506.09516v1 Announce Type: new Abstract: Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text…

June 12, 2025
Scaling Laws for Uncertainty in Deep Learning

Scaling Laws for Uncertainty in Deep Learning arXiv:2506.09648v1 Announce Type: new Abstract: Deep learning has recently revealed the existence of scaling laws, demonstrating that model performance follows predictable trends based on dataset and model sizes. Inspired by these findings and fascinating phenomena emerging in the over-parameterized regime, we examine a parallel direction: do similar scaling…

June 12, 2025
Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting

Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting arXiv:2506.08049v1 Announce Type: new Abstract: Subseasonal-to-seasonal (S2S) forecasting, which predicts climate conditions from several weeks to months in advance, presents significant challenges due to the chaotic dynamics of atmospheric systems and complex interactions across multiple scales. Current approaches often fail to explicitly model underlying physical processes and teleconnections…

June 11, 2025
Constrained Pareto Set Identification with Bandit Feedback

Constrained Pareto Set Identification with Bandit Feedback arXiv:2506.08127v1 Announce Type: new Abstract: In this paper, we address the problem of identifying the Pareto Set under feasibility constraints in a multivariate bandit setting. Specifically, given a $K$-armed bandit with unknown means $mu_1, dots, mu_K in mathbb{R}^d$, the goal is to identify the set of arms whose…

June 11, 2025
WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection

WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection arXiv:2506.08066v1 Announce Type: new Abstract: Change Point Detection (CPD) aims to identify moments of abrupt distribution shifts in data streams. Real-world high-dimensional CPD remains challenging due to data pattern complexity and violation of common assumptions. Resorting to standalone deep neural networks, the current state-of-the-art detectors…

June 11, 2025
Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces

Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces arXiv:2506.08325v1 Announce Type: new Abstract: Depth measures are powerful tools for defining level sets in emerging, non–standard, and complex random objects such as high-dimensional multivariate data, functional data, and random graphs. Despite their favorable theoretical properties, the integration of…

June 11, 2025
Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification

Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification arXiv:2506.08548v1 Announce Type: new Abstract: Many classification tasks involve imbalanced data, in which a class is largely underrepresented. Several techniques consists in creating a rebalanced dataset on which a classifier is trained. In this paper, we study theoretically such a procedure, when the classifier…

June 11, 2025
Direct Fisher Score Estimation for Likelihood Maximization

Direct Fisher Score Estimation for Likelihood Maximization arXiv:2506.06542v1 Announce Type: new Abstract: We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from…

June 10, 2025
On the Fundamental Impossibility of Hallucination Control in Large Language Models

On the Fundamental Impossibility of Hallucination Control in Large Language Models arXiv:2506.06382v1 Announce Type: new Abstract: This paper explains textbf{why it is impossible to create large language models that do not hallucinate and what are the trade-offs we should be looking for}. It presents a formal textbf{impossibility theorem} demonstrating that no inference mechanism can simultaneously…

June 10, 2025
Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations

Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations arXiv:2506.06613v1 Announce Type: new Abstract: Learning distribution families over $mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to…

June 10, 2025
Continuous Semi-Implicit Models

Continuous Semi-Implicit Models arXiv:2506.06778v1 Announce Type: new Abstract: Semi-implicit distributions have shown great promise in variational inference and generative modeling. Hierarchical semi-implicit models, which stack multiple semi-implicit layers, enhance the expressiveness of semi-implicit distributions and can be used to accelerate diffusion models given pretrained score networks. However, their sequential training often suffers from slow convergence.…

June 10, 2025
The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes

The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes arXiv:2506.06828v1 Announce Type: new Abstract: I present a novel approach to estimating the temporal and spatial patterns of violent conflict. I show how we can use highly temporally and spatially disaggregated data on conflict events in tandem with Gaussian processes to estimate temporospatial conflict trends.…

June 10, 2025
Nonlinear Causal Discovery through a Sequential Edge Orientation Approach

Nonlinear Causal Discovery through a Sequential Edge Orientation Approach arXiv:2506.05590v1 Announce Type: new Abstract: Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require…

June 9, 2025
Online Conformal Model Selection for Nonstationary Time Series

Online Conformal Model Selection for Nonstationary Time Series arXiv:2506.05544v1 Announce Type: new Abstract: This paper introduces the MPS (Model Prediction Set), a novel framework for online model selection for nonstationary time series. Classical model selection methods, such as information criteria and cross-validation, rely heavily on the stationarity assumption and often fail in dynamic environments which…

June 9, 2025
Multilevel neural simulation-based inference

Multilevel neural simulation-based inference arXiv:2506.06087v1 Announce Type: new Abstract: Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing…

June 9, 2025
Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series

Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series arXiv:2506.05354v1 Announce Type: cross Abstract: Nonstationarity of real-life time series requires model adaptation. In classical approaches like ARMA-ARCH there is assumed some arbitrarily chosen dependence type. To avoid their bias, we will focus on novel more agnostic approach: moving…

June 9, 2025
Zeroth-Order Optimization Finds Flat Minima

Zeroth-Order Optimization Finds Flat Minima arXiv:2506.05454v1 Announce Type: cross Abstract: Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit…

June 9, 2025
On the Wasserstein Geodesic Principal Component Analysis of probability measures

On the Wasserstein Geodesic Principal Component Analysis of probability measures arXiv:2506.04480v1 Announce Type: new Abstract: This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of…

June 6, 2025
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning arXiv:2506.04626v1 Announce Type: new Abstract: Motivated by real-world settings where data collection and policy deployment — whether for a single agent or across multiple agents — are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a…

June 6, 2025
Distributional encoding for Gaussian process regression with qualitative inputs

Distributional encoding for Gaussian process regression with qualitative inputs arXiv:2506.04813v1 Announce Type: new Abstract: Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However,…

June 6, 2025
Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models

Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models arXiv:2506.04945v1 Announce Type: new Abstract: Estimating causal effects of joint interventions on multiple variables is crucial in many domains, but obtaining data from such simultaneous interventions can be challenging. Our study explores how to learn joint interventional effects using only observational data and single-variable interventions.…

June 6, 2025
Nonlinear Causal Discovery for Grouped Data

Nonlinear Causal Discovery for Grouped Data arXiv:2506.05120v1 Announce Type: new Abstract: Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather…

June 6, 2025
SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search

SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search arXiv:2506.03657v1 Announce Type: new Abstract: Community detection is a fundamental task in graph analysis, with methods often relying on fitting models like the Stochastic Block Model (SBM) to observed networks. While many algorithms can accurately estimate SBM parameters when the input graph…

June 5, 2025
Models of Heavy-Tailed Mechanistic Universality

Models of Heavy-Tailed Mechanistic Universality arXiv:2506.03470v1 Announce Type: new Abstract: Recent theoretical and empirical successes in deep learning, including the celebrated neural scaling laws, are punctuated by the observation that many objects of interest tend to exhibit some form of heavy-tailed or power law behavior. In particular, the prevalence of heavy-tailed spectral densities in Jacobians,…

June 5, 2025
Position: There Is No Free Bayesian Uncertainty Quantification

Position: There Is No Free Bayesian Uncertainty Quantification arXiv:2506.03670v1 Announce Type: new Abstract: Due to their intuitive appeal, Bayesian methods of modeling and uncertainty quantification have become popular in modern machine and deep learning. When providing a prior distribution over the parameter space, it is straightforward to obtain a distribution over the parameters that is…

June 5, 2025
Latent Guided Sampling for Combinatorial Optimization

Latent Guided Sampling for Combinatorial Optimization arXiv:2506.03672v1 Announce Type: new Abstract: Combinatorial Optimization problems are widespread in domains such as logistics, manufacturing, and drug discovery, yet their NP-hard nature makes them computationally challenging. Recent Neural Combinatorial Optimization methods leverage deep learning to learn solution strategies, trained via Supervised or Reinforcement Learning (RL). While promising, these…

June 5, 2025
Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices

Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices arXiv:2506.03764v1 Announce Type: new Abstract: We present a theoretical framework for deriving the general $n$-th order Fr’echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato’s analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular…

June 5, 2025
Assumption-free stability for ranking problems

Assumption-free stability for ranking problems arXiv:2506.02257v1 Announce Type: new Abstract: In this work, we consider ranking problems among a finite set of candidates: for instance, selecting the top-$k$ items among a larger list of candidates or obtaining the full ranking of all items in the set. These problems are often unstable, in the sense that…

June 4, 2025
Enabling Probabilistic Learning on Manifolds through Double Diffusion Maps

Enabling Probabilistic Learning on Manifolds through Double Diffusion Maps arXiv:2506.02254v1 Announce Type: new Abstract: We present a generative learning framework for probabilistic sampling based on an extension of the Probabilistic Learning on Manifolds (PLoM) approach, which is designed to generate statistically consistent realizations of a random vector in a finite-dimensional Euclidean space, informed by a…

June 4, 2025
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements

MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements arXiv:2506.02260v1 Announce Type: new Abstract: The growing prevalence of digital health technologies has led to the generation of complex multi-modal data, such as physical activity measurements simultaneously collected from various sensors of mobile and wearable devices. These data hold immense potential for advancing health studies, but current…

June 4, 2025
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression arXiv:2506.02336v1 Announce Type: new Abstract: We study gradient descent (GD) with a constant stepsize for $ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective, achieving exponential convergence in $widetilde{mathcal{O}}(kappa)$ steps with $kappa$ being the condition…

June 4, 2025
Tensor State Space-based Dynamic Multilayer Network Modeling

Tensor State Space-based Dynamic Multilayer Network Modeling arXiv:2506.02413v1 Announce Type: new Abstract: Understanding the complex interactions within dynamic multilayer networks is critical for advancements in various scientific domains. Existing models often fail to capture such networks’ temporal and cross-layer dynamics. This paper introduces a novel Tensor State Space Model for Dynamic Multilayer Networks (TSSDMN), utilizing…

June 4, 2025
Minimax Rates for the Estimation of Eigenpairs of Weighted Laplace-Beltrami Operators on Manifolds

Minimax Rates for the Estimation of Eigenpairs of Weighted Laplace-Beltrami Operators on Manifolds arXiv:2506.00171v1 Announce Type: new Abstract: We study the problem of estimating eigenpairs of elliptic differential operators from samples of a distribution $rho$ supported on a manifold $M$. The operators discussed in the paper are relevant in unsupervised learning and in particular are…

June 3, 2025
Overfitting has a limitation: a model-independent generalization error bound based on R’enyi entropy

Overfitting has a limitation: a model-independent generalization error bound based on R’enyi entropy arXiv:2506.00182v1 Announce Type: new Abstract: Will further scaling up of machine learning models continue to bring success? A significant challenge in answering this question lies in understanding generalization error, which is the impact of overfitting. Understanding generalization error behavior of increasingly large-scale…

June 3, 2025
Riemannian Principal Component Analysis

Riemannian Principal Component Analysis arXiv:2506.00226v1 Announce Type: new Abstract: This paper proposes an innovative extension of Principal Component Analysis (PCA) that transcends the traditional assumption of data lying in Euclidean space, enabling its application to data on Riemannian manifolds. The primary challenge addressed is the lack of vector space operations on such manifolds. Fletcher et…

June 3, 2025
Beyond Winning: Margin of Victory Relative to Expectation Unlocks Accurate Skill Ratings

Beyond Winning: Margin of Victory Relative to Expectation Unlocks Accurate Skill Ratings arXiv:2506.00348v1 Announce Type: new Abstract: Knowledge of accurate relative skills in any competitive system is essential, but foundational approaches such as ELO discard extremely relevant performance data by concentrating exclusively on binary outcomes. While margin of victory (MOV) extensions exist, they often lack…

June 3, 2025
Bayesian Data Sketching for Varying Coefficient Regression Models

Bayesian Data Sketching for Varying Coefficient Regression Models arXiv:2506.00270v1 Announce Type: new Abstract: Varying coefficient models are popular for estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We introduce Bayesian…

June 3, 2025
Gibbs randomness-compression proposition: An efficient deep learning

Gibbs randomness-compression proposition: An efficient deep learning arXiv:2505.23869v1 Announce Type: new Abstract: A proposition that connects randomness and compression put forward via Gibbs entropy over set of measurement vectors associated with a compression process. The proposition states that a lossy compression process is equivalent to {it directed randomness} that preserves information content. The proposition originated…

June 2, 2025
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning arXiv:2505.23783v1 Announce Type: new Abstract: In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performances in classification. While calibration techniques are proposed to…

June 2, 2025
Conformal Object Detection by Sequential Risk Control

Conformal Object Detection by Sequential Risk Control arXiv:2505.24038v1 Announce Type: new Abstract: Recent advances in object detectors have led to their adoption for industrial uses. However, their deployment in critical applications is hindered by the inherent lack of reliability of neural networks and the complex structure of object detection models. To address these challenges, we…

June 2, 2025
Performative Risk Control: Calibrating Models for Reliable Deployment under Performativity

Performative Risk Control: Calibrating Models for Reliable Deployment under Performativity arXiv:2505.24097v1 Announce Type: new Abstract: Calibrating blackbox machine learning models to achieve risk control is crucial to ensure reliable decision-making. A rich line of literature has been studying how to calibrate a model so that its predictions satisfy explicit finite-sample statistical guarantees under a fixed,…

June 2, 2025
A Mathematical Perspective On Contrastive Learning

A Mathematical Perspective On Contrastive Learning arXiv:2505.24134v1 Announce Type: new Abstract: Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each modality, that align representations within a common latent…

June 2, 2025
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games

Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games arXiv:2505.22781v1 Announce Type: new Abstract: We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting,…

May 30, 2025
Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features

Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features arXiv:2505.22997v1 Announce Type: new Abstract: Traditional classifiers often assume feature independence or rely on overly simplistic relationships, leading to poor performance in settings where real-world dependencies matter. We introduce the Deep Copula Classifier (DCC), a generative model that separates the learning…

May 30, 2025
Highly Efficient and Effective LLMs with Multi-Boolean Architectures

Highly Efficient and Effective LLMs with Multi-Boolean Architectures arXiv:2505.22811v1 Announce Type: new Abstract: Weight binarization has emerged as a promising strategy to drastically reduce the complexity of large language models (LLMs). It is mainly classified into two approaches: post-training binarization and finetuning with training-aware binarization methods. The first approach, while having low complexity, leads to…

May 30, 2025
JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows

JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows arXiv:2505.23196v1 Announce Type: new Abstract: Conformal prediction provides a model-agnostic framework for uncertainty quantification with finite-sample validity guarantees, making it an attractive tool for constructing reliable prediction sets. However, existing approaches commonly rely on residual-based conformity scores, which impose geometric constraints and struggle when the underlying distribution is…

May 30, 2025
Stable Thompson Sampling: Valid Inference via Variance Inflation

Stable Thompson Sampling: Valid Inference via Variance Inflation arXiv:2505.23260v1 Announce Type: new Abstract: We consider the problem of statistical inference when the data is collected via a Thompson Sampling-type algorithm. While Thompson Sampling (TS) is known to be both asymptotically optimal and empirically effective, its adaptive sampling scheme poses challenges for constructing confidence intervals for…

May 30, 2025
A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models

A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models arXiv:2505.21580v1 Announce Type: new Abstract: Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as of an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in…

May 29, 2025
STACI: Spatio-Temporal Aleatoric Conformal Inference

STACI: Spatio-Temporal Aleatoric Conformal Inference arXiv:2505.21658v1 Announce Type: new Abstract: Fitting Gaussian Processes (GPs) provides interpretable aleatoric uncertainty quantification for estimation of spatio-temporal fields. Spatio-temporal deep learning models, while scalable, typically assume a simplistic independent covariance matrix for the response, failing to capture the underlying correlation structure. However, spatio-temporal GPs suffer from issues of scalability…

May 29, 2025
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference

Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference arXiv:2505.21721v1 Announce Type: new Abstract: We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at an almost dimension-independent rate. Specifically, for strongly log-concave and log-smooth targets, the number of iterations for BBVI with a sub-Gaussian family to achieve…

May 29, 2025
Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks arXiv:2505.21791v1 Announce Type: new Abstract: Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of…

May 29, 2025
A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging arXiv:2505.21796v1 Announce Type: new Abstract: Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing…

May 29, 2025
Differentially private ratio statistics

Differentially private ratio statistics arXiv:2505.20351v1 Announce Type: new Abstract: Ratio statistics–such as relative risk and odds ratios–play a central role in hypothesis testing, model evaluation, and decision-making across many areas of machine learning, including causal inference and fairness analysis. However, despite privacy concerns surrounding many datasets and despite increasing adoption of differential privacy, differentially private…

May 28, 2025
Learning with Expected Signatures: Theory and Applications

Learning with Expected Signatures: Theory and Applications arXiv:2505.20465v1 Announce Type: new Abstract: The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This “model-free” embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML)…

May 28, 2025
Kernel Quantile Embeddings and Associated Probability Metrics

Kernel Quantile Embeddings and Associated Probability Metrics arXiv:2505.20433v1 Announce Type: new Abstract: Embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) has enabled powerful nonparametric methods such as the maximum mean discrepancy (MMD), a statistical distance with strong theoretical and computational properties. At its core, the MMD relies on kernel mean embeddings to represent distributions…

May 28, 2025
Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models

Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models arXiv:2505.20536v1 Announce Type: new Abstract: This paper studies the task of estimating heterogeneous treatment effects in causal panel data models, in the presence of covariate effects. We propose a novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models, that employs flexible model structures and powerful…

May 28, 2025
Balancing Performance and Costs in Best Arm Identification

Balancing Performance and Costs in Best Arm Identification arXiv:2505.20583v1 Announce Type: new Abstract: We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners…

May 28, 2025
Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems

Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems arXiv:2505.18276v1 Announce Type: new Abstract: Designing algorithms for solving high-dimensional Bayesian inverse problems directly in infinite-dimensional function spaces – where such problems are naturally formulated – is crucial to ensure stability and convergence as the discretization of the underlying problem is refined.…

May 27, 2025
Operator Learning for Schr”{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization

Operator Learning for Schr”{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization arXiv:2505.18288v1 Announce Type: new Abstract: We consider the problem of learning the evolution operator for the time-dependent Schr”{o}dinger equation, where the Hamiltonian may vary with time. Existing neural network-based surrogates often ignore fundamental properties of the Schr”{o}dinger equation, such as linearity and unitarity, and…

May 27, 2025
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective

On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective arXiv:2505.18346v1 Announce Type: new Abstract: Weak-to-strong generalization, where a student model trained on imperfect labels generated by a weaker teacher nonetheless surpasses that teacher, has been widely observed but the mechanisms that enable it have remained poorly understood. In this paper, through a theoretical analysis of…

May 27, 2025
Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling

Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling arXiv:2505.18327v1 Announce Type: new Abstract: Constrained stochastic nonlinear optimization problems have attracted significant attention for their ability to model complex real-world scenarios in physics, economics, and biology. As datasets continue to grow, online inference methods have become crucial for enabling real-time decision-making without the need…

May 27, 2025
Identifiability of latent causal graphical models without pure children

Identifiability of latent causal graphical models without pure children arXiv:2505.18410v1 Announce Type: new Abstract: This paper considers a challenging problem of identifying a causal graphical model under the presence of latent variables. While various identifiability conditions have been proposed in the literature, they often require multiple pure children per latent variable or restrictions on the…

May 27, 2025
Learning Probabilities of Causation from Finite Population Data

Learning Probabilities of Causation from Finite Population Data arXiv:2505.17133v1 Announce Type: new Abstract: Probabilities of causation play a crucial role in modern decision-making. This paper addresses the challenge of predicting probabilities of causation for subpopulations with textbf{insufficient} data using machine learning models. Tian and Pearl first defined and derived tight bounds for three fundamental probabilities…

May 26, 2025
Liouville PDE-based sliced-Wasserstein flow for fair regression

Liouville PDE-based sliced-Wasserstein flow for fair regression arXiv:2505.17204v1 Announce Type: new Abstract: The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is applied to fair regression. We have improved the SWF in a few aspects. First, the stochastic diffusive term from the Fokker-Planck equation-based Monte Carlo is transformed to Liouville partial differential…

May 26, 2025
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine

Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine arXiv:2505.17283v1 Announce Type: new Abstract: Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach…

May 26, 2025
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation arXiv:2505.17288v1 Announce Type: new Abstract: Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new…

May 26, 2025
Optimal Transport with Heterogeneously Missing Data

Optimal Transport with Heterogeneously Missing Data arXiv:2505.17291v1 Announce Type: new Abstract: We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous missingness probabilities across features and across the two distributions.…

May 26, 2025
PO-Flow: Flow-based Generative Models for Sampling Potential Outcomes and Counterfactuals

PO-Flow: Flow-based Generative Models for Sampling Potential Outcomes and Counterfactuals arXiv:2505.16051v1 Announce Type: new Abstract: We propose PO-Flow, a novel continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcomes and counterfactuals. Trained via flow matching, PO-Flow provides a unified framework for individualized potential outcome prediction, counterfactual predictions, and uncertainty-aware density learning.…

May 23, 2025
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision arXiv:2505.15927v1 Announce Type: new Abstract: Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the…

May 23, 2025
Oh SnapMMD! Forecasting Stochastic Dynamics Beyond the Schr”odinger Bridge’s End

Oh SnapMMD! Forecasting Stochastic Dynamics Beyond the Schr”odinger Bridge’s End arXiv:2505.16082v1 Announce Type: new Abstract: Scientists often want to make predictions beyond the observed time horizon of “snapshot” data following latent stochastic dynamics. For example, in time course single-cell mRNA profiling, scientists have access to cellular transcriptional state measurements (snapshots) from different biological replicates at…

May 23, 2025
Dimension-adapted Momentum Outscales SGD

Dimension-adapted Momentum Outscales SGD arXiv:2505.16098v1 Announce Type: new Abstract: We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target…

May 23, 2025
Exponential Convergence of CAVI for Bayesian PCA

Exponential Convergence of CAVI for Bayesian PCA arXiv:2505.16145v1 Announce Type: new Abstract: Probabilistic principal component analysis (PCA) and its Bayesian variant (BPCA) are widely used for dimension reduction in machine learning and statistics. The main advantage of probabilistic PCA over the traditional formulation is allowing uncertainty quantification. The parameters of BPCA are typically learned using…

May 23, 2025
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective

Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective arXiv:2505.14808v1 Announce Type: new Abstract: This work aims to demystify the out-of-distribution (OOD) capabilities of in-context learning (ICL) by studying linear regression tasks parameterized with low-rank covariance matrices. With such a parameterization, we can model distribution shifts as a varying angle between the subspace of the…

May 22, 2025
LOBSTUR: A Local Bootstrap Framework for Tuning Unsupervised Representations in Graph Neural Networks

LOBSTUR: A Local Bootstrap Framework for Tuning Unsupervised Representations in Graph Neural Networks arXiv:2505.14867v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) are increasingly used in conjunction with unsupervised learning techniques to learn powerful node representations, but their deployment is hindered by their high sensitivity to hyperparameter tuning and the absence of established methodologies for…

May 22, 2025
Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds

Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds arXiv:2505.15013v1 Announce Type: new Abstract: First-order adaptive optimization methods like Adam are the default choices for training modern deep neural networks. Despite their empirical success, the theoretical understanding of these methods in non-smooth settings, particularly in Deep ReLU networks, remains limited. ReLU…

May 22, 2025
A Linear Approach to Data Poisoning

A Linear Approach to Data Poisoning arXiv:2505.15175v1 Announce Type: new Abstract: We investigate the theoretical foundations of data poisoning attacks in machine learning models. Our analysis reveals that the Hessian with respect to the input serves as a diagnostic tool for detecting poisoning, exhibiting spectral signatures that characterize compromised datasets. We use random matrix theory…

May 22, 2025
Infinite hierarchical contrastive clustering for personal digital envirotyping

Infinite hierarchical contrastive clustering for personal digital envirotyping arXiv:2505.15022v1 Announce Type: new Abstract: Daily environments have profound influence on our health and behavior. Recent work has shown that digital envirotyping, where computer vision is applied to images of daily environments taken during ecological momentary assessment (EMA), can be used to identify meaningful relationships between environmental…

May 22, 2025
Continuous Domain Generalization

Continuous Domain Generalization arXiv:2505.13519v1 Announce Type: new Abstract: Real-world data distributions often shift continuously across multiple latent factors such as time, geography, and socioeconomic context. However, existing domain generalization approaches typically treat domains as discrete or evolving along a single axis (e.g., time), which fails to capture the complex, multi-dimensional nature of real-world variation. This…

May 21, 2025
Data Balancing Strategies: A Survey of Resampling and Augmentation Methods

Data Balancing Strategies: A Survey of Resampling and Augmentation Methods arXiv:2505.13518v1 Announce Type: new Abstract: Imbalanced data poses a significant obstacle in machine learning, as an unequal distribution of class labels often results in skewed predictions and diminished model accuracy. To mitigate this problem, various resampling strategies have been developed, encompassing both oversampling and undersampling…

May 21, 2025
Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback

Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback arXiv:2505.13562v1 Announce Type: new Abstract: Learning in games is a fundamental problem in machine learning and artificial intelligence, with numerous applications~citep{silver2016mastering,schrittwieser2020mastering}. This work investigates two-player zero-sum matrix games with an unknown payoff matrix and bandit feedback, where each player observes their actions and the…

May 21, 2025
Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles

Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles arXiv:2505.13585v1 Announce Type: new Abstract: This work introduces a new method called scalable Bayesian Monte Carlo (SBMC). The model interpolates between a point estimator and the posterior, and the algorithm is a parallel implementation of a consistent (asymptotically unbiased) Bayesian deep learning algorithm: sequential Monte…

May 21, 2025
Backward Conformal Prediction

Backward Conformal Prediction arXiv:2505.13732v1 Announce Type: new Abstract: We introduce $textit{Backward Conformal Prediction}$, a method that guarantees conformal coverage while providing flexible control over the size of prediction sets. Unlike standard conformal prediction, which fixes the coverage level and allows the conformal set size to vary, our approach defines a rule that constrains how prediction…

May 21, 2025
The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations

The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations arXiv:2505.11622v1 Announce Type: new Abstract: We present a novel kernel-based method for learning multivariate stochastic differential equations (SDEs). The method follows a two-step procedure: we first estimate the drift term function, then the (matrix-valued) diffusion function given the drift. Occupation kernels are integral functionals…

May 20, 2025
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles

Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles arXiv:2505.11671v1 Announce Type: new Abstract: Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)…

May 20, 2025
Missing Data Imputation by Reducing Mutual Information with Rectified Flows

Missing Data Imputation by Reducing Mutual Information with Rectified Flows arXiv:2505.11749v1 Announce Type: new Abstract: This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method…

May 20, 2025
Thompson Sampling-like Algorithms for Stochastic Rising Bandits

Thompson Sampling-like Algorithms for Stochastic Rising Bandits arXiv:2505.12092v1 Announce Type: new Abstract: Stochastic rising rested bandit (SRRB) is a setting where the arms’ expected rewards increase as they are pulled. It models scenarios in which the performances of the different options grow as an effect of an underlying learning process (e.g., online model selection). Even…

May 20, 2025
Multi-Attribute Graph Estimation with Sparse-Group Non-Convex Penalties

Multi-Attribute Graph Estimation with Sparse-Group Non-Convex Penalties arXiv:2505.11984v1 Announce Type: new Abstract: We consider the problem of inferring the conditional independence graph (CIG) of high-dimensional Gaussian vectors from multi-attribute data. Most existing methods for graph estimation are based on single-attribute models where one associates a scalar random variable with each node. In multi-attribute graphical models,…

May 20, 2025