Category: stat.ML

Towards a perturbation-based explanation for medical AI as differentiable programs

Towards a perturbation-based explanation for medical AI as differentiable programs arXiv:2502.14001v1 Announce Type: new Abstract: Recent advancement in machine learning algorithms reaches a point where medical devices can be equipped with artificial intelligence (AI) models for diagnostic support and routine automation in clinical settings. In medicine and healthcare, there is a particular demand for sufficient…

February 21, 2025
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition

New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition arXiv:2502.14060v1 Announce Type: new Abstract: We study fundamental limits of first-order stochastic optimization in a range of nonconvex settings, including L-smooth functions satisfying Quasar-Convexity (QC), Quadratic Growth (QG), and Restricted Secant Inequalities (RSI). While the convergence properties of standard algorithms are well-understood in deterministic regimes,…

February 21, 2025
Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs

Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs arXiv:2502.14121v1 Announce Type: new Abstract: Designing modern industrial systems requires balancing several competing objectives, such as profitability, resilience, and sustainability, while accounting for complex interactions between technological, economic, and environmental factors. Multi-objective optimization (MOO) methods are commonly used to navigate…

February 21, 2025
Conformal Prediction under L’evy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations

Conformal Prediction under L’evy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations arXiv:2502.14105v1 Announce Type: new Abstract: Conformal prediction provides a powerful framework for constructing prediction intervals with finite-sample guarantees, yet its robustness under distribution shifts remains a significant challenge. This paper addresses this limitation by modeling distribution shifts using L’evy-Prokhorov (LP) ambiguity sets, which…

February 21, 2025
Prediction-Powered Adaptive Shrinkage Estimation

Prediction-Powered Adaptive Shrinkage Estimation arXiv:2502.14166v1 Announce Type: new Abstract: Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI’s benefits for individual statistical tasks, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS),…

February 21, 2025
Model selection for behavioral learning data and applications to contextual bandits

Model selection for behavioral learning data and applications to contextual bandits arXiv:2502.13186v1 Announce Type: new Abstract: Learning for animals or humans is the process that leads to behaviors better adapted to the environment. This process highly depends on the individual that learns and is usually observed only through the individual’s actions. This article presents ways…

February 20, 2025
Task Shift: From Classification to Regression in Overparameterized Linear Models

Task Shift: From Classification to Regression in Overparameterized Linear Models arXiv:2502.13285v1 Announce Type: new Abstract: Modern machine learning methods have recently demonstrated remarkable capability to generalize under task shift, where latent knowledge is transferred to a different, often more difficult, task under a similar data distribution. We investigate this phenomenon in an overparameterized linear regression…

February 20, 2025
An Efficient Permutation-Based Kernel Two-Sample Test

An Efficient Permutation-Based Kernel Two-Sample Test arXiv:2502.13570v1 Announce Type: new Abstract: Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due…

February 20, 2025
Identifying metric structures of deep latent variable models

Identifying metric structures of deep latent variable models arXiv:2502.13757v1 Announce Type: new Abstract: Deep latent variable models learn condensed representations of data that, hopefully, reflect the inner workings of the studied phenomena. Unfortunately, these latent representations are not statistically identifiable, meaning they cannot be uniquely determined. Domain experts, therefore, need to tread carefully when interpreting…

February 20, 2025
Graph Signal Inference by Learning Narrowband Spectral Kernels

Graph Signal Inference by Learning Narrowband Spectral Kernels arXiv:2502.13686v1 Announce Type: new Abstract: While a common assumption in graph signal analysis is the smoothness of the signals or the band-limitedness of their spectrum, in many instances the spectrum of real graph data may be concentrated at multiple regions of the spectrum, possibly including mid-to-high-frequency components.…

February 20, 2025
Suboptimal Shapley Value Explanations

Suboptimal Shapley Value Explanations arXiv:2502.12209v1 Announce Type: new Abstract: Deep Neural Networks (DNNs) have demonstrated strong capacity in supporting a wide variety of applications. Shapley value has emerged as a prominent tool to analyze feature importance to help people understand the inference process of deep neural models. Computing Shapley value function requires choosing a baseline…

February 19, 2025
The Majority Vote Paradigm Shift: When Popular Meets Optimal

The Majority Vote Paradigm Shift: When Popular Meets Optimal arXiv:2502.12581v1 Announce Type: new Abstract: Reliably labelling data typically requires annotations from multiple human workers. However, humans are far from being perfect. Hence, it is a common practice to aggregate labels gathered from multiple annotators to make a more confident estimate of the true label. Among…

February 19, 2025
Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation arXiv:2502.12607v1 Announce Type: new Abstract: We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such optimization by leveraging duality theory in…

February 19, 2025
Green LIME: Improving AI Explainability through Design of Experiments

Green LIME: Improving AI Explainability through Design of Experiments arXiv:2502.12753v1 Announce Type: new Abstract: In artificial intelligence (AI), the complexity of many models and processes often surpasses human interpretability, making it challenging to understand why a specific prediction is made. This lack of transparency is particularly problematic in critical fields like healthcare, where trust in…

February 19, 2025
Federated Variational Inference for Bayesian Mixture Models

Federated Variational Inference for Bayesian Mixture Models arXiv:2502.12684v1 Announce Type: new Abstract: We present a federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets. We introduce a principled ‘divide and conquer’ inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by…

February 19, 2025
Forecasting time series with constraints

Forecasting time series with constraints arXiv:2502.10485v1 Announce Type: new Abstract: Time series forecasting presents unique challenges that limit the effectiveness of traditional machine learning algorithms. To address these limitations, various approaches have incorporated linear constraints into learning algorithms, such as generalized additive models and hierarchical forecasting. In this paper, we propose a unified framework for…

February 18, 2025
Weighted quantization using MMD: From mean field to mean shift via gradient flows

Weighted quantization using MMD: From mean field to mean shift via gradient flows arXiv:2502.10600v1 Announce Type: new Abstract: Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a finite weighted mixture of Dirac measures that best approximates…

February 18, 2025
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm

Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm arXiv:2502.10650v1 Announce Type: new Abstract: Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational Autoencoders (VAEs) have been one of the most…

February 18, 2025
Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes

Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes arXiv:2502.10605v1 Announce Type: new Abstract: Estimating the causal effects of an intervention on outcomes is crucial. But often in domains such as healthcare and social services, this critical information about outcomes is documented by unstructured text, e.g. clinical notes in healthcare or case notes in social services.…

February 18, 2025
Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training

Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training arXiv:2502.10793v1 Announce Type: new Abstract: Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers…

February 18, 2025
Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs

Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs arXiv:2502.09832v1 Announce Type: new Abstract: In this paper, assuming a natural strengthening of the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated ErdH{o}s-R’enyi graphs $mathcal G(n,q;rho)$ when the edge-density $q=n^{-1+o(1)}$ and…

February 17, 2025
On Volume Minimization in Conformal Regression

On Volume Minimization in Conformal Regression arXiv:2502.09985v1 Announce Type: new Abstract: We study the question of volume optimality in split conformal regression, a topic still poorly understood in comparison to coverage control. Using the fact that the calibration step can be seen as an empirical volume minimization problem, we first derive a finite-sample upper-bound on…

February 17, 2025
Estimation of the Learning Coefficient Using Empirical Loss

Estimation of the Learning Coefficient Using Empirical Loss arXiv:2502.09998v1 Announce Type: new Abstract: The learning coefficient plays a crucial role in analyzing the performance of information criteria, such as the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC), which Sumio Watanabe developed to assess model generalization ability. In regular statistical…

February 17, 2025
Improved Online Confidence Bounds for Multinomial Logistic Bandits

Improved Online Confidence Bounds for Multinomial Logistic Bandits arXiv:2502.10020v1 Announce Type: new Abstract: In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online confidence bound for MNL models and achieved nearly…

February 17, 2025
Combinatorial Reinforcement Learning with Preference Feedback

Combinatorial Reinforcement Learning with Preference Feedback arXiv:2502.10158v1 Announce Type: new Abstract: In this paper, we consider combinatorial reinforcement learning with preference feedback, where a learning agent sequentially offers an action–an assortment of multiple items to–a user, whose preference feedback follows a multinomial logistic (MNL) model. This framework allows us to model real-world scenarios, particularly those…

February 17, 2025
A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection

A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection arXiv:2502.08695v1 Announce Type: new Abstract: Bayesian nonparametric methods are naturally suited to the problem of out-of-distribution (OOD) detection. However, these techniques have largely been eschewed in favor of simpler methods based on distances between pre-trained or learned embeddings of data points. Here we…

February 14, 2025
Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition

Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition arXiv:2502.09047v1 Announce Type: new Abstract: A common pursuit in modern statistical learning is to attain satisfactory generalization out of the source data distribution (OOD). In theory, the challenge remains unsolved even under the canonical setting of covariate shift for the linear model.…

February 14, 2025
Off-Policy Evaluation for Recommendations with Missing-Not-At-Random Rewards

Off-Policy Evaluation for Recommendations with Missing-Not-At-Random Rewards arXiv:2502.08993v1 Announce Type: new Abstract: Unbiased recommender learning (URL) and off-policy evaluation/learning (OPE/L) techniques are effective in addressing the data bias caused by display position and logging policies, thereby consistently improving the performance of recommendations. However, when both bias exits in the logged data, these estimators may suffer…

February 14, 2025
Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling

Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling arXiv:2502.09306v1 Announce Type: new Abstract: We investigate the theoretical properties of general diffusion (interpolation) paths and their Langevin Monte Carlo implementation, referred to as diffusion annealed Langevin Monte Carlo (DALMC), under weak conditions on the data distribution. Specifically, we analyse and provide non-asymptotic error…

February 14, 2025
A Differentiable Rank-Based Objective For Better Feature Learning

A Differentiable Rank-Based Objective For Better Feature Learning arXiv:2502.09445v1 Announce Type: new Abstract: In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in cite{azadkia2021simple}. While FOCI is based on a…

February 14, 2025
SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph

SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph arXiv:2502.07857v1 Announce Type: new Abstract: Causal discovery can be computationally demanding for large numbers of variables. If we only wish to estimate the causal effects on a small subset of target variables, we might not need to learn the causal graph for…

February 13, 2025
Discrete Markov Probabilistic Models

Discrete Markov Probabilistic Models arXiv:2502.07939v1 Announce Type: new Abstract: This paper introduces the Discrete Markov Probabilistic Model (DMPM), a novel algorithm for discrete data generation. The algorithm operates in the space of bits ${0,1}^d$, where the noising process is a continuous-time Markov chain that can be sampled exactly via a Poissonian clock that flips labels…

February 13, 2025
The Observational Partial Order of Causal Structures with Latent Variables

The Observational Partial Order of Causal Structures with Latent Variables arXiv:2502.07891v1 Announce Type: new Abstract: For two causal structures with the same set of visible variables, one is said to observationally dominate the other if the set of distributions over the visible variables realizable by the first contains the set of distributions over the visible…

February 13, 2025
Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design

Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design arXiv:2502.08004v1 Announce Type: new Abstract: Simulation-based inference (SBI) is a method to perform inference on a variety of complex scientific models with challenging inference (inverse) problems. Bayesian Optimal Experimental Design (BOED) aims to efficiently use experimental resources to make better inferences. Various…

February 13, 2025
Multi-View Oriented GPLVM: Expressiveness and Efficiency

Multi-View Oriented GPLVM: Expressiveness and Efficiency arXiv:2502.08253v1 Announce Type: new Abstract: The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral…

February 13, 2025
Confidence Intervals for Evaluation of Data Mining

Confidence Intervals for Evaluation of Data Mining arXiv:2502.07016v1 Announce Type: new Abstract: In data mining, when binary prediction rules are used to predict a binary outcome, many performance measures are used in a vast array of literature for the purposes of evaluation and comparison. Some examples include classification accuracy, precision, recall, F measures, and Jaccard…

February 12, 2025
Epistemic Uncertainty in Conformal Scores: A Unified Approach

Epistemic Uncertainty in Conformal Scores: A Unified Approach arXiv:2502.06995v1 Announce Type: new Abstract: Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture epistemic uncertainty, which can lead to overconfident predictions in data-sparse regions. Although recent conformal scores have been developed to address this limitation, they are typically designed for specific…

February 12, 2025
Generative Distribution Prediction: A Unified Approach to Multimodal Learning

Generative Distribution Prediction: A Unified Approach to Multimodal Learning arXiv:2502.07090v1 Announce Type: new Abstract: Accurate prediction with multimodal data-encompassing tabular, textual, and visual inputs or outputs-is fundamental to advancing analytics in diverse application domains. Traditional approaches often struggle to integrate heterogeneous data types while maintaining high predictive accuracy. We introduce Generative Distribution Prediction (GDP), a…

February 12, 2025
Online Covariance Matrix Estimation in Sketched Newton Methods

Online Covariance Matrix Estimation in Sketched Newton Methods arXiv:2502.07114v1 Announce Type: new Abstract: Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique…

February 12, 2025
Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds

Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds arXiv:2502.07265v1 Announce Type: new Abstract: We introduce the Riemannian Proximal Sampler, a method for sampling from densities defined on Riemannian manifolds. The performance of this sampler critically depends on two key oracles: the Manifold Brownian Increments (MBI) oracle and the Riemannian Heat-kernel (RHK) oracle. We establish high-accuracy…

February 12, 2025
Online Covariance Estimation in Nonsmooth Stochastic Approximation

Online Covariance Estimation in Nonsmooth Stochastic Approximation arXiv:2502.05305v1 Announce Type: new Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H’ajek and Le Cam.…

February 11, 2025
On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers

On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers arXiv:2502.05672v1 Announce Type: new Abstract: This article provides a rigorous analysis of convergence and stability of Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning and Online Decision Transformers. These algorithms performed competitively across various benchmarks, from games to robotic tasks,…

February 11, 2025
dynoGP: Deep Gaussian Processes for dynamic system identification

dynoGP: Deep Gaussian Processes for dynamic system identification arXiv:2502.05620v1 Announce Type: new Abstract: In this work, we present a novel approach to system identification for dynamical systems, based on a specific class of Deep Gaussian Processes (Deep GPs). These models are constructed by interconnecting linear dynamic GPs (equivalent to stochastic linear time-invariant dynamical systems) and…

February 11, 2025
Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction

Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction arXiv:2502.05676v1 Announce Type: new Abstract: Ensuring model calibration is critical for reliable predictions, yet popular distribution-free methods, such as histogram binning and isotonic regression, provide only asymptotic guarantees. We introduce a unified framework for Venn and Venn-Abers calibration, generalizing Vovk’s binary classification approach to arbitrary…

February 11, 2025
TD(0) Learning converges for Polynomial mixing and non-linear functions

TD(0) Learning converges for Polynomial mixing and non-linear functions arXiv:2502.05706v1 Announce Type: new Abstract: Theoretical work on Temporal Difference (TD) learning has provided finite-sample and high-probability guarantees for data generated from Markov chains. However, these bounds typically require linear function approximation, instance-dependent step sizes, algorithmic modifications, and restrictive mixing rates. We present theoretical findings for…

February 11, 2025
Sparsity-Based Interpolation of External, Internal and Swap Regret

Sparsity-Based Interpolation of External, Internal and Swap Regret arXiv:2502.04543v1 Announce Type: new Abstract: Focusing on the expert problem in online learning, this paper studies the interpolation of several performance metrics via $phi$-regret minimization, which measures the performance of an algorithm by its regret with respect to an arbitrary action modification rule $phi$. With $d$ experts…

February 10, 2025
Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect

Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect arXiv:2502.04673v1 Announce Type: new Abstract: Estimation and inference for the Average Treatment Effect (ATE) is a cornerstone of causal inference and often serves as the foundation for developing procedures for more complicated settings. Although traditionally analyzed in a batch setting, recent advances in martingale theory…

February 10, 2025
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond arXiv:2502.04575v1 Announce Type: new Abstract: Given an unnormalized probability density $piproptomathrm{e}^{-V}$, estimating its normalizing constant $Z=int_{mathbb{R}^d}mathrm{e}^{-V(x)}mathrm{d}x$ or free energy $F=-log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions…

February 10, 2025
A Meta-learner for Heterogeneous Effects in Difference-in-Differences

A Meta-learner for Heterogeneous Effects in Difference-in-Differences arXiv:2502.04699v1 Announce Type: new Abstract: We address the problem of estimating heterogeneous treatment effects in panel data, adopting the popular Difference-in-Differences (DiD) framework under the conditional parallel trends assumption. We propose a novel doubly robust meta-learner for the Conditional Average Treatment Effect on the Treated (CATT), reducing the…

February 10, 2025
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders arXiv:2502.04730v1 Announce Type: new Abstract: Learning informative representations of phylogenetic tree structures is essential for analyzing evolutionary relationships. Classical distance-based methods have been widely used to project phylogenetic trees into Euclidean space, but they are often sensitive to the choice of distance metric and may lack…

February 10, 2025
Two in context learning tasks with complex functions

Two in context learning tasks with complex functions arXiv:2502.03503v1 Announce Type: new Abstract: We examine two in context learning (ICL) tasks with mathematical functions in several train and test settings for transformer models. Our study generalizes work on linear functions by showing that small transformers, even models with attention layers only, can approximate arbitrary polynomial…

February 7, 2025
Multivariate Conformal Prediction using Optimal Transport

Multivariate Conformal Prediction using Optimal Transport arXiv:2502.03609v1 Announce Type: new Abstract: Conformal prediction (CP) quantifies the uncertainty of machine learning models by constructing sets of plausible outputs. These sets are constructed by leveraging a so-called conformity score, a quantity computed using the input point of interest, a prediction model, and past observations. CP sets are…

February 7, 2025
Online Learning Algorithms in Hilbert Spaces with $beta-$ and $phi-$Mixing Sequences

Online Learning Algorithms in Hilbert Spaces with $beta-$ and $phi-$Mixing Sequences arXiv:2502.03551v1 Announce Type: new Abstract: In this paper, we study an online algorithm in a reproducing kernel Hilbert spaces (RKHS) based on a class of dependent processes, called the mixing process. For such a process, the degree of dependence is measured by various mixing…

February 7, 2025
Rule-based Evolving Fuzzy System for Time Series Forecasting: New Perspectives Based on Type-2 Fuzzy Sets Measures Approach

Rule-based Evolving Fuzzy System for Time Series Forecasting: New Perspectives Based on Type-2 Fuzzy Sets Measures Approach arXiv:2502.03650v1 Announce Type: new Abstract: Real-world data contain uncertainty and variations that can be correlated to external variables, known as randomness. An alternative cause of randomness is chaos, which can be an important component of chaotic time series.…

February 7, 2025
Guiding Two-Layer Neural Network Lipschitzness via Gradient Descent Learning Rate Constraints

Guiding Two-Layer Neural Network Lipschitzness via Gradient Descent Learning Rate Constraints arXiv:2502.03792v1 Announce Type: new Abstract: We demonstrate that applying an eventual decay to the learning rate (LR) in empirical risk minimization (ERM), where the mean-squared-error loss is minimized using standard gradient descent (GD) for training a two-layer neural network with Lipschitz activation functions, ensures…

February 7, 2025
Networks with Finite VC Dimension: Pro and Contra

Networks with Finite VC Dimension: Pro and Contra arXiv:2502.02679v1 Announce Type: new Abstract: Approximation and learning of classifiers of large data sets by neural networks in terms of high-dimensional geometry and statistical learning theory are investigated. The influence of the VC dimension of sets of input-output functions of networks on approximation capabilities is compared with…

February 6, 2025
Achievable distributional robustness when the robust risk is only partially identified

Achievable distributional robustness when the robust risk is only partially identified arXiv:2502.02710v1 Announce Type: new Abstract: In safety-critical applications, machine learning models should generalize well under worst-case distribution shifts, that is, have a small robust risk. Invariance-based algorithms can provably take advantage of structural assumptions on the shifts when the training distributions are heterogeneous enough…

February 6, 2025
Algorithms with Calibrated Machine Learning Predictions

Algorithms with Calibrated Machine Learning Predictions arXiv:2502.02861v1 Announce Type: new Abstract: The field of algorithms with predictions incorporates machine learning advice in the design of online algorithms to improve real-world performance. While this theoretical framework often assumes uniform reliability across all predictions, modern machine learning models can now provide instance-level uncertainty estimates. In this paper,…

February 6, 2025
Gap-Dependent Bounds for Federated $Q$-learning

Gap-Dependent Bounds for Federated $Q$-learning arXiv:2502.02859v1 Announce Type: new Abstract: We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $sqrt{T}$-type regret bounds and communication cost bounds with a $log T$ term scaling…

February 6, 2025
Uncertainty Quantification with the Empirical Neural Tangent Kernel

Uncertainty Quantification with the Empirical Neural Tangent Kernel arXiv:2502.02870v1 Announce Type: new Abstract: While neural networks have demonstrated impressive performance across various tasks, accurately quantifying uncertainty in their predictions is essential to ensure their trustworthiness and enable widespread adoption in critical systems. Several Bayesian uncertainty quantification (UQ) methods exist that are either cheap or reliable,…

February 6, 2025
Doubly Robust Monte Carlo Tree Search

Doubly Robust Monte Carlo Tree Search arXiv:2502.01672v1 Announce Type: new Abstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS…

February 5, 2025
Graph Canonical Correlation Analysis

Graph Canonical Correlation Analysis arXiv:2502.01780v1 Announce Type: new Abstract: Canonical correlation analysis (CCA) is a widely used technique for estimating associations between two sets of multi-dimensional variables. Recent advancements in CCA methods have expanded their application to decipher the interactions of multiomics datasets, imaging-omics datasets, and more. However, conventional CCA methods are limited in their…

February 5, 2025
Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models

Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models arXiv:2502.01919v1 Announce Type: new Abstract: In this work, we present a comprehensive Bayesian posterior analysis of what we term Poisson Hierarchical Indian Buffet Processes, designed for complex random sparse count species sampling models that allow…

February 5, 2025
Local minima of the empirical risk in high dimension: General theorems and convex examples

Local minima of the empirical risk in high dimension: General theorems and convex examples arXiv:2502.01953v1 Announce Type: new Abstract: We consider a general model for high-dimensional empirical risk minimization whereby the data $mathbf{x}_i$ are $d$-dimensional isotropic Gaussian vectors, the model is parametrized by $mathbf{Theta}inmathbb{R}^{dtimes k}$, and the loss depends on the data via the projection…

February 5, 2025
Theoretical and Practical Analysis of Fr’echet Regression via Comparison Geometry

Theoretical and Practical Analysis of Fr’echet Regression via Comparison Geometry arXiv:2502.01995v1 Announce Type: new Abstract: Fr’echet regression extends classical regression methods to non-Euclidean metric spaces, enabling the analysis of data relationships on complex structures such as manifolds and graphs. This work establishes a rigorous theoretical analysis for Fr’echet regression through the lens of comparison geometry…

February 5, 2025
Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees

Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees arXiv:2502.00240v1 Announce Type: new Abstract: Learning effective regularization is crucial for solving ill-posed inverse problems, which arise in a wide range of scientific and engineering applications. While data-driven methods that parameterize regularizers using deep neural networks have demonstrated strong empirical performance, they often…

February 4, 2025
Supervised Quadratic Feature Analysis: An Information Geometry Approach to Dimensionality Reduction

Supervised Quadratic Feature Analysis: An Information Geometry Approach to Dimensionality Reduction arXiv:2502.00168v1 Announce Type: new Abstract: Supervised dimensionality reduction aims to map labeled data to a low-dimensional feature space while maximizing class discriminability. Despite the availability of methods for learning complex non-linear features (e.g. Deep Learning), there is an enduring demand for dimensionality reduction methods…

February 4, 2025
Learning to Fuse Temporal Proximity Networks: A Case Study in Chimpanzee Social Interactions

Learning to Fuse Temporal Proximity Networks: A Case Study in Chimpanzee Social Interactions arXiv:2502.00302v1 Announce Type: new Abstract: How can we identify groups of primate individuals which could be conjectured to drive social structure? To address this question, one of us has collected a time series of data for social interactions between chimpanzees. Here we…

February 4, 2025
Decentralized Inference for Distributed Geospatial Data Using Low-Rank Models

Decentralized Inference for Distributed Geospatial Data Using Low-Rank Models arXiv:2502.00309v1 Announce Type: new Abstract: Advancements in information technology have enabled the creation of massive spatial datasets, driving the need for scalable and efficient computational methodologies. While offering viable solutions, centralized frameworks are limited by vulnerabilities such as single-point failures and communication bottlenecks. This paper presents…

February 4, 2025
Variance Reduction via Resampling and Experience Replay

Variance Reduction via Resampling and Experience Replay arXiv:2502.00520v1 Announce Type: new Abstract: Experience replay is a foundational technique in reinforcement learning that enhances learning stability by storing past experiences in a replay buffer and reusing them during training. Despite its practical success, its theoretical properties remain underexplored. In this paper, we present a theoretical framework…

February 4, 2025
A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization arXiv:2501.18756v1 Announce Type: new Abstract: Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function’s optimum and…

February 3, 2025
Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models

Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models arXiv:2501.18863v1 Announce Type: new Abstract: Score-based generative models, which transform noise into data by learning to reverse a diffusion process, have become a cornerstone of modern generative AI. This paper contributes to establishing theoretical guarantees for the probability flow ODE, a widely used diffusion-based…

February 3, 2025
Trustworthy Evaluation of Generative AI Models

Trustworthy Evaluation of Generative AI Models arXiv:2501.18897v1 Announce Type: new Abstract: Generative AI (GenAI) models have recently achieved remarkable empirical performance in various applications, however, their evaluations yet lack uncertainty quantification. In this paper, we propose a method to compare two generative models based on an unbiased estimator of their relative performance gap. Statistically, our…

February 3, 2025
Optimizing Through Change: Bounds and Recommendations for Time-Varying Bayesian Optimization Algorithms

Optimizing Through Change: Bounds and Recommendations for Time-Varying Bayesian Optimization Algorithms arXiv:2501.18963v1 Announce Type: new Abstract: Time-Varying Bayesian Optimization (TVBO) is the go-to framework for optimizing a time-varying, expensive, noisy black-box function. However, most of the solutions proposed so far either rely on unrealistic assumptions on the nature of the objective function or do not…

February 3, 2025
Optimal Transport-based Conformal Prediction

Optimal Transport-based Conformal Prediction arXiv:2501.18991v1 Announce Type: new Abstract: Conformal Prediction (CP) is a principled framework for quantifying uncertainty in blackbox learning models, by constructing prediction sets with finite-sample coverage guarantees. Traditional approaches rely on scalar nonconformity scores, which fail to fully exploit the geometric structure of multivariate outputs, such as in multi-output regression or…

February 3, 2025
Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection

Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection arXiv:2501.17889v1 Announce Type: new Abstract: Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable…

January 31, 2025
Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks

Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks arXiv:2501.17882v1 Announce Type: new Abstract: We consider a multi-player multi-armed bandit setting in the presence of adversaries that attempt to negatively affect the rewards received by the players in the system. The reward distributions for any given arm are heterogeneous across the players. In the event of…

January 31, 2025
U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms

U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms arXiv:2501.18084v1 Announce Type: new Abstract: Across various domains, the growing advocacy for open science and open-source machine learning has made an increasing number of models publicly available. These models allow practitioners to integrate them into their own contexts, reducing the need for extensive data labeling, training, and calibration.…

January 31, 2025
Optimal Survey Design for Private Mean Estimation

Optimal Survey Design for Private Mean Estimation arXiv:2501.18121v1 Announce Type: new Abstract: This work identifies the first privacy-aware stratified sampling scheme that minimizes the variance for general private mean estimation under the Laplace, Discrete Laplace (DLap) and Truncated-Uniform-Laplace (TuLap) mechanisms within the framework of differential privacy (DP). We view stratified sampling as a subsampling operation,…

January 31, 2025
Random Feature Representation Boosting

Random Feature Representation Boosting arXiv:2501.18283v1 Announce Type: new Abstract: We introduce Random Feature Representation Boosting (RFRBoost), a novel method for constructing deep residual random feature neural networks (RFNNs) using boosting theory. RFRBoost uses random features at each layer to learn the functional gradient of the network representation, enhancing performance while preserving the convex optimization benefits…

January 31, 2025
Near-Optimal Algorithms for Omniprediction

Near-Optimal Algorithms for Omniprediction arXiv:2501.17205v1 Announce Type: new Abstract: Omnipredictors are simple prediction functions that encode loss-minimizing predictions with respect to a hypothesis class $H$, simultaneously for every loss function within a class of losses $L$. In this work, we give near-optimal learning algorithms for omniprediction, in both the online and offline settings. To begin,…

January 30, 2025
Testing Conditional Mean Independence Using Generative Neural Networks

Testing Conditional Mean Independence Using Generative Neural Networks arXiv:2501.17345v1 Announce Type: new Abstract: Conditional mean independence (CMI) testing is crucial for statistical tasks including model determination and variable importance evaluation. In this work, we introduce a novel population CMI measure and a bootstrap-based testing procedure that utilizes deep generative neural networks to estimate the conditional…

January 30, 2025
A Survey on Cluster-based Federated Learning

A Survey on Cluster-based Federated Learning arXiv:2501.17512v1 Announce Type: new Abstract: As the industrial and commercial use of Federated Learning (FL) has expanded, so has the need for optimized algorithms. In settings were FL clients’ data is non-independently and identically distributed (non-IID) and with highly heterogeneous distributions, the baseline FL approach seems to fall short.…

January 30, 2025
Exact characterization of {epsilon}-Safe Decision Regions for exponential family distributions and Multi Cost SVM approximation

Exact characterization of {epsilon}-Safe Decision Regions for exponential family distributions and Multi Cost SVM approximation arXiv:2501.17731v1 Announce Type: new Abstract: Probabilistic guarantees on the prediction of data-driven classifiers are necessary to define models that can be considered reliable. This is a key requirement for modern machine learning in which the goodness of a system is…

January 30, 2025
Sequential Learning of the Pareto Front for Multi-objective Bandits

Sequential Learning of the Pareto Front for Multi-objective Bandits arXiv:2501.17513v1 Announce Type: new Abstract: We study the problem of sequential learning of the Pareto front in multi-objective multi-armed bandits. An agent is faced with K possible arms to pull. At each turn she picks one, and receives a vector-valued reward. When she thinks she has…

January 30, 2025
Nonparametric Sparse Online Learning of the Koopman Operator

Nonparametric Sparse Online Learning of the Koopman Operator arXiv:2501.16489v1 Announce Type: new Abstract: The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. Data-driven techniques to learn the Koopman operator typically assume that the chosen function space is closed under system dynamics. In this paper, we study the Koopman…

January 29, 2025
Variational Schr”odinger Momentum Diffusion

Variational Schr”odinger Momentum Diffusion arXiv:2501.16675v1 Announce Type: new Abstract: The momentum Schr”odinger Bridge (mSB) has emerged as a leading method for accelerating generative diffusion processes and reducing transport costs. However, the lack of simulation-free properties inevitably results in high training costs and affects scalability. To obtain a trade-off between transport properties and scalability, we introduce…

January 29, 2025
Exponential Family Attention

Exponential Family Attention arXiv:2501.16790v1 Announce Type: new Abstract: The self-attention mechanism is the backbone of the transformer neural network underlying most large language models. It can capture complex word patterns and long-range dependencies in natural language. This paper introduces exponential family attention (EFA), a probabilistic generative model that extends self-attention to handle high-dimensional sequence, spatial,…

January 29, 2025
Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis

Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis arXiv:2501.16768v1 Announce Type: new Abstract: Multiview learning has drawn widespread attention for its efficacy in leveraging cross-view consensus and complementarity information to achieve a comprehensive representation of data. While multi-view learning has undergone vigorous development and achieved remarkable success, the theoretical understanding of its generalization behavior…

January 29, 2025
Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect

Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect arXiv:2501.16988v1 Announce Type: new Abstract: Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through “Marginal Variable Importance Metric” (MVIM), a model-agnostic…

January 29, 2025
ED-Filter: Dynamic Feature Filtering for Eating Disorder Classification

ED-Filter: Dynamic Feature Filtering for Eating Disorder Classification arXiv:2501.14785v1 Announce Type: new Abstract: Eating disorders (ED) are critical psychiatric problems that have alarmed the mental health community. Mental health professionals are increasingly recognizing the utility of data derived from social media platforms such as Twitter. However, high dimensionality and extensive feature sets of Twitter data…

January 28, 2025
Explaining Categorical Feature Interactions Using Graph Covariance and LLMs

Explaining Categorical Feature Interactions Using Graph Covariance and LLMs arXiv:2501.14932v1 Announce Type: new Abstract: Modern datasets often consist of numerous samples with abundant features and associated timestamps. Analyzing such datasets to uncover underlying events typically requires complex statistical methods and substantial domain expertise. A notable example, and the primary data focus of this paper, is…

January 28, 2025
Median of Forests for Robust Density Estimation

Median of Forests for Robust Density Estimation arXiv:2501.15157v1 Announce Type: new Abstract: Robust density estimation refers to the consistent estimation of the density function even when the data is contaminated by outliers. We find that existing forest density estimation at a certain point is inherently resistant to the outliers outside the cells containing the point,…

January 28, 2025
Conformal Inference of Individual Treatment Effects Using Conditional Density Estimates

Conformal Inference of Individual Treatment Effects Using Conditional Density Estimates arXiv:2501.14933v1 Announce Type: new Abstract: In an era where diverse and complex data are increasingly accessible, the precise prediction of individual treatment effects (ITE) becomes crucial across fields such as healthcare, economics, and public policy. Current state-of-the-art approaches, while providing valid prediction intervals through Conformal…

January 28, 2025
A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges

A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges arXiv:2501.15196v1 Announce Type: new Abstract: Time series anomaly detection presents various challenges due to the sequential and dynamic nature of time-dependent data. Traditional unsupervised methods frequently encounter difficulties in generalization, often overfitting to known normal patterns observed during training and…

January 28, 2025
Distributionally Robust Coreset Selection under Covariate Shift

Distributionally Robust Coreset Selection under Covariate Shift arXiv:2501.14253v1 Announce Type: new Abstract: Coreset selection, which involves selecting a small subset from an existing training dataset, is an approach to reducing training data, and various approaches have been proposed for this method. In practical situations where these methods are employed, it is often the case that…

January 27, 2025
EFiGP: Eigen-Fourier Physics-Informed Gaussian Process for Inference of Dynamic Systems

EFiGP: Eigen-Fourier Physics-Informed Gaussian Process for Inference of Dynamic Systems arXiv:2501.14107v1 Announce Type: new Abstract: Parameter estimation and trajectory reconstruction for data-driven dynamical systems governed by ordinary differential equations (ODEs) are essential tasks in fields such as biology, engineering, and physics. These inverse problems — estimating ODE parameters from observational data — are particularly challenging…

January 27, 2025
Statistical Verification of Linear Classifiers

Statistical Verification of Linear Classifiers arXiv:2501.14430v1 Announce Type: new Abstract: We propose a homogeneity test closely related to the concept of linear separability between two samples. Using the test one can answer the question whether a linear classifier is merely “random” or effectively captures differences between two classes. We focus on establishing upper bounds for…

January 27, 2025
coverforest: Conformal Predictions with Random Forest in Python

coverforest: Conformal Predictions with Random Forest in Python arXiv:2501.14570v1 Announce Type: new Abstract: Conformal prediction provides a framework for uncertainty quantification, specifically in the forms of prediction intervals and sets with distribution-free guaranteed coverage. While recent cross-conformal techniques such as CV+ and Jackknife+-after-bootstrap achieve better data efficiency than traditional split conformal methods, they incur substantial…

January 27, 2025
Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization

Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization arXiv:2501.14635v1 Announce Type: new Abstract: The optimal transport barycenter (a.k.a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the unregularized barycenter for discretized probability distributions on point clouds is a challenging task when…

January 27, 2025