Category: stat.ML
-
Towards a perturbation-based explanation for medical AI as differentiable programs
Towards a perturbation-based explanation for medical AI as differentiable programs arXiv:2502.14001v1 Announce Type: new Abstract: Recent advancement in machine learning algorithms reaches a point where medical devices can be equipped with artificial intelligence (AI) models for diagnostic support and routine automation in clinical settings. In medicine and healthcare, there is a particular demand for sufficient…
-
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition arXiv:2502.14060v1 Announce Type: new Abstract: We study fundamental limits of first-order stochastic optimization in a range of nonconvex settings, including L-smooth functions satisfying Quasar-Convexity (QC), Quadratic Growth (QG), and Restricted Secant Inequalities (RSI). While the convergence properties of standard algorithms are well-understood in deterministic regimes,…
-
Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs
Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs arXiv:2502.14121v1 Announce Type: new Abstract: Designing modern industrial systems requires balancing several competing objectives, such as profitability, resilience, and sustainability, while accounting for complex interactions between technological, economic, and environmental factors. Multi-objective optimization (MOO) methods are commonly used to navigate…
-
Conformal Prediction under L’evy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations
Conformal Prediction under L’evy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations arXiv:2502.14105v1 Announce Type: new Abstract: Conformal prediction provides a powerful framework for constructing prediction intervals with finite-sample guarantees, yet its robustness under distribution shifts remains a significant challenge. This paper addresses this limitation by modeling distribution shifts using L’evy-Prokhorov (LP) ambiguity sets, which…
-
Prediction-Powered Adaptive Shrinkage Estimation
Prediction-Powered Adaptive Shrinkage Estimation arXiv:2502.14166v1 Announce Type: new Abstract: Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI’s benefits for individual statistical tasks, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS),…
-
Model selection for behavioral learning data and applications to contextual bandits
Model selection for behavioral learning data and applications to contextual bandits arXiv:2502.13186v1 Announce Type: new Abstract: Learning for animals or humans is the process that leads to behaviors better adapted to the environment. This process highly depends on the individual that learns and is usually observed only through the individual’s actions. This article presents ways…
-
Task Shift: From Classification to Regression in Overparameterized Linear Models
Task Shift: From Classification to Regression in Overparameterized Linear Models arXiv:2502.13285v1 Announce Type: new Abstract: Modern machine learning methods have recently demonstrated remarkable capability to generalize under task shift, where latent knowledge is transferred to a different, often more difficult, task under a similar data distribution. We investigate this phenomenon in an overparameterized linear regression…
-
An Efficient Permutation-Based Kernel Two-Sample Test
An Efficient Permutation-Based Kernel Two-Sample Test arXiv:2502.13570v1 Announce Type: new Abstract: Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due…
-
Identifying metric structures of deep latent variable models
Identifying metric structures of deep latent variable models arXiv:2502.13757v1 Announce Type: new Abstract: Deep latent variable models learn condensed representations of data that, hopefully, reflect the inner workings of the studied phenomena. Unfortunately, these latent representations are not statistically identifiable, meaning they cannot be uniquely determined. Domain experts, therefore, need to tread carefully when interpreting…
-
Graph Signal Inference by Learning Narrowband Spectral Kernels
Graph Signal Inference by Learning Narrowband Spectral Kernels arXiv:2502.13686v1 Announce Type: new Abstract: While a common assumption in graph signal analysis is the smoothness of the signals or the band-limitedness of their spectrum, in many instances the spectrum of real graph data may be concentrated at multiple regions of the spectrum, possibly including mid-to-high-frequency components.…
-
Suboptimal Shapley Value Explanations
Suboptimal Shapley Value Explanations arXiv:2502.12209v1 Announce Type: new Abstract: Deep Neural Networks (DNNs) have demonstrated strong capacity in supporting a wide variety of applications. Shapley value has emerged as a prominent tool to analyze feature importance to help people understand the inference process of deep neural models. Computing Shapley value function requires choosing a baseline…
-
The Majority Vote Paradigm Shift: When Popular Meets Optimal
The Majority Vote Paradigm Shift: When Popular Meets Optimal arXiv:2502.12581v1 Announce Type: new Abstract: Reliably labelling data typically requires annotations from multiple human workers. However, humans are far from being perfect. Hence, it is a common practice to aggregate labels gathered from multiple annotators to make a more confident estimate of the true label. Among…
-
Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation
Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation arXiv:2502.12607v1 Announce Type: new Abstract: We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such optimization by leveraging duality theory in…
-
Green LIME: Improving AI Explainability through Design of Experiments
Green LIME: Improving AI Explainability through Design of Experiments arXiv:2502.12753v1 Announce Type: new Abstract: In artificial intelligence (AI), the complexity of many models and processes often surpasses human interpretability, making it challenging to understand why a specific prediction is made. This lack of transparency is particularly problematic in critical fields like healthcare, where trust in…
-
Federated Variational Inference for Bayesian Mixture Models
Federated Variational Inference for Bayesian Mixture Models arXiv:2502.12684v1 Announce Type: new Abstract: We present a federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets. We introduce a principled ‘divide and conquer’ inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by…
-
Forecasting time series with constraints
Forecasting time series with constraints arXiv:2502.10485v1 Announce Type: new Abstract: Time series forecasting presents unique challenges that limit the effectiveness of traditional machine learning algorithms. To address these limitations, various approaches have incorporated linear constraints into learning algorithms, such as generalized additive models and hierarchical forecasting. In this paper, we propose a unified framework for…
-
Weighted quantization using MMD: From mean field to mean shift via gradient flows
Weighted quantization using MMD: From mean field to mean shift via gradient flows arXiv:2502.10600v1 Announce Type: new Abstract: Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a finite weighted mixture of Dirac measures that best approximates…
-
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm arXiv:2502.10650v1 Announce Type: new Abstract: Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational Autoencoders (VAEs) have been one of the most…
-
Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes
Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes arXiv:2502.10605v1 Announce Type: new Abstract: Estimating the causal effects of an intervention on outcomes is crucial. But often in domains such as healthcare and social services, this critical information about outcomes is documented by unstructured text, e.g. clinical notes in healthcare or case notes in social services.…
-
Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training
Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training arXiv:2502.10793v1 Announce Type: new Abstract: Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers…
-
Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs
Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs arXiv:2502.09832v1 Announce Type: new Abstract: In this paper, assuming a natural strengthening of the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated ErdH{o}s-R’enyi graphs $mathcal G(n,q;rho)$ when the edge-density $q=n^{-1+o(1)}$ and…
-
On Volume Minimization in Conformal Regression
On Volume Minimization in Conformal Regression arXiv:2502.09985v1 Announce Type: new Abstract: We study the question of volume optimality in split conformal regression, a topic still poorly understood in comparison to coverage control. Using the fact that the calibration step can be seen as an empirical volume minimization problem, we first derive a finite-sample upper-bound on…
-
Estimation of the Learning Coefficient Using Empirical Loss
Estimation of the Learning Coefficient Using Empirical Loss arXiv:2502.09998v1 Announce Type: new Abstract: The learning coefficient plays a crucial role in analyzing the performance of information criteria, such as the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC), which Sumio Watanabe developed to assess model generalization ability. In regular statistical…
-
Improved Online Confidence Bounds for Multinomial Logistic Bandits
Improved Online Confidence Bounds for Multinomial Logistic Bandits arXiv:2502.10020v1 Announce Type: new Abstract: In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online confidence bound for MNL models and achieved nearly…
-
Combinatorial Reinforcement Learning with Preference Feedback
Combinatorial Reinforcement Learning with Preference Feedback arXiv:2502.10158v1 Announce Type: new Abstract: In this paper, we consider combinatorial reinforcement learning with preference feedback, where a learning agent sequentially offers an action–an assortment of multiple items to–a user, whose preference feedback follows a multinomial logistic (MNL) model. This framework allows us to model real-world scenarios, particularly those…
-
A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection
A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection arXiv:2502.08695v1 Announce Type: new Abstract: Bayesian nonparametric methods are naturally suited to the problem of out-of-distribution (OOD) detection. However, these techniques have largely been eschewed in favor of simpler methods based on distances between pre-trained or learned embeddings of data points. Here we…
-
Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition
Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition arXiv:2502.09047v1 Announce Type: new Abstract: A common pursuit in modern statistical learning is to attain satisfactory generalization out of the source data distribution (OOD). In theory, the challenge remains unsolved even under the canonical setting of covariate shift for the linear model.…
-
Off-Policy Evaluation for Recommendations with Missing-Not-At-Random Rewards
Off-Policy Evaluation for Recommendations with Missing-Not-At-Random Rewards arXiv:2502.08993v1 Announce Type: new Abstract: Unbiased recommender learning (URL) and off-policy evaluation/learning (OPE/L) techniques are effective in addressing the data bias caused by display position and logging policies, thereby consistently improving the performance of recommendations. However, when both bias exits in the logged data, these estimators may suffer…
-
Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling
Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling arXiv:2502.09306v1 Announce Type: new Abstract: We investigate the theoretical properties of general diffusion (interpolation) paths and their Langevin Monte Carlo implementation, referred to as diffusion annealed Langevin Monte Carlo (DALMC), under weak conditions on the data distribution. Specifically, we analyse and provide non-asymptotic error…
-
A Differentiable Rank-Based Objective For Better Feature Learning
A Differentiable Rank-Based Objective For Better Feature Learning arXiv:2502.09445v1 Announce Type: new Abstract: In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in cite{azadkia2021simple}. While FOCI is based on a…
-
SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph
SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph arXiv:2502.07857v1 Announce Type: new Abstract: Causal discovery can be computationally demanding for large numbers of variables. If we only wish to estimate the causal effects on a small subset of target variables, we might not need to learn the causal graph for…
-
Discrete Markov Probabilistic Models
Discrete Markov Probabilistic Models arXiv:2502.07939v1 Announce Type: new Abstract: This paper introduces the Discrete Markov Probabilistic Model (DMPM), a novel algorithm for discrete data generation. The algorithm operates in the space of bits ${0,1}^d$, where the noising process is a continuous-time Markov chain that can be sampled exactly via a Poissonian clock that flips labels…
-
The Observational Partial Order of Causal Structures with Latent Variables
The Observational Partial Order of Causal Structures with Latent Variables arXiv:2502.07891v1 Announce Type: new Abstract: For two causal structures with the same set of visible variables, one is said to observationally dominate the other if the set of distributions over the visible variables realizable by the first contains the set of distributions over the visible…
-
Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design
Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design arXiv:2502.08004v1 Announce Type: new Abstract: Simulation-based inference (SBI) is a method to perform inference on a variety of complex scientific models with challenging inference (inverse) problems. Bayesian Optimal Experimental Design (BOED) aims to efficiently use experimental resources to make better inferences. Various…
-
Multi-View Oriented GPLVM: Expressiveness and Efficiency
Multi-View Oriented GPLVM: Expressiveness and Efficiency arXiv:2502.08253v1 Announce Type: new Abstract: The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral…
-
Confidence Intervals for Evaluation of Data Mining
Confidence Intervals for Evaluation of Data Mining arXiv:2502.07016v1 Announce Type: new Abstract: In data mining, when binary prediction rules are used to predict a binary outcome, many performance measures are used in a vast array of literature for the purposes of evaluation and comparison. Some examples include classification accuracy, precision, recall, F measures, and Jaccard…
-
Epistemic Uncertainty in Conformal Scores: A Unified Approach
Epistemic Uncertainty in Conformal Scores: A Unified Approach arXiv:2502.06995v1 Announce Type: new Abstract: Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture epistemic uncertainty, which can lead to overconfident predictions in data-sparse regions. Although recent conformal scores have been developed to address this limitation, they are typically designed for specific…
-
Generative Distribution Prediction: A Unified Approach to Multimodal Learning
Generative Distribution Prediction: A Unified Approach to Multimodal Learning arXiv:2502.07090v1 Announce Type: new Abstract: Accurate prediction with multimodal data-encompassing tabular, textual, and visual inputs or outputs-is fundamental to advancing analytics in diverse application domains. Traditional approaches often struggle to integrate heterogeneous data types while maintaining high predictive accuracy. We introduce Generative Distribution Prediction (GDP), a…
-
Online Covariance Matrix Estimation in Sketched Newton Methods
Online Covariance Matrix Estimation in Sketched Newton Methods arXiv:2502.07114v1 Announce Type: new Abstract: Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique…
-
Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds
Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds arXiv:2502.07265v1 Announce Type: new Abstract: We introduce the Riemannian Proximal Sampler, a method for sampling from densities defined on Riemannian manifolds. The performance of this sampler critically depends on two key oracles: the Manifold Brownian Increments (MBI) oracle and the Riemannian Heat-kernel (RHK) oracle. We establish high-accuracy…
-
Online Covariance Estimation in Nonsmooth Stochastic Approximation
Online Covariance Estimation in Nonsmooth Stochastic Approximation arXiv:2502.05305v1 Announce Type: new Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H’ajek and Le Cam.…
-
On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers
On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers arXiv:2502.05672v1 Announce Type: new Abstract: This article provides a rigorous analysis of convergence and stability of Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning and Online Decision Transformers. These algorithms performed competitively across various benchmarks, from games to robotic tasks,…
-
dynoGP: Deep Gaussian Processes for dynamic system identification
dynoGP: Deep Gaussian Processes for dynamic system identification arXiv:2502.05620v1 Announce Type: new Abstract: In this work, we present a novel approach to system identification for dynamical systems, based on a specific class of Deep Gaussian Processes (Deep GPs). These models are constructed by interconnecting linear dynamic GPs (equivalent to stochastic linear time-invariant dynamical systems) and…
-
Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction
Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction arXiv:2502.05676v1 Announce Type: new Abstract: Ensuring model calibration is critical for reliable predictions, yet popular distribution-free methods, such as histogram binning and isotonic regression, provide only asymptotic guarantees. We introduce a unified framework for Venn and Venn-Abers calibration, generalizing Vovk’s binary classification approach to arbitrary…
-
TD(0) Learning converges for Polynomial mixing and non-linear functions
TD(0) Learning converges for Polynomial mixing and non-linear functions arXiv:2502.05706v1 Announce Type: new Abstract: Theoretical work on Temporal Difference (TD) learning has provided finite-sample and high-probability guarantees for data generated from Markov chains. However, these bounds typically require linear function approximation, instance-dependent step sizes, algorithmic modifications, and restrictive mixing rates. We present theoretical findings for…
-
Sparsity-Based Interpolation of External, Internal and Swap Regret
Sparsity-Based Interpolation of External, Internal and Swap Regret arXiv:2502.04543v1 Announce Type: new Abstract: Focusing on the expert problem in online learning, this paper studies the interpolation of several performance metrics via $phi$-regret minimization, which measures the performance of an algorithm by its regret with respect to an arbitrary action modification rule $phi$. With $d$ experts…
-
Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect arXiv:2502.04673v1 Announce Type: new Abstract: Estimation and inference for the Average Treatment Effect (ATE) is a cornerstone of causal inference and often serves as the foundation for developing procedures for more complicated settings. Although traditionally analyzed in a batch setting, recent advances in martingale theory…
-
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond arXiv:2502.04575v1 Announce Type: new Abstract: Given an unnormalized probability density $piproptomathrm{e}^{-V}$, estimating its normalizing constant $Z=int_{mathbb{R}^d}mathrm{e}^{-V(x)}mathrm{d}x$ or free energy $F=-log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions…
-
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders arXiv:2502.04730v1 Announce Type: new Abstract: Learning informative representations of phylogenetic tree structures is essential for analyzing evolutionary relationships. Classical distance-based methods have been widely used to project phylogenetic trees into Euclidean space, but they are often sensitive to the choice of distance metric and may lack…
-
Two in context learning tasks with complex functions
Two in context learning tasks with complex functions arXiv:2502.03503v1 Announce Type: new Abstract: We examine two in context learning (ICL) tasks with mathematical functions in several train and test settings for transformer models. Our study generalizes work on linear functions by showing that small transformers, even models with attention layers only, can approximate arbitrary polynomial…
-
Multivariate Conformal Prediction using Optimal Transport
Multivariate Conformal Prediction using Optimal Transport arXiv:2502.03609v1 Announce Type: new Abstract: Conformal prediction (CP) quantifies the uncertainty of machine learning models by constructing sets of plausible outputs. These sets are constructed by leveraging a so-called conformity score, a quantity computed using the input point of interest, a prediction model, and past observations. CP sets are…
-
Online Learning Algorithms in Hilbert Spaces with $beta-$ and $phi-$Mixing Sequences
Online Learning Algorithms in Hilbert Spaces with $beta-$ and $phi-$Mixing Sequences arXiv:2502.03551v1 Announce Type: new Abstract: In this paper, we study an online algorithm in a reproducing kernel Hilbert spaces (RKHS) based on a class of dependent processes, called the mixing process. For such a process, the degree of dependence is measured by various mixing…
-
Rule-based Evolving Fuzzy System for Time Series Forecasting: New Perspectives Based on Type-2 Fuzzy Sets Measures Approach
Rule-based Evolving Fuzzy System for Time Series Forecasting: New Perspectives Based on Type-2 Fuzzy Sets Measures Approach arXiv:2502.03650v1 Announce Type: new Abstract: Real-world data contain uncertainty and variations that can be correlated to external variables, known as randomness. An alternative cause of randomness is chaos, which can be an important component of chaotic time series.…
-
Guiding Two-Layer Neural Network Lipschitzness via Gradient Descent Learning Rate Constraints
Guiding Two-Layer Neural Network Lipschitzness via Gradient Descent Learning Rate Constraints arXiv:2502.03792v1 Announce Type: new Abstract: We demonstrate that applying an eventual decay to the learning rate (LR) in empirical risk minimization (ERM), where the mean-squared-error loss is minimized using standard gradient descent (GD) for training a two-layer neural network with Lipschitz activation functions, ensures…
-
Networks with Finite VC Dimension: Pro and Contra
Networks with Finite VC Dimension: Pro and Contra arXiv:2502.02679v1 Announce Type: new Abstract: Approximation and learning of classifiers of large data sets by neural networks in terms of high-dimensional geometry and statistical learning theory are investigated. The influence of the VC dimension of sets of input-output functions of networks on approximation capabilities is compared with…
-
Achievable distributional robustness when the robust risk is only partially identified
Achievable distributional robustness when the robust risk is only partially identified arXiv:2502.02710v1 Announce Type: new Abstract: In safety-critical applications, machine learning models should generalize well under worst-case distribution shifts, that is, have a small robust risk. Invariance-based algorithms can provably take advantage of structural assumptions on the shifts when the training distributions are heterogeneous enough…
-
Algorithms with Calibrated Machine Learning Predictions
Algorithms with Calibrated Machine Learning Predictions arXiv:2502.02861v1 Announce Type: new Abstract: The field of algorithms with predictions incorporates machine learning advice in the design of online algorithms to improve real-world performance. While this theoretical framework often assumes uniform reliability across all predictions, modern machine learning models can now provide instance-level uncertainty estimates. In this paper,…
-
Gap-Dependent Bounds for Federated $Q$-learning
Gap-Dependent Bounds for Federated $Q$-learning arXiv:2502.02859v1 Announce Type: new Abstract: We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $sqrt{T}$-type regret bounds and communication cost bounds with a $log T$ term scaling…
-
Uncertainty Quantification with the Empirical Neural Tangent Kernel
Uncertainty Quantification with the Empirical Neural Tangent Kernel arXiv:2502.02870v1 Announce Type: new Abstract: While neural networks have demonstrated impressive performance across various tasks, accurately quantifying uncertainty in their predictions is essential to ensure their trustworthiness and enable widespread adoption in critical systems. Several Bayesian uncertainty quantification (UQ) methods exist that are either cheap or reliable,…
-
Doubly Robust Monte Carlo Tree Search
Doubly Robust Monte Carlo Tree Search arXiv:2502.01672v1 Announce Type: new Abstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS…
-
Graph Canonical Correlation Analysis
Graph Canonical Correlation Analysis arXiv:2502.01780v1 Announce Type: new Abstract: Canonical correlation analysis (CCA) is a widely used technique for estimating associations between two sets of multi-dimensional variables. Recent advancements in CCA methods have expanded their application to decipher the interactions of multiomics datasets, imaging-omics datasets, and more. However, conventional CCA methods are limited in their…
-
Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models
Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models arXiv:2502.01919v1 Announce Type: new Abstract: In this work, we present a comprehensive Bayesian posterior analysis of what we term Poisson Hierarchical Indian Buffet Processes, designed for complex random sparse count species sampling models that allow…
-
Local minima of the empirical risk in high dimension: General theorems and convex examples
Local minima of the empirical risk in high dimension: General theorems and convex examples arXiv:2502.01953v1 Announce Type: new Abstract: We consider a general model for high-dimensional empirical risk minimization whereby the data $mathbf{x}_i$ are $d$-dimensional isotropic Gaussian vectors, the model is parametrized by $mathbf{Theta}inmathbb{R}^{dtimes k}$, and the loss depends on the data via the projection…
-
Theoretical and Practical Analysis of Fr’echet Regression via Comparison Geometry
Theoretical and Practical Analysis of Fr’echet Regression via Comparison Geometry arXiv:2502.01995v1 Announce Type: new Abstract: Fr’echet regression extends classical regression methods to non-Euclidean metric spaces, enabling the analysis of data relationships on complex structures such as manifolds and graphs. This work establishes a rigorous theoretical analysis for Fr’echet regression through the lens of comparison geometry…
-
Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees
Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees arXiv:2502.00240v1 Announce Type: new Abstract: Learning effective regularization is crucial for solving ill-posed inverse problems, which arise in a wide range of scientific and engineering applications. While data-driven methods that parameterize regularizers using deep neural networks have demonstrated strong empirical performance, they often…
-
Supervised Quadratic Feature Analysis: An Information Geometry Approach to Dimensionality Reduction
Supervised Quadratic Feature Analysis: An Information Geometry Approach to Dimensionality Reduction arXiv:2502.00168v1 Announce Type: new Abstract: Supervised dimensionality reduction aims to map labeled data to a low-dimensional feature space while maximizing class discriminability. Despite the availability of methods for learning complex non-linear features (e.g. Deep Learning), there is an enduring demand for dimensionality reduction methods…
-
Decentralized Inference for Distributed Geospatial Data Using Low-Rank Models
Decentralized Inference for Distributed Geospatial Data Using Low-Rank Models arXiv:2502.00309v1 Announce Type: new Abstract: Advancements in information technology have enabled the creation of massive spatial datasets, driving the need for scalable and efficient computational methodologies. While offering viable solutions, centralized frameworks are limited by vulnerabilities such as single-point failures and communication bottlenecks. This paper presents…
-
Variance Reduction via Resampling and Experience Replay
Variance Reduction via Resampling and Experience Replay arXiv:2502.00520v1 Announce Type: new Abstract: Experience replay is a foundational technique in reinforcement learning that enhances learning stability by storing past experiences in a replay buffer and reusing them during training. Despite its practical success, its theoretical properties remain underexplored. In this paper, we present a theoretical framework…
-
Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models
Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models arXiv:2501.18863v1 Announce Type: new Abstract: Score-based generative models, which transform noise into data by learning to reverse a diffusion process, have become a cornerstone of modern generative AI. This paper contributes to establishing theoretical guarantees for the probability flow ODE, a widely used diffusion-based…
-
A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization
A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization arXiv:2501.18756v1 Announce Type: new Abstract: Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function’s optimum and…
-
Trustworthy Evaluation of Generative AI Models
Trustworthy Evaluation of Generative AI Models arXiv:2501.18897v1 Announce Type: new Abstract: Generative AI (GenAI) models have recently achieved remarkable empirical performance in various applications, however, their evaluations yet lack uncertainty quantification. In this paper, we propose a method to compare two generative models based on an unbiased estimator of their relative performance gap. Statistically, our…
-
Optimizing Through Change: Bounds and Recommendations for Time-Varying Bayesian Optimization Algorithms
Optimizing Through Change: Bounds and Recommendations for Time-Varying Bayesian Optimization Algorithms arXiv:2501.18963v1 Announce Type: new Abstract: Time-Varying Bayesian Optimization (TVBO) is the go-to framework for optimizing a time-varying, expensive, noisy black-box function. However, most of the solutions proposed so far either rely on unrealistic assumptions on the nature of the objective function or do not…
-
Optimal Transport-based Conformal Prediction
Optimal Transport-based Conformal Prediction arXiv:2501.18991v1 Announce Type: new Abstract: Conformal Prediction (CP) is a principled framework for quantifying uncertainty in blackbox learning models, by constructing prediction sets with finite-sample coverage guarantees. Traditional approaches rely on scalar nonconformity scores, which fail to fully exploit the geometric structure of multivariate outputs, such as in multi-output regression or…
-
Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection
Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection arXiv:2501.17889v1 Announce Type: new Abstract: Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable…
-
Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks
Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks arXiv:2501.17882v1 Announce Type: new Abstract: We consider a multi-player multi-armed bandit setting in the presence of adversaries that attempt to negatively affect the rewards received by the players in the system. The reward distributions for any given arm are heterogeneous across the players. In the event of…
-
U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms
U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms arXiv:2501.18084v1 Announce Type: new Abstract: Across various domains, the growing advocacy for open science and open-source machine learning has made an increasing number of models publicly available. These models allow practitioners to integrate them into their own contexts, reducing the need for extensive data labeling, training, and calibration.…
-
Optimal Survey Design for Private Mean Estimation
Optimal Survey Design for Private Mean Estimation arXiv:2501.18121v1 Announce Type: new Abstract: This work identifies the first privacy-aware stratified sampling scheme that minimizes the variance for general private mean estimation under the Laplace, Discrete Laplace (DLap) and Truncated-Uniform-Laplace (TuLap) mechanisms within the framework of differential privacy (DP). We view stratified sampling as a subsampling operation,…
-
Random Feature Representation Boosting
Random Feature Representation Boosting arXiv:2501.18283v1 Announce Type: new Abstract: We introduce Random Feature Representation Boosting (RFRBoost), a novel method for constructing deep residual random feature neural networks (RFNNs) using boosting theory. RFRBoost uses random features at each layer to learn the functional gradient of the network representation, enhancing performance while preserving the convex optimization benefits…
-
Near-Optimal Algorithms for Omniprediction
Near-Optimal Algorithms for Omniprediction arXiv:2501.17205v1 Announce Type: new Abstract: Omnipredictors are simple prediction functions that encode loss-minimizing predictions with respect to a hypothesis class $H$, simultaneously for every loss function within a class of losses $L$. In this work, we give near-optimal learning algorithms for omniprediction, in both the online and offline settings. To begin,…
-
Testing Conditional Mean Independence Using Generative Neural Networks
Testing Conditional Mean Independence Using Generative Neural Networks arXiv:2501.17345v1 Announce Type: new Abstract: Conditional mean independence (CMI) testing is crucial for statistical tasks including model determination and variable importance evaluation. In this work, we introduce a novel population CMI measure and a bootstrap-based testing procedure that utilizes deep generative neural networks to estimate the conditional…
-
A Survey on Cluster-based Federated Learning
A Survey on Cluster-based Federated Learning arXiv:2501.17512v1 Announce Type: new Abstract: As the industrial and commercial use of Federated Learning (FL) has expanded, so has the need for optimized algorithms. In settings were FL clients’ data is non-independently and identically distributed (non-IID) and with highly heterogeneous distributions, the baseline FL approach seems to fall short.…
-
Exact characterization of {epsilon}-Safe Decision Regions for exponential family distributions and Multi Cost SVM approximation
Exact characterization of {epsilon}-Safe Decision Regions for exponential family distributions and Multi Cost SVM approximation arXiv:2501.17731v1 Announce Type: new Abstract: Probabilistic guarantees on the prediction of data-driven classifiers are necessary to define models that can be considered reliable. This is a key requirement for modern machine learning in which the goodness of a system is…
-
Sequential Learning of the Pareto Front for Multi-objective Bandits
Sequential Learning of the Pareto Front for Multi-objective Bandits arXiv:2501.17513v1 Announce Type: new Abstract: We study the problem of sequential learning of the Pareto front in multi-objective multi-armed bandits. An agent is faced with K possible arms to pull. At each turn she picks one, and receives a vector-valued reward. When she thinks she has…
-
Nonparametric Sparse Online Learning of the Koopman Operator
Nonparametric Sparse Online Learning of the Koopman Operator arXiv:2501.16489v1 Announce Type: new Abstract: The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. Data-driven techniques to learn the Koopman operator typically assume that the chosen function space is closed under system dynamics. In this paper, we study the Koopman…
-
Variational Schr”odinger Momentum Diffusion
Variational Schr”odinger Momentum Diffusion arXiv:2501.16675v1 Announce Type: new Abstract: The momentum Schr”odinger Bridge (mSB) has emerged as a leading method for accelerating generative diffusion processes and reducing transport costs. However, the lack of simulation-free properties inevitably results in high training costs and affects scalability. To obtain a trade-off between transport properties and scalability, we introduce…
-
Exponential Family Attention
Exponential Family Attention arXiv:2501.16790v1 Announce Type: new Abstract: The self-attention mechanism is the backbone of the transformer neural network underlying most large language models. It can capture complex word patterns and long-range dependencies in natural language. This paper introduces exponential family attention (EFA), a probabilistic generative model that extends self-attention to handle high-dimensional sequence, spatial,…
-
Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis
Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis arXiv:2501.16768v1 Announce Type: new Abstract: Multiview learning has drawn widespread attention for its efficacy in leveraging cross-view consensus and complementarity information to achieve a comprehensive representation of data. While multi-view learning has undergone vigorous development and achieved remarkable success, the theoretical understanding of its generalization behavior…
-
Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect
Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect arXiv:2501.16988v1 Announce Type: new Abstract: Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through “Marginal Variable Importance Metric” (MVIM), a model-agnostic…
-
ED-Filter: Dynamic Feature Filtering for Eating Disorder Classification
ED-Filter: Dynamic Feature Filtering for Eating Disorder Classification arXiv:2501.14785v1 Announce Type: new Abstract: Eating disorders (ED) are critical psychiatric problems that have alarmed the mental health community. Mental health professionals are increasingly recognizing the utility of data derived from social media platforms such as Twitter. However, high dimensionality and extensive feature sets of Twitter data…
-
Explaining Categorical Feature Interactions Using Graph Covariance and LLMs
Explaining Categorical Feature Interactions Using Graph Covariance and LLMs arXiv:2501.14932v1 Announce Type: new Abstract: Modern datasets often consist of numerous samples with abundant features and associated timestamps. Analyzing such datasets to uncover underlying events typically requires complex statistical methods and substantial domain expertise. A notable example, and the primary data focus of this paper, is…
-
Median of Forests for Robust Density Estimation
Median of Forests for Robust Density Estimation arXiv:2501.15157v1 Announce Type: new Abstract: Robust density estimation refers to the consistent estimation of the density function even when the data is contaminated by outliers. We find that existing forest density estimation at a certain point is inherently resistant to the outliers outside the cells containing the point,…
-
Conformal Inference of Individual Treatment Effects Using Conditional Density Estimates
Conformal Inference of Individual Treatment Effects Using Conditional Density Estimates arXiv:2501.14933v1 Announce Type: new Abstract: In an era where diverse and complex data are increasingly accessible, the precise prediction of individual treatment effects (ITE) becomes crucial across fields such as healthcare, economics, and public policy. Current state-of-the-art approaches, while providing valid prediction intervals through Conformal…
-
A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges
A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges arXiv:2501.15196v1 Announce Type: new Abstract: Time series anomaly detection presents various challenges due to the sequential and dynamic nature of time-dependent data. Traditional unsupervised methods frequently encounter difficulties in generalization, often overfitting to known normal patterns observed during training and…
-
Distributionally Robust Coreset Selection under Covariate Shift
Distributionally Robust Coreset Selection under Covariate Shift arXiv:2501.14253v1 Announce Type: new Abstract: Coreset selection, which involves selecting a small subset from an existing training dataset, is an approach to reducing training data, and various approaches have been proposed for this method. In practical situations where these methods are employed, it is often the case that…
-
EFiGP: Eigen-Fourier Physics-Informed Gaussian Process for Inference of Dynamic Systems
EFiGP: Eigen-Fourier Physics-Informed Gaussian Process for Inference of Dynamic Systems arXiv:2501.14107v1 Announce Type: new Abstract: Parameter estimation and trajectory reconstruction for data-driven dynamical systems governed by ordinary differential equations (ODEs) are essential tasks in fields such as biology, engineering, and physics. These inverse problems — estimating ODE parameters from observational data — are particularly challenging…
-
Statistical Verification of Linear Classifiers
Statistical Verification of Linear Classifiers arXiv:2501.14430v1 Announce Type: new Abstract: We propose a homogeneity test closely related to the concept of linear separability between two samples. Using the test one can answer the question whether a linear classifier is merely “random” or effectively captures differences between two classes. We focus on establishing upper bounds for…
-
coverforest: Conformal Predictions with Random Forest in Python
coverforest: Conformal Predictions with Random Forest in Python arXiv:2501.14570v1 Announce Type: new Abstract: Conformal prediction provides a framework for uncertainty quantification, specifically in the forms of prediction intervals and sets with distribution-free guaranteed coverage. While recent cross-conformal techniques such as CV+ and Jackknife+-after-bootstrap achieve better data efficiency than traditional split conformal methods, they incur substantial…