Category: cs.LG
-
On the Wasserstein Geodesic Principal Component Analysis of probability measures
On the Wasserstein Geodesic Principal Component Analysis of probability measures arXiv:2506.04480v1 Announce Type: new Abstract: This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of…
-
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning arXiv:2506.04626v1 Announce Type: new Abstract: Motivated by real-world settings where data collection and policy deployment — whether for a single agent or across multiple agents — are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a…
-
Distributional encoding for Gaussian process regression with qualitative inputs
Distributional encoding for Gaussian process regression with qualitative inputs arXiv:2506.04813v1 Announce Type: new Abstract: Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However,…
-
Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models
Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models arXiv:2506.04945v1 Announce Type: new Abstract: Estimating causal effects of joint interventions on multiple variables is crucial in many domains, but obtaining data from such simultaneous interventions can be challenging. Our study explores how to learn joint interventional effects using only observational data and single-variable interventions.…
-
Nonlinear Causal Discovery for Grouped Data
Nonlinear Causal Discovery for Grouped Data arXiv:2506.05120v1 Announce Type: new Abstract: Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather…
-
SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search
SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search arXiv:2506.03657v1 Announce Type: new Abstract: Community detection is a fundamental task in graph analysis, with methods often relying on fitting models like the Stochastic Block Model (SBM) to observed networks. While many algorithms can accurately estimate SBM parameters when the input graph…
-
Models of Heavy-Tailed Mechanistic Universality
Models of Heavy-Tailed Mechanistic Universality arXiv:2506.03470v1 Announce Type: new Abstract: Recent theoretical and empirical successes in deep learning, including the celebrated neural scaling laws, are punctuated by the observation that many objects of interest tend to exhibit some form of heavy-tailed or power law behavior. In particular, the prevalence of heavy-tailed spectral densities in Jacobians,…
-
Position: There Is No Free Bayesian Uncertainty Quantification
Position: There Is No Free Bayesian Uncertainty Quantification arXiv:2506.03670v1 Announce Type: new Abstract: Due to their intuitive appeal, Bayesian methods of modeling and uncertainty quantification have become popular in modern machine and deep learning. When providing a prior distribution over the parameter space, it is straightforward to obtain a distribution over the parameters that is…
-
Latent Guided Sampling for Combinatorial Optimization
Latent Guided Sampling for Combinatorial Optimization arXiv:2506.03672v1 Announce Type: new Abstract: Combinatorial Optimization problems are widespread in domains such as logistics, manufacturing, and drug discovery, yet their NP-hard nature makes them computationally challenging. Recent Neural Combinatorial Optimization methods leverage deep learning to learn solution strategies, trained via Supervised or Reinforcement Learning (RL). While promising, these…
-
Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices
Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices arXiv:2506.03764v1 Announce Type: new Abstract: We present a theoretical framework for deriving the general $n$-th order Fr’echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato’s analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular…
-
Assumption-free stability for ranking problems
Assumption-free stability for ranking problems arXiv:2506.02257v1 Announce Type: new Abstract: In this work, we consider ranking problems among a finite set of candidates: for instance, selecting the top-$k$ items among a larger list of candidates or obtaining the full ranking of all items in the set. These problems are often unstable, in the sense that…
-
Enabling Probabilistic Learning on Manifolds through Double Diffusion Maps
Enabling Probabilistic Learning on Manifolds through Double Diffusion Maps arXiv:2506.02254v1 Announce Type: new Abstract: We present a generative learning framework for probabilistic sampling based on an extension of the Probabilistic Learning on Manifolds (PLoM) approach, which is designed to generate statistically consistent realizations of a random vector in a finite-dimensional Euclidean space, informed by a…
-
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements arXiv:2506.02260v1 Announce Type: new Abstract: The growing prevalence of digital health technologies has led to the generation of complex multi-modal data, such as physical activity measurements simultaneously collected from various sensors of mobile and wearable devices. These data hold immense potential for advancing health studies, but current…
-
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression arXiv:2506.02336v1 Announce Type: new Abstract: We study gradient descent (GD) with a constant stepsize for $ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective, achieving exponential convergence in $widetilde{mathcal{O}}(kappa)$ steps with $kappa$ being the condition…
-
Tensor State Space-based Dynamic Multilayer Network Modeling
Tensor State Space-based Dynamic Multilayer Network Modeling arXiv:2506.02413v1 Announce Type: new Abstract: Understanding the complex interactions within dynamic multilayer networks is critical for advancements in various scientific domains. Existing models often fail to capture such networks’ temporal and cross-layer dynamics. This paper introduces a novel Tensor State Space Model for Dynamic Multilayer Networks (TSSDMN), utilizing…
-
Minimax Rates for the Estimation of Eigenpairs of Weighted Laplace-Beltrami Operators on Manifolds
Minimax Rates for the Estimation of Eigenpairs of Weighted Laplace-Beltrami Operators on Manifolds arXiv:2506.00171v1 Announce Type: new Abstract: We study the problem of estimating eigenpairs of elliptic differential operators from samples of a distribution $rho$ supported on a manifold $M$. The operators discussed in the paper are relevant in unsupervised learning and in particular are…
-
Overfitting has a limitation: a model-independent generalization error bound based on R’enyi entropy
Overfitting has a limitation: a model-independent generalization error bound based on R’enyi entropy arXiv:2506.00182v1 Announce Type: new Abstract: Will further scaling up of machine learning models continue to bring success? A significant challenge in answering this question lies in understanding generalization error, which is the impact of overfitting. Understanding generalization error behavior of increasingly large-scale…
-
Riemannian Principal Component Analysis
Riemannian Principal Component Analysis arXiv:2506.00226v1 Announce Type: new Abstract: This paper proposes an innovative extension of Principal Component Analysis (PCA) that transcends the traditional assumption of data lying in Euclidean space, enabling its application to data on Riemannian manifolds. The primary challenge addressed is the lack of vector space operations on such manifolds. Fletcher et…
-
Bayesian Data Sketching for Varying Coefficient Regression Models
Bayesian Data Sketching for Varying Coefficient Regression Models arXiv:2506.00270v1 Announce Type: new Abstract: Varying coefficient models are popular for estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We introduce Bayesian…
-
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning arXiv:2505.23783v1 Announce Type: new Abstract: In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performances in classification. While calibration techniques are proposed to…
-
Gibbs randomness-compression proposition: An efficient deep learning
Gibbs randomness-compression proposition: An efficient deep learning arXiv:2505.23869v1 Announce Type: new Abstract: A proposition that connects randomness and compression put forward via Gibbs entropy over set of measurement vectors associated with a compression process. The proposition states that a lossy compression process is equivalent to {it directed randomness} that preserves information content. The proposition originated…
-
Conformal Object Detection by Sequential Risk Control
Conformal Object Detection by Sequential Risk Control arXiv:2505.24038v1 Announce Type: new Abstract: Recent advances in object detectors have led to their adoption for industrial uses. However, their deployment in critical applications is hindered by the inherent lack of reliability of neural networks and the complex structure of object detection models. To address these challenges, we…
-
Performative Risk Control: Calibrating Models for Reliable Deployment under Performativity
Performative Risk Control: Calibrating Models for Reliable Deployment under Performativity arXiv:2505.24097v1 Announce Type: new Abstract: Calibrating blackbox machine learning models to achieve risk control is crucial to ensure reliable decision-making. A rich line of literature has been studying how to calibrate a model so that its predictions satisfy explicit finite-sample statistical guarantees under a fixed,…
-
A Mathematical Perspective On Contrastive Learning
A Mathematical Perspective On Contrastive Learning arXiv:2505.24134v1 Announce Type: new Abstract: Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each modality, that align representations within a common latent…
-
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games arXiv:2505.22781v1 Announce Type: new Abstract: We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting,…
-
Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features
Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features arXiv:2505.22997v1 Announce Type: new Abstract: Traditional classifiers often assume feature independence or rely on overly simplistic relationships, leading to poor performance in settings where real-world dependencies matter. We introduce the Deep Copula Classifier (DCC), a generative model that separates the learning…
-
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Highly Efficient and Effective LLMs with Multi-Boolean Architectures arXiv:2505.22811v1 Announce Type: new Abstract: Weight binarization has emerged as a promising strategy to drastically reduce the complexity of large language models (LLMs). It is mainly classified into two approaches: post-training binarization and finetuning with training-aware binarization methods. The first approach, while having low complexity, leads to…
-
JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows
JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows arXiv:2505.23196v1 Announce Type: new Abstract: Conformal prediction provides a model-agnostic framework for uncertainty quantification with finite-sample validity guarantees, making it an attractive tool for constructing reliable prediction sets. However, existing approaches commonly rely on residual-based conformity scores, which impose geometric constraints and struggle when the underlying distribution is…
-
Stable Thompson Sampling: Valid Inference via Variance Inflation
Stable Thompson Sampling: Valid Inference via Variance Inflation arXiv:2505.23260v1 Announce Type: new Abstract: We consider the problem of statistical inference when the data is collected via a Thompson Sampling-type algorithm. While Thompson Sampling (TS) is known to be both asymptotically optimal and empirically effective, its adaptive sampling scheme poses challenges for constructing confidence intervals for…
-
A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models
A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models arXiv:2505.21580v1 Announce Type: new Abstract: Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as of an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in…
-
STACI: Spatio-Temporal Aleatoric Conformal Inference
STACI: Spatio-Temporal Aleatoric Conformal Inference arXiv:2505.21658v1 Announce Type: new Abstract: Fitting Gaussian Processes (GPs) provides interpretable aleatoric uncertainty quantification for estimation of spatio-temporal fields. Spatio-temporal deep learning models, while scalable, typically assume a simplistic independent covariance matrix for the response, failing to capture the underlying correlation structure. However, spatio-temporal GPs suffer from issues of scalability…
-
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference arXiv:2505.21721v1 Announce Type: new Abstract: We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at an almost dimension-independent rate. Specifically, for strongly log-concave and log-smooth targets, the number of iterations for BBVI with a sub-Gaussian family to achieve…
-
Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks
Global Minimizers of $ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks arXiv:2505.21791v1 Announce Type: new Abstract: Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of…
-
A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging
A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging arXiv:2505.21796v1 Announce Type: new Abstract: Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing…
-
Differentially private ratio statistics
Differentially private ratio statistics arXiv:2505.20351v1 Announce Type: new Abstract: Ratio statistics–such as relative risk and odds ratios–play a central role in hypothesis testing, model evaluation, and decision-making across many areas of machine learning, including causal inference and fairness analysis. However, despite privacy concerns surrounding many datasets and despite increasing adoption of differential privacy, differentially private…
-
Learning with Expected Signatures: Theory and Applications
Learning with Expected Signatures: Theory and Applications arXiv:2505.20465v1 Announce Type: new Abstract: The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This “model-free” embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML)…
-
Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models
Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models arXiv:2505.20536v1 Announce Type: new Abstract: This paper studies the task of estimating heterogeneous treatment effects in causal panel data models, in the presence of covariate effects. We propose a novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models, that employs flexible model structures and powerful…
-
Balancing Performance and Costs in Best Arm Identification
Balancing Performance and Costs in Best Arm Identification arXiv:2505.20583v1 Announce Type: new Abstract: We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners…
-
Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems
Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems arXiv:2505.18276v1 Announce Type: new Abstract: Designing algorithms for solving high-dimensional Bayesian inverse problems directly in infinite-dimensional function spaces – where such problems are naturally formulated – is crucial to ensure stability and convergence as the discretization of the underlying problem is refined.…
-
Operator Learning for Schr”{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization
Operator Learning for Schr”{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization arXiv:2505.18288v1 Announce Type: new Abstract: We consider the problem of learning the evolution operator for the time-dependent Schr”{o}dinger equation, where the Hamiltonian may vary with time. Existing neural network-based surrogates often ignore fundamental properties of the Schr”{o}dinger equation, such as linearity and unitarity, and…
-
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective arXiv:2505.18346v1 Announce Type: new Abstract: Weak-to-strong generalization, where a student model trained on imperfect labels generated by a weaker teacher nonetheless surpasses that teacher, has been widely observed but the mechanisms that enable it have remained poorly understood. In this paper, through a theoretical analysis of…
-
Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling
Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling arXiv:2505.18327v1 Announce Type: new Abstract: Constrained stochastic nonlinear optimization problems have attracted significant attention for their ability to model complex real-world scenarios in physics, economics, and biology. As datasets continue to grow, online inference methods have become crucial for enabling real-time decision-making without the need…
-
Identifiability of latent causal graphical models without pure children
Identifiability of latent causal graphical models without pure children arXiv:2505.18410v1 Announce Type: new Abstract: This paper considers a challenging problem of identifying a causal graphical model under the presence of latent variables. While various identifiability conditions have been proposed in the literature, they often require multiple pure children per latent variable or restrictions on the…
-
Liouville PDE-based sliced-Wasserstein flow for fair regression
Liouville PDE-based sliced-Wasserstein flow for fair regression arXiv:2505.17204v1 Announce Type: new Abstract: The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is applied to fair regression. We have improved the SWF in a few aspects. First, the stochastic diffusive term from the Fokker-Planck equation-based Monte Carlo is transformed to Liouville partial differential…
-
Learning Probabilities of Causation from Finite Population Data
Learning Probabilities of Causation from Finite Population Data arXiv:2505.17133v1 Announce Type: new Abstract: Probabilities of causation play a crucial role in modern decision-making. This paper addresses the challenge of predicting probabilities of causation for subpopulations with textbf{insufficient} data using machine learning models. Tian and Pearl first defined and derived tight bounds for three fundamental probabilities…
-
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine arXiv:2505.17283v1 Announce Type: new Abstract: Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach…
-
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation arXiv:2505.17288v1 Announce Type: new Abstract: Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new…
-
Optimal Transport with Heterogeneously Missing Data
Optimal Transport with Heterogeneously Missing Data arXiv:2505.17291v1 Announce Type: new Abstract: We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous missingness probabilities across features and across the two distributions.…
-
PO-Flow: Flow-based Generative Models for Sampling Potential Outcomes and Counterfactuals
PO-Flow: Flow-based Generative Models for Sampling Potential Outcomes and Counterfactuals arXiv:2505.16051v1 Announce Type: new Abstract: We propose PO-Flow, a novel continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcomes and counterfactuals. Trained via flow matching, PO-Flow provides a unified framework for individualized potential outcome prediction, counterfactual predictions, and uncertainty-aware density learning.…
-
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision arXiv:2505.15927v1 Announce Type: new Abstract: Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the…
-
Oh SnapMMD! Forecasting Stochastic Dynamics Beyond the Schr”odinger Bridge’s End
Oh SnapMMD! Forecasting Stochastic Dynamics Beyond the Schr”odinger Bridge’s End arXiv:2505.16082v1 Announce Type: new Abstract: Scientists often want to make predictions beyond the observed time horizon of “snapshot” data following latent stochastic dynamics. For example, in time course single-cell mRNA profiling, scientists have access to cellular transcriptional state measurements (snapshots) from different biological replicates at…
-
Dimension-adapted Momentum Outscales SGD
Dimension-adapted Momentum Outscales SGD arXiv:2505.16098v1 Announce Type: new Abstract: We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target…
-
Exponential Convergence of CAVI for Bayesian PCA
Exponential Convergence of CAVI for Bayesian PCA arXiv:2505.16145v1 Announce Type: new Abstract: Probabilistic principal component analysis (PCA) and its Bayesian variant (BPCA) are widely used for dimension reduction in machine learning and statistics. The main advantage of probabilistic PCA over the traditional formulation is allowing uncertainty quantification. The parameters of BPCA are typically learned using…
-
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective arXiv:2505.14808v1 Announce Type: new Abstract: This work aims to demystify the out-of-distribution (OOD) capabilities of in-context learning (ICL) by studying linear regression tasks parameterized with low-rank covariance matrices. With such a parameterization, we can model distribution shifts as a varying angle between the subspace of the…
-
LOBSTUR: A Local Bootstrap Framework for Tuning Unsupervised Representations in Graph Neural Networks
LOBSTUR: A Local Bootstrap Framework for Tuning Unsupervised Representations in Graph Neural Networks arXiv:2505.14867v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) are increasingly used in conjunction with unsupervised learning techniques to learn powerful node representations, but their deployment is hindered by their high sensitivity to hyperparameter tuning and the absence of established methodologies for…
-
Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds
Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds arXiv:2505.15013v1 Announce Type: new Abstract: First-order adaptive optimization methods like Adam are the default choices for training modern deep neural networks. Despite their empirical success, the theoretical understanding of these methods in non-smooth settings, particularly in Deep ReLU networks, remains limited. ReLU…
-
A Linear Approach to Data Poisoning
A Linear Approach to Data Poisoning arXiv:2505.15175v1 Announce Type: new Abstract: We investigate the theoretical foundations of data poisoning attacks in machine learning models. Our analysis reveals that the Hessian with respect to the input serves as a diagnostic tool for detecting poisoning, exhibiting spectral signatures that characterize compromised datasets. We use random matrix theory…
-
Infinite hierarchical contrastive clustering for personal digital envirotyping
Infinite hierarchical contrastive clustering for personal digital envirotyping arXiv:2505.15022v1 Announce Type: new Abstract: Daily environments have profound influence on our health and behavior. Recent work has shown that digital envirotyping, where computer vision is applied to images of daily environments taken during ecological momentary assessment (EMA), can be used to identify meaningful relationships between environmental…
-
Continuous Domain Generalization
Continuous Domain Generalization arXiv:2505.13519v1 Announce Type: new Abstract: Real-world data distributions often shift continuously across multiple latent factors such as time, geography, and socioeconomic context. However, existing domain generalization approaches typically treat domains as discrete or evolving along a single axis (e.g., time), which fails to capture the complex, multi-dimensional nature of real-world variation. This…
-
Data Balancing Strategies: A Survey of Resampling and Augmentation Methods
Data Balancing Strategies: A Survey of Resampling and Augmentation Methods arXiv:2505.13518v1 Announce Type: new Abstract: Imbalanced data poses a significant obstacle in machine learning, as an unequal distribution of class labels often results in skewed predictions and diminished model accuracy. To mitigate this problem, various resampling strategies have been developed, encompassing both oversampling and undersampling…
-
Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback
Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback arXiv:2505.13562v1 Announce Type: new Abstract: Learning in games is a fundamental problem in machine learning and artificial intelligence, with numerous applications~citep{silver2016mastering,schrittwieser2020mastering}. This work investigates two-player zero-sum matrix games with an unknown payoff matrix and bandit feedback, where each player observes their actions and the…
-
Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles
Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles arXiv:2505.13585v1 Announce Type: new Abstract: This work introduces a new method called scalable Bayesian Monte Carlo (SBMC). The model interpolates between a point estimator and the posterior, and the algorithm is a parallel implementation of a consistent (asymptotically unbiased) Bayesian deep learning algorithm: sequential Monte…
-
Backward Conformal Prediction
Backward Conformal Prediction arXiv:2505.13732v1 Announce Type: new Abstract: We introduce $textit{Backward Conformal Prediction}$, a method that guarantees conformal coverage while providing flexible control over the size of prediction sets. Unlike standard conformal prediction, which fixes the coverage level and allows the conformal set size to vary, our approach defines a rule that constrains how prediction…
-
The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations
The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations arXiv:2505.11622v1 Announce Type: new Abstract: We present a novel kernel-based method for learning multivariate stochastic differential equations (SDEs). The method follows a two-step procedure: we first estimate the drift term function, then the (matrix-valued) diffusion function given the drift. Occupation kernels are integral functionals…
-
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles arXiv:2505.11671v1 Announce Type: new Abstract: Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)…
-
Missing Data Imputation by Reducing Mutual Information with Rectified Flows
Missing Data Imputation by Reducing Mutual Information with Rectified Flows arXiv:2505.11749v1 Announce Type: new Abstract: This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method…
-
Multi-Attribute Graph Estimation with Sparse-Group Non-Convex Penalties
Multi-Attribute Graph Estimation with Sparse-Group Non-Convex Penalties arXiv:2505.11984v1 Announce Type: new Abstract: We consider the problem of inferring the conditional independence graph (CIG) of high-dimensional Gaussian vectors from multi-attribute data. Most existing methods for graph estimation are based on single-attribute models where one associates a scalar random variable with each node. In multi-attribute graphical models,…
-
Thompson Sampling-like Algorithms for Stochastic Rising Bandits
Thompson Sampling-like Algorithms for Stochastic Rising Bandits arXiv:2505.12092v1 Announce Type: new Abstract: Stochastic rising rested bandit (SRRB) is a setting where the arms’ expected rewards increase as they are pulled. It models scenarios in which the performances of the different options grow as an effect of an underlying learning process (e.g., online model selection). Even…
-
An Exponential Averaging Process with Strong Convergence Properties
An Exponential Averaging Process with Strong Convergence Properties arXiv:2505.10605v1 Announce Type: new Abstract: Averaging, or smoothing, is a fundamental approach to obtain stable, de-noised estimates from noisy observations. In certain scenarios, observations made along trajectories of random dynamical systems are of particular interest. One popular smoothing technique for such a scenario is exponential moving averaging…
-
Minimax learning rates for estimating binary classifiers under margin conditions
Minimax learning rates for estimating binary classifiers under margin conditions arXiv:2505.10628v1 Announce Type: new Abstract: We study classification problems using binary estimators where the decision boundary is described by horizon functions and where the data distribution satisfies a geometric margin condition. We establish upper and lower bounds for the minimax learning rate over broad function…
-
Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization
Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization arXiv:2505.11089v1 Announce Type: new Abstract: In this paper, we consider a score-based Integer Programming (IP) approach for solving the Bayesian Network Structure Learning (BNSL) problem. State-of-the-art BNSL IP formulations suffer from the exponentially large number of variables and constraints. A standard approach in IP…
-
Supervised Models Can Generalize Also When Trained on Random Label
Supervised Models Can Generalize Also When Trained on Random Label arXiv:2505.11006v1 Announce Type: new Abstract: The success of unsupervised learning raises the question of whether also supervised models can be trained without using the information in the output $y$. In this paper, we demonstrate that this is indeed possible. The key step is to formulate…
-
Nash: Neural Adaptive Shrinkage for Structured High-Dimensional Regression
Nash: Neural Adaptive Shrinkage for Structured High-Dimensional Regression arXiv:2505.11143v1 Announce Type: new Abstract: Sparse linear regression is a fundamental tool in data analysis. However, traditional approaches often fall short when covariates exhibit structure or arise from heterogeneous sources. In biomedical applications, covariates may stem from distinct modalities or be structured according to an underlying graph.…
-
On Measuring Intrinsic Causal Attributions in Deep Neural Networks
On Measuring Intrinsic Causal Attributions in Deep Neural Networks arXiv:2505.09660v1 Announce Type: new Abstract: Quantifying the causal influence of input features within neural networks has become a topic of increasing interest. Existing approaches typically assess direct, indirect, and total causal effects. This work treats NNs as structural causal models (SCMs) and extends our focus to…
-
LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data
LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data arXiv:2505.09803v1 Announce Type: new Abstract: In many scientific and industrial applications, we are given a handful of instances (a ‘small ensemble’) of a spatially distributed quantity (a ‘field’) but would like to acquire many more. For example, a large ensemble of global temperature sensitivity fields…
-
Learning Multi-Attribute Differential Graphs with Non-Convex Penalties
Learning Multi-Attribute Differential Graphs with Non-Convex Penalties arXiv:2505.09748v1 Announce Type: new Abstract: We consider the problem of estimating differences in two multi-attribute Gaussian graphical models (GGMs) which are known to have similar structure, using a penalized D-trace loss function with non-convex penalties. The GGM structure is encoded in its precision (inverse covariance) matrix. Existing methods…
-
A Scalable Gradient-Based Optimization Framework for Sparse Minimum-Variance Portfolio Selection
A Scalable Gradient-Based Optimization Framework for Sparse Minimum-Variance Portfolio Selection arXiv:2505.10099v1 Announce Type: new Abstract: Portfolio optimization involves selecting asset weights to minimize a risk-reward objective, such as the portfolio variance in the classical minimum-variance framework. Sparse portfolio selection extends this by imposing a cardinality constraint: only $k$ assets from a universe of $p$ may…
-
Path Gradients after Flow Matching
Path Gradients after Flow Matching arXiv:2505.10139v1 Announce Type: new Abstract: Boltzmann Generators have emerged as a promising machine learning tool for generating samples from equilibrium distributions of molecular systems using Normalizing Flows and importance weighting. Recently, Flow Matching has helped speed up Continuous Normalizing Flows (CNFs), scale them to more complex molecular systems, and minimize…
-
Lower Bounds on the MMSE of Adversarially Inferring Sensitive Features
Lower Bounds on the MMSE of Adversarially Inferring Sensitive Features arXiv:2505.09004v1 Announce Type: new Abstract: We propose an adversarial evaluation framework for sensitive feature inference based on minimum mean-squared error (MMSE) estimation with a finite sample size and linear predictive models. Our approach establishes theoretical lower bounds on the true MMSE of inferring sensitive features…
-
Online Learning of Neural Networks
Online Learning of Neural Networks arXiv:2505.09167v1 Announce Type: new Abstract: We study online learning of feedforward neural networks with the sign activation function that implement functions from the unit ball in $mathbb{R}^d$ to a finite label set ${1, ldots, Y}$. First, we characterize a margin condition that is sufficient and in some cases necessary for…
-
Risk Bounds For Distributional Regression
Risk Bounds For Distributional Regression arXiv:2505.09075v1 Announce Type: new Abstract: This work examines risk bounds for nonparametric distributional regression estimators. For convex-constrained distributional regression, general upper bounds are established for the continuous ranked probability score (CRPS) and the worst-case mean squared error (MSE) across the domain. These theoretical results are applied to isotonic and trend…
-
Optimal Transport-Based Domain Adaptation for Rotated Linear Regression
Optimal Transport-Based Domain Adaptation for Rotated Linear Regression arXiv:2505.09229v1 Announce Type: new Abstract: Optimal Transport (OT) has proven effective for domain adaptation (DA) by aligning distributions across domains with differing statistical properties. Building on the approach of Courty et al. (2016), who mapped source data to the target domain for improved model transfer, we focus…
-
Fairness-aware Bayes optimal functional classification
Fairness-aware Bayes optimal functional classification arXiv:2505.09471v1 Announce Type: new Abstract: Algorithmic fairness has become a central topic in machine learning, and mitigating disparities across different subpopulations has emerged as a rapidly growing research area. In this paper, we systematically study the classification of functional data under fairness constraints, ensuring the disparity level of the classifier…
-
Wasserstein Distributionally Robust Nonparametric Regression
Wasserstein Distributionally Robust Nonparametric Regression arXiv:2505.07967v1 Announce Type: new Abstract: Distributionally robust optimization has become a powerful tool for prediction and decision-making under model uncertainty. By focusing on the local worst-case risk, it enhances robustness by identifying the most unfavorable distribution within a predefined ambiguity set. While extensive research has been conducted in parametric settings,…
-
Diffusion-based supervised learning of generative models for efficient sampling of multimodal distributions
Diffusion-based supervised learning of generative models for efficient sampling of multimodal distributions arXiv:2505.07825v1 Announce Type: new Abstract: We propose a hybrid generative model for efficient sampling of high-dimensional, multimodal probability distributions for Bayesian inference. Traditional Monte Carlo methods, such as the Metropolis-Hastings and Langevin Monte Carlo sampling methods, are effective for sampling from single-mode distributions…
-
Sharp Gaussian approximations for Decentralized Federated Learning
Sharp Gaussian approximations for Decentralized Federated Learning arXiv:2505.08125v1 Announce Type: new Abstract: Federated Learning has gained traction in privacy-sensitive collaborative environments, with local SGD emerging as a key optimization method in decentralized settings. While its convergence properties are well-studied, asymptotic statistical guarantees beyond convergence remain limited. In this paper, we present two generalized Gaussian approximation…
-
SIM-Shapley: A Stable and Computationally Efficient Approach to Shapley Value Approximation
SIM-Shapley: A Stable and Computationally Efficient Approach to Shapley Value Approximation arXiv:2505.08198v1 Announce Type: new Abstract: Explainable artificial intelligence (XAI) is essential for trustworthy machine learning (ML), particularly in high-stakes domains such as healthcare and finance. Shapley value (SV) methods provide a principled framework for feature attribution in complex models but incur high computational costs,…
-
Lie Group Symmetry Discovery and Enforcement Using Vector Fields
Lie Group Symmetry Discovery and Enforcement Using Vector Fields arXiv:2505.08219v1 Announce Type: new Abstract: Symmetry-informed machine learning can exhibit advantages over machine learning which fails to account for symmetry. Additionally, recent attention has been given to continuous symmetry discovery using vector fields which serve as infinitesimal generators for Lie group symmetries. In this paper, we…
-
Fair Representation Learning for Continuous Sensitive Attributes using Expectation of Integral Probability Metrics
Fair Representation Learning for Continuous Sensitive Attributes using Expectation of Integral Probability Metrics arXiv:2505.06435v1 Announce Type: new Abstract: AI fairness, also known as algorithmic fairness, aims to ensure that algorithms operate without bias or discrimination towards any individual or group. Among various AI algorithms, the Fair Representation Learning (FRL) approach has gained significant interest in…
-
High-Dimensional Importance-Weighted Information Criteria: Theory and Optimality
High-Dimensional Importance-Weighted Information Criteria: Theory and Optimality arXiv:2505.06531v1 Announce Type: new Abstract: Imori and Ing (2025) proposed the importance-weighted orthogonal greedy algorithm (IWOGA) for model selection in high-dimensional misspecified regression models under covariate shift. To determine the number of IWOGA iterations, they introduced the high-dimensional importance-weighted information criterion (HDIWIC). They argued that the combined use…
-
Learning Guarantee of Reward Modeling Using Deep Neural Networks
Learning Guarantee of Reward Modeling Using Deep Neural Networks arXiv:2505.06601v1 Announce Type: new Abstract: In this work, we study the learning theory of reward modeling with pairwise comparison data using deep neural networks. We establish a novel non-asymptotic regret bound for deep reward estimators in a non-parametric setting, which depends explicitly on the network architecture.…
-
Feature Representation Transferring to Lightweight Models via Perception Coherence
Feature Representation Transferring to Lightweight Models via Perception Coherence arXiv:2505.06595v1 Announce Type: new Abstract: In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called textit{perception coherence}. Based on this notion, we propose a loss function, which takes into account…
-
Optimal Regret of Bernoulli Bandits under Global Differential Privacy
Optimal Regret of Bernoulli Bandits under Global Differential Privacy arXiv:2505.05613v1 Announce Type: new Abstract: As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under $epsilon$-global Differential Privacy (DP) has been widely studied. Unlike bandits…
-
DaringFed: A Dynamic Bayesian Persuasion Pricing for Online Federated Learning under Two-sided Incomplete Information
DaringFed: A Dynamic Bayesian Persuasion Pricing for Online Federated Learning under Two-sided Incomplete Information arXiv:2505.05842v1 Announce Type: cross Abstract: Online Federated Learning (OFL) is a real-time learning paradigm that sequentially executes parameter aggregation immediately for each random arriving client. To motivate clients to participate in OFL, it is crucial to offer appropriate incentives to offset…
-
Safe-EF: Error Feedback for Nonsmooth Constrained Optimization
Safe-EF: Error Feedback for Nonsmooth Constrained Optimization arXiv:2505.06053v1 Announce Type: cross Abstract: Federated learning faces severe communication bottlenecks due to the high dimensionality of model updates. Communication compression with contractive compressors (e.g., Top-K) is often preferable in practice but can degrade performance without proper handling. Error feedback (EF) mitigates such issues but has been largely…
-
Mixed-Integer Optimization for Responsible Machine Learning
Mixed-Integer Optimization for Responsible Machine Learning arXiv:2505.05857v1 Announce Type: cross Abstract: In the last few decades, Machine Learning (ML) has achieved significant success across domains ranging from healthcare, sustainability, and the social sciences, to criminal justice and finance. But its deployment in increasingly sophisticated, critical, and sensitive areas affecting individuals, the groups they belong to,…
-
Generalization Analysis for Contrastive Representation Learning under Non-IID Settings
Generalization Analysis for Contrastive Representation Learning under Non-IID Settings arXiv:2505.04937v1 Announce Type: new Abstract: Contrastive Representation Learning (CRL) has achieved impressive success in various domains in recent years. Nevertheless, the theoretical understanding of the generalization behavior of CRL is limited. Moreover, to the best of our knowledge, the current literature only analyzes generalization bounds under…
-
Learning Linearized Models from Nonlinear Systems under Initialization Constraints with Finite Data
Learning Linearized Models from Nonlinear Systems under Initialization Constraints with Finite Data arXiv:2505.04954v1 Announce Type: new Abstract: The identification of a linear system model from data has wide applications in control theory. The existing work that provides finite sample guarantees for linear system identification typically uses data from a single long system trajectory under i.i.d.…