Category: cs.LG
-
Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature
Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature arXiv:2503.06079v1 Announce Type: new Abstract: Despite the significance of probabilistic time-series forecasting models, their evaluation metrics often involve intractable integrations. The most widely used metric, the continuous ranked probability score (CRPS), is a strictly proper scoring function; however, its computation requires approximation. We found…
-
On Statistical Estimation of Edge-Reinforced Random Walks
On Statistical Estimation of Edge-Reinforced Random Walks arXiv:2503.06115v1 Announce Type: new Abstract: Reinforced random walks (RRWs), including vertex-reinforced random walks (VRRWs) and edge-reinforced random walks (ERRWs), model random walks where the transition probabilities evolve based on prior visitation history~cite{mgr, fmk, tarres, volkov}. These models have found applications in various areas, such as network representation learning~cite{xzzs},…
-
Double Debiased Machine Learning for Mediation Analysis with Continuous Treatments
Double Debiased Machine Learning for Mediation Analysis with Continuous Treatments arXiv:2503.06156v1 Announce Type: new Abstract: Uncovering causal mediation effects is of significant value to practitioners seeking to isolate the direct treatment effect from the potential mediated effect. We propose a double machine learning (DML) algorithm for mediation analysis that supports continuous treatments. To estimate the…
-
Bayesian Optimization for Robust Identification of Ornstein-Uhlenbeck Model
Bayesian Optimization for Robust Identification of Ornstein-Uhlenbeck Model arXiv:2503.06381v1 Announce Type: new Abstract: This paper deals with the identification of the stochastic Ornstein-Uhlenbeck (OU) process error model, which is characterized by an inverse time constant, and the unknown variances of the process and observation noises. Although the availability of the explicit expression of the log-likelihood…
-
A Practical Introduction to Kernel Discrepancies: MMD, HSIC & KSD
A Practical Introduction to Kernel Discrepancies: MMD, HSIC & KSD arXiv:2503.04820v1 Announce Type: new Abstract: This article provides a practical introduction to kernel discrepancies, focusing on the Maximum Mean Discrepancy (MMD), the Hilbert-Schmidt Independence Criterion (HSIC), and the Kernel Stein Discrepancy (KSD). Various estimators for these discrepancies are presented, including the commonly-used V-statistics and U-statistics,…
-
Boltzmann convolutions and Welford mean-variance layers with an application to time series forecasting and classification
Boltzmann convolutions and Welford mean-variance layers with an application to time series forecasting and classification arXiv:2503.04956v1 Announce Type: new Abstract: In this paper we propose a novel problem called the ForeClassing problem where the loss of a classification decision is only observed at a future time point after the classification decision has to be made.…
-
A characterization of sample adaptivity in UCB data
A characterization of sample adaptivity in UCB data arXiv:2503.04855v1 Announce Type: new Abstract: We characterize a joint CLT of the number of pulls and the sample mean reward of the arms in a stochastic two-armed bandit environment under UCB algorithms. Several implications of this result are in place: (1) a nonstandard CLT of the number…
-
Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits
Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits arXiv:2503.05098v1 Announce Type: new Abstract: Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one have prior knowledge of a (relatively) tight upper bound on…
-
Topology-Aware Conformal Prediction for Stream Networks
Topology-Aware Conformal Prediction for Stream Networks arXiv:2503.04981v1 Announce Type: new Abstract: Stream networks, a unique class of spatiotemporal graphs, exhibit complex directional flow constraints and evolving dependencies, making uncertainty quantification a critical yet challenging task. Traditional conformal prediction methods struggle in this setting due to the need for joint predictions across multiple interdependent locations and…
-
Reheated Gradient-based Discrete Sampling for Combinatorial Optimization
Reheated Gradient-based Discrete Sampling for Combinatorial Optimization arXiv:2503.04047v1 Announce Type: new Abstract: Recently, gradient-based discrete sampling has emerged as a highly efficient, general-purpose solver for various combinatorial optimization (CO) problems, achieving performance comparable to or surpassing the popular data-driven approaches. However, we identify a critical issue in these methods, which we term ”wandering in contours”.…
-
Conformal Prediction with Upper and Lower Bound Models
Conformal Prediction with Upper and Lower Bound Models arXiv:2503.04071v1 Announce Type: new Abstract: This paper studies a Conformal Prediction (CP) methodology for building prediction intervals in a regression setting, given only deterministic lower and upper bounds on the target variable. It proposes a new CP mechanism (CPUL) that goes beyond post-processing by adopting a model…
-
Generalization in Federated Learning: A Conditional Mutual Information Framework
Generalization in Federated Learning: A Conditional Mutual Information Framework arXiv:2503.04091v1 Announce Type: new Abstract: Federated Learning (FL) is a widely adopted privacy-preserving distributed learning framework, yet its generalization performance remains less explored compared to centralized learning. In FL, the generalization error consists of two components: the out-of-sample gap, which measures the gap between the empirical…
-
Learning Causal Response Representations through Direct Effect Analysis
Learning Causal Response Representations through Direct Effect Analysis arXiv:2503.04358v1 Announce Type: new Abstract: We propose a novel approach for learning causal response representations. Our method aims to extract directions in which a multidimensional outcome is most directly caused by a treatment variable. By bridging conditional independence testing with causal representation learning, we formulate an optimisation…
-
Applications of Entropy in Data Analysis and Machine Learning: A Review
Applications of Entropy in Data Analysis and Machine Learning: A Review arXiv:2503.02921v1 Announce Type: new Abstract: Since its origin in the thermodynamics of the 19th century, the concept of entropy has also permeated other fields of physics and mathematics, such as Classical and Quantum Statistical Mechanics, Information Theory, Probability Theory, Ergodic Theory and the Theory…
-
LAPD: Langevin-Assisted Bayesian Active Learning for Physical Discovery
LAPD: Langevin-Assisted Bayesian Active Learning for Physical Discovery arXiv:2503.02983v1 Announce Type: new Abstract: Discovering physical laws from data is a fundamental challenge in scientific research, particularly when high-quality data are scarce or costly to obtain. Traditional methods for identifying dynamical systems often struggle with noise sensitivity, inefficiency in data usage, and the inability to quantify…
-
PAC Learning with Improvements
PAC Learning with Improvements arXiv:2503.03184v1 Announce Type: new Abstract: One of the most basic lower bounds in machine learning is that in nearly any nontrivial setting, it takes $textit{at least}$ $1/epsilon$ samples to learn to error $epsilon$ (and more, if the classifier being learned is complex). However, suppose that data points are agents who have…
-
Convergence Rates for Softmax Gating Mixture of Experts
Convergence Rates for Softmax Gating Mixture of Experts arXiv:2503.03213v1 Announce Type: new Abstract: Mixture of experts (MoE) has recently emerged as an effective framework to advance the efficiency and scalability of machine learning models by softly dividing complex tasks among multiple specialized sub-models termed experts. Central to the success of MoE is an adaptive softmax…
-
Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations
Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations arXiv:2503.03283v1 Announce Type: new Abstract: Drawing parallels with the way biological networks are studied, we adapt the treatment–control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating…
-
Mathematical Foundation of Interpretable Equivariant Surrogate Models
Mathematical Foundation of Interpretable Equivariant Surrogate Models arXiv:2503.01942v1 Announce Type: new Abstract: This paper introduces a rigorous mathematical framework for neural network explainability, and more broadly for the explainability of equivariant operators called Group Equivariant Operators (GEOs) based on Group Equivariant Non-Expansive Operators (GENEOs) transformations. The central concept involves quantifying the distance between GEOs by…
-
Gradient-free stochastic optimization for additive models
Gradient-free stochastic optimization for additive models arXiv:2503.02131v1 Announce Type: new Abstract: We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-{L}ojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H”older family…
-
Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification
Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification arXiv:2503.02110v1 Announce Type: new Abstract: We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. citet{GL} previously established the lack of asymptotic…
-
Online Inference for Quantiles by Constant Learning-Rate Stochastic Gradient Descent
Online Inference for Quantiles by Constant Learning-Rate Stochastic Gradient Descent arXiv:2503.02178v1 Announce Type: new Abstract: This paper proposes an online inference method of the stochastic gradient descent (SGD) with a constant learning rate for quantile loss functions with theoretical guarantees. Since the quantile loss function is neither smooth nor strongly convex, we view such SGD…
-
Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements
Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements arXiv:2503.02437v1 Announce Type: new Abstract: This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, LGTC-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form…
-
Approaching the Harm of Gradient Attacks While Only Flipping Labels
Approaching the Harm of Gradient Attacks While Only Flipping Labels arXiv:2503.00140v1 Announce Type: new Abstract: Availability attacks are one of the strongest forms of training-phase attacks in machine learning, making the model unusable. While prior work in distributed ML has demonstrated such effect via gradient attacks and, more recently, data poisoning, we ask: can similar…
-
An interpretation of the Brownian bridge as a physics-informed prior for the Poisson equation
An interpretation of the Brownian bridge as a physics-informed prior for the Poisson equation arXiv:2503.00213v1 Announce Type: new Abstract: Physics-informed machine learning is one of the most commonly used methods for fusing physical knowledge in the form of partial differential equations with experimental data. The idea is to construct a loss function where the physical…
-
Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits
Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits arXiv:2503.00273v1 Announce Type: new Abstract: We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we…
-
LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention
LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention arXiv:2503.00387v1 Announce Type: new Abstract: Existing contextual multi-armed bandit (MAB) algorithms fail to effectively capture both long-term trends and local patterns across all arms, leading to suboptimal performance in environments with rapidly changing reward structures. They also rely on static exploration rates, which do not dynamically adjust…
-
Generalization Bounds for Equivariant Networks on Markov Data
Generalization Bounds for Equivariant Networks on Markov Data arXiv:2503.00292v1 Announce Type: new Abstract: Equivariant neural networks play a pivotal role in analyzing datasets with symmetry properties, particularly in complex data structures. However, integrating equivariance with Markov properties presents notable challenges due to the inherent dependencies within such data. Previous research has primarily concentrated on establishing…
-
Transfer Learning through Enhanced Sufficient Representation: Enriching Source Domain Knowledge with Target Data
Transfer Learning through Enhanced Sufficient Representation: Enriching Source Domain Knowledge with Target Data arXiv:2502.20414v1 Announce Type: new Abstract: Transfer learning is an important approach for addressing the challenges posed by limited data availability in various applications. It accomplishes this by transferring knowledge from well-established source domains to a less familiar target domain. However, traditional transfer…
-
Efficient Risk-sensitive Planning via Entropic Risk Measures
Efficient Risk-sensitive Planning via Entropic Risk Measures arXiv:2502.20423v1 Announce Type: new Abstract: Risk-sensitive planning aims to identify policies maximizing some tail-focused metrics in Markov Decision Processes (MDPs). Such an optimization task can be very costly for the most widely used and interpretable metrics such as threshold probabilities or (Conditional) Values at Risk. Indeed, previous work…
-
Amortized Conditional Independence Testing
Amortized Conditional Independence Testing arXiv:2502.20925v1 Announce Type: new Abstract: Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery – a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the…
-
Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability
Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability arXiv:2502.20531v1 Announce Type: new Abstract: Deep neural networks trained using gradient descent with a fixed learning rate $eta$ often operate in the regime of “edge of stability” (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold $2/eta$. In this work,…
-
Post-Hoc Uncertainty Quantification in Pre-Trained Neural Networks via Activation-Level Gaussian Processes
Post-Hoc Uncertainty Quantification in Pre-Trained Neural Networks via Activation-Level Gaussian Processes arXiv:2502.20966v1 Announce Type: new Abstract: Uncertainty quantification in neural networks through methods such as Dropout, Bayesian neural networks and Laplace approximations is either prone to underfitting or computationally demanding, rendering these approaches impractical for large-scale datasets. In this work, we address these shortcomings by…
-
Practical Evaluation of Copula-based Survival Metrics: Beyond the Independent Censoring Assumption
Practical Evaluation of Copula-based Survival Metrics: Beyond the Independent Censoring Assumption arXiv:2502.19460v1 Announce Type: new Abstract: Conventional survival metrics, such as Harrell’s concordance index and the Brier Score, rely on the independent censoring assumption for valid inference in the presence of right-censored data. However, when instances are censored for reasons related to the event of…
-
Advancing calibration for stochastic agent-based models in epidemiology with Stein variational inference and Gaussian process surrogates
Advancing calibration for stochastic agent-based models in epidemiology with Stein variational inference and Gaussian process surrogates arXiv:2502.19550v1 Announce Type: new Abstract: Accurate calibration of stochastic agent-based models (ABMs) in epidemiology is crucial to make them useful in public health policy decisions and interventions. Traditional calibration methods, e.g., Markov Chain Monte Carlo (MCMC), that yield a…
-
Fast Debiasing of the LASSO Estimator
Fast Debiasing of the LASSO Estimator arXiv:2502.19825v1 Announce Type: new Abstract: In high-dimensional sparse regression, the textsc{Lasso} estimator offers excellent theoretical guarantees but is well-known to produce biased estimates. To address this, cite{Javanmard2014} introduced a method to “debias” the textsc{Lasso} estimates for a random sub-Gaussian sensing matrix $boldsymbol{A}$. Their approach relies on computing an “approximate…
-
Multiple Linked Tensor Factorization
Multiple Linked Tensor Factorization arXiv:2502.20286v1 Announce Type: new Abstract: In biomedical research and other fields, it is now common to generate high content data that are both multi-source and multi-way. Multi-source data are collected from different high-throughput technologies while multi-way data are collected over multiple dimensions, yielding multiple tensor arrays. Integrative analysis of these data…
-
Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula
Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula arXiv:2502.20003v1 Announce Type: new Abstract: The analytic characterization of the high-dimensional behavior of optimization for Generalized Linear Models (GLMs) with Gaussian data has been a central focus in statistics and probability in recent years. While convex cases, such as the LASSO,…
-
Applications of Statistical Field Theory in Deep Learning
Applications of Statistical Field Theory in Deep Learning arXiv:2502.18553v1 Announce Type: new Abstract: Deep learning algorithms have made incredible strides in the past decade yet due to the complexity of these algorithms, the science of deep learning remains in its early stages. Being an experimentally driven field, it is natural to seek a theory of…
-
Learning and Computation of $Phi$-Equilibria at the Frontier of Tractability
Learning and Computation of $Phi$-Equilibria at the Frontier of Tractability arXiv:2502.18582v1 Announce Type: new Abstract: $Phi$-equilibria — and the associated notion of $Phi$-regret — are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $Phi$ begets stronger notions of rationality. Recently, Daskalakis, Farina, Fishelson,…
-
Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood
Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood arXiv:2502.19086v1 Announce Type: new Abstract: We introduce the use of Gaussian Processes (GPs) for the probabilistic forecasting of intermittent time series. The model is trained in a Bayesian framework that accounts for the uncertainty about the latent function and marginalizes it out when making predictions.…
-
Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data
Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data arXiv:2502.18756v1 Announce Type: new Abstract: Motivation: Biomedical studies increasingly produce multi-view high-dimensional datasets (e.g., multi-omics) that demand integrative analysis. Existing canonical correlation analysis (CCA) and generalized CCA methods address at most two of the following three key aspects simultaneously: (i) nonlinear dependence, (ii) sparsity for…
-
Enhancing Gradient-based Discrete Sampling via Parallel Tempering
Enhancing Gradient-based Discrete Sampling via Parallel Tempering arXiv:2502.19240v1 Announce Type: new Abstract: While gradient-based discrete samplers are effective in sampling from complex distributions, they are susceptible to getting trapped in local minima, particularly in high-dimensional, multimodal discrete distributions, owing to the discontinuities inherent in these landscapes. To circumvent this issue, we combine parallel tempering, also…
-
Are GNNs doomed by the topology of their input graph?
Are GNNs doomed by the topology of their input graph? arXiv:2502.17739v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable success in learning from graph-structured data. However, the influence of the input graph’s topology on GNN behavior remains poorly understood. In this work, we explore whether GNNs are inherently limited by the structure…
-
An Overview of Large Language Models for Statisticians
An Overview of Large Language Models for Statisticians arXiv:2502.17814v1 Announce Type: new Abstract: Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI), exhibiting remarkable capabilities across diverse tasks such as text generation, reasoning, and decision-making. While their success has primarily been driven by advances in computational power and deep learning architectures,…
-
Conformal Prediction Under Generalized Covariate Shift with Posterior Drift
Conformal Prediction Under Generalized Covariate Shift with Posterior Drift arXiv:2502.17744v1 Announce Type: new Abstract: In many real applications of statistical learning, collecting sufficiently many training data is often expensive, time-consuming, or even unrealistic. In this case, a transfer learning approach, which aims to leverage knowledge from a related source domain to improve the learning performance…
-
Golden Ratio Mixing of Real and Synthetic Data for Stabilizing Generative Model Training
Golden Ratio Mixing of Real and Synthetic Data for Stabilizing Generative Model Training arXiv:2502.18049v1 Announce Type: new Abstract: Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies…
-
Near-Optimal Approximations for Bayesian Inference in Function Space
Near-Optimal Approximations for Bayesian Inference in Function Space arXiv:2502.18279v1 Announce Type: new Abstract: We propose a scalable inference algorithm for Bayes posteriors defined on a reproducing kernel Hilbert space (RKHS). Given a likelihood function and a Gaussian random element representing the prior, the corresponding Bayes posterior measure $Pi_{text{B}}$ can be obtained as the stationary distribution…
-
Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements
Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements arXiv:2502.16008v1 Announce Type: new Abstract: We consider the problem of exact recovery of a $k$-sparse binary vector from generalized linear measurements (such as logistic regression). We analyze the linear estimation algorithm (Plan, Vershynin, Yudovina, 2017), and also show information theoretic lower bounds on the number…
-
A Review of Causal Decision Making
A Review of Causal Decision Making arXiv:2502.16156v1 Announce Type: new Abstract: To make effective decisions, it is important to have a thorough understanding of the causal relationships among actions, environments, and outcomes. This review aims to surface three crucial aspects of decision-making through a causal lens: 1) the discovery of causal relationships through causal structure…
-
Statistical Inference in Reinforcement Learning: A Selective Survey
Statistical Inference in Reinforcement Learning: A Selective Survey arXiv:2502.16195v1 Announce Type: new Abstract: Reinforcement learning (RL) is concerned with how intelligence agents take actions in a given environment to maximize the cumulative reward they receive. In healthcare, applying RL algorithms could assist patients in improving their health status. In ride-sharing platforms, applying RL algorithms could…
-
Rectifying Conformity Scores for Better Conditional Coverage
Rectifying Conformity Scores for Better Conditional Coverage arXiv:2502.16336v1 Announce Type: new Abstract: We present a new method for generating confidence sets within the split conformal prediction framework. Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage. The transformation is based on an estimate of…
-
Subspace Recovery in Winsorized PCA: Insights into Accuracy and Robustness
Subspace Recovery in Winsorized PCA: Insights into Accuracy and Robustness arXiv:2502.16391v1 Announce Type: new Abstract: In this paper, we explore the theoretical properties of subspace recovery using Winsorized Principal Component Analysis (WPCA), utilizing a common data transformation technique that caps extreme values to mitigate the impact of outliers. Despite the widespread use of winsorization in…
-
Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making
Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making arXiv:2502.15072v1 Announce Type: new Abstract: Policymakers often use Classification and Regression Trees (CART) to partition populations based on binary outcomes and target subpopulations whose probability of the binary event exceeds a threshold. However, classic CART and knowledge distillation method whose student model…
-
Variational phylogenetic inference with products over bipartitions
Variational phylogenetic inference with products over bipartitions arXiv:2502.15110v1 Announce Type: new Abstract: Bayesian phylogenetics requires accurate and efficient approximation of posterior distributions over trees. In this work, we develop a variational Bayesian approach for ultrametric phylogenetic trees. We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form…
-
Tensor Product Neural Networks for Functional ANOVA Model
Tensor Product Neural Networks for Functional ANOVA Model arXiv:2502.15215v1 Announce Type: new Abstract: Interpretability for machine learning models is becoming more and more important as machine learning models become more complex. The functional ANOVA model, which decomposes a high-dimensional function into a sum of lower dimensional functions so called components, is one of the most…
-
Fr’echet Cumulative Covariance Net for Deep Nonlinear Sufficient Dimension Reduction with Random Objects
Fr’echet Cumulative Covariance Net for Deep Nonlinear Sufficient Dimension Reduction with Random Objects arXiv:2502.15374v1 Announce Type: new Abstract: Nonlinear sufficient dimension reductioncitep{libing_generalSDR}, which constructs nonlinear low-dimensional representations to summarize essential features of high-dimensional data, is an important branch of representation learning. However, most existing methods are not applicable when the response variables are complex non-Euclidean…
-
Towards a perturbation-based explanation for medical AI as differentiable programs
Towards a perturbation-based explanation for medical AI as differentiable programs arXiv:2502.14001v1 Announce Type: new Abstract: Recent advancement in machine learning algorithms reaches a point where medical devices can be equipped with artificial intelligence (AI) models for diagnostic support and routine automation in clinical settings. In medicine and healthcare, there is a particular demand for sufficient…
-
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition arXiv:2502.14060v1 Announce Type: new Abstract: We study fundamental limits of first-order stochastic optimization in a range of nonconvex settings, including L-smooth functions satisfying Quasar-Convexity (QC), Quadratic Growth (QG), and Restricted Secant Inequalities (RSI). While the convergence properties of standard algorithms are well-understood in deterministic regimes,…
-
Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs
Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs arXiv:2502.14121v1 Announce Type: new Abstract: Designing modern industrial systems requires balancing several competing objectives, such as profitability, resilience, and sustainability, while accounting for complex interactions between technological, economic, and environmental factors. Multi-objective optimization (MOO) methods are commonly used to navigate…
-
Conformal Prediction under L’evy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations
Conformal Prediction under L’evy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations arXiv:2502.14105v1 Announce Type: new Abstract: Conformal prediction provides a powerful framework for constructing prediction intervals with finite-sample guarantees, yet its robustness under distribution shifts remains a significant challenge. This paper addresses this limitation by modeling distribution shifts using L’evy-Prokhorov (LP) ambiguity sets, which…
-
Prediction-Powered Adaptive Shrinkage Estimation
Prediction-Powered Adaptive Shrinkage Estimation arXiv:2502.14166v1 Announce Type: new Abstract: Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI’s benefits for individual statistical tasks, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS),…
-
Model selection for behavioral learning data and applications to contextual bandits
Model selection for behavioral learning data and applications to contextual bandits arXiv:2502.13186v1 Announce Type: new Abstract: Learning for animals or humans is the process that leads to behaviors better adapted to the environment. This process highly depends on the individual that learns and is usually observed only through the individual’s actions. This article presents ways…
-
Task Shift: From Classification to Regression in Overparameterized Linear Models
Task Shift: From Classification to Regression in Overparameterized Linear Models arXiv:2502.13285v1 Announce Type: new Abstract: Modern machine learning methods have recently demonstrated remarkable capability to generalize under task shift, where latent knowledge is transferred to a different, often more difficult, task under a similar data distribution. We investigate this phenomenon in an overparameterized linear regression…
-
An Efficient Permutation-Based Kernel Two-Sample Test
An Efficient Permutation-Based Kernel Two-Sample Test arXiv:2502.13570v1 Announce Type: new Abstract: Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due…
-
Identifying metric structures of deep latent variable models
Identifying metric structures of deep latent variable models arXiv:2502.13757v1 Announce Type: new Abstract: Deep latent variable models learn condensed representations of data that, hopefully, reflect the inner workings of the studied phenomena. Unfortunately, these latent representations are not statistically identifiable, meaning they cannot be uniquely determined. Domain experts, therefore, need to tread carefully when interpreting…
-
Graph Signal Inference by Learning Narrowband Spectral Kernels
Graph Signal Inference by Learning Narrowband Spectral Kernels arXiv:2502.13686v1 Announce Type: new Abstract: While a common assumption in graph signal analysis is the smoothness of the signals or the band-limitedness of their spectrum, in many instances the spectrum of real graph data may be concentrated at multiple regions of the spectrum, possibly including mid-to-high-frequency components.…
-
Suboptimal Shapley Value Explanations
Suboptimal Shapley Value Explanations arXiv:2502.12209v1 Announce Type: new Abstract: Deep Neural Networks (DNNs) have demonstrated strong capacity in supporting a wide variety of applications. Shapley value has emerged as a prominent tool to analyze feature importance to help people understand the inference process of deep neural models. Computing Shapley value function requires choosing a baseline…
-
The Majority Vote Paradigm Shift: When Popular Meets Optimal
The Majority Vote Paradigm Shift: When Popular Meets Optimal arXiv:2502.12581v1 Announce Type: new Abstract: Reliably labelling data typically requires annotations from multiple human workers. However, humans are far from being perfect. Hence, it is a common practice to aggregate labels gathered from multiple annotators to make a more confident estimate of the true label. Among…
-
Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation
Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation arXiv:2502.12607v1 Announce Type: new Abstract: We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such optimization by leveraging duality theory in…
-
Green LIME: Improving AI Explainability through Design of Experiments
Green LIME: Improving AI Explainability through Design of Experiments arXiv:2502.12753v1 Announce Type: new Abstract: In artificial intelligence (AI), the complexity of many models and processes often surpasses human interpretability, making it challenging to understand why a specific prediction is made. This lack of transparency is particularly problematic in critical fields like healthcare, where trust in…
-
Federated Variational Inference for Bayesian Mixture Models
Federated Variational Inference for Bayesian Mixture Models arXiv:2502.12684v1 Announce Type: new Abstract: We present a federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets. We introduce a principled ‘divide and conquer’ inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by…
-
Forecasting time series with constraints
Forecasting time series with constraints arXiv:2502.10485v1 Announce Type: new Abstract: Time series forecasting presents unique challenges that limit the effectiveness of traditional machine learning algorithms. To address these limitations, various approaches have incorporated linear constraints into learning algorithms, such as generalized additive models and hierarchical forecasting. In this paper, we propose a unified framework for…
-
Weighted quantization using MMD: From mean field to mean shift via gradient flows
Weighted quantization using MMD: From mean field to mean shift via gradient flows arXiv:2502.10600v1 Announce Type: new Abstract: Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a finite weighted mixture of Dirac measures that best approximates…
-
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm arXiv:2502.10650v1 Announce Type: new Abstract: Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational Autoencoders (VAEs) have been one of the most…
-
Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes
Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes arXiv:2502.10605v1 Announce Type: new Abstract: Estimating the causal effects of an intervention on outcomes is crucial. But often in domains such as healthcare and social services, this critical information about outcomes is documented by unstructured text, e.g. clinical notes in healthcare or case notes in social services.…
-
Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training
Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training arXiv:2502.10793v1 Announce Type: new Abstract: Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers…
-
Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs
Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs arXiv:2502.09832v1 Announce Type: new Abstract: In this paper, assuming a natural strengthening of the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated ErdH{o}s-R’enyi graphs $mathcal G(n,q;rho)$ when the edge-density $q=n^{-1+o(1)}$ and…
-
On Volume Minimization in Conformal Regression
On Volume Minimization in Conformal Regression arXiv:2502.09985v1 Announce Type: new Abstract: We study the question of volume optimality in split conformal regression, a topic still poorly understood in comparison to coverage control. Using the fact that the calibration step can be seen as an empirical volume minimization problem, we first derive a finite-sample upper-bound on…
-
Estimation of the Learning Coefficient Using Empirical Loss
Estimation of the Learning Coefficient Using Empirical Loss arXiv:2502.09998v1 Announce Type: new Abstract: The learning coefficient plays a crucial role in analyzing the performance of information criteria, such as the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC), which Sumio Watanabe developed to assess model generalization ability. In regular statistical…
-
Improved Online Confidence Bounds for Multinomial Logistic Bandits
Improved Online Confidence Bounds for Multinomial Logistic Bandits arXiv:2502.10020v1 Announce Type: new Abstract: In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online confidence bound for MNL models and achieved nearly…
-
Combinatorial Reinforcement Learning with Preference Feedback
Combinatorial Reinforcement Learning with Preference Feedback arXiv:2502.10158v1 Announce Type: new Abstract: In this paper, we consider combinatorial reinforcement learning with preference feedback, where a learning agent sequentially offers an action–an assortment of multiple items to–a user, whose preference feedback follows a multinomial logistic (MNL) model. This framework allows us to model real-world scenarios, particularly those…
-
A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection
A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection arXiv:2502.08695v1 Announce Type: new Abstract: Bayesian nonparametric methods are naturally suited to the problem of out-of-distribution (OOD) detection. However, these techniques have largely been eschewed in favor of simpler methods based on distances between pre-trained or learned embeddings of data points. Here we…
-
Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition
Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition arXiv:2502.09047v1 Announce Type: new Abstract: A common pursuit in modern statistical learning is to attain satisfactory generalization out of the source data distribution (OOD). In theory, the challenge remains unsolved even under the canonical setting of covariate shift for the linear model.…
-
Off-Policy Evaluation for Recommendations with Missing-Not-At-Random Rewards
Off-Policy Evaluation for Recommendations with Missing-Not-At-Random Rewards arXiv:2502.08993v1 Announce Type: new Abstract: Unbiased recommender learning (URL) and off-policy evaluation/learning (OPE/L) techniques are effective in addressing the data bias caused by display position and logging policies, thereby consistently improving the performance of recommendations. However, when both bias exits in the logged data, these estimators may suffer…
-
Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling
Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling arXiv:2502.09306v1 Announce Type: new Abstract: We investigate the theoretical properties of general diffusion (interpolation) paths and their Langevin Monte Carlo implementation, referred to as diffusion annealed Langevin Monte Carlo (DALMC), under weak conditions on the data distribution. Specifically, we analyse and provide non-asymptotic error…
-
A Differentiable Rank-Based Objective For Better Feature Learning
A Differentiable Rank-Based Objective For Better Feature Learning arXiv:2502.09445v1 Announce Type: new Abstract: In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in cite{azadkia2021simple}. While FOCI is based on a…
-
SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph
SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph arXiv:2502.07857v1 Announce Type: new Abstract: Causal discovery can be computationally demanding for large numbers of variables. If we only wish to estimate the causal effects on a small subset of target variables, we might not need to learn the causal graph for…
-
Discrete Markov Probabilistic Models
Discrete Markov Probabilistic Models arXiv:2502.07939v1 Announce Type: new Abstract: This paper introduces the Discrete Markov Probabilistic Model (DMPM), a novel algorithm for discrete data generation. The algorithm operates in the space of bits ${0,1}^d$, where the noising process is a continuous-time Markov chain that can be sampled exactly via a Poissonian clock that flips labels…
-
The Observational Partial Order of Causal Structures with Latent Variables
The Observational Partial Order of Causal Structures with Latent Variables arXiv:2502.07891v1 Announce Type: new Abstract: For two causal structures with the same set of visible variables, one is said to observationally dominate the other if the set of distributions over the visible variables realizable by the first contains the set of distributions over the visible…
-
Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design
Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design arXiv:2502.08004v1 Announce Type: new Abstract: Simulation-based inference (SBI) is a method to perform inference on a variety of complex scientific models with challenging inference (inverse) problems. Bayesian Optimal Experimental Design (BOED) aims to efficiently use experimental resources to make better inferences. Various…
-
Multi-View Oriented GPLVM: Expressiveness and Efficiency
Multi-View Oriented GPLVM: Expressiveness and Efficiency arXiv:2502.08253v1 Announce Type: new Abstract: The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral…
-
Confidence Intervals for Evaluation of Data Mining
Confidence Intervals for Evaluation of Data Mining arXiv:2502.07016v1 Announce Type: new Abstract: In data mining, when binary prediction rules are used to predict a binary outcome, many performance measures are used in a vast array of literature for the purposes of evaluation and comparison. Some examples include classification accuracy, precision, recall, F measures, and Jaccard…
-
Epistemic Uncertainty in Conformal Scores: A Unified Approach
Epistemic Uncertainty in Conformal Scores: A Unified Approach arXiv:2502.06995v1 Announce Type: new Abstract: Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture epistemic uncertainty, which can lead to overconfident predictions in data-sparse regions. Although recent conformal scores have been developed to address this limitation, they are typically designed for specific…
-
Generative Distribution Prediction: A Unified Approach to Multimodal Learning
Generative Distribution Prediction: A Unified Approach to Multimodal Learning arXiv:2502.07090v1 Announce Type: new Abstract: Accurate prediction with multimodal data-encompassing tabular, textual, and visual inputs or outputs-is fundamental to advancing analytics in diverse application domains. Traditional approaches often struggle to integrate heterogeneous data types while maintaining high predictive accuracy. We introduce Generative Distribution Prediction (GDP), a…
-
Online Covariance Matrix Estimation in Sketched Newton Methods
Online Covariance Matrix Estimation in Sketched Newton Methods arXiv:2502.07114v1 Announce Type: new Abstract: Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique…
-
Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds
Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds arXiv:2502.07265v1 Announce Type: new Abstract: We introduce the Riemannian Proximal Sampler, a method for sampling from densities defined on Riemannian manifolds. The performance of this sampler critically depends on two key oracles: the Manifold Brownian Increments (MBI) oracle and the Riemannian Heat-kernel (RHK) oracle. We establish high-accuracy…
-
Online Covariance Estimation in Nonsmooth Stochastic Approximation
Online Covariance Estimation in Nonsmooth Stochastic Approximation arXiv:2502.05305v1 Announce Type: new Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H’ajek and Le Cam.…
-
dynoGP: Deep Gaussian Processes for dynamic system identification
dynoGP: Deep Gaussian Processes for dynamic system identification arXiv:2502.05620v1 Announce Type: new Abstract: In this work, we present a novel approach to system identification for dynamical systems, based on a specific class of Deep Gaussian Processes (Deep GPs). These models are constructed by interconnecting linear dynamic GPs (equivalent to stochastic linear time-invariant dynamical systems) and…