Category: stat.ML

  • TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

    TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models arXiv:2507.10643v1 Announce Type: new Abstract: Existing post-hoc model-agnostic methods generate external explanations for opaque models, primarily by locally attributing the model output to its input features. However, they often lack an explicit and systematic framework for quantifying the contribution of individual features. Building…

  • Robust Multi-Manifold Clustering via Simplex Paths

    Robust Multi-Manifold Clustering via Simplex Paths arXiv:2507.10710v1 Announce Type: new Abstract: This article introduces a novel, geometric approach for multi-manifold clustering (MMC), i.e. for clustering a collection of potentially intersecting, d-dimensional manifolds into the individual manifold components. We first compute a locality graph on d-simplices, using the dihedral angle in between adjacent simplices as the…

  • GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering

    GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering arXiv:2507.10956v1 Announce Type: new Abstract: It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn the…

  • Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection

    Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection arXiv:2507.11136v1 Announce Type: new Abstract: Tensor Network (TN) Kernel Machines speed up model learning by representing parameters as low-rank TNs, reducing computation and memory use. However, most TN-based Kernel methods are deterministic and ignore parameter uncertainty. Further, they require manual tuning of model…

  • How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction

    How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction arXiv:2507.11161v1 Announce Type: new Abstract: In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely…

  • The Bayesian Approach to Continual Learning: An Overview

    The Bayesian Approach to Continual Learning: An Overview arXiv:2507.08922v1 Announce Type: new Abstract: Continual learning is an online paradigm where a learner continually accumulates knowledge from different tasks encountered over sequential time steps. Importantly, the learner is required to extend and update its knowledge without forgetting about the learning experience acquired from the past, and…

  • Physics-informed machine learning: A mathematical framework with applications to time series forecasting

    Physics-informed machine learning: A mathematical framework with applications to time series forecasting arXiv:2507.08906v1 Announce Type: new Abstract: Physics-informed machine learning (PIML) is an emerging framework that integrates physical knowledge into machine learning models. This physical prior often takes the form of a partial differential equation (PDE) system that the regression function must satisfy. In the…

  • Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization

    Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization arXiv:2507.09093v1 Announce Type: new Abstract: We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts.…

  • Fixed-Confidence Multiple Change Point Identification under Bandit Feedback

    Fixed-Confidence Multiple Change Point Identification under Bandit Feedback arXiv:2507.08994v1 Announce Type: new Abstract: Piecewise constant functions describe a variety of real-world phenomena in domains ranging from chemistry to manufacturing. In practice, it is often required to confidently identify the locations of the abrupt changes in these functions as quickly as possible. For this, we introduce…

  • CoVAE: Consistency Training of Variational Autoencoders

    CoVAE: Consistency Training of Variational Autoencoders arXiv:2507.09103v1 Announce Type: new Abstract: Current state-of-the-art generative approaches frequently rely on a two-stage training procedure, where an autoencoder (often a VAE) first performs dimensionality reduction, followed by training a generative model on the learned latent space. While effective, this introduces computational overhead and increased sampling times. We challenge…

  • Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation

    Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation arXiv:2507.08108v1 Announce Type: new Abstract: textit{Mallows model} is a widely-used probabilistic framework for learning from ranking data, with applications ranging from recommendation systems and voting to aligning language models with human preferences~cite{chen2024mallows, kleinberg2021algorithmic, rafailov2024direct}. Under this model, observed rankings are noisy perturbations of a…

  • CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

    CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk arXiv:2507.08150v1 Announce Type: new Abstract: Accurate uncertainty quantification is critical for reliable predictive modeling, especially in regression tasks. Existing methods typically address either aleatoric uncertainty from measurement noise or epistemic uncertainty from limited data, but not necessarily both in a balanced way. We propose CLEAR, a calibration…

  • MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts

    MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts arXiv:2507.08280v1 Announce Type: new Abstract: In real-world data analysis, missingness distributional shifts between training and test input datasets frequently occur, posing a significant challenge to achieving robust prediction performance. In this study, we propose a novel deep learning framework designed to address such shifts in missingness…

  • Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks

    Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks arXiv:2507.08261v1 Announce Type: new Abstract: Batch normalization (BN) is a ubiquitous operation in deep neural networks used primarily to achieve stability and regularization during network training. BN involves feature map centering and scaling using sample means and variances, respectively. Since these statistics…

  • Optimal and Practical Batched Linear Bandit Algorithm

    Optimal and Practical Batched Linear Bandit Algorithm arXiv:2507.08438v1 Announce Type: new Abstract: We study the linear bandit problem under limited adaptivity, known as the batched linear bandit. While existing approaches can achieve near-optimal regret in theory, they are often computationally prohibitive or underperform in practice. We propose texttt{BLAE}, a novel batched algorithm that integrates arm…

  • Topological Machine Learning with Unreduced Persistence Diagrams

    Topological Machine Learning with Unreduced Persistence Diagrams arXiv:2507.07156v1 Announce Type: new Abstract: Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore…

  • Class conditional conformal prediction for multiple inputs by p-value aggregation

    Class conditional conformal prediction for multiple inputs by p-value aggregation arXiv:2507.07150v1 Announce Type: new Abstract: Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification tasks, specifically tailored for scenarios where multiple observations (multi-inputs) of a…

  • Bayesian Double Descent

    Bayesian Double Descent arXiv:2507.07338v1 Announce Type: new Abstract: Double descent is a phenomenon of over-parameterized statistical models. Our goal is to view double descent from a Bayesian perspective. Over-parameterized models such as deep neural networks have an interesting re-descending property in their risk characteristics. This is a recent phenomenon in machine learning and has been…

  • Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals

    Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals arXiv:2507.07461v1 Announce Type: new Abstract: When performing Bayesian inference using Sequential Monte Carlo (SMC) methods, two considerations arise: the accuracy of the posterior approximation and computational efficiency. To address computational demands, Sequential Monte Carlo Squared (SMC$^2$) is well-suited for high-performance computing (HPC) environments.…

  • Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting

    Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting arXiv:2507.07469v1 Announce Type: new Abstract: Time-series models like ARIMA remain widely used for forecasting but limited to linear assumptions and high computational cost in large and complex datasets. We propose Galerkin-ARIMA that generalizes the AR component of ARIMA and replace it with a flexible…

  • On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective

    On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective arXiv:2507.06552v1 Announce Type: new Abstract: This paper studies the hardness of unsupervised domain adaptation (UDA) under covariate shift. We model the uncertainty that the learner faces by a distribution $pi$ in the ground-truth triples $(p, q, f)$ — which we call a UDA…

  • Semi-parametric Functional Classification via Path Signatures Logistic Regression

    Semi-parametric Functional Classification via Path Signatures Logistic Regression arXiv:2507.06637v1 Announce Type: new Abstract: We propose Path Signatures Logistic Regression (PSLR), a semi-parametric framework for classifying vector-valued functional data with scalar covariates. Classical functional logistic regression models rely on linear assumptions and fixed basis expansions, which limit flexibility and degrade performance under irregular sampling. PSLR overcomes…

  • Fast Gaussian Processes under Monotonicity Constraints

    Fast Gaussian Processes under Monotonicity Constraints arXiv:2507.06677v1 Announce Type: new Abstract: Gaussian processes (GPs) are widely used as surrogate models for complicated functions in scientific and engineering applications. In many cases, prior knowledge about the function to be approximated, such as monotonicity, is available and can be leveraged to improve model fidelity. Incorporating such constraints…

  • Conformal Prediction for Long-Tailed Classification

    Conformal Prediction for Long-Tailed Classification arXiv:2507.06867v1 Announce Type: new Abstract: Many real-world classification problems, such as plant identification, have extremely long-tailed class distributions. In order for prediction sets to be useful in such settings, they should (i) provide good class-conditional coverage, ensuring that rare classes are not systematically omitted from the prediction sets, and (ii)…

  • Adaptive collaboration for online personalized distributed learning with heterogeneous clients

    Adaptive collaboration for online personalized distributed learning with heterogeneous clients arXiv:2507.06844v1 Announce Type: new Abstract: We study the problem of online personalized decentralized learning with $N$ statistically heterogeneous clients collaborating to accelerate local training. An important challenge in this setting is to select relevant collaborators to reduce gradient variance while mitigating the introduced bias. To…

  • Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting

    Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting arXiv:2507.05470v1 Announce Type: new Abstract: We propose Temporal Conformal Prediction (TCP), a novel framework for constructing prediction intervals in financial time-series with guaranteed finite-sample validity. TCP integrates quantile regression with a conformal calibration layer that adapts online via a decaying…

  • Enjoying Non-linearity in Multinomial Logistic Bandits

    Enjoying Non-linearity in Multinomial Logistic Bandits arXiv:2507.05306v1 Announce Type: new Abstract: We consider the multinomial logistic bandit problem, a variant of generalized linear bandits where a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on…

  • A Malliavin calculus approach to score functions in diffusion generative models

    A Malliavin calculus approach to score functions in diffusion generative models arXiv:2507.05550v1 Announce Type: new Abstract: Score-based diffusion generative models have recently emerged as a powerful tool for modelling complex data distributions. These models aim at learning the score function, which defines a map from a known probability distribution to the target data distribution via…

  • Property Elicitation on Imprecise Probabilities

    Property Elicitation on Imprecise Probabilities arXiv:2507.05857v1 Announce Type: new Abstract: Property elicitation studies which attributes of a probability distribution can be determined by minimising a risk. We investigate a generalisation of property elicitation to imprecise probabilities (IP). This investigation is motivated by multi-distribution learning, which takes the classical machine learning paradigm of minimising a single…

  • Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis

    Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis arXiv:2507.05913v1 Announce Type: new Abstract: A simple yet effective method for inference-time alignment of generative models is Best-of-$N$ (BoN), where $N$ outcomes are sampled from a reference policy, evaluated using a proxy reward model, and the highest-scoring one is selected. While prior work argues that…

  • Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation

    Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation arXiv:2507.03169v1 Announce Type: new Abstract: The rise of generative AI search engines is disrupting traditional SEO, with Gartner predicting 25% reduction in conventional search usage by 2026. This necessitates new approaches for web content visibility in AI-driven search environments. We present a domain-specific fine-tuning approach…

  • LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference

    LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference arXiv:2507.03271v1 Announce Type: new Abstract: Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one…

  • Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data

    Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data arXiv:2507.03681v1 Announce Type: new Abstract: Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover effect heterogeneity over patient characteristics, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic…

  • Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression

    Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression arXiv:2507.03736v1 Announce Type: new Abstract: In this work, we propose a novel methodology for robustly estimating particle size distributions from optical scattering measurements using constrained Gaussian process regression. The estimation of particle size distributions is commonly formulated as a Fredholm integral equation of…

  • Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis

    Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis arXiv:2507.03756v1 Announce Type: new Abstract: The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data — implying that some form of…

  • Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation

    Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation arXiv:2507.02084v1 Announce Type: new Abstract: The adaptive Iterative Soft-Thresholding Algorithm (ISTA) has been a popular algorithm for finding a desirable solution to the LASSO problem without explicitly tuning the regularization parameter $lambda$. Despite that the adaptive ISTA is a successful practical algorithm, few theoretical results exist.…

  • Hybrid least squares for learning functions from highly noisy data

    Hybrid least squares for learning functions from highly noisy data arXiv:2507.02215v1 Announce Type: new Abstract: Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are powerful in the small noise regime are suboptimal when large noise is present. We propose…

  • Transfer Learning for Matrix Completion

    Transfer Learning for Matrix Completion arXiv:2507.02248v1 Announce Type: new Abstract: In this paper, we explore the knowledge transfer under the setting of matrix completion, which aims to enhance the estimation of a low-rank target matrix with auxiliary data available. We propose a transfer learning procedure given prior information on which source datasets are favorable. We…

  • It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

    It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation arXiv:2507.02275v1 Announce Type: new Abstract: Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a…

  • Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited

    Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited arXiv:2507.02377v1 Announce Type: new Abstract: Inducing-point-based sparse variational Gaussian processes have become the standard workhorse for scaling up GP models. Recent advances show that these methods can be improved by introducing a diagonal scaling matrix to the conditional posterior density given the inducing points. This paper first…

  • Asymptotic convexity of wide and shallow neural networks

    Asymptotic convexity of wide and shallow neural networks arXiv:2507.01044v1 Announce Type: new Abstract: For a simple model of shallow and wide neural networks, we show that the epigraph of its input-output map as a function of the network parameters approximates epigraph of a. convex function in a precise sense. This leads to a plausible explanation…

  • Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles

    Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles arXiv:2507.01542v1 Announce Type: new Abstract: Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting…

  • A generative modeling / Physics-Informed Neural Network approach to random differential equations

    A generative modeling / Physics-Informed Neural Network approach to random differential equations arXiv:2507.01687v1 Announce Type: new Abstract: The integration of Scientific Machine Learning (SciML) techniques with uncertainty quantification (UQ) represents a rapidly evolving frontier in computational science. This work advances Physics-Informed Neural Networks (PINNs) by incorporating probabilistic frameworks to effectively model uncertainty in complex systems.…

  • When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery

    When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery arXiv:2507.01613v1 Announce Type: new Abstract: Paired comparison data, where users evaluate items in pairs, play a central role in ranking and preference learning tasks. While ordinal comparison data intuitively offer richer information than binary comparisons, this paper challenges that conventional wisdom. We…

  • Proof of a perfect platonic representation hypothesis

    Proof of a perfect platonic representation hypothesis arXiv:2507.01098v1 Announce Type: cross Abstract: In this note, we elaborate on and explain in detail the proof given by Ziyin et al. (2025) of the “perfect” Platonic Representation Hypothesis (PRH) for the embedded deep linear network model (EDLN). We show that if trained with SGD, two EDLNs with…

  • Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets

    Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets arXiv:2507.00298v1 Announce Type: new Abstract: This study addresses the challenge of statistically extracting generative factors from complex, high-dimensional datasets in unsupervised or semi-supervised settings. We investigate encoder-decoder-based generative models for nonlinear dimensionality reduction, focusing on disentangling low-dimensional latent variables…

  • Disentangled Feature Importance

    Disentangled Feature Importance arXiv:2507.00260v1 Announce Type: new Abstract: Feature importance quantification faces a fundamental challenge: when predictors are correlated, standard methods systematically underestimate their contributions. We prove that major existing approaches target identical population functionals under squared-error loss, revealing why they share this correlation-induced bias. To address this limitation, we introduce emph{Disentangled Feature Importance (DFI)},…

  • GRAND: Graph Release with Assured Node Differential Privacy

    GRAND: Graph Release with Assured Node Differential Privacy arXiv:2507.00402v1 Announce Type: new Abstract: Differential privacy is a well-established framework for safeguarding sensitive information in data. While extensively applied across various domains, its application to network data — particularly at the node level — remains underexplored. Existing methods for node-level privacy either focus exclusively on query-based…

  • Forward Reverse Kernel Regression for the Schr”{o}dinger bridge problem

    Forward Reverse Kernel Regression for the Schr”{o}dinger bridge problem arXiv:2507.00640v1 Announce Type: new Abstract: In this paper, we study the Schr”odinger Bridge Problem (SBP), which is central to entropic optimal transport. For general reference processes and begin–endpoint distributions, we propose a forward-reverse iterative Monte Carlo procedure to approximate the Schr”odinger potentials in a nonparametric way.…

  • An in depth look at the Procrustes-Wasserstein distance: properties and barycenters

    An in depth look at the Procrustes-Wasserstein distance: properties and barycenters arXiv:2507.00894v1 Announce Type: new Abstract: Due to its invariance to rigid transformations such as rotations and reflections, Procrustes-Wasserstein (PW) was introduced in the literature as an optimal transport (OT) distance, alternative to Wasserstein and more suited to tasks such as the alignment and comparison…

  • Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

    Strategic A/B testing via Maximum Probability-driven Two-armed Bandit arXiv:2506.22536v1 Announce Type: new Abstract: Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their…

  • Adjoint Schr”odinger Bridge Sampler

    Adjoint Schr”odinger Bridge Sampler arXiv:2506.22565v1 Announce Type: new Abstract: Computational methods for learning to sample from the Boltzmann distribution — where the target distribution is known only up to an unnormalized energy function — have advanced significantly recently. Due to the lack of explicit target samples, however, prior diffusion-based methods, known as diffusion samplers, often…

  • Bayesian Invariance Modeling of Multi-Environment Data

    Bayesian Invariance Modeling of Multi-Environment Data arXiv:2506.22675v1 Announce Type: new Abstract: Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features – those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this…

  • CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

    CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation arXiv:2506.22963v1 Announce Type: new Abstract: Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on…

  • AICO: Feature Significance Tests for Supervised Learning

    AICO: Feature Significance Tests for Supervised Learning arXiv:2506.23396v1 Announce Type: new Abstract: The opacity of many supervised learning algorithms remains a key challenge, hindering scientific discovery and limiting broader deployment — particularly in high-stakes domains. This paper develops model- and distribution-agnostic significance tests to assess the influence of input features in any regression or classification…

  • Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19

    Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19 arXiv:2506.21739v1 Announce Type: new Abstract: Authors Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, and Tzu-Hsuan Liu use the Finite Impulse Response (FIR) linear system filtering method to track and predict the number of people infected and recovered from COVID-19, in a…

  • Critically-Damped Higher-Order Langevin Dynamics

    Critically-Damped Higher-Order Langevin Dynamics arXiv:2506.21741v1 Announce Type: new Abstract: Denoising Diffusion Probabilistic Models represent an entirely new class of generative AI methods that have yet to be fully explored. Critical damping has been successfully introduced in Critically-Damped Langevin Dynamics (CLD) and Critically-Damped Third-Order Langevin Dynamics (TOLD++), but has not yet been applied to dynamics of…

  • TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics

    TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics arXiv:2506.21757v1 Announce Type: new Abstract: Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is…

  • Thompson Sampling in Function Spaces via Neural Operators

    Thompson Sampling in Function Spaces via Neural Operators arXiv:2506.21894v1 Announce Type: new Abstract: We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator’s output. We assume that functional evaluations are inexpensive, while queries to the operator (such as running a high-fidelity…

  • Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction

    Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction arXiv:2506.21802v1 Announce Type: new Abstract: Machine learning (ML) models always make a prediction, even when they are likely to be wrong. This causes problems in practical applications, as we do not know if we should trust a prediction. ML with reject option addresses this issue…

  • The final solution of the Hitchhiker’s problem #5

    The final solution of the Hitchhiker’s problem #5 arXiv:2506.20672v1 Announce Type: new Abstract: A recent survey, nicknamed “Hitchhiker’s Guide”, J.J. Arias-Garc{i}a, R. Mesiar, and B. De Baets, A hitchhiker’s guide to quasi-copulas, Fuzzy Sets and Systems 393 (2020) 1-28, has raised the rating of quasi-copula problems in the dependence modeling community in spite of the…

  • Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

    Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon arXiv:2506.20779v1 Announce Type: new Abstract: We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs — a problem well motivated by the minima stability and…

  • Active Learning for Manifold Gaussian Process Regression

    Active Learning for Manifold Gaussian Process Regression arXiv:2506.20928v1 Announce Type: new Abstract: This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the…

  • Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics

    Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics arXiv:2506.20935v1 Announce Type: new Abstract: Forecasting geopolitical conflict from data sources like the Global Database of Events, Language, and Tone (GDELT) is a critical challenge for national security. The inherent sparsity, burstiness,…

  • Lower Bounds on the Size of Markov Equivalence Classes

    Lower Bounds on the Size of Markov Equivalence Classes arXiv:2506.20933v1 Announce Type: new Abstract: Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned about the underlying causal graph from purely…

  • Data-Driven Dynamic Factor Modeling via Manifold Learning

    Data-Driven Dynamic Factor Modeling via Manifold Learning arXiv:2506.19945v1 Announce Type: new Abstract: We propose a data-driven dynamic factor framework where a response variable depends on a high-dimensional set of covariates, without imposing any parametric model on the joint dynamics. Leveraging Anisotropic Diffusion Maps, a nonlinear manifold learning technique introduced by Singer and Coifman, our framework…

  • A Principled Path to Fitted Distributional Evaluation

    A Principled Path to Fitted Distributional Evaluation arXiv:2506.20048v1 Announce Type: new Abstract: In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation — developed for expectation-based reinforcement learning — to…

  • Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

    Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives arXiv:2506.20114v1 Announce Type: new Abstract: Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an…

  • Valid Selection among Conformal Sets

    Valid Selection among Conformal Sets arXiv:2506.20173v1 Announce Type: new Abstract: Conformal prediction offers a distribution-free framework for constructing prediction sets with coverage guarantees. In practice, multiple valid conformal prediction sets may be available, arising from different models or methodologies. However, selecting the most desirable set, such as the smallest, can invalidate the coverage guarantees. To…

  • POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes

    POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes arXiv:2506.20406v1 Announce Type: new Abstract: Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on…

  • Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions

    Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions arXiv:2506.19010v1 Announce Type: new Abstract: Causal decomposition analysis aims to assess the effect of modifying risk factors on reducing social disparities in outcomes. Recently, this analysis has incorporated individual characteristics when modifying risk factors by utilizing optimal treatment regimes (OTRs). Since the…

  • When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

    When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets arXiv:2506.19031v1 Announce Type: new Abstract: While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold.…

  • Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality

    Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality arXiv:2506.19144v1 Announce Type: new Abstract: This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our…

  • Rare dense solutions clusters in asymmetric binary perceptrons — local entropy via fully lifted RDT

    Rare dense solutions clusters in asymmetric binary perceptrons — local entropy via fully lifted RDT arXiv:2506.19276v1 Announce Type: new Abstract: We study classical asymmetric binary perceptron (ABP) and associated emph{local entropy} (LE) as potential source of its algorithmic hardness. Isolation of emph{typical} ABP solutions in SAT phase seemingly suggests a universal algorithmic hardness. Paradoxically, efficient…

  • Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks

    Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks arXiv:2506.19695v1 Announce Type: new Abstract: This paper studies the $ell^p$-Lipschitz constants of ReLU neural networks $Phi: mathbb{R}^d to mathbb{R}$ with random parameters for $p in [1,infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn…

  • Coupled Entropy: A Goldilocks Generalization?

    Coupled Entropy: A Goldilocks Generalization? arXiv:2506.17229v1 Announce Type: new Abstract: Nonextensive Statistical Mechanics (NSM) has developed into a powerful toolset for modeling and analyzing complex systems. Despite its many successes, a puzzle arose early in its development. The constraints on the Tsallis entropy are in the form of an escort distribution with elements proportional to…

  • Differentiable neural network representation of multi-well, locally-convex potentials

    Differentiable neural network representation of multi-well, locally-convex potentials arXiv:2506.17242v1 Announce Type: new Abstract: Multi-well potentials are ubiquitous in science, modeling phenomena such as phase transitions, dynamic instabilities, and multimodal behavior across physics, chemistry, and biology. In contrast to non-smooth minimum-of-mixture representations, we propose a differentiable and convex formulation based on a log-sum-exponential (LSE) mixture of…

  • Gaussian Processes and Reproducing Kernels: Connections and Equivalences

    Gaussian Processes and Reproducing Kernels: Connections and Equivalences arXiv:2506.17366v1 Announce Type: new Abstract: This monograph studies the relations between two approaches using positive definite kernels: probabilistic methods using Gaussian processes, and non-probabilistic methods using reproducing kernel Hilbert spaces (RKHS). They are widely studied and used in machine learning, statistics, and numerical analysis. Connections and equivalences…

  • Scalable Machine Learning Algorithms using Path Signatures

    Scalable Machine Learning Algorithms using Path Signatures arXiv:2506.17634v1 Announce Type: new Abstract: The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures – iterated integrals that provide faithful, hierarchical representations of paths – offering a principled and universal feature map for sequential and structured data. Rooted in rough path…

  • Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes

    Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes arXiv:2506.17764v1 Announce Type: new Abstract: Band-limited functions are fundamental objects that are widely used in systems theory and signal processing. In this paper we refine a recent nonparametric, nonasymptotic method for constructing simultaneous confidence regions for band-limited functions from noisy input-output…

  • From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems

    From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems arXiv:2506.15906v1 Announce Type: new Abstract: Operator learning offers a powerful paradigm for solving parametric partial differential equations (PDEs), but scaling probabilistic neural operators such as the recently proposed Gaussian Processes Operators (GPOs) to high-dimensional, data-intensive regimes remains a significant challenge. In this…

  • Sampling conditioned diffusions via Pathspace Projected Monte Carlo

    Sampling conditioned diffusions via Pathspace Projected Monte Carlo arXiv:2506.15743v1 Announce Type: new Abstract: We present an algorithm to sample stochastic differential equations conditioned on rather general constraints, including integral constraints, endpoint constraints, and stochastic integral constraints. The algorithm is a pathspace Metropolis-adjusted manifold sampling scheme, which samples stochastic paths on the submanifold of realizations that…

  • Diffusion-Based Hypothesis Testing and Change-Point Detection

    Diffusion-Based Hypothesis Testing and Change-Point Detection arXiv:2506.16089v1 Announce Type: new Abstract: Score-based methods have recently seen increasing popularity in modeling and generation. Methods have been constructed to perform hypothesis testing and change-point detection with score functions, but these methods are in general not as powerful as their likelihood-based peers. Recent works consider generalizing the score-based…

  • CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization

    CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization arXiv:2506.16189v1 Announce Type: new Abstract: We study the problem of conformal prediction (CP) under geometric data shifts, where data samples are susceptible to transformations such as rotations or flips. While CP endows prediction models with post-hoc uncertainty quantification and formal coverage guarantees, their practicality breaks under distribution…

  • Random feature approximation for general spectral methods

    Random feature approximation for general spectral methods arXiv:2506.16283v1 Announce Type: new Abstract: Random feature approximation is arguably one of the most widely used techniques for kernel methods in large-scale learning algorithms. In this work, we analyze the generalization properties of random feature methods, extending previous results for Tikhonov regularization to a broad class of spectral…

  • Optimal Convergence Rates of Deep Neural Network Classifiers

    Optimal Convergence Rates of Deep Neural Network Classifiers arXiv:2506.14899v1 Announce Type: new Abstract: In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s in [0,infty]$) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of…

  • Double Machine Learning for Conditional Moment Restrictions: IV regression, Proximal Causal Learning and Beyond

    Double Machine Learning for Conditional Moment Restrictions: IV regression, Proximal Causal Learning and Beyond arXiv:2506.14950v1 Announce Type: new Abstract: Solving conditional moment restrictions (CMRs) is a key problem considered in statistics, causal inference, and econometrics, where the aim is to solve for a function of interest that satisfies some conditional moment equalities. Specifically, many techniques…

  • Performative Validity of Recourse Explanations

    Performative Validity of Recourse Explanations arXiv:2506.15366v1 Announce Type: new Abstract: When applicants get rejected by an algorithmic decision system, recourse explanations provide actionable suggestions for how to change their input features to get a positive evaluation. A crucial yet overlooked phenomenon is that recourse explanations are performative: When many applicants act according to their recommendations,…

  • An Observation on Lloyd’s k-Means Algorithm in High Dimensions

    An Observation on Lloyd’s k-Means Algorithm in High Dimensions arXiv:2506.14952v1 Announce Type: new Abstract: Clustering and estimating cluster means are core problems in statistics and machine learning, with k-means and Expectation Maximization (EM) being two widely used algorithms. In this work, we provide a theoretical explanation for the failure of k-means in high-dimensional settings with…

  • Time-dependent density estimation using binary classifiers

    Time-dependent density estimation using binary classifiers arXiv:2506.15505v1 Announce Type: new Abstract: We propose a data-driven method to learn the time-dependent probability density of a multivariate stochastic process from sample paths, assuming that the initial probability density is known and can be evaluated. Our method uses a novel time-dependent binary classifier trained using a contrastive estimation-based…

  • Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models

    Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models arXiv:2506.13900v1 Announce Type: new Abstract: Cooperative game theory has become a cornerstone of post-hoc interpretability in machine learning, largely through the use of Shapley values. Yet, despite their widespread adoption, Shapley-based methods often rest on axiomatic justifications whose relevance to feature attribution remains…

  • Rademacher learning rates for iterated random functions

    Rademacher learning rates for iterated random functions arXiv:2506.13946v1 Announce Type: new Abstract: Most existing literature on supervised machine learning assumes that the training dataset is drawn from an i.i.d. sample. However, many real-world problems exhibit temporal dependence and strong correlations between the marginal distributions of the data-generating process, suggesting that the i.i.d. assumption is often…

  • Meta Optimality for Demographic Parity Constrained Regression via Post-Processing

    Meta Optimality for Demographic Parity Constrained Regression via Post-Processing arXiv:2506.13947v1 Announce Type: new Abstract: We address the regression problem under the constraint of demographic parity, a commonly used fairness definition. Recent studies have revealed fair minimax optimal regression algorithms, the most accurate algorithms that adhere to the fairness constraint. However, these analyses are tightly coupled…

  • Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies

    Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies arXiv:2506.13955v1 Announce Type: new Abstract: Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from (synthetic) anomalies. We extend…

  • Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms

    Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms arXiv:2506.13984v1 Announce Type: new Abstract: In this paper, we develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning. For this purpose we formulated the constrained optimization problem, in which we exploits the Bregman divergence with the Tempesta multi-parametric deformation logarithm…

  • Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation

    Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation arXiv:2506.12183v1 Announce Type: new Abstract: Evaluating anomaly detection in multivariate time series (MTS) requires careful consideration of temporal dependencies, particularly when detecting subsequence anomalies common in fault detection scenarios. While time series cross-validation (TSCV) techniques aim to preserve temporal ordering during model evaluation, their impact…

  • Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory

    Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory arXiv:2506.12350v1 Announce Type: new Abstract: Despite its empirical success, Reinforcement Learning from Human Feedback (RLHF) has been shown to violate almost all the fundamental axioms in social choice theory — such as majority consistency, pairwise majority consistency, and Condorcet consistency. This raises…

  • A Transfer Learning Framework for Multilayer Networks via Model Averaging

    A Transfer Learning Framework for Multilayer Networks via Model Averaging arXiv:2506.12455v1 Announce Type: new Abstract: Link prediction in multilayer networks is a key challenge in applications such as recommendation systems and protein-protein interaction prediction. While many techniques have been developed, most rely on assumptions about shared structures and require access to raw auxiliary data, limiting…

  • On the existence of consistent adversarial attacks in high-dimensional linear classification

    On the existence of consistent adversarial attacks in high-dimensional linear classification arXiv:2506.12454v1 Announce Type: new Abstract: What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability…

  • Dependent Randomized Rounding for Budget Constrained Experimental Design

    Dependent Randomized Rounding for Budget Constrained Experimental Design arXiv:2506.12677v1 Announce Type: new Abstract: Policymakers in resource-constrained settings require experimental designs that satisfy strict budget limits while ensuring precise estimation of treatment effects. We propose a framework that applies a dependent randomized rounding procedure to convert assignment probabilities into binary treatment decisions. Our proposed solution preserves…