Category: stat.ML
-
On the Identifiability of Regime-Switching Models with Multi-Lag Dependencies
On the Identifiability of Regime-Switching Models with Multi-Lag Dependencies arXiv:2601.03325v1 Announce Type: new Abstract: Identifiability is central to the interpretability of deep latent variable models, ensuring parameterisations are uniquely determined by the data-generating distribution. However, it remains underexplored for deep regime-switching time series. We develop a general theoretical framework for multi-lag Regime-Switching Models (RSMs), encompassing…
-
Microeconomic Foundations of Multi-Agent Learning
Microeconomic Foundations of Multi-Agent Learning arXiv:2601.03451v1 Announce Type: new Abstract: Modern AI systems increasingly operate inside markets and institutions where data, behavior, and incentives are endogenous. This paper develops an economic foundation for multi-agent learning by studying a principal-agent interaction in a Markov decision process with strategic externalities, where both the principal and the agent…
-
Online Learning with Limited Information in the Sliding Window Model
Online Learning with Limited Information in the Sliding Window Model arXiv:2601.03533v1 Announce Type: new Abstract: Motivated by recent work on the experts problem in the streaming model, we consider the experts problem in the sliding window model. The sliding window model is a well-studied model that captures applications such as traffic monitoring, epidemic tracking, and…
-
A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification
A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification arXiv:2601.04149v1 Announce Type: new Abstract: Class imbalance significantly degrades classification performance, yet its effects are rarely analyzed from a unified theoretical perspective. We propose a principled framework based on three fundamental scales: the imbalance coefficient $eta$, the sample–dimension ratio $kappa$, and the intrinsic separability $Delta$.…
-
A path to natural language through tokenisation and transformers
A path to natural language through tokenisation and transformers arXiv:2601.03368v1 Announce Type: cross Abstract: Natural languages exhibit striking regularities in their statistical structure, including notably the emergence of Zipf’s and Heaps’ laws. Despite this, it remains broadly unclear how these properties relate to the modern tokenisation schemes used in contemporary transformer models. In this note,…
-
Mitigating Long-Tailed Anomaly Score Distributions with Importance-Weighted Loss
Mitigating Long-Tailed Anomaly Score Distributions with Importance-Weighted Loss arXiv:2601.02440v1 Announce Type: new Abstract: Anomaly detection is crucial in industrial applications for identifying rare and unseen patterns to ensure system reliability. Traditional models, trained on a single class of normal data, struggle with real-world distributions where normal data exhibit diverse patterns, leading to class imbalance and…
-
Fast Conformal Prediction using Conditional Interquantile Intervals
Fast Conformal Prediction using Conditional Interquantile Intervals arXiv:2601.02769v1 Announce Type: new Abstract: We introduce Conformal Interquantile Regression (CIR), a conformal regression method that efficiently constructs near-minimal prediction intervals with guaranteed coverage. CIR leverages black-box machine learning models to estimate outcome distributions through interquantile ranges, transforming these estimates into compact prediction intervals while achieving approximate conditional…
-
Self-Supervised Learning from Noisy and Incomplete Data
Self-Supervised Learning from Noisy and Incomplete Data arXiv:2601.03244v1 Announce Type: new Abstract: Many important problems in science and engineering involve inferring a signal from noisy and/or incomplete observations, where the observation process is known. Historically, this problem has been tackled using hand-crafted regularization (e.g., sparsity, total-variation) to obtain meaningful estimates. Recent data-driven methods often offer…
-
Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis
Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis arXiv:2601.02400v1 Announce Type: cross Abstract: Text-based causal inference increasingly employs textual data as proxies for unobserved confounders, yet this approach introduces a previously undertheorized source of bias: treatment leakage. Treatment leakage occurs when text intended to capture confounding information also contains signals…
-
First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data
First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data arXiv:2601.02523v1 Announce Type: cross Abstract: Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy resources. Yet the optimization algorithms behind…
-
Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights
Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights arXiv:2601.01029v1 Announce Type: new Abstract: This paper develops a practical framework for using observational data to audit the consumer surplus effects of AI-driven decisions, specifically in targeted pricing and algorithmic lending. Traditional approaches first estimate demand functions and then integrate to compute consumer surplus, but…
-
Fibonacci-Driven Recursive Ensembles: Algorithms, Convergence, and Learning Dynamics
Fibonacci-Driven Recursive Ensembles: Algorithms, Convergence, and Learning Dynamics arXiv:2601.01055v1 Announce Type: new Abstract: This paper develops the algorithmic and dynamical foundations of recursive ensemble learning driven by Fibonacci-type update flows. In contrast with classical boosting Freund and Schapire (1997); Friedman (2001), where the ensemble evolves through first-order additive updates, we study second-order recursive architectures in…
-
Neural Networks on Symmetric Spaces of Noncompact Type
Neural Networks on Symmetric Spaces of Noncompact Type arXiv:2601.01097v1 Announce Type: new Abstract: Recent works have demonstrated promising performances of neural networks on hyperbolic spaces and symmetric positive definite (SPD) manifolds. These spaces belong to a family of Riemannian manifolds referred to as symmetric spaces of noncompact type. In this paper, we propose a novel…
-
Conformal Blindness: A Note on $A$-Cryptic change-points
Conformal Blindness: A Note on $A$-Cryptic change-points arXiv:2601.01147v1 Announce Type: new Abstract: Conformal Test Martingales (CTMs) are a standard method within the Conformal Prediction framework for testing the crucial assumption of data exchangeability by monitoring deviations from uniformity in the p-value sequence. Although exchangeability implies uniform p-values, the converse does not hold. This raises the…
-
Evidence Slopes and Effective Dimension in Singular Linear Models
Evidence Slopes and Effective Dimension in Singular Linear Models arXiv:2601.01238v1 Announce Type: new Abstract: Bayesian model selection commonly relies on Laplace approximation or the Bayesian Information Criterion (BIC), which assume that the effective model dimension equals the number of parameters. Singular learning theory replaces this assumption with the real log canonical threshold (RLCT), an effective…
-
Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inference
Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inference arXiv:2601.00038v1 Announce Type: new Abstract: This work develops an active learning framework to intelligently enrich data-driven reduced-order models (ROMs) of parametric dynamical systems, which can serve as the foundation of virtual assets in a digital twin. Data-driven ROMs are explainable, computationally…
-
Detecting Unobserved Confounders: A Kernelized Regression Approach
Detecting Unobserved Confounders: A Kernelized Regression Approach arXiv:2601.00200v1 Announce Type: new Abstract: Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environments, limiting applicability to nonlinear single-environment settings. To bridge this gap, we propose Kernel Regression Confounder Detection (KRCD), a novel method for…
-
Generative Conditional Missing Imputation Networks
Generative Conditional Missing Imputation Networks arXiv:2601.00517v1 Announce Type: new Abstract: In this study, we introduce a sophisticated generative conditional strategy designed to impute missing values within datasets, an area of considerable importance in statistical analysis. Specifically, we initially elucidate the theoretical underpinnings of the Generative Conditional Missing Imputation Networks (GCMI), demonstrating its robust properties in…
-
Deep learning estimation of the spectral density of functional time series on large domains
Deep learning estimation of the spectral density of functional time series on large domains arXiv:2601.00284v1 Announce Type: cross Abstract: We derive an estimator of the spectral density of a functional time series that is the output of a multilayer perceptron neural network. The estimator is motivated by difficulties with the computation of existing spectral density…
-
Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach
Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach arXiv:2601.00287v1 Announce Type: cross Abstract: The Stable Unit Treatment Value Assumption (SUTVA) includes the condition that there are no multiple versions of treatment in causal inference. Though we could not control the implementation of treatment in observational studies, multiple versions may exist in the treatment.…
-
Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting
Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting arXiv:2512.23805v1 Announce Type: new Abstract: Fitted Q-evaluation (FQE) is a central method for off-policy evaluation in reinforcement learning, but it generally requires Bellman completeness: that the hypothesis class is closed under the evaluation Bellman operator. This requirement is challenging because enlarging the hypothesis class can worsen…
-
Energy-Tweedie: Score meets Score, Energy meets Energy
Energy-Tweedie: Score meets Score, Energy meets Energy arXiv:2512.23818v1 Announce Type: new Abstract: Denoising and score estimation have long been known to be linked via the classical Tweedie’s formula. In this work, we first extend the latter to a wider range of distributions often called “energy models” and denoted elliptical distributions in this work. Next, we…
-
Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration
Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration arXiv:2512.23927v1 Announce Type: new Abstract: Fitted Q-iteration (FQI) and its entropy-regularized variant, soft FQI, are central tools for value-based model-free offline reinforcement learning, but can behave poorly under function approximation and distribution shift. In the entropy-regularized setting, we show that the soft Bellman operator is locally…
-
Implicit geometric regularization in flow matching via density weighted Stein operators
Implicit geometric regularization in flow matching via density weighted Stein operators arXiv:2512.23956v1 Announce Type: new Abstract: Flow Matching (FM) has emerged as a powerful paradigm for continuous normalizing flows, yet standard FM implicitly performs an unweighted $L^2$ regression over the entire ambient space. In high dimensions, this leads to a fundamental inefficiency: the vast majority…
-
Constructive Approximation of Random Process via Stochastic Interpolation Neural Network Operators
Constructive Approximation of Random Process via Stochastic Interpolation Neural Network Operators arXiv:2512.24106v1 Announce Type: new Abstract: In this paper, we construct a class of stochastic interpolation neural network operators (SINNOs) with random coefficients activated by sigmoidal functions. We establish their boundedness, interpolation accuracy, and approximation capabilities in the mean square sense, in probability, as well…
-
A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue
A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue arXiv:2512.22282v1 Announce Type: new Abstract: Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1…
-
A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure
A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure arXiv:2512.22286v1 Announce Type: new Abstract: Ensemble learning is traditionally justified as a variance-reduction strategy, explaining its strong performance for unstable predictors such as decision trees. This explanation, however, does not account for ensembles constructed from intrinsically stable estimators-including smoothing splines,…
-
On Fibonacci Ensembles: An Alternative Approach to Ensemble Learning Inspired by the Timeless Architecture of the Golden Ratio
On Fibonacci Ensembles: An Alternative Approach to Ensemble Learning Inspired by the Timeless Architecture of the Golden Ratio arXiv:2512.22284v1 Announce Type: new Abstract: Nature rarely reveals her secrets bluntly, yet in the Fibonacci sequence she grants us a glimpse of her quiet architecture of growth, harmony, and recursive stability citep{Koshy2001Fibonacci, Livio2002GoldenRatio}. From spiral galaxies to…
-
Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds
Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds arXiv:2512.22473v1 Announce Type: new Abstract: Transformers empirically perform precise probabilistic reasoning in carefully constructed “Bayesian wind tunnels” and in large-scale language models, yet the mechanisms by which gradient-based learning creates the required internal geometry remain opaque. We provide a complete first-order analysis of how cross-entropy training…
-
An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry
An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry arXiv:2512.21451v1 Announce Type: new Abstract: Being infinite dimensional, non-parametric information geometry has long faced an “intractability barrier” due to the fact that the Fisher-Rao metric is now a functional incurring difficulties in defining its inverse. This paper introduces a novel framework to resolve the…
-
Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models
Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models arXiv:2512.21593v1 Announce Type: new Abstract: Diffusion models have become a central tool in deep generative modeling, but standard formulations rely on a single network and a single diffusion schedule to transform a simple prior, typically a standard normal distribution, into the target…
-
Tilt Matching for Scalable Sampling and Fine-Tuning
Tilt Matching for Scalable Sampling and Fine-Tuning arXiv:2512.21829v1 Announce Type: new Abstract: We propose a simple, scalable algorithm for using stochastic interpolants to sample from unnormalized densities and for fine-tuning generative models. The approach, Tilt Matching, arises from a dynamical equation relating the flow matching velocity to one targeting the same distribution tilted by a…
-
Automated Pollen Recognition in Optical and Holographic Microscopy Images
Automated Pollen Recognition in Optical and Holographic Microscopy Images arXiv:2512.08589v1 Announce Type: cross Abstract: This study explores the application of deep learning to improve and automate pollen grain detection and classification in both optical and holographic microscopy images, with a particular focus on veterinary cytology use cases. We used YOLOv8s for object detection and MobileNetV3L…
-
Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry
Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry arXiv:2512.21411v1 Announce Type: cross Abstract: Singular learning theory (SLT) citep{watanabe2009algebraic,watanabe2018mathematical} provides a rigorous asymptotic framework for Bayesian models with non-identifiable parameterizations, yet the statistical meaning of its second-order invariant, the emph{singular fluctuation}, has remained unclear. In this work, we show…
-
Fast and Exact Least Absolute Deviations Line Fitting via Piecewise Affine Lower-Bounding
Fast and Exact Least Absolute Deviations Line Fitting via Piecewise Affine Lower-Bounding arXiv:2512.20682v1 Announce Type: new Abstract: Least-absolute-deviations (LAD) line fitting is robust to outliers but computationally more involved than least squares regression. Although the literature includes linear and near-linear time algorithms for the LAD line fitting problem, these methods are difficult to implement and,…
-
Diffusion Models in Simulation-Based Inference: A Tutorial Review
Diffusion Models in Simulation-Based Inference: A Tutorial Review arXiv:2512.20685v1 Announce Type: new Abstract: Diffusion models have recently emerged as powerful learners for simulation-based inference (SBI), enabling fast and accurate estimation of latent parameters from simulated and real data. Their score-based formulation offers a flexible way to learn conditional or joint distributions over parameters and observations,…
-
Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights
Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights arXiv:2512.20811v1 Announce Type: new Abstract: Several performance measures are used to evaluate binary and multiclass classification tasks. But individual observations may often have distinct weights, and none of these measures are sensitive to such varying weights. We propose a new weighted…
-
Enhancing diffusion models with Gaussianization preprocessing
Enhancing diffusion models with Gaussianization preprocessing arXiv:2512.21020v1 Announce Type: new Abstract: Diffusion models are a class of generative models that have demonstrated remarkable success in tasks such as image generation. However, one of the bottlenecks of these models is slow sampling due to the delay before the onset of trajectory bifurcation, at which point substantial…
-
Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments
Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments arXiv:2512.21005v1 Announce Type: new Abstract: Modeling sparse count data, which arise across numerous scientific fields, presents significant statistical challenges. This chapter addresses these challenges in the context of infectious disease prediction, with a focus on predicting outbreaks in geographic regions that have historically…
-
Robust Causal Directionality Inference in Quantum Inference under MNAR Observation and High-Dimensional Noise
Robust Causal Directionality Inference in Quantum Inference under MNAR Observation and High-Dimensional Noise arXiv:2512.19746v1 Announce Type: new Abstract: In quantum mechanics, observation actively shapes the system, paralleling the statistical notion of Missing Not At Random (MNAR). This study introduces a unified framework for textbf{robust causal directionality inference} in quantum engineering, determining whether relations are system$to$observation,…
-
Quasiprobabilistic Density Ratio Estimation with a Reverse Engineered Classification Loss Function
Quasiprobabilistic Density Ratio Estimation with a Reverse Engineered Classification Loss Function arXiv:2512.19913v1 Announce Type: new Abstract: We consider a generalization of the classifier-based density-ratio estimation task to a quasiprobabilistic setting where probability densities can be negative. The problem with most loss functions used for this task is that they implicitly define a relationship between the…
-
Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing
Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing arXiv:2512.20007v1 Announce Type: new Abstract: Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending them to powerful nonparametric alternatives is difficult due to the lack of suitable…
-
Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models
Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models arXiv:2512.20021v1 Announce Type: new Abstract: Collecting operationally realistic data to inform machine learning models can be costly. Before collecting new data, it is helpful to understand where a model is deficient. For example, object detectors trained on images of rare objects may not be…
-
Generative Bayesian Hyperparameter Tuning
Generative Bayesian Hyperparameter Tuning arXiv:2512.20051v1 Announce Type: new Abstract: noindent Hyper-parameter selection is a central practical problem in modern machine learning, governing regularization strength, model capacity, and robustness choices. Cross-validation is often computationally prohibitive at scale, while fully Bayesian hyper-parameter learning can be difficult due to the cost of posterior sampling. We develop a generative…
-
Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler
Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler arXiv:2512.17977v1 Announce Type: new Abstract: Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling — classical MCMC methods, even with tempering, can suffer from exponential mixing times —…
-
Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty
Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty arXiv:2512.18083v1 Announce Type: new Abstract: Standard approaches to causal inference, such as Outcome Regression and Inverse Probability Weighted Regression Adjustment (IPWRA), are typically derived through the lens of missing data imputation and identification theory. In this work, we unify these methods from a Machine…
-
Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning
Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning arXiv:2512.18720v1 Announce Type: new Abstract: Effective feature selection is essential for high-dimensional data analysis and machine learning. Unsupervised feature selection (UFS) aims to simultaneously cluster data and identify the most discriminative features. Most existing UFS methods linearly project features into a pseudo-label space for clustering,…
-
On Conditional Stochastic Interpolation for Generative Nonlinear Sufficient Dimension Reduction
On Conditional Stochastic Interpolation for Generative Nonlinear Sufficient Dimension Reduction arXiv:2512.18971v1 Announce Type: new Abstract: Identifying low-dimensional sufficient structures in nonlinear sufficient dimension reduction (SDR) has long been a fundamental yet challenging problem. Most existing methods lack theoretical guarantees of exhaustiveness in identifying lower dimensional structures, either at the population level or at the sample…
-
Cluster-Based Generalized Additive Models Informed by Random Fourier Features
Cluster-Based Generalized Additive Models Informed by Random Fourier Features arXiv:2512.19373v1 Announce Type: new Abstract: Explainable machine learning aims to strike a balance between prediction accuracy and model transparency, particularly in settings where black-box predictive models, such as deep neural networks or kernel-based methods, achieve strong empirical performance but remain difficult to interpret. This work introduces…
-
Disentangled representations via score-based variational autoencoders
Disentangled representations via score-based variational autoencoders arXiv:2512.17127v1 Announce Type: new Abstract: We present the Score-based Autoencoder for Multiscale Inference (SAMI), a method for unsupervised representation learning that combines the theoretical frameworks of diffusion models and VAEs. By unifying their respective evidence lower bounds, SAMI formulates a principled objective that learns representations through score-based guidance of…
-
Sharp Structure-Agnostic Lower Bounds for General Functional Estimation
Sharp Structure-Agnostic Lower Bounds for General Functional Estimation arXiv:2512.17341v1 Announce Type: new Abstract: The design of efficient nonparametric estimators has long been a central problem in statistics, machine learning, and decision making. Classical optimal procedures often rely on strong structural assumptions, which can be misspecified in practice and complicate deployment. This limitation has sparked growing…
-
Generative modeling of conditional probability distributions on the level-sets of collective variables
Generative modeling of conditional probability distributions on the level-sets of collective variables arXiv:2512.17374v1 Announce Type: new Abstract: Given a probability distribution $mu$ in $mathbb{R}^d$ represented by data, we study in this paper the generative modeling of its conditional probability distributions on the level-sets of a collective variable $xi: mathbb{R}^d rightarrow mathbb{R}^k$, where $1 le k…
-
Fast and Robust: Computationally Efficient Covariance Estimation for Sub-Weibull Vectors
Fast and Robust: Computationally Efficient Covariance Estimation for Sub-Weibull Vectors arXiv:2512.17632v1 Announce Type: new Abstract: High-dimensional covariance estimation is notoriously sensitive to outliers. While statistically optimal estimators exist for general heavy-tailed distributions, they often rely on computationally expensive techniques like semidefinite programming or iterative M-estimation ($O(d^3)$). In this work, we target the specific regime of…
-
Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing
Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing arXiv:2512.17426v1 Announce Type: new Abstract: We consider sparse signal reconstruction via minimization of the smoothly clipped absolute deviation (SCAD) penalty, and develop one-step replica-symmetry-breaking (1RSB) extensions of approximate message passing (AMP), termed 1RSB-AMP. Starting from the 1RSB formulation of belief propagation, we…
-
BayesSum: Bayesian Quadrature in Discrete Spaces
BayesSum: Bayesian Quadrature in Discrete Spaces arXiv:2512.16105v1 Announce Type: new Abstract: This paper addresses the challenging computational problem of estimating intractable expectations over discrete domains. Existing approaches, including Monte Carlo and Russian Roulette estimators, are consistent but often require a large number of samples to achieve accurate results. We propose a novel estimator, emph{BayesSum}, which…
-
DAG Learning from Zero-Inflated Count Data Using Continuous Optimization
DAG Learning from Zero-Inflated Count Data Using Continuous Optimization arXiv:2512.16233v1 Announce Type: new Abstract: We address network structure learning from zero-inflated count data by casting each node as a zero-inflated generalized linear model and optimizing a smooth, score-based objective under a directed acyclic graph constraint. Our Zero-Inflated Continuous Optimization (ZICO) approach uses node-wise likelihoods with…
-
Advantages and limitations in the use of transfer learning for individual treatment effects in causal machine learning
Advantages and limitations in the use of transfer learning for individual treatment effects in causal machine learning arXiv:2512.16489v1 Announce Type: new Abstract: Generalizing causal knowledge across diverse environments is challenging, especially when estimates from large-scale datasets must be applied to smaller or systematically different contexts, where external validity is critical. Model-based estimators of individual treatment…
-
Riemannian Stochastic Interpolants for Amorphous Particle Systems
Riemannian Stochastic Interpolants for Amorphous Particle Systems arXiv:2512.16607v1 Announce Type: new Abstract: Modern generative models hold great promise for accelerating diverse tasks involving the simulation of physical systems, but they must be adapted to the specific constraints of each domain. Significant progress has been made for biomolecules and crystalline materials. Here, we address amorphous materials…
-
On The Hidden Biases of Flow Matching Samplers
On The Hidden Biases of Flow Matching Samplers arXiv:2512.16768v1 Announce Type: new Abstract: We study the implicit bias of flow matching (FM) samplers via the lens of empirical flow matching. Although population FM may produce gradient-field velocities resembling optimal transport (OT), we show that the empirical FM minimizer is almost never a gradient field, even…
-
Online Partitioned Local Depth for semi-supervised applications
Online Partitioned Local Depth for semi-supervised applications arXiv:2512.15436v1 Announce Type: new Abstract: We introduce an extension of the partitioned local depth (PaLD) algorithm that is adapted to online applications such as semi-supervised prediction. The new algorithm we present, online PaLD, is well-suited to situations where it is a possible to pre-compute a cohesion network from…
-
A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point
A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point arXiv:2512.15606v1 Announce Type: new Abstract: Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network parameters. We characterize the Hessian eigenspectrum for…
-
High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations
High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations arXiv:2512.15684v1 Announce Type: new Abstract: Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. Despite decades of practical success, a precise theoretical understanding of its behavior in high-dimensional regimes remains limited. In this…
-
Model inference for ranking from pairwise comparisons
Model inference for ranking from pairwise comparisons arXiv:2512.15269v1 Announce Type: cross Abstract: We consider the problem of ranking objects from noisy pairwise comparisons, for example, ranking tennis players from the outcomes of matches. We follow a standard approach to this problem and assume that each object has an unobserved strength and that the outcome of…
-
A Bayesian latent class reinforcement learning framework to capture adaptive, feedback-driven travel behaviour
A Bayesian latent class reinforcement learning framework to capture adaptive, feedback-driven travel behaviour arXiv:2512.14713v1 Announce Type: cross Abstract: Many travel decisions involve a degree of experience formation, where individuals learn their preferences over time. At the same time, there is extensive scope for heterogeneity across individual travellers, both in their underlying preferences and in how…
-
Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics
Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics arXiv:2512.13997v1 Announce Type: new Abstract: Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require discarding valuable data, unnecessarily reducing test power.…
-
One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing
One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing arXiv:2512.13892v1 Announce Type: new Abstract: Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise operate as black boxes. While permutation-based methods are a standard tool for this…
-
On the Hardness of Conditional Independence Testing In Practice
On the Hardness of Conditional Independence Testing In Practice arXiv:2512.14000v1 Announce Type: new Abstract: Tests of conditional independence (CI) underpin a number of important problems in machine learning and statistics, from causal discovery to evaluation of predictor fairness and out-of-distribution robustness. Shah and Peters (2020) showed that, contrary to the unconditional case, no universally finite-sample…
-
Weighted Conformal Prediction Provides Adaptive and Valid Mask-Conditional Coverage for General Missing Data Mechanisms
Weighted Conformal Prediction Provides Adaptive and Valid Mask-Conditional Coverage for General Missing Data Mechanisms arXiv:2512.14221v1 Announce Type: new Abstract: Conformal prediction (CP) offers a principled framework for uncertainty quantification, but it fails to guarantee coverage when faced with missing covariates. In addressing the heterogeneity induced by various missing patterns, Mask-Conditional Valid (MCV) Coverage has emerged…
-
Improving the Accuracy of Amortized Model Comparison with Self-Consistency
Improving the Accuracy of Amortized Model Comparison with Self-Consistency arXiv:2512.14308v1 Announce Type: new Abstract: Amortized Bayesian inference (ABI) offers fast, scalable approximations to posterior densities by training neural surrogates on data simulated from the statistical model. However, ABI methods are highly sensitive to model misspecification: when observed data fall outside the training distribution (generative scope…
-
Interval Fisher’s Discriminant Analysis and Visualisation
Interval Fisher’s Discriminant Analysis and Visualisation arXiv:2512.11945v1 Announce Type: new Abstract: In Data Science, entities are typically represented by single valued measurements. Symbolic Data Analysis extends this framework to more complex structures, such as intervals and histograms, that express internal variability. We propose an extension of multiclass Fisher’s Discriminant Analysis to interval-valued data, using Moore’s…
-
Hellinger loss function for Generative Adversarial Networks
Hellinger loss function for Generative Adversarial Networks arXiv:2512.12267v1 Announce Type: new Abstract: We propose Hellinger-type loss functions for training Generative Adversarial Networks (GANs), motivated by the boundedness, symmetry, and robustness properties of the Hellinger distance. We define an adversarial objective based on this divergence and study its statistical properties within a general parametric framework. We…
-
Co-Hub Node Based Multiview Graph Learning with Theoretical Guarantees
Co-Hub Node Based Multiview Graph Learning with Theoretical Guarantees arXiv:2512.12435v1 Announce Type: new Abstract: Identifying the graphical structure underlying the observed multivariate data is essential in numerous applications. Current methodologies are predominantly confined to deducing a singular graph under the presumption that the observed data are uniform. However, many contexts involve heterogeneous datasets that feature…
-
Towards a pretrained deep learning estimator of the Linfoot informational correlation
Towards a pretrained deep learning estimator of the Linfoot informational correlation arXiv:2512.12358v1 Announce Type: new Abstract: We develop a supervised deep-learning approach to estimate mutual information between two continuous random variables. As labels, we use the Linfoot informational correlation, a transformation of mutual information that has many important properties. Our method is based on ground…
-
Efficient Level-Crossing Probability Calculation for Gaussian Process Modeled Data
Efficient Level-Crossing Probability Calculation for Gaussian Process Modeled Data arXiv:2512.12442v1 Announce Type: new Abstract: Almost all scientific data have uncertainties originating from different sources. Gaussian process regression (GPR) models are a natural way to model data with Gaussian-distributed uncertainties. GPR also has the benefit of reducing I/O bandwidth and storage requirements for large scientific simulations.…
-
STARK denoises spatial transcriptomics images via adaptive regularization
STARK denoises spatial transcriptomics images via adaptive regularization arXiv:2512.10994v1 Announce Type: new Abstract: We present an approach to denoising spatial transcriptomics images that is particularly effective for uncovering cell identities in the regime of ultra-low sequencing depths, and also allows for interpolation of gene expression. The method — Spatial Transcriptomics via Adaptive Regularization and Kernels…
-
An Efficient Variant of One-Class SVM with Lifelong Online Learning Guarantees
An Efficient Variant of One-Class SVM with Lifelong Online Learning Guarantees arXiv:2512.11052v1 Announce Type: new Abstract: We study outlier (a.k.a., anomaly) detection for single-pass non-stationary streaming data. In the well-studied offline or batch outlier detection problem, traditional methods such as kernel One-Class SVM (OCSVM) are both computationally heavy and prone to large false-negative (Type II)…
-
Provable Recovery of Locally Important Signed Features and Interactions from Random Forest
Provable Recovery of Locally Important Signed Features and Interactions from Random Forest arXiv:2512.11081v1 Announce Type: new Abstract: Feature and Interaction Importance (FII) methods are essential in supervised learning for assessing the relevance of input variables and their interactions in complex prediction models. In many domains, such as personalized medicine, local interpretations for individual predictions are…
-
TPV: Parameter Perturbations Through the Lens of Test Prediction Variance
TPV: Parameter Perturbations Through the Lens of Test Prediction Variance arXiv:2512.11089v1 Announce Type: new Abstract: We identify test prediction variance (TPV) — the first-order sensitivity of model outputs to parameter perturbations around a trained solution — as a unifying quantity that links several classical observations about generalization in deep networks. TPV is a fully label-free…
-
Data-Driven Model Reduction using WeldNet: Windowed Encoders for Learning Dynamics
Data-Driven Model Reduction using WeldNet: Windowed Encoders for Learning Dynamics arXiv:2512.11090v1 Announce Type: new Abstract: Many problems in science and engineering involve time-dependent, high dimensional datasets arising from complex physical processes, which are costly to simulate. In this work, we propose WeldNet: Windowed Encoders for Learning Dynamics, a data-driven nonlinear model reduction framework to build…
-
LxCIM: a new rank-based binary classifier performance metric invariant to local exchange of classes
LxCIM: a new rank-based binary classifier performance metric invariant to local exchange of classes arXiv:2512.10053v1 Announce Type: new Abstract: Binary classification is one of the oldest, most prevalent, and studied problems in machine learning. However, the metrics used to evaluate model performance have received comparatively little attention. The area under the receiver operating characteristic curve…
-
The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights
The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights arXiv:2512.10188v1 Announce Type: new Abstract: We analyze gradient descent with randomly weighted data points in a linear regression model, under a generic weighting distribution. This includes various forms of stochastic gradient descent, importance sampling, but also extends to weighting distributions with…
-
Diffusion differentiable resampling
Diffusion differentiable resampling arXiv:2512.10401v1 Announce Type: new Abstract: This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). We propose a new informative resampling method that is instantly pathwise differentiable, based on an ensemble score diffusion model. We prove that our diffusion resampling method provides a consistent estimate…
-
Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels
Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels arXiv:2512.10256v1 Announce Type: new Abstract: We analyze prediction error in stochastic dynamical systems with memory, focusing on generalized Langevin equations (GLEs) formulated as stochastic Volterra equations. We establish that, under a strongly convex potential, trajectory discrepancies decay at a rate determined by the decay of…
-
Supervised Learning of Random Neural Architectures Structured by Latent Random Fields on Compact Boundaryless Multiply-Connected Manifolds
Supervised Learning of Random Neural Architectures Structured by Latent Random Fields on Compact Boundaryless Multiply-Connected Manifolds arXiv:2512.10407v1 Announce Type: new Abstract: This paper introduces a new probabilistic framework for supervised learning in neural systems. It is designed to model complex, uncertain systems whose random outputs are strongly non-Gaussian given deterministic inputs. The architecture itself is…
-
Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming
Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming arXiv:2512.08948v1 Announce Type: new Abstract: We study online statistical inference for the solutions of stochastic optimization problems with equality and inequality constraints. Such problems are prevalent in statistics and machine learning, encompassing constrained $M$-estimation, physics-informed models, safe reinforcement learning, and algorithmic fairness. We develop…
-
WTNN: Weibull-Tailored Neural Networks for survival analysis
WTNN: Weibull-Tailored Neural Networks for survival analysis arXiv:2512.09163v1 Announce Type: new Abstract: The Weibull distribution is a commonly adopted choice for modeling the survival of systems subject to maintenance over time. When only proxy indicators and censored observations are available, it becomes necessary to express the distribution’s parameters as functions of time-dependent covariates. Deep neural…
-
Robust and Sparse Estimation of Unbounded Density Ratio under Heavy Contamination
Robust and Sparse Estimation of Unbounded Density Ratio under Heavy Contamination arXiv:2512.09266v1 Announce Type: new Abstract: We examine the non-asymptotic properties of robust density ratio estimation (DRE) in contaminated settings. Weighted DRE is the most promising among existing methods, exhibiting doubly strong robustness from an asymptotic perspective. This study demonstrates that Weighted DRE achieves sparse…
-
Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression
Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression arXiv:2512.09275v1 Announce Type: new Abstract: Positional encoding (PE) is a core architectural component of Transformers, yet its impact on the Transformer’s generalization and robustness remains unclear. In this work, we provide the first generalization analysis for a single-layer Transformer under in-context…
-
Estimation of Stochastic Optimal Transport Maps
Estimation of Stochastic Optimal Transport Maps arXiv:2512.09499v1 Announce Type: new Abstract: The optimal transport (OT) map is a geometry-driven transformation between high-dimensional probability distributions which underpins a wide range of tasks in statistics, applied probability, and machine learning. However, existing statistical theory for OT map estimation is quite restricted, hinging on Brenier’s theorem (quadratic cost,…
-
Functional Random Forest with Adaptive Cost-Sensitive Splitting for Imbalanced Functional Data Classification
Functional Random Forest with Adaptive Cost-Sensitive Splitting for Imbalanced Functional Data Classification arXiv:2512.07888v1 Announce Type: new Abstract: Classification of functional data where observations are curves or trajectories poses unique challenges, particularly under severe class imbalance. Traditional Random Forest algorithms, while robust for tabular data, often fail to capture the intrinsic structure of functional observations and…
-
Provable Diffusion Posterior Sampling for Bayesian Inversion
Provable Diffusion Posterior Sampling for Bayesian Inversion arXiv:2512.08022v1 Announce Type: new Abstract: This paper proposes a novel diffusion-based posterior sampling method within a plug-and-play (PnP) framework. Our approach constructs a probability transport from an easy-to-sample terminal distribution to the target posterior, using a warm-start strategy to initialize the particles. To approximate the posterior score, we…
-
Worst-case generation via minimax optimization in Wasserstein space
Worst-case generation via minimax optimization in Wasserstein space arXiv:2512.08176v1 Announce Type: new Abstract: Worst-case generation plays a critical role in evaluating robustness and stress-testing systems under distribution shifts, in applications ranging from machine learning models to power grids and medical prediction systems. We develop a generative modeling framework for worst-case generation for a pre-specified risk,…
-
Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis
Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis arXiv:2512.08601v1 Announce Type: new Abstract: Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts…
-
Bayesian Optimization for Function-Valued Responses under Min-Max Criteria
Bayesian Optimization for Function-Valued Responses under Min-Max Criteria arXiv:2512.07868v1 Announce Type: cross Abstract: Bayesian optimization is widely used for optimizing expensive black box functions, but most existing approaches focus on scalar responses. In many scientific and engineering settings the response is functional, varying smoothly over an index such as time or wavelength, which makes classical…
-
Contextual Strongly Convex Simulation Optimization: Optimize then Predict with Inexact Solutions
Contextual Strongly Convex Simulation Optimization: Optimize then Predict with Inexact Solutions arXiv:2512.06270v1 Announce Type: new Abstract: In this work, we study contextual strongly convex simulation optimization and adopt an “optimize then predict” (OTP) approach for real-time decision making. In the offline stage, simulation optimization is conducted across a set of covariates to approximate the optimal-solution…
-
Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders
Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders arXiv:2512.06348v1 Announce Type: new Abstract: Extreme weather events are widely studied in fields such as agriculture, ecology, and meteorology. The spatio-temporal co-occurrence of extreme events can strengthen or weaken under changing climate conditions. In this paper, we propose a novel approach to model spatio-temporal extremes by integrating climate…
-
Canonical Tail Dependence for Soft Extremal Clustering of Multichannel Brain Signals
Canonical Tail Dependence for Soft Extremal Clustering of Multichannel Brain Signals arXiv:2512.06435v1 Announce Type: new Abstract: We develop a novel characterization of extremal dependence between two cortical regions of the brain when its signals display extremely large amplitudes. We show that connectivity in the tails of the distribution reveals unique features of extreme events (e.g.,…
-
ADAM Optimization with Adaptive Batch Selection
ADAM Optimization with Adaptive Batch Selection arXiv:2512.06795v1 Announce Type: new Abstract: Adam is a widely used optimizer in neural network training due to its adaptive learning rate. However, because different data samples influence model updates to varying degrees, treating them equally can lead to inefficient convergence. To address this, a prior work proposed adapting the…
-
Latent Nonlinear Denoising Score Matching for Enhanced Learning of Structured Distributions
Latent Nonlinear Denoising Score Matching for Enhanced Learning of Structured Distributions arXiv:2512.06615v1 Announce Type: new Abstract: We present latent nonlinear denoising score matching (LNDSM), a novel training objective for score-based generative models that integrates nonlinear forward dynamics with the VAE-based latent SGM framework. This combination is achieved by reformulating the cross-entropy term using the approximate…