Category: cs.LG

  • An Adaptive Sampling Framework for Detecting Localized Concept Drift under Label Scarcity

    An Adaptive Sampling Framework for Detecting Localized Concept Drift under Label Scarcity arXiv:2511.02452v1 Announce Type: new Abstract: Concept drift and label scarcity are two critical challenges limiting the robustness of predictive models in dynamic industrial environments. Existing drift detection methods often assume global shifts and rely on dense supervision, making them ill-suited for regression tasks…

  • A new class of Markov random fields enabling lightweight sampling

    A new class of Markov random fields enabling lightweight sampling arXiv:2511.02373v1 Announce Type: new Abstract: This work addresses the problem of efficient sampling of Markov random fields (MRF). The sampling of Potts or Ising MRF is most often based on Gibbs sampling, and is thus computationally expensive. We consider in this work how to circumvent…

  • Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data

    Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data arXiv:2511.00217v1 Announce Type: new Abstract: Linear mixed models are widely used for clustered data, but their reliance on parametric forms limits flexibility in complex and high-dimensional settings. In contrast, gradient boosting methods achieve high predictive accuracy through nonparametric estimation, but…

  • A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications

    A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications arXiv:2511.00366v1 Announce Type: new Abstract: Digital twins are developed to model the behavior of a specific physical asset (or twin), and they can consist of high-fidelity physics-based models or surrogates. A highly accurate surrogate is often preferred over multi-physics models as…

  • Accuracy estimation of neural networks by extreme value theory

    Accuracy estimation of neural networks by extreme value theory arXiv:2511.00490v1 Announce Type: new Abstract: Neural networks are able to approximate any continuous function on a compact set. However, it is not obvious how to quantify the error of the neural network, i.e., the remaining bias between the function and the neural network. Here, we propose…

  • Perturbations in the Orthogonal Complement Subspace for Efficient Out-of-Distribution Detection

    Perturbations in the Orthogonal Complement Subspace for Efficient Out-of-Distribution Detection arXiv:2511.00849v1 Announce Type: new Abstract: Out-of-distribution (OOD) detection is essential for deploying deep learning models in open-world environments. Existing approaches, such as energy-based scoring and gradient-projection methods, typically rely on high-dimensional representations to separate in-distribution (ID) and OOD samples. We introduce P-OCS (Perturbations in the…

  • SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations

    SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations arXiv:2511.00685v1 Announce Type: new Abstract: The field of simulation optimization (SO) encompasses various methods developed to optimize complex, expensive-to-sample stochastic systems. Established methods include, but are not limited to, ranking-and-selection for finite alternatives and surrogate-based methods for continuous domains, with broad applications in engineering and…

  • Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications

    Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications arXiv:2510.27056v1 Announce Type: new Abstract: This study explores the classification error of Mixture Discriminant Analysis (MDA) in scenarios where the number of mixture components exceeds those present in the actual data distribution, a condition known as overspecification. We use a two-component Gaussian mixture…

  • On the Equivalence of Optimal Transport Problem and Action Matching with Optimal Vector Fields

    On the Equivalence of Optimal Transport Problem and Action Matching with Optimal Vector Fields arXiv:2510.27385v1 Announce Type: new Abstract: Flow Matching (FM) method in generative modeling maps arbitrary probability distributions by constructing an interpolation between them and then learning the vector field that defines ODE for this interpolation. Recently, it was shown that FM can…

  • Minimax-Optimal Two-Sample Test with Sliced Wasserstein

    Minimax-Optimal Two-Sample Test with Sliced Wasserstein arXiv:2510.27498v1 Announce Type: new Abstract: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees and computational efficiency, its theoretical foundations for hypothesis testing remain limited.…

  • Interpretable Model-Aware Counterfactual Explanations for Random Forest

    Interpretable Model-Aware Counterfactual Explanations for Random Forest arXiv:2510.27397v1 Announce Type: new Abstract: Despite their enormous predictive power, machine learning models are often unsuitable for applications in regulated industries such as finance, due to their limited capacity to provide explanations. While model-agnostic frameworks such as Shapley values have proved to be convenient and popular, they rarely…

  • Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

    Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms arXiv:2510.25811v1 Announce Type: new Abstract: We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn…

  • $L_1$-norm Regularized Indefinite Kernel Logistic Regression

    $L_1$-norm Regularized Indefinite Kernel Logistic Regression arXiv:2510.26043v1 Announce Type: new Abstract: Kernel logistic regression (KLR) is a powerful classification method widely applied across diverse domains. In many real-world scenarios, indefinite kernels capture more domain-specific structural information than positive definite kernels. This paper proposes a novel $L_1$-norm regularized indefinite kernel logistic regression (RIKLR) model, which extends…

  • Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

    Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation arXiv:2510.26026v1 Announce Type: new Abstract: Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals {for returns} in both on-policy and off-policy settings. Our method integrates distributional…

  • Bias-Corrected Data Synthesis for Imbalanced Learning

    Bias-Corrected Data Synthesis for Imbalanced Learning arXiv:2510.26046v1 Announce Type: new Abstract: Imbalanced data, where the positive samples represent only a small proportion compared to the negative samples, makes it challenging for classification problems to balance the false positive and false negative rates. A common approach to addressing the challenge involves generating synthetic data for the…

  • Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems

    Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems arXiv:2510.26061v1 Announce Type: new Abstract: We propose a data-driven framework for efficiently solving quadratic programming (QP) problems by reducing the number of variables in high-dimensional QPs using instance-specific projection. A graph neural network-based model is designed to generate projections tailored to each QP instance, enabling…

  • Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees

    Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees arXiv:2510.24754v1 Announce Type: new Abstract: Uncertain knowledge graph embedding (UnKGE) methods learn vector representations that capture both structural and uncertainty information to predict scores of unseen triples. However, existing methods produce only point estimates, without quantifying predictive uncertainty-limiting their reliability in high-stakes applications where…

  • Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

    Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm arXiv:2510.24815v1 Announce Type: new Abstract: Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with critical decisions at stake. The Hoeffding or ANOVA…

  • Generative Bayesian Optimization: Generative Models as Acquisition Functions

    Generative Bayesian Optimization: Generative Models as Acquisition Functions arXiv:2510.25240v1 Announce Type: new Abstract: We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch scaling as generative sampling, optimization of non-continuous design spaces, and high-dimensional and combinatorial design.…

  • Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains

    Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains arXiv:2510.25514v1 Announce Type: new Abstract: We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function approximation can lead…

  • Using latent representations to link disjoint longitudinal data for mixed-effects regression

    Using latent representations to link disjoint longitudinal data for mixed-effects regression arXiv:2510.25531v1 Announce Type: new Abstract: Many rare diseases offer limited established treatment options, leading patients to switch therapies when new medications emerge. To analyze the impact of such treatment switches within the low sample size limitations of rare disease trials, it is important to…

  • Beyond Normality: Reliable A/B Testing with Non-Gaussian Data

    Beyond Normality: Reliable A/B Testing with Non-Gaussian Data arXiv:2510.23666v1 Announce Type: new Abstract: A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise $t$-test to compare outcomes between the treatment and control groups, thereby…

  • VIKING: Deep variational inference with stochastic projections

    VIKING: Deep variational inference with stochastic projections arXiv:2510.23684v1 Announce Type: new Abstract: Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon…

  • Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis

    Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis arXiv:2510.23935v1 Announce Type: new Abstract: Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we…

  • Bayesian neural networks with interpretable priors from Mercer kernels

    Bayesian neural networks with interpretable priors from Mercer kernels arXiv:2510.23745v1 Announce Type: new Abstract: Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. Bayesian neural networks (BNNs) provide a framework for this purpose by constructing…

  • Score-based constrained generative modeling via Langevin diffusions with boundary conditions

    Score-based constrained generative modeling via Langevin diffusions with boundary conditions arXiv:2510.23985v1 Announce Type: new Abstract: Score-based generative models based on stochastic differential equations (SDEs) achieve impressive performance in sampling from unknown distributions, but often fail to satisfy underlying constraints. We propose a constrained generative model using kinetic (underdamped) Langevin dynamics with specular reflection of velocity…

  • Input Adaptive Bayesian Model Averaging

    Input Adaptive Bayesian Model Averaging arXiv:2510.22054v1 Announce Type: new Abstract: This paper studies prediction with multiple candidate models, where the goal is to combine their outputs. This task is especially challenging in heterogeneous settings, where different models may be better suited to different inputs. We propose input adaptive Bayesian Model Averaging (IA-BMA), a Bayesian method…

  • Bridging Prediction and Attribution: Identifying Forward and Backward Causal Influence Ranges Using Assimilative Causal Inference

    Bridging Prediction and Attribution: Identifying Forward and Backward Causal Influence Ranges Using Assimilative Causal Inference arXiv:2510.21889v1 Announce Type: new Abstract: Causal inference identifies cause-and-effect relationships between variables. While traditional approaches rely on data to reveal causal links, a recently developed method, assimilative causal inference (ACI), integrates observations with dynamical models. It utilizes Bayesian data assimilation…

  • Differentially Private High-dimensional Variable Selection via Integer Programming

    Differentially Private High-dimensional Variable Selection via Integer Programming arXiv:2510.22062v1 Announce Type: new Abstract: Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale non-private sparse regression – known as Best Subset Selection (BSS) – with millions…

  • Frequentist Validity of Epistemic Uncertainty Estimators

    Frequentist Validity of Epistemic Uncertainty Estimators arXiv:2510.22063v1 Announce Type: new Abstract: Decomposing prediction uncertainty into its aleatoric (irreducible) and epistemic (reducible) components is critical for the development and deployment of machine learning systems. A popular, principled measure for epistemic uncertainty is the mutual information between the response variable and model parameters. However, evaluating this measure…

  • MMbeddings: Parameter-Efficient, Low-Overfitting Probabilistic Embeddings Inspired by Nonlinear Mixed Models

    MMbeddings: Parameter-Efficient, Low-Overfitting Probabilistic Embeddings Inspired by Nonlinear Mixed Models arXiv:2510.22198v1 Announce Type: new Abstract: We present MMbeddings, a probabilistic embedding approach that reinterprets categorical embeddings through the lens of nonlinear mixed models, effectively bridging classical statistical theory with modern deep learning. By treating embeddings as latent random effects within a variational autoencoder framework, our…

  • Exponential Convergence Guarantees for Iterative Markovian Fitting

    Exponential Convergence Guarantees for Iterative Markovian Fitting arXiv:2510.20871v1 Announce Type: new Abstract: The Schr”odinger Bridge (SB) problem has become a fundamental tool in computational optimal transport and generative modeling. To address this problem, ideal methods such as Iterative Proportional Fitting and Iterative Markovian Fitting (IMF) have been proposed-alongside practical approximations like Diffusion Schr”odinger Bridge and…

  • Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization

    Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization arXiv:2510.20883v1 Announce Type: new Abstract: Adversarial training has emerged as a key technique to enhance model robustness against adversarial input perturbations. Many of the existing methods rely on computationally expensive min-max problems that limit their application in practice. We propose a novel formulation of adversarial…

  • A Short Note on Upper Bounds for Graph Neural Operator Convergence Rate

    A Short Note on Upper Bounds for Graph Neural Operator Convergence Rate arXiv:2510.20954v1 Announce Type: new Abstract: Graphons, as limits of graph sequences, provide a framework for analyzing the asymptotic behavior of graph neural operators. Spectral convergence of sampled graphs to graphons yields operator-level convergence rates, enabling transferability analyses of GNNs. This note summarizes known…

  • Doubly-Regressing Approach for Subgroup Fairness

    Doubly-Regressing Approach for Subgroup Fairness arXiv:2510.21091v1 Announce Type: new Abstract: Algorithmic fairness is a socially crucial topic in real-world applications of AI. Among many notions of fairness, subgroup fairness is widely studied when multiple sensitive attributes (e.g., gender, race, age) are present. However, as the number of sensitive attributes grows, the number of subgroups increases…

  • Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization

    Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization arXiv:2510.21273v1 Announce Type: new Abstract: Probabilistic models must be well calibrated to support reliable decision-making. While calibration in single-output regression is well studied, defining and achieving multivariate calibration in multi-output regression remains considerably more challenging. The existing literature on multivariate calibration primarily focuses on diagnostic tools…

  • Compositional Generation for Long-Horizon Coupled PDEs

    Compositional Generation for Long-Horizon Coupled PDEs arXiv:2510.20141v1 Announce Type: new Abstract: Simulating coupled PDE systems is computationally intensive, and prior efforts have largely focused on training surrogates on the joint (coupled) data, which requires a large amount of data. In the paper, we study compositional diffusion approaches where diffusion models are only trained on the…

  • Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models

    Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models arXiv:2510.19999v1 Announce Type: new Abstract: We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints that reduces training time in comparison to existing state-of-the-art methods. We redesign the CD method by performing a Taylor expansion…

  • Neural Networks for Censored Expectile Regression Based on Data Augmentation

    Neural Networks for Censored Expectile Regression Based on Data Augmentation arXiv:2510.20344v1 Announce Type: new Abstract: Expectile regression neural networks (ERNNs) are powerful tools for capturing heterogeneity and complex nonlinear structures in data. However, most existing research has primarily focused on fully observed data, with limited attention paid to scenarios involving censored observations. In this paper,…

  • Testing Most Influential Sets

    Testing Most Influential Sets arXiv:2510.20372v1 Announce Type: new Abstract: Small subsets of data with disproportionate influence on model outcomes can have dramatic impacts on conclusions, with a few data points sometimes overturning key findings. While recent work has developed methods to identify these emph{most influential sets}, no formal theory exists to determine when their influence…

  • Learning Decentralized Routing Policies via Graph Attention-based Multi-Agent Reinforcement Learning in Lunar Delay-Tolerant Networks

    Learning Decentralized Routing Policies via Graph Attention-based Multi-Agent Reinforcement Learning in Lunar Delay-Tolerant Networks arXiv:2510.20436v1 Announce Type: new Abstract: We present a fully decentralized routing framework for multi-robot exploration missions operating under the constraints of a Lunar Delay-Tolerant Network (LDTN). In this setting, autonomous rovers must relay collected data to a lander under intermittent connectivity…

  • Signature Kernel Scoring Rule as Spatio-Temporal Diagnostic for Probabilistic Forecasting

    Signature Kernel Scoring Rule as Spatio-Temporal Diagnostic for Probabilistic Forecasting arXiv:2510.19110v1 Announce Type: new Abstract: Modern weather forecasting has increasingly transitioned from numerical weather prediction (NWP) to data-driven machine learning forecasting techniques. While these new models produce probabilistic forecasts to quantify uncertainty, their training and evaluation may remain hindered by conventional scoring rules, primarily MSE,…

  • Calibrated Principal Component Regression

    Calibrated Principal Component Regression arXiv:2510.19020v1 Announce Type: new Abstract: We propose a new method for statistical inference in generalized linear models. In the overparameterized regime, Principal Component Regression (PCR) reduces variance by projecting high-dimensional data to a low-dimensional principal subspace before fitting. However, PCR incurs truncation bias whenever the true regression vector has mass outside…

  • Extreme Event Aware ($eta$-) Learning

    Extreme Event Aware ($eta$-) Learning arXiv:2510.19161v1 Announce Type: new Abstract: Quantifying and predicting rare and extreme events persists as a crucial yet challenging task in understanding complex dynamical systems. Many practical challenges arise from the infrequency and severity of these events, including the considerable variance of simple sampling methods and the substantial computational cost of…

  • Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study

    Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study arXiv:2510.19306v1 Announce Type: new Abstract: This study investigates whether Topological Data Analysis (TDA) can provide additional insights beyond traditional statistical methods in clustering currency behaviours. We focus on the foreign exchange (FX) market, which is a complex system often exhibiting non-linear and high-dimensional…

  • Graphical model for tensor factorization by sparse sampling

    Graphical model for tensor factorization by sparse sampling arXiv:2510.17886v1 Announce Type: new Abstract: We consider tensor factorizations based on sparse measurements of the tensor components. The measurements are designed in a way that the underlying graph of interactions is a random graph. The setup will be useful in cases where a substantial amount of data…

  • Learning Time-Varying Graphs from Incomplete Graph Signals

    Learning Time-Varying Graphs from Incomplete Graph Signals arXiv:2510.17903v1 Announce Type: new Abstract: This paper tackles the challenging problem of jointly inferring time-varying network topologies and imputing missing data from partially observed graph signals. We propose a unified non-convex optimization framework to simultaneously recover a sequence of graph Laplacian matrices while reconstructing the unobserved signal entries.…

  • Generalization Below the Edge of Stability: The Role of Data Geometry

    Generalization Below the Edge of Stability: The Role of Data Geometry arXiv:2510.18120v1 Announce Type: new Abstract: Understanding generalization in overparameterized neural networks hinges on the interplay between the data geometry, neural architecture, and training dynamics. In this paper, we theoretically explore how data geometry controls this implicit bias. This paper presents theoretical results for overparameterized…

  • Arbitrated Indirect Treatment Comparisons

    Arbitrated Indirect Treatment Comparisons arXiv:2510.18071v1 Announce Type: new Abstract: Matching-adjusted indirect comparison (MAIC) has been increasingly employed in health technology assessments (HTA). By reweighting subjects from a trial with individual participant data (IPD) to match the covariate summary statistics of another trial with only aggregate data (AgD), MAIC facilitates the estimation of a treatment effect…

  • Beating the Winner’s Curse via Inference-Aware Policy Optimization

    Beating the Winner’s Curse via Inference-Aware Policy Optimization arXiv:2510.18161v1 Announce Type: new Abstract: There has been a surge of recent interest in automatically learning policies to target treatment decisions based on rich individual covariates. A common approach is to train a machine learning model to predict counterfactual outcomes, and then select the policy that optimizes…

  • Learning density ratios in causal inference using Bregman-Riesz regression

    Learning density ratios in causal inference using Bregman-Riesz regression arXiv:2510.16127v1 Announce Type: new Abstract: The ratio of two probability density functions is a fundamental quantity that appears in many areas of statistics and machine learning, including causal inference, reinforcement learning, covariate shift, outlier detection, independence testing, importance sampling, and diffusion modeling. Naively estimating the numerator…

  • A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators

    A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators arXiv:2510.16419v1 Announce Type: new Abstract: While significant progress has been made in heterogeneous treatment effect (HTE) estimation, the evaluation of HTE estimators remains underdeveloped. In this article, we propose a robust evaluation framework based on relative error, which quantifies performance differences between two HTE estimators.…

  • Personalized Collaborative Learning with Affinity-Based Variance Reduction

    Personalized Collaborative Learning with Affinity-Based Variance Reduction arXiv:2510.16232v1 Announce Type: new Abstract: Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels — gaining collaborative speedup when agents are similar, without performance degradation when…

  • From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction

    From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction arXiv:2510.16551v1 Announce Type: new Abstract: This research proposes a systematic, large language model (LLM) approach for extracting product and service attributes, features, and associated sentiments from customer reviews. Grounded in marketing theory, the framework distinguishes perceptual attributes from actionable features, producing interpretable…

  • From Universal Approximation Theorem to Tropical Geometry of Multi-Layer Perceptrons

    From Universal Approximation Theorem to Tropical Geometry of Multi-Layer Perceptrons arXiv:2510.15012v1 Announce Type: new Abstract: We revisit the Universal Approximation Theorem(UAT) through the lens of the tropical geometry of neural networks and introduce a constructive, geometry-aware initialization for sigmoidal multi-layer perceptrons (MLPs). Tropical geometry shows that Rectified Linear Unit (ReLU) networks admit decision functions with…

  • Reliable data clustering with Bayesian community detection

    Reliable data clustering with Bayesian community detection arXiv:2510.15013v1 Announce Type: new Abstract: From neuroscience and genomics to systems biology and ecology, researchers rely on clustering similarity data to uncover modular structure. Yet widely used clustering methods, such as hierarchical clustering, k-means, and WGCNA, lack principled model selection, leaving them susceptible to noise. A common workaround…

  • The Coverage Principle: How Pre-training Enables Post-Training

    The Coverage Principle: How Pre-training Enables Post-Training arXiv:2510.15020v1 Announce Type: new Abstract: Language models demonstrate remarkable abilities when pre-trained on large text corpora and fine-tuned for specific tasks, but how and why pre-training shapes the success of the final model remains poorly understood. Notably, although pre-training success is often quantified by cross entropy loss, cross-entropy…

  • The Tree-SNE Tree Exists

    The Tree-SNE Tree Exists arXiv:2510.15014v1 Announce Type: new Abstract: The clustering and visualisation of high-dimensional data is a ubiquitous task in modern data science. Popular techniques include nonlinear dimensionality reduction methods like t-SNE or UMAP. These methods face the `scale-problem’ of clustering: when dealing with the MNIST dataset, do we want to distinguish different digits…

  • The Minimax Lower Bound of Kernel Stein Discrepancy Estimation

    The Minimax Lower Bound of Kernel Stein Discrepancy Estimation arXiv:2510.15058v1 Announce Type: new Abstract: Kernel Stein discrepancies (KSDs) have emerged as a powerful tool for quantifying goodness-of-fit over the last decade, featuring numerous successful applications. To the best of our knowledge, all existing KSD estimators with known rate achieve $sqrt n$-convergence. In this work, we…

  • Exact Dynamics of Multi-class Stochastic Gradient Descent

    Exact Dynamics of Multi-class Stochastic Gradient Descent arXiv:2510.14074v1 Announce Type: new Abstract: We develop a framework for analyzing the training and learning rate dynamics on a variety of high- dimensional optimization problems trained using one-pass stochastic gradient descent (SGD) with data generated from multiple anisotropic classes. We give exact expressions for a large class of…

  • deFOREST: Fusing Optical and Radar satellite data for Enhanced Sensing of Tree-loss

    deFOREST: Fusing Optical and Radar satellite data for Enhanced Sensing of Tree-loss arXiv:2510.14092v1 Announce Type: new Abstract: In this paper we develop a deforestation detection pipeline that incorporates optical and Synthetic Aperture Radar (SAR) data. A crucial component of the pipeline is the construction of anomaly maps of the optical data, which is done using…

  • High-Dimensional BWDM: A Robust Nonparametric Clustering Validation Index for Large-Scale Data

    High-Dimensional BWDM: A Robust Nonparametric Clustering Validation Index for Large-Scale Data arXiv:2510.14145v1 Announce Type: new Abstract: Determining the appropriate number of clusters in unsupervised learning is a central problem in statistics and data science. Traditional validity indices such as Calinski-Harabasz, Silhouette, and Davies-Bouldin-depend on centroid-based distances and therefore degrade in high-dimensional or contaminated data. This…

  • Personalized federated learning, Row-wise fusion regularization, Multivariate modeling, Sparse estimation

    Personalized federated learning, Row-wise fusion regularization, Multivariate modeling, Sparse estimation arXiv:2510.14413v1 Announce Type: new Abstract: We study personalized federated learning for multivariate responses where client models are heterogeneous yet share variable-level structure. Existing entry-wise penalties ignore cross-response dependence, while matrix-wise fusion over-couples clients. We propose a Sparse Row-wise Fusion (SROF) regularizer that clusters row vectors…

  • A novel Information-Driven Strategy for Optimal Regression Assessment

    A novel Information-Driven Strategy for Optimal Regression Assessment arXiv:2510.14222v1 Announce Type: new Abstract: In Machine Learning (ML), a regression algorithm aims to minimize a loss function based on data. An assessment method in this context seeks to quantify the discrepancy between the optimal response for an input-output system and the estimate produced by a learned…

  • Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space

    Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space arXiv:2510.12916v1 Announce Type: new Abstract: Systems of interacting continuous-time Markov chains are a powerful model class, but inference is typically intractable in high dimensional settings. Auxiliary information, such as noisy observations, is typically only available at discrete times, and incorporating it via…

  • Simplicial Gaussian Models: Representation and Inference

    Simplicial Gaussian Models: Representation and Inference arXiv:2510.12983v1 Announce Type: new Abstract: Probabilistic graphical models (PGMs) are powerful tools for representing statistical dependencies through graphs in high-dimensional systems. However, they are limited to pairwise interactions. In this work, we propose the simplicial Gaussian model (SGM), which extends Gaussian PGM to simplicial complexes. SGM jointly models random…

  • Conformal Inference for Open-Set and Imbalanced Classification

    Conformal Inference for Open-Set and Imbalanced Classification arXiv:2510.13037v1 Announce Type: new Abstract: This paper presents a conformal prediction method for classification in highly imbalanced and open-set settings, where there are many possible classes and not all may be represented in the data. Existing approaches require a finite, known label space and typically involve random sample…

  • A Multi-dimensional Semantic Surprise Framework Based on Low-Entropy Semantic Manifolds for Fine-Grained Out-of-Distribution Detection

    A Multi-dimensional Semantic Surprise Framework Based on Low-Entropy Semantic Manifolds for Fine-Grained Out-of-Distribution Detection arXiv:2510.13093v1 Announce Type: new Abstract: Out-of-Distribution (OOD) detection is a cornerstone for the safe deployment of AI systems in the open world. However, existing methods treat OOD detection as a binary classification problem, a cognitive flattening that fails to distinguish between…

  • Gaussian Certified Unlearning in High Dimensions: A Hypothesis Testing Approach

    Gaussian Certified Unlearning in High Dimensions: A Hypothesis Testing Approach arXiv:2510.13094v1 Announce Type: new Abstract: Machine unlearning seeks to efficiently remove the influence of selected data while preserving generalization. Significant progress has been made in low dimensions $(p ll n)$, but high dimensions pose serious theoretical challenges as standard optimization assumptions of $Omega(1)$ strong convexity…

  • Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models

    Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models arXiv:2510.11789v1 Announce Type: new Abstract: We study the convergence rate of learning pairwise interactions in single-layer attention-style models, where tokens interact through a weight matrix and a non-linear activation function. We prove that the minimax rate is $M^{-frac{2beta}{2beta+1}}$ with $M$ being the sample size, depending…

  • On Thompson Sampling and Bilateral Uncertainty in Additive Bayesian Optimization

    On Thompson Sampling and Bilateral Uncertainty in Additive Bayesian Optimization arXiv:2510.11792v1 Announce Type: new Abstract: In Bayesian Optimization (BO), additive assumptions can mitigate the twin difficulties of modeling and searching a complex function in high dimension. However, common acquisition functions, like the Additive Lower Confidence Bound, ignore pairwise covariances between dimensions, which we’ll call textit{bilateral…

  • Active Subspaces in Infinite Dimension

    Active Subspaces in Infinite Dimension arXiv:2510.11871v1 Announce Type: new Abstract: Active subspace analysis uses the leading eigenspace of the gradient’s second moment to conduct supervised dimension reduction. In this article, we extend this methodology to real-valued functionals on Hilbert space. We define an operator which coincides with the active subspace matrix when applied to a…

  • High-Probability Bounds For Heterogeneous Local Differential Privacy

    High-Probability Bounds For Heterogeneous Local Differential Privacy arXiv:2510.11895v1 Announce Type: new Abstract: We study statistical estimation under local differential privacy (LDP) when users may hold heterogeneous privacy levels and accuracy must be guaranteed with high probability. Departing from the common in-expectation analyses, and for one-dimensional and multi-dimensional mean estimation problems, we develop finite sample upper…

  • Simplifying Optimal Transport through Schatten-$p$ Regularization

    Simplifying Optimal Transport through Schatten-$p$ Regularization arXiv:2510.11910v1 Announce Type: new Abstract: We propose a new general framework for recovering low-rank structure in optimal transport using Schatten-$p$ norm regularization. Our approach extends existing methods that promote sparse and interpretable transport maps or plans, while providing a unified and principled family of convex programs that encourage low-dimensional…

  • Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation

    Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation arXiv:2510.09908v1 Announce Type: new Abstract: The rise of large-scale pretrained models has made it feasible to generate predictive or synthetic features at low cost, raising the question of how to incorporate such surrogate predictions into downstream decision-making. We study this problem in the setting of…

  • Calibrating Generative Models

    Calibrating Generative Models arXiv:2510.10020v1 Announce Type: new Abstract: Generative models frequently suffer miscalibration, wherein class probabilities and other statistics of the sampling distribution deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly,…

  • Kernel Treatment Effects with Adaptively Collected Data

    Kernel Treatment Effects with Adaptively Collected Data arXiv:2510.10245v1 Announce Type: new Abstract: Adaptive experiments improve efficiency by adjusting treatment assignments based on past outcomes, but this adaptivity breaks the i.i.d. assumptions that underpins classical asymptotics. At the same time, many questions of interest are distributional, extending beyond average effects. Kernel treatment effects (KTE) provide a…

  • Neural variational inference for cutting feedback during uncertainty propagation

    Neural variational inference for cutting feedback during uncertainty propagation arXiv:2510.10268v1 Announce Type: new Abstract: In many scientific applications, uncertainty of estimates from an earlier (upstream) analysis needs to be propagated in subsequent (downstream) Bayesian analysis, without feedback. Cutting feedback methods, also termed cut-Bayes, achieve this by constructing a cut-posterior distribution that prevents backward information flow.…

  • On some practical challenges of conformal prediction

    On some practical challenges of conformal prediction arXiv:2510.10324v1 Announce Type: new Abstract: Conformal prediction is a model-free machine learning method for creating prediction regions with a guaranteed coverage probability level. However, a data scientist often faces three challenges in practice: (i) the determination of a conformal prediction region is only approximate, jeopardizing the finite-sample validity…

  • A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization

    A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization arXiv:2510.08916v1 Announce Type: new Abstract: The representer theorem is a cornerstone of kernel methods, which aim to estimate latent functions in reproducing kernel Hilbert spaces (RKHSs) in a nonparametric manner. Its significance lies in converting inherently infinite-dimensional optimization problems into finite-dimensional ones over dual…

  • Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

    Gradient-Guided Furthest Point Sampling for Robust Training Set Selection arXiv:2510.08906v1 Announce Type: new Abstract: Smart training set selections procedures enable the reduction of data needs and improves predictive robustness in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular…

  • Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains

    Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains arXiv:2510.08929v1 Announce Type: new Abstract: We study generative modeling on convex domains using flow matching and mirror maps, and identify two fundamental challenges. First, standard log-barrier mirror maps induce heavy-tailed dual distributions, leading to ill-posed dynamics. Second, coupling with Gaussian priors performs poorly…

  • Distributionally robust approximation property of neural networks

    Distributionally robust approximation property of neural networks arXiv:2510.09177v1 Announce Type: new Abstract: The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby extending classical universal approximation theorems even beyond…

  • A unified Bayesian framework for adversarial robustness

    A unified Bayesian framework for adversarial robustness arXiv:2510.09288v1 Announce Type: new Abstract: The vulnerability of machine learning models to adversarial attacks remains a critical security challenge. Traditional defenses, such as adversarial training, typically robustify models by minimizing a worst-case loss. However, these deterministic approaches do not account for uncertainty in the adversary’s attack. While stochastic…

  • Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death

    Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death arXiv:2510.07501v1 Announce Type: new Abstract: Truncation by death, a prevalent challenge in critical care, renders traditional dynamic treatment regime (DTR) evaluation inapplicable due to ill-defined potential outcomes. We introduce a principal stratification-based method, focusing on the always-survivor value function. We derive a semiparametrically efficient,…

  • From Data to Rewards: a Bilevel Optimization Perspective on Maximum Likelihood Estimation

    From Data to Rewards: a Bilevel Optimization Perspective on Maximum Likelihood Estimation arXiv:2510.07624v1 Announce Type: new Abstract: Generative models form the backbone of modern machine learning, underpinning state-of-the-art systems in text, vision, and multimodal applications. While Maximum Likelihood Estimation has traditionally served as the dominant training paradigm, recent work have highlighted its limitations, particularly in…

  • When Robustness Meets Conservativeness: Conformalized Uncertainty Calibration for Balanced Decision Making

    When Robustness Meets Conservativeness: Conformalized Uncertainty Calibration for Balanced Decision Making arXiv:2510.07750v1 Announce Type: new Abstract: Robust optimization safeguards decisions against uncertainty by optimizing against worst-case scenarios, yet their effectiveness hinges on a prespecified robustness level that is often chosen ad hoc, leading to either insufficient protection or overly conservative and costly solutions. Recent approaches…

  • A Honest Cross-Validation Estimator for Prediction Performance

    A Honest Cross-Validation Estimator for Prediction Performance arXiv:2510.07649v1 Announce Type: new Abstract: Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model performance on the test set, and averages the…

  • Surrogate Graph Partitioning for Spatial Prediction

    Surrogate Graph Partitioning for Spatial Prediction arXiv:2510.07832v1 Announce Type: new Abstract: Spatial prediction refers to the estimation of unobserved values from spatially distributed observations. Although recent advances have improved the capacity to model diverse observation types, adoption in practice remains limited in industries that demand interpretability. To mitigate this gap, surrogate models that explain black-box…

  • Online Matching via Reinforcement Learning: An Expert Policy Orchestration Strategy

    Online Matching via Reinforcement Learning: An Expert Policy Orchestration Strategy arXiv:2510.06515v1 Announce Type: new Abstract: Online matching problems arise in many complex systems, from cloud services and online marketplaces to organ exchange networks, where timely, principled decisions are critical for maintaining high system performance. Traditional heuristics in these settings are simple and interpretable but typically…

  • A General Constructive Upper Bound on Shallow Neural Nets Complexity

    A General Constructive Upper Bound on Shallow Neural Nets Complexity arXiv:2510.06372v1 Announce Type: new Abstract: We provide an upper bound on the number of neurons required in a shallow neural network to approximate a continuous function on a compact set with a given accuracy. This method, inspired by a specific proof of the Stone-Weierstrass theorem,…

  • Q-Learning with Fine-Grained Gap-Dependent Regret

    Q-Learning with Fine-Grained Gap-Dependent Regret arXiv:2510.06647v1 Announce Type: new Abstract: We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing…

  • Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix

    Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix arXiv:2510.06685v1 Announce Type: new Abstract: Self-attention layers have become fundamental building blocks of modern deep neural networks, yet their theoretical understanding remains limited, particularly from the perspective of random matrix theory. In this work, we provide a rigorous analysis of the singular value spectrum of…

  • Bayesian Nonparametric Dynamical Clustering of Time Series

    Bayesian Nonparametric Dynamical Clustering of Time Series arXiv:2510.06919v1 Announce Type: new Abstract: We present a method that models the evolution of an unbounded number of time series clusters by switching among an unknown number of regimes with linear dynamics. We develop a Bayesian non-parametric approach using a hierarchical Dirichlet process as a prior on the…

  • Minima and Critical Points of the Bethe Free Energy Are Invariant Under Deformation Retractions of Factor Graphs

    Minima and Critical Points of the Bethe Free Energy Are Invariant Under Deformation Retractions of Factor Graphs arXiv:2510.05380v1 Announce Type: new Abstract: In graphical models, factor graphs, and more generally energy-based models, the interactions between variables are encoded by a graph, a hypergraph, or, in the most general case, a partially ordered set (poset). Inference…

  • Refereed Learning

    Refereed Learning arXiv:2510.05440v1 Announce Type: new Abstract: We initiate an investigation of learning tasks in a setting where the learner is given access to two competing provers, only one of which is honest. Specifically, we consider the power of such learners in assessing purported properties of opaque models. Following prior work that considers the power…

  • Domain-Shift-Aware Conformal Prediction for Large Language Models

    Domain-Shift-Aware Conformal Prediction for Large Language Models arXiv:2510.05566v1 Announce Type: new Abstract: Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under…

  • A Probabilistic Basis for Low-Rank Matrix Learning

    A Probabilistic Basis for Low-Rank Matrix Learning arXiv:2510.05447v1 Announce Type: new Abstract: Low rank inference on matrices is widely conducted by optimizing a cost function augmented with a penalty proportional to the nuclear norm $Vert cdot Vert_*$. However, despite the assortment of computational methods for such problems, there is a surprising lack of understanding of…

  • Bilevel optimization for learning hyperparameters: Application to solving PDEs and inverse problems with Gaussian processes

    Bilevel optimization for learning hyperparameters: Application to solving PDEs and inverse problems with Gaussian processes arXiv:2510.05568v1 Announce Type: new Abstract: Methods for solving scientific computing and inference problems, such as kernel- and neural network-based approaches for partial differential equations (PDEs), inverse problems, and supervised learning tasks, depend crucially on the choice of hyperparameters. Specifically, the…