Category: stat.ML

  • Total Variation Rates for Riemannian Flow Matching

    Total Variation Rates for Riemannian Flow Matching arXiv:2602.05174v1 Announce Type: new Abstract: Riemannian flow matching (RFM) extends flow-based generative modeling to data supported on manifolds by learning a time-dependent tangent vector field whose flow-ODE transports a simple base distribution to the data law. We develop a nonasymptotic Total Variation (TV) convergence analysis for RFM samplers…

  • Finite-Particle Rates for Regularized Stein Variational Gradient Descent

    Finite-Particle Rates for Regularized Stein Variational Gradient Descent arXiv:2602.05172v1 Announce Type: new Abstract: We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type preconditioner to the kernelized Wasserstein gradient. For the resulting interacting $N$-particle…

  • Logarithmic-time Schedules for Scaling Language Models with Momentum

    Logarithmic-time Schedules for Scaling Language Models with Momentum arXiv:2602.05298v1 Announce Type: new Abstract: In practice, the hyperparameters $(beta_1, beta_2)$ and weight-decay $lambda$ in AdamW are typically kept at fixed values. Is there any reason to do otherwise? We show that for large-scale language model training, the answer is yes: by exploiting the power-law structure of…

  • Radon–Wasserstein Gradient Flows for Interacting-Particle Sampling in High Dimensions

    Radon–Wasserstein Gradient Flows for Interacting-Particle Sampling in High Dimensions arXiv:2602.05227v1 Announce Type: new Abstract: Gradient flows of the Kullback–Leibler (KL) divergence, such as the Fokker–Planck equation and Stein Variational Gradient Descent, evolve a distribution toward a target density known only up to a normalizing constant. We introduce new gradient flows of the KL divergence with…

  • Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach

    Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach arXiv:2602.05340v1 Announce Type: new Abstract: We consider the sequential experimental design problem in the predict-then-optimize paradigm. In this paradigm, the outputs of the prediction model are used as coefficient vectors in a downstream linear optimization problem. Traditional sequential experimental design aims to control the input variables (features)…

  • A Hitchhiker’s Guide to Poisson Gradient Estimation

    A Hitchhiker’s Guide to Poisson Gradient Estimation arXiv:2602.03896v1 Announce Type: new Abstract: Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation and Gumbel-SoftMax (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical…

  • Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations

    Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations arXiv:2602.03889v1 Announce Type: new Abstract: Finite mixture models are widely used for unsupervised learning, but maximum likelihood estimation via EM suffers from degeneracy as components collapse. We introduce transcendental regularization, a penalized likelihood framework with analytic barrier functions that prevent degeneracy while maintaining asymptotic efficiency. The…

  • Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

    Byzantine Machine Learning: MultiKrum and an optimal notion of robustness arXiv:2602.03899v1 Announce Type: new Abstract: Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule…

  • Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks

    Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks arXiv:2602.03948v1 Announce Type: new Abstract: In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network.…

  • Learning Multi-type heterogeneous interacting particle systems

    Learning Multi-type heterogeneous interacting particle systems arXiv:2602.03954v1 Announce Type: new Abstract: We propose a framework for the joint inference of network topology, multi-type interaction kernels, and latent type assignments in heterogeneous interacting particle systems from multi-trajectory data. This learning task is a challenging non-convex mixed-integer optimization problem, which we address through a novel three-stage approach.…

  • Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation

    Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation arXiv:2602.02633v1 Announce Type: new Abstract: Often, constraints arise in deployment settings where even lightweight parameter updates e.g. parameter-efficient fine-tuning could induce model shift or tuning instability. We study test-time adaptation of foundation models for few-shot classification under a completely frozen-model regime, where additionally, no…

  • Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

    Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions arXiv:2602.02577v1 Announce Type: new Abstract: The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing work has demonstrated that KL divergence between multivariate Gaussian distributions follows a relaxed triangle inequality.…

  • Near-Universal Multiplicative Updates for Nonnegative Einsum Factorization

    Near-Universal Multiplicative Updates for Nonnegative Einsum Factorization arXiv:2602.02759v1 Announce Type: new Abstract: Despite the ubiquity of multiway data across scientific domains, there are few user-friendly tools that fit tailored nonnegative tensor factorizations. Researchers may use gradient-based automatic differentiation (which often struggles in nonnegative settings), choose between a limited set of methods with mature implementations, or…

  • Training-Free Self-Correction for Multimodal Masked Diffusion Models

    Training-Free Self-Correction for Multimodal Masked Diffusion Models arXiv:2602.02927v1 Announce Type: new Abstract: Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated tokens as immutable, which may lead to error accumulation when early mistakes cannot be revised. In this work,…

  • Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks

    Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks arXiv:2602.02791v1 Announce Type: new Abstract: We study a supervised multiclass classification problem for diffusion processes, where each class is characterized by a distinct drift function and trajectories are observed at discrete times. Extending the one-dimensional multiclass framework of Denis et al. (2024) to multidimensional…

  • Neuron Block Dynamics for XOR Classification with Zero-Margin

    Neuron Block Dynamics for XOR Classification with Zero-Margin arXiv:2602.00172v1 Announce Type: new Abstract: The ability of neural networks to learn useful features through stochastic gradient descent (SGD) is a cornerstone of their success. Most theoretical analyses focus on regression or on classification tasks with a positive margin, where worst-case gradient bounds suffice. In contrast, we…

  • Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals

    Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals arXiv:2602.00171v1 Announce Type: new Abstract: Multimodal learning combines information from multiple data modalities to improve predictive performance. However, modalities often contribute unequally and in a data dependent way, making it unclear which data modalities are genuinely informative and to what extent their contributions can be trusted. Quantifying modality…

  • Singular Bayesian Neural Networks

    Singular Bayesian Neural Networks arXiv:2602.00387v1 Announce Type: new Abstract: Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{top}$ with $A in mathbb{R}^{m times r}$, $B in…

  • Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation

    Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation arXiv:2602.00413v1 Announce Type: new Abstract: Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function, these approaches require extensive computational resources and may not generalize…

  • Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey

    Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey arXiv:2602.00399v1 Announce Type: new Abstract: In the last decade, Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems. However, most RL algorithms rely on the Markov Decision Process assumption, which is violated in practical cyber-physical systems affected…

  • Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation

    Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation arXiv:2601.22367v1 Announce Type: new Abstract: Generalized Bayesian Inference (GBI) tempers a loss with a temperature $beta>0$ to mitigate overconfidence and improve robustness under model misspecification, but existing GBI methods typically rely on costly MCMC or SDE-based samplers and must be re-run for each new dataset…

  • Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models

    Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models arXiv:2601.22336v1 Announce Type: new Abstract: Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label $Yin{0,1}$, an assumption often violated by LLM…

  • It’s all the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms

    It’s all the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms arXiv:2601.22378v1 Announce Type: new Abstract: Maximum likelihood estimators (MLE) and control variate estimators (CVE) have been used in conjunction with known information across sketching algorithms and applications in machine learning. We prove that under certain conditions in an…

  • Simulation-based Bayesian inference with ameliorative learned summary statistics — Part I

    Simulation-based Bayesian inference with ameliorative learned summary statistics — Part I arXiv:2601.22441v1 Announce Type: new Abstract: This paper, which is Part 1 of a two-part paper series, considers a simulation-based inference with learned summary statistics, in which such a learned summary statistic serves as an empirical-likelihood with ameliorative effects in the Bayesian setting, when the…

  • Corrected Samplers for Discrete Flow Models

    Corrected Samplers for Discrete Flow Models arXiv:2601.22519v1 Announce Type: new Abstract: Discrete flow models (DFMs) have been proposed to learn the data distribution on a finite state space, offering a flexible framework as an alternative to discrete diffusion models. A line of recent work has studied samplers for discrete diffusion models, such as tau-leaping and…

  • Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators

    Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators arXiv:2601.20888v1 Announce Type: new Abstract: We study sampling from posterior distributions in Bayesian linear inverse problems where $A$, the parameters to observables operator, is computationally expensive. In many applications, $A$ can be factored in a manner that facilitates the construction of a cost-effective approximation $tilde{A}$.…

  • Efficient Causal Structure Learning via Modular Subgraph Integration

    Efficient Causal Structure Learning via Modular Subgraph Integration arXiv:2601.21014v1 Announce Type: new Abstract: Learning causal structures from observational data remains a fundamental yet computationally intensive task, particularly in high-dimensional settings where existing methods face challenges such as the super-exponential growth of the search space and increasing computational demands. To address this, we introduce VISTA (Voting-based…

  • A Diffusive Classification Loss for Learning Energy-based Generative Models

    A Diffusive Classification Loss for Learning Energy-based Generative Models arXiv:2601.21025v1 Announce Type: new Abstract: Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-based models (EBMs), where the score is obtained from the negative input-gradient of the energy.…

  • Diffusion-based Annealed Boltzmann Generators : benefits, pitfalls and hopes

    Diffusion-based Annealed Boltzmann Generators : benefits, pitfalls and hopes arXiv:2601.21026v1 Announce Type: new Abstract: Sampling configurations at thermodynamic equilibrium is a central challenge in statistical physics. Boltzmann Generators (BGs) tackle it by combining a generative model with a Monte Carlo (MC) correction step to obtain asymptotically unbiased samples from an unnormalized target. Most current BGs…

  • An efficient, accurate, and interpretable machine learning method for computing probability of failure

    An efficient, accurate, and interpretable machine learning method for computing probability of failure arXiv:2601.21089v1 Announce Type: new Abstract: We introduce a novel machine learning method called the Penalized Profile Support Vector Machine based on the Gabriel edited set for the computation of the probability of failure for a complex system as determined by a threshold…

  • Deep Neural Networks as Iterated Function Systems and a Generalization Bound

    Deep Neural Networks as Iterated Function Systems and a Generalization Bound arXiv:2601.19958v1 Announce Type: new Abstract: Deep neural networks (DNNs) achieve remarkable performance on a wide range of tasks, yet their mathematical analysis remains fragmented: stability and generalization are typically studied in disparate frameworks and on a case-by-case basis. Architecturally, DNNs rely on the recursive…

  • Minimax Rates for Hyperbolic Hierarchical Learning

    Minimax Rates for Hyperbolic Hierarchical Learning arXiv:2601.20047v1 Announce Type: new Abstract: We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse,…

  • Efficient Evaluation of LLM Performance with Statistical Guarantees

    Efficient Evaluation of LLM Performance with Statistical Guarantees arXiv:2601.20251v1 Announce Type: new Abstract: Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized…

  • Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging

    Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging arXiv:2601.20269v1 Announce Type: new Abstract: Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpopulations, raising critical concerns regarding algorithmic bias. Fairness auditing addresses these risks through two primary functions: certification, which verifies adherence to…

  • Physics-informed Blind Reconstruction of Dense Fields from Sparse Measurements using Neural Networks with a Differentiable Simulator

    Physics-informed Blind Reconstruction of Dense Fields from Sparse Measurements using Neural Networks with a Differentiable Simulator arXiv:2601.20496v1 Announce Type: new Abstract: Generating dense physical fields from sparse measurements is a fundamental question in sampling, signal processing, and many other applications. State-of-the-art methods either use spatial statistics or rely on examples of dense fields in the…

  • Statistical Inference for Explainable Boosting Machines

    Statistical Inference for Explainable Boosting Machines arXiv:2601.18857v1 Announce Type: new Abstract: Explainable boosting machines (EBMs) are popular “glass-box” models that learn a set of univariate functions using boosting trees. These achieve explainability through visualizations of each feature’s effect. However, unlike linear model coefficients, uncertainty quantification for the learned univariate functions requires computationally intensive bootstrapping, making…

  • Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration

    Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration arXiv:2601.18907v1 Announce Type: new Abstract: Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small can lead to slow progress. We propose implicit variants…

  • Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget

    Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget arXiv:2601.18950v1 Announce Type: new Abstract: Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One…

  • Convergence of Muon with Newton-Schulz

    Convergence of Muon with Newton-Schulz arXiv:2601.19156v1 Announce Type: new Abstract: We analyze Muon as originally proposed and used in practice — using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a…

  • Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making

    Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making arXiv:2601.19186v1 Announce Type: new Abstract: Fairness is a central pillar of trustworthy machine learning, especially in domains where accuracy- or profit-driven optimization is insufficient. While most fairness research focuses on supervised learning, fairness in policy learning remains less explored. Because policy learning is…

  • Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding

    Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding arXiv:2601.17160v1 Announce Type: new Abstract: We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full…

  • Error Analysis of Bayesian Inverse Problems with Generative Priors

    Error Analysis of Bayesian Inverse Problems with Generative Priors arXiv:2601.17374v1 Announce Type: new Abstract: Data-driven methods for the solution of inverse problems have become widely popular in recent years thanks to the rise of machine learning techniques. A popular approach concerns the training of a generative model on additional data to learn a bespoke prior…

  • “Rebuilding” Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training

    “Rebuilding” Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training arXiv:2601.17510v1 Announce Type: new Abstract: This article presents the full, original record of the 2024 Joint Statistical Meetings (JSM) town hall, “Statistics in the Age of AI,” which convened leading statisticians to discuss how the field is evolving in…

  • Boosting methods for interval-censored data with regression and classification

    Boosting methods for interval-censored data with regression and classification arXiv:2601.17973v1 Announce Type: new Abstract: Boosting has garnered significant interest across both machine learning and statistical communities. Traditional boosting algorithms, designed for fully observed random samples, often struggle with real-world problems, particularly with interval-censored data. This type of data is common in survival analysis and time-to-event…

  • A Cherry-Picking Approach to Large Load Shaping for More Effective Carbon Reduction

    A Cherry-Picking Approach to Large Load Shaping for More Effective Carbon Reduction arXiv:2601.17990v1 Announce Type: new Abstract: Shaping multi-megawatt loads, such as data centers, impacts generator dispatch on the electric grid, which in turn affects system CO2 emissions and energy cost. Substantiating the effectiveness of prevalent load shaping strategies, such as those based on grid-level…

  • Distributional Computational Graphs: Error Bounds

    Distributional Computational Graphs: Error Bounds arXiv:2601.16250v1 Announce Type: new Abstract: We study a general framework of distributional computational graphs: computational graphs whose inputs are probability distributions rather than point values. We analyze the discretization error that arises when these graphs are evaluated using finite approximations of continuous probability distributions. Such an approximation might be the…

  • Perfect Clustering for Sparse Directed Stochastic Block Models

    Perfect Clustering for Sparse Directed Stochastic Block Models arXiv:2601.16427v1 Announce Type: new Abstract: Exact recovery in stochastic block models (SBMs) is well understood in undirected settings, but remains considerably less developed for directed and sparse networks, particularly when the number of communities diverges. Spectral methods for directed SBMs often lack stability in asymmetric, low-degree regimes,…

  • Efficient Learning of Stationary Diffusions with Stein-type Discrepancies

    Efficient Learning of Stationary Diffusions with Stein-type Discrepancies arXiv:2601.16597v1 Announce Type: new Abstract: Learning a stationary diffusion amounts to estimating the parameters of a stochastic differential equation whose stationary distribution matches a target distribution. We build on the recently introduced kernel deviation from stationarity (KDS), which enforces stationarity by evaluating expectations of the diffusion’s generator…

  • Towards Latent Diffusion Suitable For Text

    Towards Latent Diffusion Suitable For Text arXiv:2601.16220v1 Announce Type: cross Abstract: Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of continuous diffusion models to discrete state spaces. NFDM learns a multivariate forward…

  • Long-Term Probabilistic Forecast of Vegetation Conditions Using Climate Attributes in the Four Corners Region

    Long-Term Probabilistic Forecast of Vegetation Conditions Using Climate Attributes in the Four Corners Region arXiv:2601.16347v1 Announce Type: cross Abstract: Weather conditions can drastically alter the state of crops and rangelands, and in turn, impact the incomes and food security of individuals worldwide. Satellite-based remote sensing offers an effective way to monitor vegetation and climate variables…

  • Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation

    Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation arXiv:2601.15360v1 Announce Type: new Abstract: Estimating Heterogeneous Treatment Effects (HTE) in industrial applications such as AdTech and healthcare presents a dual challenge: extreme class imbalance and heavy-tailed outcome distributions. While the X-Learner framework effectively addresses imbalance through cross-imputation, we demonstrate that it…

  • Non-Stationary Functional Bilevel Optimization

    Non-Stationary Functional Bilevel Optimization arXiv:2601.15363v1 Announce Type: new Abstract: Functional bilevel optimization (FBO) provides a powerful framework for hierarchical learning in function spaces, yet current methods are limited to static offline settings and perform suboptimally in online, non-stationary scenarios. We propose SmoothFBO, the first algorithm for non-stationary FBO with both theoretical guarantees and practical scalability.…

  • Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization

    Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization arXiv:2601.15500v1 Announce Type: new Abstract: In recent years, Rectified flow (RF) has gained considerable popularity largely due to its generation efficiency and state-of-the-art performance. In this paper, we investigate the degree to which RF automatically adapts to the intrinsic…

  • On damage of interpolation to adversarial robustness in regression

    On damage of interpolation to adversarial robustness in regression arXiv:2601.16070v1 Announce Type: new Abstract: Deep neural networks (DNNs) typically involve a large number of parameters and are trained to achieve zero or near-zero training error. Despite such interpolation, they often exhibit strong generalization performance on unseen data, a phenomenon that has motivated extensive theoretical investigations.…

  • Synthetic Augmentation in Imbalanced Learning: When It Helps, When It Hurts, and How Much to Add

    Synthetic Augmentation in Imbalanced Learning: When It Helps, When It Hurts, and How Much to Add arXiv:2601.16120v1 Announce Type: new Abstract: Imbalanced classification, where one class is observed far less frequently than the other, often causes standard training procedures to prioritize the majority class and perform poorly on rare but important cases. A classic and…

  • Meta Flow Maps enable scalable reward alignment

    Meta Flow Maps enable scalable reward alignment arXiv:2601.14430v1 Announce Type: new Abstract: Controlling generative models is computationally expensive. This is because optimal alignment with a reward function–whether via inference-time steering or fine-tuning–requires estimating the value function. This task demands access to the conditional posterior $p_{1|t}(x_1|x_t)$, the distribution of clean data $x_1$ consistent with an intermediate…

  • Large Data Limits of Laplace Learning for Gaussian Measure Data in Infinite Dimensions

    Large Data Limits of Laplace Learning for Gaussian Measure Data in Infinite Dimensions arXiv:2601.14515v1 Announce Type: new Abstract: Laplace learning is a semi-supervised method, a solution for finding missing labels from a partially labeled dataset utilizing the geometry given by the unlabeled data points. The method minimizes a Dirichlet energy defined on a (discrete) graph…

  • Communication-Efficient Federated Risk Difference Estimation for Time-to-Event Clinical Outcomes

    Communication-Efficient Federated Risk Difference Estimation for Time-to-Event Clinical Outcomes arXiv:2601.14609v1 Announce Type: new Abstract: Privacy-preserving model co-training in medical research is often hindered by server-dependent architectures incompatible with protected hospital data systems and by the predominant focus on relative effect measures (hazard ratios) which lack clinical interpretability for absolute survival risk assessment. We propose FedRD,…

  • Semi-Supervised Mixture Models under the Concept of Missing at Radom with Margin Confidence and Aranda Ordaz Function

    Semi-Supervised Mixture Models under the Concept of Missing at Radom with Margin Confidence and Aranda Ordaz Function arXiv:2601.14631v1 Announce Type: new Abstract: This paper presents a semi-supervised learning framework for Gaussian mixture modelling under a Missing at Random (MAR) mechanism. The method explicitly parameterizes the missingness mechanism by modelling the probability of missingness as a…

  • Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers

    Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers arXiv:2601.15014v1 Announce Type: new Abstract: We study in-context learning for nonparametric regression with $alpha$-H”older smooth regression functions, for some $alpha>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Theta(log n)$ parameters and $Omegabigl(n^{2alpha/(2alpha+d)}log^3 nbigr)$ pretraining sequences can achieve the minimax-optimal…

  • Gradient-based Active Learning with Gaussian Processes for Global Sensitivity Analysis

    Gradient-based Active Learning with Gaussian Processes for Global Sensitivity Analysis arXiv:2601.11790v1 Announce Type: new Abstract: Global sensitivity analysis of complex numerical simulators is often limited by the small number of model evaluations that can be afforded. In such settings, surrogate models built from a limited set of simulations can substantially reduce the computational burden, provided…

  • A Kernel Approach for Semi-implicit Variational Inference

    A Kernel Approach for Semi-implicit Variational Inference arXiv:2601.12023v1 Announce Type: new Abstract: Semi-implicit variational inference (SIVI) enhances the expressiveness of variational families through hierarchical semi-implicit distributions, but the intractability of their densities makes standard ELBO-based optimization biased. Recent score-matching approaches to SIVI (SIVI-SM) address this issue via a minimax formulation, at the expense of an…

  • On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

    On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization arXiv:2601.12238v1 Announce Type: new Abstract: While momentum-based acceleration has been studied extensively in deterministic optimization problems, its behavior in nonstationary environments — where the data distribution and optimal parameters drift over time — remains underexplored. We analyze the tracking performance of Stochastic Gradient Descent…

  • A Theory of Diversity for Random Matrices with Applications to In-Context Learning of Schr”odinger Equations

    A Theory of Diversity for Random Matrices with Applications to In-Context Learning of Schr”odinger Equations arXiv:2601.12587v1 Announce Type: new Abstract: We address the following question: given a collection ${mathbf{A}^{(1)}, dots, mathbf{A}^{(N)}}$ of independent $d times d$ random matrices drawn from a common distribution $mathbb{P}$, what is the probability that the centralizer of ${mathbf{A}^{(1)}, dots, mathbf{A}^{(N)}}$…

  • Approximate full conformal prediction in RKHS

    Approximate full conformal prediction in RKHS arXiv:2601.13102v1 Announce Type: new Abstract: Full conformal prediction is a framework that implicitly formulates distribution-free confidence prediction regions for a wide range of estimators. However, a classical limitation of the full conformal framework is the computation of the confidence prediction regions, which is usually impossible since it requires training…

  • Mass Distribution versus Density Distribution in the Context of Clustering

    Mass Distribution versus Density Distribution in the Context of Clustering arXiv:2601.10759v1 Announce Type: new Abstract: This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution…

  • Memorize Early, Then Query: Inlier-Memorization-Guided Active Outlier Detection

    Memorize Early, Then Query: Inlier-Memorization-Guided Active Outlier Detection arXiv:2601.10993v1 Announce Type: new Abstract: Outlier detection (OD) aims to identify abnormal instances, known as outliers or anomalies, by learning typical patterns of normal data, or inliers. Performing OD under an unsupervised regime-without any information about anomalous instances in the training data-is challenging. A recently observed phenomenon,…

  • Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach

    Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach arXiv:2601.11016v1 Announce Type: new Abstract: In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules that prescribe decisions using covariates.…

  • Split-and-Conquer: Distributed Factor Modeling for High-Dimensional Matrix-Variate Time Series

    Split-and-Conquer: Distributed Factor Modeling for High-Dimensional Matrix-Variate Time Series arXiv:2601.11091v1 Announce Type: new Abstract: In this paper, we propose a distributed framework for reducing the dimensionality of high-dimensional, large-scale, heterogeneous matrix-variate time series data using a factor model. The data are first partitioned column-wise (or row-wise) and allocated to node servers, where each node estimates…

  • Fine Tuning a Simulation-Driven Estimator

    Fine Tuning a Simulation-Driven Estimator arXiv:2504.04480v2 Announce Type: cross Abstract: Many industries now deploy high-fidelity simulators (digital twins) to represent physical systems, yet their parameters must be calibrated to match the true system. This motivated the construction of simulation-driven parameter estimators, built by generating synthetic observations for sampled parameter values and learning a supervised mapping…

  • Accelerated Regularized Wasserstein Proximal Sampling Algorithms

    Accelerated Regularized Wasserstein Proximal Sampling Algorithms arXiv:2601.09848v1 Announce Type: new Abstract: We consider sampling from a Gibbs distribution by evolving a finite number of particles using a particular score estimator rather than Brownian motion. To accelerate the particles, we consider a second-order score-based ODE, similar to Nesterov acceleration. In contrast to traditional kernel density score…

  • CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data

    CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data arXiv:2601.10494v1 Announce Type: new Abstract: With grid operators confronting rising uncertainty from renewable integration and a broader push toward electrification, Demand-Side Management (DSM) — particularly Demand Response (DR) — has attracted significant attention as a cost-effective mechanism for balancing modern electricity systems.…

  • Coarsening Causal DAG Models

    Coarsening Causal DAG Models arXiv:2601.10531v1 Announce Type: new Abstract: Directed acyclic graphical (DAG) models are a powerful tool for representing causal relationships among jointly distributed random variables, especially concerning data from across different experimental settings. However, it is not always practical or desirable to estimate a causal model at the granularity of given features in…

  • Parametric RDT approach to computational gap of symmetric binary perceptron

    Parametric RDT approach to computational gap of symmetric binary perceptron arXiv:2601.10628v1 Announce Type: new Abstract: We study potential presence of statistical-computational gaps (SCG) in symmetric binary perceptrons (SBP) via a parametric utilization of emph{fully lifted random duality theory} (fl-RDT) [96]. A structural change from decreasingly to arbitrarily ordered $c$-sequence (a key fl-RDT parametric component) is…

  • Classification Imbalance as Transfer Learning

    Classification Imbalance as Transfer Learning arXiv:2601.10630v1 Announce Type: new Abstract: Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this…

  • Tail-Sensitive KL and R’enyi Convergence of Unadjusted Hamiltonian Monte Carlo via One-Shot Couplings

    Tail-Sensitive KL and R’enyi Convergence of Unadjusted Hamiltonian Monte Carlo via One-Shot Couplings arXiv:2601.09019v1 Announce Type: new Abstract: Hamiltonian Monte Carlo (HMC) algorithms are among the most widely used sampling methods in high dimensional settings, yet their convergence properties are poorly understood in divergences that quantify relative density mismatch, such as Kullback-Leibler (KL) and R’enyi…

  • Horseshoe Mixtures-of-Experts (HS-MoE)

    Horseshoe Mixtures-of-Experts (HS-MoE) arXiv:2601.09043v1 Announce Type: new Abstract: Horseshoe mixtures-of-experts (HS-MoE) models provide a Bayesian framework for sparse expert selection in mixture-of-experts architectures. We combine the horseshoe prior’s adaptive global-local shrinkage with input-dependent gating, yielding data-adaptive sparsity in expert usage. Our primary methodological contribution is a particle learning algorithm for sequential inference, in which the…

  • MLCBART: Multilabel Classification with Bayesian Additive Regression Trees

    MLCBART: Multilabel Classification with Bayesian Additive Regression Trees arXiv:2601.08964v1 Announce Type: cross Abstract: Multilabel Classification (MLC) deals with the simultaneous classification of multiple binary labels. The task is challenging because, not only may there be arbitrarily different and complex relationships between predictor variables and each label, but associations among labels may exist even after accounting…

  • SCaLE: Switching Cost aware Learning and Exploration

    SCaLE: Switching Cost aware Learning and Exploration arXiv:2601.09042v1 Announce Type: cross Abstract: This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the…

  • Efficient Clustering in Stochastic Bandits

    Efficient Clustering in Stochastic Bandits arXiv:2601.09162v1 Announce Type: cross Abstract: We study the Bandit Clustering (BC) problem under the fixed confidence setting, where the objective is to group a collection of data sequences (arms) into clusters through sequential sampling from adaptively selected arms at each time step while ensuring a fixed error probability at the…

  • Decentralized Online Convex Optimization with Unknown Feedback Delays

    Decentralized Online Convex Optimization with Unknown Feedback Delays arXiv:2601.07901v1 Announce Type: new Abstract: Decentralized online convex optimization (D-OCO), where multiple agents within a network collaboratively learn optimal decisions in real-time, arises naturally in applications such as federated learning, sensor networks, and multi-agent control. In this paper, we study D-OCO under unknown, time-and agent-varying feedback delays.…

  • A Statistical Assessment of Amortized Inference Under Signal-to-Noise Variation and Distribution Shift

    A Statistical Assessment of Amortized Inference Under Signal-to-Noise Variation and Distribution Shift arXiv:2601.07944v1 Announce Type: new Abstract: Since the turn of the century, approximate Bayesian inference has steadily evolved as new computational techniques have been incorporated to handle increasingly complex and large-scale predictive problems. The recent success of deep neural networks and foundation models has…

  • Towards A Unified PAC-Bayesian Framework for Norm-based Generalization Bounds

    Towards A Unified PAC-Bayesian Framework for Norm-based Generalization Bounds arXiv:2601.08100v1 Announce Type: new Abstract: Understanding the generalization behavior of deep neural networks remains a fundamental challenge in modern statistical learning theory. Among existing approaches, PAC-Bayesian norm-based bounds have demonstrated particular promise due to their data-dependent nature and their ability to capture algorithmic and geometric properties…

  • Structural Dimension Reduction in Bayesian Networks

    Structural Dimension Reduction in Bayesian Networks arXiv:2601.08236v1 Announce Type: new Abstract: This work introduces a novel technique, named structural dimension reduction, to collapse a Bayesian network onto a minimum and localized one while ensuring that probabilistic inferences between the original and reduced networks remain consistent. To this end, we propose a new combinatorial structure in…

  • Robust low-rank estimation with multiple binary responses using pairwise AUC loss

    Robust low-rank estimation with multiple binary responses using pairwise AUC loss arXiv:2601.08618v1 Announce Type: new Abstract: Multiple binary responses arise in many modern data-analytic problems. Although fitting separate logistic regressions for each response is computationally attractive, it ignores shared structure and can be statistically inefficient, especially in high-dimensional and class-imbalanced regimes. Low-rank models offer a…

  • Physics-informed Gaussian Process Regression in Solving Eigenvalue Problem of Linear Operators

    Physics-informed Gaussian Process Regression in Solving Eigenvalue Problem of Linear Operators arXiv:2601.06462v1 Announce Type: new Abstract: Applying Physics-Informed Gaussian Process Regression to the eigenvalue problem $(mathcal{L}-lambda)u = 0$ poses a fundamental challenge, where the null source term results in a trivial predictive mean and a degenerate marginal likelihood. Drawing inspiration from system identification, we construct…

  • Inference-Time Alignment for Diffusion Models via Doob’s Matching

    Inference-Time Alignment for Diffusion Models via Doob’s Matching arXiv:2601.06514v1 Announce Type: new Abstract: Inference-time alignment for diffusion models aims to adapt a pre-trained diffusion model toward a target distribution without retraining the base score network, thereby preserving the generative capacity of the base model while enforcing desired properties at the inference time. A central mechanism…

  • Dimension-reduced outcome-weighted learning for estimating individualized treatment regimes in observational studies

    Dimension-reduced outcome-weighted learning for estimating individualized treatment regimes in observational studies arXiv:2601.06782v1 Announce Type: new Abstract: Individualized treatment regimes (ITRs) aim to improve clinical outcomes by assigning treatment based on patient-specific characteristics. However, existing methods often struggle with high-dimensional covariates, limiting accuracy, interpretability, and real-world applicability. We propose a novel sufficient dimension reduction approach that…

  • Constrained Density Estimation via Optimal Transport

    Constrained Density Estimation via Optimal Transport arXiv:2601.06830v1 Announce Type: new Abstract: A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the constraints that the expected value of a set of functions adopts or exceeds given values. The framework…

  • The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks

    The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks arXiv:2601.06961v1 Announce Type: new Abstract: The success of deep neural networks largely depends on the statistical structure of the training data. While learning dynamics and generalization on isotropic data are well-established, the impact of pronounced anisotropy on these crucial…

  • Machine learning assisted state prediction of misspecified linear dynamical system via modal reduction

    Machine learning assisted state prediction of misspecified linear dynamical system via modal reduction arXiv:2601.05297v1 Announce Type: new Abstract: Accurate prediction of structural dynamics is imperative for preserving digital twin fidelity throughout operational lifetimes. Parametric models with fixed nominal parameters often omit critical physical effects due to simplifications in geometry, material behavior, damping, or boundary conditions,…

  • A Bayesian Generative Modeling Approach for Arbitrary Conditional Inference

    A Bayesian Generative Modeling Approach for Arbitrary Conditional Inference arXiv:2601.05355v1 Announce Type: new Abstract: Modern data analysis increasingly requires flexible conditional inference P(X_B | X_A) where (X_A, X_B) is an arbitrary partition of observed variable X. Existing conditional inference methods lack this flexibility as they are tied to a fixed conditioning structure and cannot perform…

  • A brief note on learning problem with global perspectives

    A brief note on learning problem with global perspectives arXiv:2601.05441v1 Announce Type: new Abstract: This brief note considers the problem of learning with dynamic-optimizing principal-agent setting, in which the agents are allowed to have global perspectives about the learning process, i.e., the ability to view things according to their relative importances or in their true…

  • Multi-task Modeling for Engineering Applications with Sparse Data

    Multi-task Modeling for Engineering Applications with Sparse Data arXiv:2601.05910v1 Announce Type: new Abstract: Modern engineering and scientific workflows often require simultaneous predictions across related tasks and fidelity levels, where high-fidelity data is scarce and expensive, while low-fidelity data is more abundant. This paper introduces an Multi-Task Gaussian Processes (MTGP) framework tailored for engineering systems characterized…

  • Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem

    Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem arXiv:2601.06009v1 Announce Type: new Abstract: We develop a practical framework for distinguishing diffusive stochastic processes from deterministic signals using only a single discrete time series. Our approach is based on classical excursion and crossing theorems for continuous semimartingales, which correlates number $N_varepsilon$ of excursions of magnitude…

  • ROOFS: RObust biOmarker Feature Selection

    ROOFS: RObust biOmarker Feature Selection arXiv:2601.05151v1 Announce Type: new Abstract: Feature selection (FS) is essential for biomarker discovery and in the analysis of biomedical datasets. However, challenges such as high-dimensional feature space, low sample size, multicollinearity, and missing values make FS non-trivial. Moreover, FS performances vary across datasets and predictive tasks. We propose roofs, a…

  • CAOS: Conformal Aggregation of One-Shot Predictors

    CAOS: Conformal Aggregation of One-Shot Predictors arXiv:2601.05219v1 Announce Type: new Abstract: One-shot prediction enables rapid adaptation of pretrained foundation models to new tasks using only one labeled example, but lacks principled uncertainty quantification. While conformal prediction provides finite-sample coverage guarantees, standard split conformal methods are inefficient in the one-shot setting due to data splitting and…

  • Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data

    Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data arXiv:2601.05227v1 Announce Type: new Abstract: I propose a novel framework that integrates stochastic differential equations (SDEs) with deep generative models to improve uncertainty quantification in machine learning applications involving structured and temporal data. This approach, termed Stochastic Latent Differential Inference (SLDI), embeds…

  • Learning Multinomial Logits in $O(n log n)$ time

    Learning Multinomial Logits in $O(n log n)$ time arXiv:2601.04423v1 Announce Type: cross Abstract: A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]={1,…, n}$, each assigned a positive weight. A query specifies an admissible subset — called a slate — and the model chooses one item from that slate with probability…

  • Aligned explanations in neural networks

    Aligned explanations in neural networks arXiv:2601.04378v1 Announce Type: cross Abstract: Feature attribution is the dominant paradigm for explaining deep neural networks. However, most existing methods only loosely reflect the model’s prediction-making process, thereby merely white-painting the black box. We argue that explanatory alignment is a key aspect of trustworthiness in prediction tasks: explanations must be…