Category: cs.AI

  • Learning Order Forest for Qualitative-Attribute Data Clustering

    Learning Order Forest for Qualitative-Attribute Data Clustering arXiv:2603.03387v1 Announce Type: new Abstract: Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e.g., the nominal values of attributes like symptoms, marital status,…

  • From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

    From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference arXiv:2602.22492v1 Announce Type: new Abstract: In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence…

  • Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability

    Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability arXiv:2602.15919v1 Announce Type: new Abstract: Can the privacy vulnerability of individual data points be assessed without retraining models or explicitly simulating attacks? We answer affirmatively by showing that exposure to membership inference attack (MIA) is fundamentally governed by a data point’s influence on the learned model.…

  • Nonparametric Distribution Regression Re-calibration

    Nonparametric Distribution Regression Re-calibration arXiv:2602.13362v1 Announce Type: new Abstract: A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow…

  • Metabolic cost of information processing in Poisson variational autoencoders

    Metabolic cost of information processing in Poisson variational autoencoders arXiv:2602.13421v1 Announce Type: new Abstract: Computation in biological systems is fundamentally energy-constrained, yet standard theories of computation treat energy as freely available. Here, we argue that variational free energy minimization under a Poisson assumption offers a principled path toward an energy-aware theory of computation. Our key…

  • Provable Offline Reinforcement Learning for Structured Cyclic MDPs

    Provable Offline Reinforcement Learning for Structured Cyclic MDPs arXiv:2602.11679v1 Announce Type: new Abstract: We introduce a novel cyclic Markov decision process (MDP) framework for multi-step decision problems with heterogeneous stage-specific dynamics, transitions, and discount factors across the cycle. In this setting, offline learning is challenging: optimizing a policy at any stage shifts the state distributions…

  • When LLMs get significantly worse: A statistical approach to detect model degradations

    When LLMs get significantly worse: A statistical approach to detect model degradations arXiv:2602.10144v1 Announce Type: new Abstract: Minimizing the inference cost and latency of foundation models has become a crucial area of research. Optimization approaches include theoretically lossless methods and others without accuracy guarantees like quantization. In all of these cases it is crucial to…

  • Persistent Entropy as a Detector of Phase Transitions

    Persistent Entropy as a Detector of Phase Transitions arXiv:2602.09058v1 Announce Type: new Abstract: Persistent entropy (PE) is an information-theoretic summary statistic of persistence barcodes that has been widely used to detect regime changes in complex systems. Despite its empirical success, a general theoretical understanding of when and why persistent entropy reliably detects phase transitions has…

  • Quantifying Epistemic Uncertainty in Diffusion Models

    Quantifying Epistemic Uncertainty in Diffusion Models arXiv:2602.09170v1 Announce Type: new Abstract: To ensure high quality outputs, it is important to quantify the epistemic uncertainty of diffusion models.Existing methods are often unreliable because they mix epistemic and aleatoric uncertainty. We introduce a method based on Fisher information that explicitly isolates epistemic variance, producing more reliable plausibility…

  • The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning

    The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning arXiv:2602.09394v1 Announce Type: new Abstract: Manufacturing lines, service journeys, supply chains, and AI reasoning chains share a common challenge: attributing a terminal outcome to the intermediate stage that caused it. We establish an information-theoretic barrier to this credit assignment problem: the signal connecting…

  • Fast and Robust Likelihood-Guided Diffusion Posterior Sampling with Amortized Variational Inference

    Fast and Robust Likelihood-Guided Diffusion Posterior Sampling with Amortized Variational Inference arXiv:2602.07102v1 Announce Type: new Abstract: Zero-shot diffusion posterior sampling offers a flexible framework for inverse problems by accommodating arbitrary degradation operators at test time, but incurs high computational cost due to repeated likelihood-guided updates. In contrast, previous amortized diffusion approaches enable fast inference by…

  • Total Variation Rates for Riemannian Flow Matching

    Total Variation Rates for Riemannian Flow Matching arXiv:2602.05174v1 Announce Type: new Abstract: Riemannian flow matching (RFM) extends flow-based generative modeling to data supported on manifolds by learning a time-dependent tangent vector field whose flow-ODE transports a simple base distribution to the data law. We develop a nonasymptotic Total Variation (TV) convergence analysis for RFM samplers…

  • Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

    Byzantine Machine Learning: MultiKrum and an optimal notion of robustness arXiv:2602.03899v1 Announce Type: new Abstract: Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule…

  • Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding

    Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding arXiv:2601.17160v1 Announce Type: new Abstract: We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full…

  • “Rebuilding” Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training

    “Rebuilding” Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training arXiv:2601.17510v1 Announce Type: new Abstract: This article presents the full, original record of the 2024 Joint Statistical Meetings (JSM) town hall, “Statistics in the Age of AI,” which convened leading statisticians to discuss how the field is evolving in…

  • Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization

    Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization arXiv:2601.15500v1 Announce Type: new Abstract: In recent years, Rectified flow (RF) has gained considerable popularity largely due to its generation efficiency and state-of-the-art performance. In this paper, we investigate the degree to which RF automatically adapts to the intrinsic…

  • Communication-Efficient Federated Risk Difference Estimation for Time-to-Event Clinical Outcomes

    Communication-Efficient Federated Risk Difference Estimation for Time-to-Event Clinical Outcomes arXiv:2601.14609v1 Announce Type: new Abstract: Privacy-preserving model co-training in medical research is often hindered by server-dependent architectures incompatible with protected hospital data systems and by the predominant focus on relative effect measures (hazard ratios) which lack clinical interpretability for absolute survival risk assessment. We propose FedRD,…

  • Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach

    Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach arXiv:2601.11016v1 Announce Type: new Abstract: In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules that prescribe decisions using covariates.…

  • Decentralized Online Convex Optimization with Unknown Feedback Delays

    Decentralized Online Convex Optimization with Unknown Feedback Delays arXiv:2601.07901v1 Announce Type: new Abstract: Decentralized online convex optimization (D-OCO), where multiple agents within a network collaboratively learn optimal decisions in real-time, arises naturally in applications such as federated learning, sensor networks, and multi-agent control. In this paper, we study D-OCO under unknown, time-and agent-varying feedback delays.…

  • A Bayesian Generative Modeling Approach for Arbitrary Conditional Inference

    A Bayesian Generative Modeling Approach for Arbitrary Conditional Inference arXiv:2601.05355v1 Announce Type: new Abstract: Modern data analysis increasingly requires flexible conditional inference P(X_B | X_A) where (X_A, X_B) is an arbitrary partition of observed variable X. Existing conditional inference methods lack this flexibility as they are tied to a fixed conditioning structure and cannot perform…

  • CAOS: Conformal Aggregation of One-Shot Predictors

    CAOS: Conformal Aggregation of One-Shot Predictors arXiv:2601.05219v1 Announce Type: new Abstract: One-shot prediction enables rapid adaptation of pretrained foundation models to new tasks using only one labeled example, but lacks principled uncertainty quantification. While conformal prediction provides finite-sample coverage guarantees, standard split conformal methods are inefficient in the one-shot setting due to data splitting and…

  • Microeconomic Foundations of Multi-Agent Learning

    Microeconomic Foundations of Multi-Agent Learning arXiv:2601.03451v1 Announce Type: new Abstract: Modern AI systems increasingly operate inside markets and institutions where data, behavior, and incentives are endogenous. This paper develops an economic foundation for multi-agent learning by studying a principal-agent interaction in a Markov decision process with strategic externalities, where both the principal and the agent…

  • Mitigating Long-Tailed Anomaly Score Distributions with Importance-Weighted Loss

    Mitigating Long-Tailed Anomaly Score Distributions with Importance-Weighted Loss arXiv:2601.02440v1 Announce Type: new Abstract: Anomaly detection is crucial in industrial applications for identifying rare and unseen patterns to ensure system reliability. Traditional models, trained on a single class of normal data, struggle with real-world distributions where normal data exhibit diverse patterns, leading to class imbalance and…

  • Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights

    Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights arXiv:2601.01029v1 Announce Type: new Abstract: This paper develops a practical framework for using observational data to audit the consumer surplus effects of AI-driven decisions, specifically in targeted pricing and algorithmic lending. Traditional approaches first estimate demand functions and then integrate to compute consumer surplus, but…

  • Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

    Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds arXiv:2512.22473v1 Announce Type: new Abstract: Transformers empirically perform precise probabilistic reasoning in carefully constructed “Bayesian wind tunnels” and in large-scale language models, yet the mechanisms by which gradient-based learning creates the required internal geometry remain opaque. We provide a complete first-order analysis of how cross-entropy training…

  • Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models

    Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models arXiv:2512.21593v1 Announce Type: new Abstract: Diffusion models have become a central tool in deep generative modeling, but standard formulations rely on a single network and a single diffusion schedule to transform a simple prior, typically a standard normal distribution, into the target…

  • One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing

    One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing arXiv:2512.13892v1 Announce Type: new Abstract: Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise operate as black boxes. While permutation-based methods are a standard tool for this…

  • Functional Random Forest with Adaptive Cost-Sensitive Splitting for Imbalanced Functional Data Classification

    Functional Random Forest with Adaptive Cost-Sensitive Splitting for Imbalanced Functional Data Classification arXiv:2512.07888v1 Announce Type: new Abstract: Classification of functional data where observations are curves or trajectories poses unique challenges, particularly under severe class imbalance. Traditional Random Forest algorithms, while robust for tabular data, often fail to capture the intrinsic structure of functional observations and…

  • Bayesian Optimization for Function-Valued Responses under Min-Max Criteria

    Bayesian Optimization for Function-Valued Responses under Min-Max Criteria arXiv:2512.07868v1 Announce Type: cross Abstract: Bayesian optimization is widely used for optimizing expensive black box functions, but most existing approaches focus on scalar responses. In many scientific and engineering settings the response is functional, varying smoothly over an index such as time or wavelength, which makes classical…

  • How to Tame Your LLM: Semantic Collapse in Continuous Systems

    How to Tame Your LLM: Semantic Collapse in Continuous Systems arXiv:2512.05162v1 Announce Type: new Abstract: We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve under probabilistic transition operators. The associated transfer operator $P: L^2(M,mu) to L^2(M,mu)$ encodes…

  • A note on the impossibility of conditional PAC-efficient reasoning in large language models

    A note on the impossibility of conditional PAC-efficient reasoning in large language models arXiv:2512.03057v1 Announce Type: new Abstract: We prove an impossibility result for conditional Probably Approximately Correct (PAC)-efficient reasoning in large language models. While recent work has established marginal PAC efficiency guarantees for composite models that switch between expensive expert models and cheaper fast…

  • Spatiotemporal Pyramid Flow Matching for Climate Emulation

    Spatiotemporal Pyramid Flow Matching for Climate Emulation arXiv:2512.02268v1 Announce Type: cross Abstract: Generative models have the potential to transform the way we emulate Earth’s changing climate. Previous generative approaches rely on weather-scale autoregression for climate emulation, but this is inherently slow for long climate horizons and has yet to demonstrate stable rollouts under nonstationary forcings.…

  • FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection

    FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection arXiv:2511.19476v1 Announce Type: new Abstract: Coreset selection compresses large datasets into compact, representative subsets, reducing the energy and computational burden of training deep neural networks. Existing methods are either: (i) DNN-based, which are tied to model-specific parameters and introduce architectural bias; or (ii) DNN-free, which rely on…

  • DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing

    DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing arXiv:2511.17038v1 Announce Type: cross Abstract: From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formulation fails to explain its practical behavior: the prior offers limited guidance, while reconstruction is largely…

  • Implicit Bias of the JKO Scheme

    Implicit Bias of the JKO Scheme arXiv:2511.14827v1 Announce Type: new Abstract: Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $eta>0$ a sequence of probability distributions $rho_k^eta$ that…

  • Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

    Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit arXiv:2511.15120v1 Announce Type: new Abstract: In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbol{x})=g(boldsymbol{U}boldsymbol{x})$ with hidden subspace $boldsymbol{U}in mathbb{R}^{rtimes d}$, which is the…

  • Self-adaptive weighting and sampling for physics-informed neural networks

    Self-adaptive weighting and sampling for physics-informed neural networks arXiv:2511.05452v1 Announce Type: new Abstract: Physics-informed deep learning has emerged as a promising framework for solving partial differential equations (PDEs). Nevertheless, training these models on complex problems remains challenging, often leading to limited accuracy and efficiency. In this work, we introduce a hybrid adaptive sampling and weighting…

  • Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems

    Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems arXiv:2510.26061v1 Announce Type: new Abstract: We propose a data-driven framework for efficiently solving quadratic programming (QP) problems by reducing the number of variables in high-dimensional QPs using instance-specific projection. A graph neural network-based model is designed to generate projections tailored to each QP instance, enabling…

  • Using latent representations to link disjoint longitudinal data for mixed-effects regression

    Using latent representations to link disjoint longitudinal data for mixed-effects regression arXiv:2510.25531v1 Announce Type: new Abstract: Many rare diseases offer limited established treatment options, leading patients to switch therapies when new medications emerge. To analyze the impact of such treatment switches within the low sample size limitations of rare disease trials, it is important to…

  • Frequentist Validity of Epistemic Uncertainty Estimators

    Frequentist Validity of Epistemic Uncertainty Estimators arXiv:2510.22063v1 Announce Type: new Abstract: Decomposing prediction uncertainty into its aleatoric (irreducible) and epistemic (reducible) components is critical for the development and deployment of machine learning systems. A popular, principled measure for epistemic uncertainty is the mutual information between the response variable and model parameters. However, evaluating this measure…

  • From Universal Approximation Theorem to Tropical Geometry of Multi-Layer Perceptrons

    From Universal Approximation Theorem to Tropical Geometry of Multi-Layer Perceptrons arXiv:2510.15012v1 Announce Type: new Abstract: We revisit the Universal Approximation Theorem(UAT) through the lens of the tropical geometry of neural networks and introduce a constructive, geometry-aware initialization for sigmoidal multi-layer perceptrons (MLPs). Tropical geometry shows that Rectified Linear Unit (ReLU) networks admit decision functions with…

  • The Coverage Principle: How Pre-training Enables Post-Training

    The Coverage Principle: How Pre-training Enables Post-Training arXiv:2510.15020v1 Announce Type: new Abstract: Language models demonstrate remarkable abilities when pre-trained on large text corpora and fine-tuned for specific tasks, but how and why pre-training shapes the success of the final model remains poorly understood. Notably, although pre-training success is often quantified by cross entropy loss, cross-entropy…

  • A Multi-dimensional Semantic Surprise Framework Based on Low-Entropy Semantic Manifolds for Fine-Grained Out-of-Distribution Detection

    A Multi-dimensional Semantic Surprise Framework Based on Low-Entropy Semantic Manifolds for Fine-Grained Out-of-Distribution Detection arXiv:2510.13093v1 Announce Type: new Abstract: Out-of-Distribution (OOD) detection is a cornerstone for the safe deployment of AI systems in the open world. However, existing methods treat OOD detection as a binary classification problem, a cognitive flattening that fails to distinguish between…

  • Bayesian Nonparametric Dynamical Clustering of Time Series

    Bayesian Nonparametric Dynamical Clustering of Time Series arXiv:2510.06919v1 Announce Type: new Abstract: We present a method that models the evolution of an unbounded number of time series clusters by switching among an unknown number of regimes with linear dynamics. We develop a Bayesian non-parametric approach using a hierarchical Dirichlet process as a prior on the…

  • Domain-Shift-Aware Conformal Prediction for Large Language Models

    Domain-Shift-Aware Conformal Prediction for Large Language Models arXiv:2510.05566v1 Announce Type: new Abstract: Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under…

  • Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

    Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting arXiv:2510.01414v1 Announce Type: new Abstract: This paper analyzes the generalization error of minimum-norm interpolating solutions in linear regression using spiked covariance data models. The paper characterizes how varying spike strengths and target-spike alignments can affect risk, especially in overparameterized settings. The study presents…

  • Identifying All {epsilon}-Best Arms in (Misspecified) Linear Bandits

    Identifying All {epsilon}-Best Arms in (Misspecified) Linear Bandits arXiv:2510.00073v1 Announce Type: new Abstract: Motivated by the need to efficiently identify multiple candidates in high trial-and-error cost tasks such as drug discovery, we propose a near-optimal algorithm to identify all {epsilon}-best arms (i.e., those at most {epsilon} worse than the optimum). Specifically, we introduce LinFACT, an…

  • Variance-Bounded Evaluation without Ground Truth: VB-Score

    Variance-Bounded Evaluation without Ground Truth: VB-Score arXiv:2509.22751v1 Announce Type: new Abstract: Reliable evaluation is a central challenge in machine learning when tasks lack ground truth labels or involve ambiguity and noise. Conventional frameworks, rooted in the Cranfield paradigm and label-based metrics, fail in such cases because they cannot assess how robustly a system performs under…

  • Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression

    Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression arXiv:2509.22794v1 Announce Type: new Abstract: We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing…

  • A theoretical guarantee for SyncRank

    A theoretical guarantee for SyncRank arXiv:2509.22766v1 Announce Type: new Abstract: We present a theoretical and empirical analysis of the SyncRank algorithm for recovering a global ranking from noisy pairwise comparisons. By adopting a complex-valued data model where the true ranking is encoded in the phases of a unit-modulus vector, we establish a sharp non-asymptotic recovery…

  • Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models

    Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models arXiv:2509.21423v1 Announce Type: new Abstract: We study the problem of causal structure learning from a combination of observational and interventional data generated by a linear non-Gaussian structural equation model that might contain cycles. Recent results show that using mere observational data identifies the causal graph only up…

  • Towards a Physics Foundation Model

    Towards a Physics Foundation Model arXiv:2509.13805v1 Announce Type: cross Abstract: Foundation models have revolutionized natural language processing through a “train once, deploy anywhere” paradigm, where a single pre-trained model adapts to countless downstream tasks without retraining. Access to a Physics Foundation Model (PFM) would be transformative — democratizing access to high-fidelity simulations, accelerating scientific discovery,…

  • Causal-Symbolic Meta-Learning (CSML): Inducing Causal World Models for Few-Shot Generalization

    Causal-Symbolic Meta-Learning (CSML): Inducing Causal World Models for Few-Shot Generalization arXiv:2509.12387v1 Announce Type: cross Abstract: Modern deep learning models excel at pattern recognition but remain fundamentally limited by their reliance on spurious correlations, leading to poor generalization and a demand for massive datasets. We argue that a key ingredient for human-like intelligence-robust, sample-efficient learning-stems from…

  • Uncertainty Estimation using Variance-Gated Distributions

    Uncertainty Estimation using Variance-Gated Distributions arXiv:2509.08846v1 Announce Type: cross Abstract: Evaluation of per-sample uncertainty quantification from neural networks is essential for decision-making involving high-risk applications. A common approach is to use the predictive distribution from Bayesian or approximation models and decompose the corresponding predictive uncertainty into epistemic (model-related) and aleatoric (data-related) components. However, additive decomposition…

  • Instance-Optimal Matrix Multiplicative Weight Update and Its Quantum Applications

    Instance-Optimal Matrix Multiplicative Weight Update and Its Quantum Applications arXiv:2509.08911v1 Announce Type: cross Abstract: The Matrix Multiplicative Weight Update (MMWU) is a seminal online learning algorithm with numerous applications. Applied to the matrix version of the Learning from Expert Advice (LEA) problem on the $d$-dimensional spectraplex, it is well known that MMWU achieves the minimax-optimal…

  • Diffusion Generative Models Meet Compressed Sensing, with Applications to Image Data and Financial Time Series

    Diffusion Generative Models Meet Compressed Sensing, with Applications to Image Data and Financial Time Series arXiv:2509.03898v1 Announce Type: new Abstract: This paper develops dimension reduction techniques for accelerating diffusion model inference in the context of synthetic data generation. The idea is to integrate compressed sensing into diffusion models: (i) compress the data into a latent…

  • BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

    BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design arXiv:2508.21184v1 Announce Type: cross Abstract: We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to…

  • The Information Dynamics of Generative Diffusion

    The Information Dynamics of Generative Diffusion arXiv:2508.19897v1 Announce Type: new Abstract: Generative diffusion models have emerged as a powerful class of models in machine learning, yet a unified theoretical understanding of their operation is still developing. This perspective paper provides an integrated perspective on generative diffusion by connecting their dynamic, information-theoretic, and thermodynamic properties under…

  • Track Component Failure Detection Using Data Analytics over existing STDS Track Circuit data

    Track Component Failure Detection Using Data Analytics over existing STDS Track Circuit data arXiv:2508.11693v1 Announce Type: cross Abstract: Track Circuits (TC) are the main signalling devices used to detect the presence of a train on a rail track. It has been used since the 19th century and nowadays there are many types depending on the…

  • Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI

    Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI arXiv:2508.14936v1 Announce Type: cross Abstract: Generative artificial intelligence for synthetic data generation holds substantial potential to address practical challenges in epidemiology. However, many current methods suffer from limited quality, high computational demands, and complexity for non-experts. Furthermore, common evaluation strategies…

  • Preference Models assume Proportional Hazards of Utilities

    Preference Models assume Proportional Hazards of Utilities arXiv:2508.13189v1 Announce Type: new Abstract: Approaches for estimating preferences from human annotated data typically involves inducing a distribution over a ranked list of choices such as the Plackett-Luce model. Indeed, modern AI alignment tools such as Reward Modelling and Direct Preference Optimization are based on the statistical assumptions…

  • An Introduction to Sliced Optimal Transport

    An Introduction to Sliced Optimal Transport arXiv:2508.12519v1 Announce Type: new Abstract: Sliced Optimal Transport (SOT) is a rapidly developing branch of optimal transport (OT) that exploits the tractability of one-dimensional OT problems. By combining tools from OT, integral geometry, and computational statistics, SOT enables fast and scalable computation of distances, barycenters, and kernels for probability…

  • ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

    ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization arXiv:2508.11551v1 Announce Type: new Abstract: Determining the optimal data mixture for large language model training remains a challenging problem with an outsized impact on performance. In practice, language model developers continue to rely on heuristic exploration since no learning-based approach has emerged as a…

  • Supervised Dynamic Dimension Reduction with Deep Neural Network

    Supervised Dynamic Dimension Reduction with Deep Neural Network arXiv:2508.03546v1 Announce Type: new Abstract: This paper studies the problem of dimension reduction, tailored to improving time series forecasting with high-dimensional predictors. We propose a novel Supervised Deep Dynamic Principal component analysis (SDDP) framework that incorporates the target variable and lagged observations into the factor extraction process.…

  • LVM-GP: Uncertainty-Aware PDE Solver via coupling latent variable model and Gaussian process

    LVM-GP: Uncertainty-Aware PDE Solver via coupling latent variable model and Gaussian process arXiv:2507.22493v1 Announce Type: new Abstract: We propose a novel probabilistic framework, termed LVM-GP, for uncertainty quantification in solving forward and inverse partial differential equations (PDEs) with noisy data. The core idea is to construct a stochastic mapping from the input to a high-dimensional…

  • Flow Stochastic Segmentation Networks

    Flow Stochastic Segmentation Networks arXiv:2507.18838v1 Announce Type: cross Abstract: We introduce the Flow Stochastic Segmentation Network (Flow-SSN), a generative segmentation model family featuring discrete-time autoregressive and modern continuous-time flow variants. We prove fundamental limitations of the low-rank parameterisation of previous methods and show that Flow-SSNs can estimate arbitrarily high-rank pixel-wise covariances without assuming the rank…

  • Bayesian preference elicitation for decision support in multiobjective optimization

    Bayesian preference elicitation for decision support in multiobjective optimization arXiv:2507.16999v1 Announce Type: new Abstract: We present a novel approach to help decision-makers efficiently identify preferred solutions from the Pareto set of a multi-objective optimization problem. Our method uses a Bayesian model to estimate the decision-maker’s utility function based on pairwise comparisons. Aided by this model,…

  • Estimating Treatment Effects with Independent Component Analysis

    Estimating Treatment Effects with Independent Component Analysis arXiv:2507.16467v1 Announce Type: new Abstract: The field of causal inference has developed a variety of methods to accurately estimate treatment effects in the presence of nuisance. Meanwhile, the field of identifiability theory has developed methods like Independent Component Analysis (ICA) to identify latent sources and mixing weights from…

  • Statistical and Algorithmic Foundations of Reinforcement Learning

    Statistical and Algorithmic Foundations of Reinforcement Learning arXiv:2507.14444v1 Announce Type: new Abstract: As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL…

  • Diffusion Models for Time Series Forecasting: A Survey

    Diffusion Models for Time Series Forecasting: A Survey arXiv:2507.14507v1 Announce Type: new Abstract: Diffusion models, initially developed for image synthesis, demonstrate remarkable generative capabilities. Recently, their application has expanded to time series forecasting (TSF), yielding promising results. In this survey, we firstly introduce the standard diffusion models and their prevalent variants, explaining their adaptation to…

  • Distribution-Free Uncertainty-Aware Virtual Sensing via Conformalized Neural Operators

    Distribution-Free Uncertainty-Aware Virtual Sensing via Conformalized Neural Operators arXiv:2507.11574v1 Announce Type: cross Abstract: Robust uncertainty quantification (UQ) remains a critical barrier to the safe deployment of deep learning in real-time virtual sensing, particularly in high-stakes domains where sparse, noisy, or non-collocated sensor data are the norm. We introduce the Conformalized Monte Carlo Operator (CMCO), a…

  • TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

    TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models arXiv:2507.10643v1 Announce Type: new Abstract: Existing post-hoc model-agnostic methods generate external explanations for opaque models, primarily by locally attributing the model output to its input features. However, they often lack an explicit and systematic framework for quantifying the contribution of individual features. Building…

  • Enjoying Non-linearity in Multinomial Logistic Bandits

    Enjoying Non-linearity in Multinomial Logistic Bandits arXiv:2507.05306v1 Announce Type: new Abstract: We consider the multinomial logistic bandit problem, a variant of generalized linear bandits where a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on…

  • Valid Selection among Conformal Sets

    Valid Selection among Conformal Sets arXiv:2506.20173v1 Announce Type: new Abstract: Conformal prediction offers a distribution-free framework for constructing prediction sets with coverage guarantees. In practice, multiple valid conformal prediction sets may be available, arising from different models or methodologies. However, selecting the most desirable set, such as the smallest, can invalidate the coverage guarantees. To…

  • CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization

    CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization arXiv:2506.16189v1 Announce Type: new Abstract: We study the problem of conformal prediction (CP) under geometric data shifts, where data samples are susceptible to transformations such as rotations or flips. While CP endows prediction models with post-hoc uncertainty quantification and formal coverage guarantees, their practicality breaks under distribution…

  • Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models

    Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models arXiv:2506.13900v1 Announce Type: new Abstract: Cooperative game theory has become a cornerstone of post-hoc interpretability in machine learning, largely through the use of Shapley values. Yet, despite their widespread adoption, Shapley-based methods often rest on axiomatic justifications whose relevance to feature attribution remains…

  • Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms

    Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms arXiv:2506.13984v1 Announce Type: new Abstract: In this paper, we develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning. For this purpose we formulated the constrained optimization problem, in which we exploits the Bregman divergence with the Tempesta multi-parametric deformation logarithm…

  • Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory

    Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory arXiv:2506.12350v1 Announce Type: new Abstract: Despite its empirical success, Reinforcement Learning from Human Feedback (RLHF) has been shown to violate almost all the fundamental axioms in social choice theory — such as majority consistency, pairwise majority consistency, and Condorcet consistency. This raises…

  • Know What You Don’t Know: Uncertainty Calibration of Process Reward Models

    Know What You Don’t Know: Uncertainty Calibration of Process Reward Models arXiv:2506.09338v1 Announce Type: new Abstract: Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present…

  • Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting

    Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting arXiv:2506.08049v1 Announce Type: new Abstract: Subseasonal-to-seasonal (S2S) forecasting, which predicts climate conditions from several weeks to months in advance, presents significant challenges due to the chaotic dynamics of atmospheric systems and complex interactions across multiple scales. Current approaches often fail to explicitly model underlying physical processes and teleconnections…

  • WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection

    WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection arXiv:2506.08066v1 Announce Type: new Abstract: Change Point Detection (CPD) aims to identify moments of abrupt distribution shifts in data streams. Real-world high-dimensional CPD remains challenging due to data pattern complexity and violation of common assumptions. Resorting to standalone deep neural networks, the current state-of-the-art detectors…

  • On the Fundamental Impossibility of Hallucination Control in Large Language Models

    On the Fundamental Impossibility of Hallucination Control in Large Language Models arXiv:2506.06382v1 Announce Type: new Abstract: This paper explains textbf{why it is impossible to create large language models that do not hallucinate and what are the trade-offs we should be looking for}. It presents a formal textbf{impossibility theorem} demonstrating that no inference mechanism can simultaneously…

  • Zeroth-Order Optimization Finds Flat Minima

    Zeroth-Order Optimization Finds Flat Minima arXiv:2506.05454v1 Announce Type: cross Abstract: Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit…

  • Beyond Winning: Margin of Victory Relative to Expectation Unlocks Accurate Skill Ratings

    Beyond Winning: Margin of Victory Relative to Expectation Unlocks Accurate Skill Ratings arXiv:2506.00348v1 Announce Type: new Abstract: Knowledge of accurate relative skills in any competitive system is essential, but foundational approaches such as ELO discard extremely relevant performance data by concentrating exclusively on binary outcomes. While margin of victory (MOV) extensions exist, they often lack…

  • Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

    Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning arXiv:2505.23783v1 Announce Type: new Abstract: In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performances in classification. While calibration techniques are proposed to…

  • Learning Probabilities of Causation from Finite Population Data

    Learning Probabilities of Causation from Finite Population Data arXiv:2505.17133v1 Announce Type: new Abstract: Probabilities of causation play a crucial role in modern decision-making. This paper addresses the challenge of predicting probabilities of causation for subpopulations with textbf{insufficient} data using machine learning models. Tian and Pearl first defined and derived tight bounds for three fundamental probabilities…

  • Continuous Domain Generalization

    Continuous Domain Generalization arXiv:2505.13519v1 Announce Type: new Abstract: Real-world data distributions often shift continuously across multiple latent factors such as time, geography, and socioeconomic context. However, existing domain generalization approaches typically treat domains as discrete or evolving along a single axis (e.g., time), which fails to capture the complex, multi-dimensional nature of real-world variation. This…

  • Data Balancing Strategies: A Survey of Resampling and Augmentation Methods

    Data Balancing Strategies: A Survey of Resampling and Augmentation Methods arXiv:2505.13518v1 Announce Type: new Abstract: Imbalanced data poses a significant obstacle in machine learning, as an unequal distribution of class labels often results in skewed predictions and diminished model accuracy. To mitigate this problem, various resampling strategies have been developed, encompassing both oversampling and undersampling…

  • Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback

    Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback arXiv:2505.13562v1 Announce Type: new Abstract: Learning in games is a fundamental problem in machine learning and artificial intelligence, with numerous applications~citep{silver2016mastering,schrittwieser2020mastering}. This work investigates two-player zero-sum matrix games with an unknown payoff matrix and bandit feedback, where each player observes their actions and the…

  • Optimal Transport for Machine Learners

    Optimal Transport for Machine Learners arXiv:2505.06589v1 Announce Type: new Abstract: Optimal Transport is a foundational mathematical theory that connects optimization, partial differential equations, and probability. It offers a powerful framework for comparing probability distributions and has recently become an important tool in machine learning, especially for designing and evaluating generative models. These course notes cover…

  • Feature Representation Transferring to Lightweight Models via Perception Coherence

    Feature Representation Transferring to Lightweight Models via Perception Coherence arXiv:2505.06595v1 Announce Type: new Abstract: In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called textit{perception coherence}. Based on this notion, we propose a loss function, which takes into account…

  • Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs

    Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs arXiv:2505.03814v1 Announce Type: new Abstract: As foundation models continue to scale, the size of trained models grows exponentially, presenting significant challenges for their evaluation. Current evaluation practices involve curating increasingly large datasets to assess the performance of large language models (LLMs). However, there is a lack of…

  • Decoding Latent Spaces: Assessing the Interpretability of Time Series Foundation Models for Visual Analytics

    Decoding Latent Spaces: Assessing the Interpretability of Time Series Foundation Models for Visual Analytics arXiv:2504.20099v1 Announce Type: cross Abstract: The present study explores the interpretability of latent spaces produced by time series foundation models, focusing on their potential for visual analysis tasks. Specifically, we evaluate the MOMENT family of models, a set of transformer-based, pre-trained…

  • (Im)possibility of Automated Hallucination Detection in Large Language Models

    (Im)possibility of Automated Hallucination Detection in Large Language Models arXiv:2504.17004v1 Announce Type: cross Abstract: Is automated hallucination detection possible? In this work, we introduce a theoretical framework to analyze the feasibility of automatically detecting hallucinations produced by large language models (LLMs). Inspired by the classical Gold-Angluin framework for language identification and its recent adaptation to…

  • Towards Accurate Forecasting of Renewable Energy : Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France

    Towards Accurate Forecasting of Renewable Energy : Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France arXiv:2504.16100v1 Announce Type: cross Abstract: Accurate prediction of non-dispatchable renewable energy sources is essential for grid stability and price prediction. Regional power supply forecasts are usually indirect through a bottom-up approach of plant-level forecasts,…

  • Physics-Informed Inference Time Scaling via Simulation-Calibrated Scientific Machine Learning

    Physics-Informed Inference Time Scaling via Simulation-Calibrated Scientific Machine Learning arXiv:2504.16172v1 Announce Type: cross Abstract: High-dimensional partial differential equations (PDEs) pose significant computational challenges across fields ranging from quantum chemistry to economics and finance. Although scientific machine learning (SciML) techniques offer approximate solutions, they often suffer from bias and neglect crucial physical insights. Inspired by inference-time…

  • How Private is Your Attention? Bridging Privacy with In-Context Learning

    How Private is Your Attention? Bridging Privacy with In-Context Learning arXiv:2504.16000v1 Announce Type: new Abstract: In-context learning (ICL)-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. While recent works have investigated the mechanisms underlying ICL, its feasibility under formal privacy constraints…

  • Significativity Indices for Agreement Values

    Significativity Indices for Agreement Values arXiv:2504.15325v1 Announce Type: cross Abstract: Agreement measures, such as Cohen’s kappa or intraclass correlation, gauge the matching between two or more classifiers. They are used in a wide range of contexts from medicine, where they evaluate the effectiveness of medical treatments and clinical trials, to artificial intelligence, where they can…

  • Near-optimal algorithms for private estimation and sequential testing of collision probability

    Near-optimal algorithms for private estimation and sequential testing of collision probability arXiv:2504.13804v1 Announce Type: new Abstract: We present new algorithms for estimating and testing emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(alpha, beta)$-local differential privacy and…