Category: cs.LG

  • Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation

    Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation arXiv:2507.03169v1 Announce Type: new Abstract: The rise of generative AI search engines is disrupting traditional SEO, with Gartner predicting 25% reduction in conventional search usage by 2026. This necessitates new approaches for web content visibility in AI-driven search environments. We present a domain-specific fine-tuning approach…

  • LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference

    LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference arXiv:2507.03271v1 Announce Type: new Abstract: Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one…

  • Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data

    Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data arXiv:2507.03681v1 Announce Type: new Abstract: Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover effect heterogeneity over patient characteristics, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic…

  • Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression

    Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression arXiv:2507.03736v1 Announce Type: new Abstract: In this work, we propose a novel methodology for robustly estimating particle size distributions from optical scattering measurements using constrained Gaussian process regression. The estimation of particle size distributions is commonly formulated as a Fredholm integral equation of…

  • Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis

    Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis arXiv:2507.03756v1 Announce Type: new Abstract: The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data — implying that some form of…

  • Hybrid least squares for learning functions from highly noisy data

    Hybrid least squares for learning functions from highly noisy data arXiv:2507.02215v1 Announce Type: new Abstract: Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are powerful in the small noise regime are suboptimal when large noise is present. We propose…

  • Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation

    Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation arXiv:2507.02084v1 Announce Type: new Abstract: The adaptive Iterative Soft-Thresholding Algorithm (ISTA) has been a popular algorithm for finding a desirable solution to the LASSO problem without explicitly tuning the regularization parameter $lambda$. Despite that the adaptive ISTA is a successful practical algorithm, few theoretical results exist.…

  • Transfer Learning for Matrix Completion

    Transfer Learning for Matrix Completion arXiv:2507.02248v1 Announce Type: new Abstract: In this paper, we explore the knowledge transfer under the setting of matrix completion, which aims to enhance the estimation of a low-rank target matrix with auxiliary data available. We propose a transfer learning procedure given prior information on which source datasets are favorable. We…

  • It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

    It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation arXiv:2507.02275v1 Announce Type: new Abstract: Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a…

  • Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited

    Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited arXiv:2507.02377v1 Announce Type: new Abstract: Inducing-point-based sparse variational Gaussian processes have become the standard workhorse for scaling up GP models. Recent advances show that these methods can be improved by introducing a diagonal scaling matrix to the conditional posterior density given the inducing points. This paper first…

  • Asymptotic convexity of wide and shallow neural networks

    Asymptotic convexity of wide and shallow neural networks arXiv:2507.01044v1 Announce Type: new Abstract: For a simple model of shallow and wide neural networks, we show that the epigraph of its input-output map as a function of the network parameters approximates epigraph of a. convex function in a precise sense. This leads to a plausible explanation…

  • Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles

    Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles arXiv:2507.01542v1 Announce Type: new Abstract: Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting…

  • A generative modeling / Physics-Informed Neural Network approach to random differential equations

    A generative modeling / Physics-Informed Neural Network approach to random differential equations arXiv:2507.01687v1 Announce Type: new Abstract: The integration of Scientific Machine Learning (SciML) techniques with uncertainty quantification (UQ) represents a rapidly evolving frontier in computational science. This work advances Physics-Informed Neural Networks (PINNs) by incorporating probabilistic frameworks to effectively model uncertainty in complex systems.…

  • When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery

    When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery arXiv:2507.01613v1 Announce Type: new Abstract: Paired comparison data, where users evaluate items in pairs, play a central role in ranking and preference learning tasks. While ordinal comparison data intuitively offer richer information than binary comparisons, this paper challenges that conventional wisdom. We…

  • Proof of a perfect platonic representation hypothesis

    Proof of a perfect platonic representation hypothesis arXiv:2507.01098v1 Announce Type: cross Abstract: In this note, we elaborate on and explain in detail the proof given by Ziyin et al. (2025) of the “perfect” Platonic Representation Hypothesis (PRH) for the embedded deep linear network model (EDLN). We show that if trained with SGD, two EDLNs with…

  • Disentangled Feature Importance

    Disentangled Feature Importance arXiv:2507.00260v1 Announce Type: new Abstract: Feature importance quantification faces a fundamental challenge: when predictors are correlated, standard methods systematically underestimate their contributions. We prove that major existing approaches target identical population functionals under squared-error loss, revealing why they share this correlation-induced bias. To address this limitation, we introduce emph{Disentangled Feature Importance (DFI)},…

  • Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets

    Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets arXiv:2507.00298v1 Announce Type: new Abstract: This study addresses the challenge of statistically extracting generative factors from complex, high-dimensional datasets in unsupervised or semi-supervised settings. We investigate encoder-decoder-based generative models for nonlinear dimensionality reduction, focusing on disentangling low-dimensional latent variables…

  • GRAND: Graph Release with Assured Node Differential Privacy

    GRAND: Graph Release with Assured Node Differential Privacy arXiv:2507.00402v1 Announce Type: new Abstract: Differential privacy is a well-established framework for safeguarding sensitive information in data. While extensively applied across various domains, its application to network data — particularly at the node level — remains underexplored. Existing methods for node-level privacy either focus exclusively on query-based…

  • Forward Reverse Kernel Regression for the Schr”{o}dinger bridge problem

    Forward Reverse Kernel Regression for the Schr”{o}dinger bridge problem arXiv:2507.00640v1 Announce Type: new Abstract: In this paper, we study the Schr”odinger Bridge Problem (SBP), which is central to entropic optimal transport. For general reference processes and begin–endpoint distributions, we propose a forward-reverse iterative Monte Carlo procedure to approximate the Schr”odinger potentials in a nonparametric way.…

  • An in depth look at the Procrustes-Wasserstein distance: properties and barycenters

    An in depth look at the Procrustes-Wasserstein distance: properties and barycenters arXiv:2507.00894v1 Announce Type: new Abstract: Due to its invariance to rigid transformations such as rotations and reflections, Procrustes-Wasserstein (PW) was introduced in the literature as an optimal transport (OT) distance, alternative to Wasserstein and more suited to tasks such as the alignment and comparison…

  • Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

    Strategic A/B testing via Maximum Probability-driven Two-armed Bandit arXiv:2506.22536v1 Announce Type: new Abstract: Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their…

  • Adjoint Schr”odinger Bridge Sampler

    Adjoint Schr”odinger Bridge Sampler arXiv:2506.22565v1 Announce Type: new Abstract: Computational methods for learning to sample from the Boltzmann distribution — where the target distribution is known only up to an unnormalized energy function — have advanced significantly recently. Due to the lack of explicit target samples, however, prior diffusion-based methods, known as diffusion samplers, often…

  • Bayesian Invariance Modeling of Multi-Environment Data

    Bayesian Invariance Modeling of Multi-Environment Data arXiv:2506.22675v1 Announce Type: new Abstract: Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features – those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this…

  • CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

    CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation arXiv:2506.22963v1 Announce Type: new Abstract: Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on…

  • AICO: Feature Significance Tests for Supervised Learning

    AICO: Feature Significance Tests for Supervised Learning arXiv:2506.23396v1 Announce Type: new Abstract: The opacity of many supervised learning algorithms remains a key challenge, hindering scientific discovery and limiting broader deployment — particularly in high-stakes domains. This paper develops model- and distribution-agnostic significance tests to assess the influence of input features in any regression or classification…

  • Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19

    Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19 arXiv:2506.21739v1 Announce Type: new Abstract: Authors Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, and Tzu-Hsuan Liu use the Finite Impulse Response (FIR) linear system filtering method to track and predict the number of people infected and recovered from COVID-19, in a…

  • Critically-Damped Higher-Order Langevin Dynamics

    Critically-Damped Higher-Order Langevin Dynamics arXiv:2506.21741v1 Announce Type: new Abstract: Denoising Diffusion Probabilistic Models represent an entirely new class of generative AI methods that have yet to be fully explored. Critical damping has been successfully introduced in Critically-Damped Langevin Dynamics (CLD) and Critically-Damped Third-Order Langevin Dynamics (TOLD++), but has not yet been applied to dynamics of…

  • TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics

    TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics arXiv:2506.21757v1 Announce Type: new Abstract: Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is…

  • Thompson Sampling in Function Spaces via Neural Operators

    Thompson Sampling in Function Spaces via Neural Operators arXiv:2506.21894v1 Announce Type: new Abstract: We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator’s output. We assume that functional evaluations are inexpensive, while queries to the operator (such as running a high-fidelity…

  • Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction

    Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction arXiv:2506.21802v1 Announce Type: new Abstract: Machine learning (ML) models always make a prediction, even when they are likely to be wrong. This causes problems in practical applications, as we do not know if we should trust a prediction. ML with reject option addresses this issue…

  • The final solution of the Hitchhiker’s problem #5

    The final solution of the Hitchhiker’s problem #5 arXiv:2506.20672v1 Announce Type: new Abstract: A recent survey, nicknamed “Hitchhiker’s Guide”, J.J. Arias-Garc{i}a, R. Mesiar, and B. De Baets, A hitchhiker’s guide to quasi-copulas, Fuzzy Sets and Systems 393 (2020) 1-28, has raised the rating of quasi-copula problems in the dependence modeling community in spite of the…

  • Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

    Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon arXiv:2506.20779v1 Announce Type: new Abstract: We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs — a problem well motivated by the minima stability and…

  • Active Learning for Manifold Gaussian Process Regression

    Active Learning for Manifold Gaussian Process Regression arXiv:2506.20928v1 Announce Type: new Abstract: This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the…

  • Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics

    Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics arXiv:2506.20935v1 Announce Type: new Abstract: Forecasting geopolitical conflict from data sources like the Global Database of Events, Language, and Tone (GDELT) is a critical challenge for national security. The inherent sparsity, burstiness,…

  • Lower Bounds on the Size of Markov Equivalence Classes

    Lower Bounds on the Size of Markov Equivalence Classes arXiv:2506.20933v1 Announce Type: new Abstract: Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned about the underlying causal graph from purely…

  • Data-Driven Dynamic Factor Modeling via Manifold Learning

    Data-Driven Dynamic Factor Modeling via Manifold Learning arXiv:2506.19945v1 Announce Type: new Abstract: We propose a data-driven dynamic factor framework where a response variable depends on a high-dimensional set of covariates, without imposing any parametric model on the joint dynamics. Leveraging Anisotropic Diffusion Maps, a nonlinear manifold learning technique introduced by Singer and Coifman, our framework…

  • A Principled Path to Fitted Distributional Evaluation

    A Principled Path to Fitted Distributional Evaluation arXiv:2506.20048v1 Announce Type: new Abstract: In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation — developed for expectation-based reinforcement learning — to…

  • Valid Selection among Conformal Sets

    Valid Selection among Conformal Sets arXiv:2506.20173v1 Announce Type: new Abstract: Conformal prediction offers a distribution-free framework for constructing prediction sets with coverage guarantees. In practice, multiple valid conformal prediction sets may be available, arising from different models or methodologies. However, selecting the most desirable set, such as the smallest, can invalidate the coverage guarantees. To…

  • Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

    Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives arXiv:2506.20114v1 Announce Type: new Abstract: Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an…

  • POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes

    POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes arXiv:2506.20406v1 Announce Type: new Abstract: Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on…

  • Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions

    Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions arXiv:2506.19010v1 Announce Type: new Abstract: Causal decomposition analysis aims to assess the effect of modifying risk factors on reducing social disparities in outcomes. Recently, this analysis has incorporated individual characteristics when modifying risk factors by utilizing optimal treatment regimes (OTRs). Since the…

  • When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

    When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets arXiv:2506.19031v1 Announce Type: new Abstract: While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold.…

  • Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality

    Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality arXiv:2506.19144v1 Announce Type: new Abstract: This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our…

  • Rare dense solutions clusters in asymmetric binary perceptrons — local entropy via fully lifted RDT

    Rare dense solutions clusters in asymmetric binary perceptrons — local entropy via fully lifted RDT arXiv:2506.19276v1 Announce Type: new Abstract: We study classical asymmetric binary perceptron (ABP) and associated emph{local entropy} (LE) as potential source of its algorithmic hardness. Isolation of emph{typical} ABP solutions in SAT phase seemingly suggests a universal algorithmic hardness. Paradoxically, efficient…

  • Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks

    Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks arXiv:2506.19695v1 Announce Type: new Abstract: This paper studies the $ell^p$-Lipschitz constants of ReLU neural networks $Phi: mathbb{R}^d to mathbb{R}$ with random parameters for $p in [1,infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn…

  • Coupled Entropy: A Goldilocks Generalization?

    Coupled Entropy: A Goldilocks Generalization? arXiv:2506.17229v1 Announce Type: new Abstract: Nonextensive Statistical Mechanics (NSM) has developed into a powerful toolset for modeling and analyzing complex systems. Despite its many successes, a puzzle arose early in its development. The constraints on the Tsallis entropy are in the form of an escort distribution with elements proportional to…

  • Differentiable neural network representation of multi-well, locally-convex potentials

    Differentiable neural network representation of multi-well, locally-convex potentials arXiv:2506.17242v1 Announce Type: new Abstract: Multi-well potentials are ubiquitous in science, modeling phenomena such as phase transitions, dynamic instabilities, and multimodal behavior across physics, chemistry, and biology. In contrast to non-smooth minimum-of-mixture representations, we propose a differentiable and convex formulation based on a log-sum-exponential (LSE) mixture of…

  • Gaussian Processes and Reproducing Kernels: Connections and Equivalences

    Gaussian Processes and Reproducing Kernels: Connections and Equivalences arXiv:2506.17366v1 Announce Type: new Abstract: This monograph studies the relations between two approaches using positive definite kernels: probabilistic methods using Gaussian processes, and non-probabilistic methods using reproducing kernel Hilbert spaces (RKHS). They are widely studied and used in machine learning, statistics, and numerical analysis. Connections and equivalences…

  • Scalable Machine Learning Algorithms using Path Signatures

    Scalable Machine Learning Algorithms using Path Signatures arXiv:2506.17634v1 Announce Type: new Abstract: The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures – iterated integrals that provide faithful, hierarchical representations of paths – offering a principled and universal feature map for sequential and structured data. Rooted in rough path…

  • Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes

    Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes arXiv:2506.17764v1 Announce Type: new Abstract: Band-limited functions are fundamental objects that are widely used in systems theory and signal processing. In this paper we refine a recent nonparametric, nonasymptotic method for constructing simultaneous confidence regions for band-limited functions from noisy input-output…

  • From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems

    From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems arXiv:2506.15906v1 Announce Type: new Abstract: Operator learning offers a powerful paradigm for solving parametric partial differential equations (PDEs), but scaling probabilistic neural operators such as the recently proposed Gaussian Processes Operators (GPOs) to high-dimensional, data-intensive regimes remains a significant challenge. In this…

  • Sampling conditioned diffusions via Pathspace Projected Monte Carlo

    Sampling conditioned diffusions via Pathspace Projected Monte Carlo arXiv:2506.15743v1 Announce Type: new Abstract: We present an algorithm to sample stochastic differential equations conditioned on rather general constraints, including integral constraints, endpoint constraints, and stochastic integral constraints. The algorithm is a pathspace Metropolis-adjusted manifold sampling scheme, which samples stochastic paths on the submanifold of realizations that…

  • Diffusion-Based Hypothesis Testing and Change-Point Detection

    Diffusion-Based Hypothesis Testing and Change-Point Detection arXiv:2506.16089v1 Announce Type: new Abstract: Score-based methods have recently seen increasing popularity in modeling and generation. Methods have been constructed to perform hypothesis testing and change-point detection with score functions, but these methods are in general not as powerful as their likelihood-based peers. Recent works consider generalizing the score-based…

  • CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization

    CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization arXiv:2506.16189v1 Announce Type: new Abstract: We study the problem of conformal prediction (CP) under geometric data shifts, where data samples are susceptible to transformations such as rotations or flips. While CP endows prediction models with post-hoc uncertainty quantification and formal coverage guarantees, their practicality breaks under distribution…

  • Random feature approximation for general spectral methods

    Random feature approximation for general spectral methods arXiv:2506.16283v1 Announce Type: new Abstract: Random feature approximation is arguably one of the most widely used techniques for kernel methods in large-scale learning algorithms. In this work, we analyze the generalization properties of random feature methods, extending previous results for Tikhonov regularization to a broad class of spectral…

  • Optimal Convergence Rates of Deep Neural Network Classifiers

    Optimal Convergence Rates of Deep Neural Network Classifiers arXiv:2506.14899v1 Announce Type: new Abstract: In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s in [0,infty]$) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of…

  • Double Machine Learning for Conditional Moment Restrictions: IV regression, Proximal Causal Learning and Beyond

    Double Machine Learning for Conditional Moment Restrictions: IV regression, Proximal Causal Learning and Beyond arXiv:2506.14950v1 Announce Type: new Abstract: Solving conditional moment restrictions (CMRs) is a key problem considered in statistics, causal inference, and econometrics, where the aim is to solve for a function of interest that satisfies some conditional moment equalities. Specifically, many techniques…

  • Performative Validity of Recourse Explanations

    Performative Validity of Recourse Explanations arXiv:2506.15366v1 Announce Type: new Abstract: When applicants get rejected by an algorithmic decision system, recourse explanations provide actionable suggestions for how to change their input features to get a positive evaluation. A crucial yet overlooked phenomenon is that recourse explanations are performative: When many applicants act according to their recommendations,…

  • An Observation on Lloyd’s k-Means Algorithm in High Dimensions

    An Observation on Lloyd’s k-Means Algorithm in High Dimensions arXiv:2506.14952v1 Announce Type: new Abstract: Clustering and estimating cluster means are core problems in statistics and machine learning, with k-means and Expectation Maximization (EM) being two widely used algorithms. In this work, we provide a theoretical explanation for the failure of k-means in high-dimensional settings with…

  • Time-dependent density estimation using binary classifiers

    Time-dependent density estimation using binary classifiers arXiv:2506.15505v1 Announce Type: new Abstract: We propose a data-driven method to learn the time-dependent probability density of a multivariate stochastic process from sample paths, assuming that the initial probability density is known and can be evaluated. Our method uses a novel time-dependent binary classifier trained using a contrastive estimation-based…

  • Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models

    Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models arXiv:2506.13900v1 Announce Type: new Abstract: Cooperative game theory has become a cornerstone of post-hoc interpretability in machine learning, largely through the use of Shapley values. Yet, despite their widespread adoption, Shapley-based methods often rest on axiomatic justifications whose relevance to feature attribution remains…

  • Rademacher learning rates for iterated random functions

    Rademacher learning rates for iterated random functions arXiv:2506.13946v1 Announce Type: new Abstract: Most existing literature on supervised machine learning assumes that the training dataset is drawn from an i.i.d. sample. However, many real-world problems exhibit temporal dependence and strong correlations between the marginal distributions of the data-generating process, suggesting that the i.i.d. assumption is often…

  • Meta Optimality for Demographic Parity Constrained Regression via Post-Processing

    Meta Optimality for Demographic Parity Constrained Regression via Post-Processing arXiv:2506.13947v1 Announce Type: new Abstract: We address the regression problem under the constraint of demographic parity, a commonly used fairness definition. Recent studies have revealed fair minimax optimal regression algorithms, the most accurate algorithms that adhere to the fairness constraint. However, these analyses are tightly coupled…

  • Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies

    Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies arXiv:2506.13955v1 Announce Type: new Abstract: Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from (synthetic) anomalies. We extend…

  • Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms

    Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms arXiv:2506.13984v1 Announce Type: new Abstract: In this paper, we develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning. For this purpose we formulated the constrained optimization problem, in which we exploits the Bregman divergence with the Tempesta multi-parametric deformation logarithm…

  • Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation

    Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation arXiv:2506.12183v1 Announce Type: new Abstract: Evaluating anomaly detection in multivariate time series (MTS) requires careful consideration of temporal dependencies, particularly when detecting subsequence anomalies common in fault detection scenarios. While time series cross-validation (TSCV) techniques aim to preserve temporal ordering during model evaluation, their impact…

  • Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory

    Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory arXiv:2506.12350v1 Announce Type: new Abstract: Despite its empirical success, Reinforcement Learning from Human Feedback (RLHF) has been shown to violate almost all the fundamental axioms in social choice theory — such as majority consistency, pairwise majority consistency, and Condorcet consistency. This raises…

  • A Transfer Learning Framework for Multilayer Networks via Model Averaging

    A Transfer Learning Framework for Multilayer Networks via Model Averaging arXiv:2506.12455v1 Announce Type: new Abstract: Link prediction in multilayer networks is a key challenge in applications such as recommendation systems and protein-protein interaction prediction. While many techniques have been developed, most rely on assumptions about shared structures and require access to raw auxiliary data, limiting…

  • On the existence of consistent adversarial attacks in high-dimensional linear classification

    On the existence of consistent adversarial attacks in high-dimensional linear classification arXiv:2506.12454v1 Announce Type: new Abstract: What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability…

  • Dependent Randomized Rounding for Budget Constrained Experimental Design

    Dependent Randomized Rounding for Budget Constrained Experimental Design arXiv:2506.12677v1 Announce Type: new Abstract: Policymakers in resource-constrained settings require experimental designs that satisfy strict budget limits while ensuring precise estimation of treatment effects. We propose a framework that applies a dependent randomized rounding procedure to convert assignment probabilities into binary treatment decisions. Our proposed solution preserves…

  • A Framework for Non-Linear Attention via Modern Hopfield Networks

    A Framework for Non-Linear Attention via Modern Hopfield Networks arXiv:2506.11043v1 Announce Type: new Abstract: In this work we propose an energy functional along the lines of Modern Hopfield Networks (MNH), the stationary points of which correspond to the attention due to Vaswani et al. [12], thus unifying both frameworks. The minima of this landscape form…

  • Fast Bayesian Optimization of Function Networks with Partial Evaluations

    Fast Bayesian Optimization of Function Networks with Partial Evaluations arXiv:2506.11456v1 Announce Type: new Abstract: Bayesian optimization of function networks (BOFN) is a framework for optimizing expensive-to-evaluate objective functions structured as networks, where some nodes’ outputs serve as inputs for others. Many real-world applications, such as manufacturing and drug discovery, involve function networks with additional properties…

  • Collaborative Prediction: To Join or To Disjoin Datasets

    Collaborative Prediction: To Join or To Disjoin Datasets arXiv:2506.11271v1 Announce Type: new Abstract: With the recent rise of generative Artificial Intelligence (AI), the need of selecting high-quality dataset to improve machine learning models has garnered increasing attention. However, some part of this topic remains underexplored, even for simple prediction models. In this work, we study…

  • On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiologic boundary conditions

    On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiologic boundary conditions arXiv:2506.11683v1 Announce Type: new Abstract: Solving inverse problems in cardiovascular modeling is particularly challenging due to the high computational cost of running high-fidelity simulations. In this work, we focus on Bayesian parameter estimation and explore different methods to reduce the…

  • Using Deep Operators to Create Spatio-temporal Surrogates for Dynamical Systems under Uncertainty

    Using Deep Operators to Create Spatio-temporal Surrogates for Dynamical Systems under Uncertainty arXiv:2506.11761v1 Announce Type: new Abstract: Spatio-temporal data, which consists of responses or measurements gathered at different times and positions, is ubiquitous across diverse applications of civil infrastructure. While SciML methods have made significant progress in tackling the issue of response prediction for individual…

  • Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes

    Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes arXiv:2506.10101v1 Announce Type: new Abstract: In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We…

  • Momentum Multi-Marginal Schr”odinger Bridge Matching

    Momentum Multi-Marginal Schr”odinger Bridge Matching arXiv:2506.10168v1 Announce Type: new Abstract: Understanding complex systems by inferring trajectories from sparse sample snapshots is a fundamental challenge in a wide range of domains, e.g., single-cell biology, meteorology, and economics. Despite advancements in Bridge and Flow matching frameworks, current methodologies rely on pairwise interpolation between adjacent snapshots. This hinders…

  • Measuring Semantic Information Production in Generative Diffusion Models

    Measuring Semantic Information Production in Generative Diffusion Models arXiv:2506.10433v1 Announce Type: new Abstract: It is well known that semantic and structural features of the generated images emerge at different times during the reverse dynamics of diffusion, a phenomenon that has been connected to physical phase transitions in magnets and other materials. In this paper, we…

  • Distributionally-Constrained Adversaries in Online Learning

    Distributionally-Constrained Adversaries in Online Learning arXiv:2506.10293v1 Announce Type: new Abstract: There has been much recent interest in understanding the continuum from adversarial to stochastic settings in online learning, with various frameworks including smoothed settings proposed to bridge this gap. We consider the more general and flexible framework of distributionally constrained adversaries in which instances are…

  • Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration

    Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration arXiv:2506.10572v1 Announce Type: new Abstract: Controlling the output probabilities of softmax-based models is a common problem in modern machine learning. Although the $mathrm{Softmax}$ function provides soft control via its temperature parameter, it lacks the ability to enforce hard constraints, such as box constraints, on output probabilities,…

  • Know What You Don’t Know: Uncertainty Calibration of Process Reward Models

    Know What You Don’t Know: Uncertainty Calibration of Process Reward Models arXiv:2506.09338v1 Announce Type: new Abstract: Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present…

  • Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking

    Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking arXiv:2506.09441v1 Announce Type: new Abstract: Tracking multiple particles in noisy and cluttered scenes remains challenging due to a combinatorial explosion of trajectory hypotheses, which scales super-exponentially with the number of particles and frames. The transformer architecture has shown a significant improvement in robustness against this high combinatorial…

  • Evasion Attacks Against Bayesian Predictive Models

    Evasion Attacks Against Bayesian Predictive Models arXiv:2506.09640v1 Announce Type: new Abstract: There is an increasing interest in analyzing the behavior of machine learning systems against adversarial attacks. However, most of the research in adversarial machine learning has focused on studying weaknesses against evasion or poisoning attacks to predictive models in classical setups, with the susceptibility…

  • LLM-Powered CPI Prediction Inference with Online Text Time Series

    LLM-Powered CPI Prediction Inference with Online Text Time Series arXiv:2506.09516v1 Announce Type: new Abstract: Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text…

  • Scaling Laws for Uncertainty in Deep Learning

    Scaling Laws for Uncertainty in Deep Learning arXiv:2506.09648v1 Announce Type: new Abstract: Deep learning has recently revealed the existence of scaling laws, demonstrating that model performance follows predictable trends based on dataset and model sizes. Inspired by these findings and fascinating phenomena emerging in the over-parameterized regime, we examine a parallel direction: do similar scaling…

  • Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting

    Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting arXiv:2506.08049v1 Announce Type: new Abstract: Subseasonal-to-seasonal (S2S) forecasting, which predicts climate conditions from several weeks to months in advance, presents significant challenges due to the chaotic dynamics of atmospheric systems and complex interactions across multiple scales. Current approaches often fail to explicitly model underlying physical processes and teleconnections…

  • Constrained Pareto Set Identification with Bandit Feedback

    Constrained Pareto Set Identification with Bandit Feedback arXiv:2506.08127v1 Announce Type: new Abstract: In this paper, we address the problem of identifying the Pareto Set under feasibility constraints in a multivariate bandit setting. Specifically, given a $K$-armed bandit with unknown means $mu_1, dots, mu_K in mathbb{R}^d$, the goal is to identify the set of arms whose…

  • WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection

    WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection arXiv:2506.08066v1 Announce Type: new Abstract: Change Point Detection (CPD) aims to identify moments of abrupt distribution shifts in data streams. Real-world high-dimensional CPD remains challenging due to data pattern complexity and violation of common assumptions. Resorting to standalone deep neural networks, the current state-of-the-art detectors…

  • Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces

    Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces arXiv:2506.08325v1 Announce Type: new Abstract: Depth measures are powerful tools for defining level sets in emerging, non–standard, and complex random objects such as high-dimensional multivariate data, functional data, and random graphs. Despite their favorable theoretical properties, the integration of…

  • Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification

    Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification arXiv:2506.08548v1 Announce Type: new Abstract: Many classification tasks involve imbalanced data, in which a class is largely underrepresented. Several techniques consists in creating a rebalanced dataset on which a classifier is trained. In this paper, we study theoretically such a procedure, when the classifier…

  • Direct Fisher Score Estimation for Likelihood Maximization

    Direct Fisher Score Estimation for Likelihood Maximization arXiv:2506.06542v1 Announce Type: new Abstract: We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from…

  • On the Fundamental Impossibility of Hallucination Control in Large Language Models

    On the Fundamental Impossibility of Hallucination Control in Large Language Models arXiv:2506.06382v1 Announce Type: new Abstract: This paper explains textbf{why it is impossible to create large language models that do not hallucinate and what are the trade-offs we should be looking for}. It presents a formal textbf{impossibility theorem} demonstrating that no inference mechanism can simultaneously…

  • Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations

    Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations arXiv:2506.06613v1 Announce Type: new Abstract: Learning distribution families over $mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to…

  • Continuous Semi-Implicit Models

    Continuous Semi-Implicit Models arXiv:2506.06778v1 Announce Type: new Abstract: Semi-implicit distributions have shown great promise in variational inference and generative modeling. Hierarchical semi-implicit models, which stack multiple semi-implicit layers, enhance the expressiveness of semi-implicit distributions and can be used to accelerate diffusion models given pretrained score networks. However, their sequential training often suffers from slow convergence.…

  • The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes

    The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes arXiv:2506.06828v1 Announce Type: new Abstract: I present a novel approach to estimating the temporal and spatial patterns of violent conflict. I show how we can use highly temporally and spatially disaggregated data on conflict events in tandem with Gaussian processes to estimate temporospatial conflict trends.…

  • Online Conformal Model Selection for Nonstationary Time Series

    Online Conformal Model Selection for Nonstationary Time Series arXiv:2506.05544v1 Announce Type: new Abstract: This paper introduces the MPS (Model Prediction Set), a novel framework for online model selection for nonstationary time series. Classical model selection methods, such as information criteria and cross-validation, rely heavily on the stationarity assumption and often fail in dynamic environments which…

  • Nonlinear Causal Discovery through a Sequential Edge Orientation Approach

    Nonlinear Causal Discovery through a Sequential Edge Orientation Approach arXiv:2506.05590v1 Announce Type: new Abstract: Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require…

  • Multilevel neural simulation-based inference

    Multilevel neural simulation-based inference arXiv:2506.06087v1 Announce Type: new Abstract: Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing…

  • Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series

    Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series arXiv:2506.05354v1 Announce Type: cross Abstract: Nonstationarity of real-life time series requires model adaptation. In classical approaches like ARMA-ARCH there is assumed some arbitrarily chosen dependence type. To avoid their bias, we will focus on novel more agnostic approach: moving…

  • Zeroth-Order Optimization Finds Flat Minima

    Zeroth-Order Optimization Finds Flat Minima arXiv:2506.05454v1 Announce Type: cross Abstract: Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit…