Category: stat.ML

  • Predicting Forced Responses of Probability Distributions via the Fluctuation-Dissipation Theorem and Generative Modeling

    Predicting Forced Responses of Probability Distributions via the Fluctuation-Dissipation Theorem and Generative Modeling arXiv:2504.13333v1 Announce Type: new Abstract: We present a novel data-driven framework for estimating the response of higher-order moments of nonlinear stochastic systems to small external perturbations. The classical Generalized Fluctuation-Dissipation Theorem (GFDT) links the unperturbed steady-state distribution to the system’s linear response.…

  • Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems

    Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems arXiv:2504.13320v1 Announce Type: new Abstract: We introduce a gradient-free framework for Bayesian Optimal Experimental Design (BOED) in sequential settings, aimed at complex systems where gradient information is unavailable. Our method combines Ensemble Kalman Inversion (EKI) for design optimization with the Affine-Invariant Langevin Dynamics (ALDI) sampler for…

  • On the minimax optimality of Flow Matching through the connection to kernel density estimation

    On the minimax optimality of Flow Matching through the connection to kernel density estimation arXiv:2504.13336v1 Announce Type: new Abstract: Flow Matching has recently gained attention in generative modeling as a simple and flexible alternative to diffusion models, the current state of the art. While existing statistical guarantees adapt tools from the analysis of diffusion models,…

  • On the Convergence of Irregular Sampling in Reproducing Kernel Hilbert Spaces

    On the Convergence of Irregular Sampling in Reproducing Kernel Hilbert Spaces arXiv:2504.13623v1 Announce Type: new Abstract: We analyse the convergence of sampling algorithms for functions in reproducing kernel Hilbert spaces (RKHS). To this end, we discuss approximation properties of kernel regression under minimalistic assumptions on both the kernel and the input data. We first prove…

  • Near-optimal algorithms for private estimation and sequential testing of collision probability

    Near-optimal algorithms for private estimation and sequential testing of collision probability arXiv:2504.13804v1 Announce Type: new Abstract: We present new algorithms for estimating and testing emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(alpha, beta)$-local differential privacy and…

  • Robust and Scalable Variational Bayes

    Robust and Scalable Variational Bayes arXiv:2504.12528v1 Announce Type: new Abstract: We propose a robust and scalable framework for variational Bayes (VB) that effectively handles outliers and contamination of arbitrary nature in large datasets. Our approach divides the dataset into disjoint subsets, computes the posterior for each subset, and applies VB approximation independently to these posteriors.…

  • Resonances in reflective Hamiltonian Monte Carlo

    Resonances in reflective Hamiltonian Monte Carlo arXiv:2504.12374v1 Announce Type: new Abstract: In high dimensions, reflective Hamiltonian Monte Carlo with inexact reflections exhibits slow mixing when the particle ensemble is initialised from a Dirac delta distribution and the uniform distribution is targeted. By quantifying the instantaneous non-uniformity of the distribution with the Sinkhorn divergence, we elucidate…

  • Spectral Algorithms under Covariate Shift

    Spectral Algorithms under Covariate Shift arXiv:2504.12625v1 Announce Type: new Abstract: Spectral algorithms leverage spectral regularization techniques to analyze and process data, providing a flexible framework for addressing supervised learning problems. To deepen our understanding of their performance in real-world scenarios where the distributions of training and test data may differ, we conduct a rigorous investigation…

  • When do Random Forests work?

    When do Random Forests work? arXiv:2504.12860v1 Announce Type: new Abstract: We study the effectiveness of randomizing split-directions in random forests. Prior literature has shown that, on the one hand, randomization can reduce variance through decorrelation, and, on the other hand, randomization regularizes and works in low signal-to-noise ratio (SNR) environments. First, we bring together and…

  • Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time

    Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time arXiv:2504.13110v1 Announce Type: new Abstract: We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential…

  • FEAT: Free energy Estimators with Adaptive Transport

    FEAT: Free energy Estimators with Adaptive Transport arXiv:2504.11516v1 Announce Type: new Abstract: We present Free energy Estimators with Adaptive Transport (FEAT), a novel framework for free energy estimation — a critical challenge across scientific domains. FEAT leverages learned transports implemented via stochastic interpolants and provides consistent, minimum-variance estimators based on escorted Jarzynski equality and controlled…

  • Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations

    Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations arXiv:2504.11554v1 Announce Type: new Abstract: Bayesian inference with computationally expensive likelihood evaluations remains a significant challenge in many scientific domains. We propose normalizing flow regression (NFR), a novel offline inference method for approximating posterior distributions. Unlike traditional surrogate approaches that require additional sampling or inference…

  • Towards Interpretable Deep Generative Models via Causal Representation Learning

    Towards Interpretable Deep Generative Models via Causal Representation Learning arXiv:2504.11609v1 Announce Type: new Abstract: Recent developments in generative artificial intelligence (AI) rely on machine learning techniques such as deep learning and generative modeling to achieve state-of-the-art performance across wide-ranging domains. These methods’ surprising performance is due in part to their ability to learn implicit “representations”…

  • Discrimination-free Insurance Pricing with Privatized Sensitive Attributes

    Discrimination-free Insurance Pricing with Privatized Sensitive Attributes arXiv:2504.11775v1 Announce Type: new Abstract: Fairness has emerged as a critical consideration in the landscape of machine learning algorithms, particularly as AI continues to transform decision-making across societal domains. To ensure that these algorithms are free from bias and do not discriminate against individuals based on sensitive attributes…

  • Generalized probabilistic canonical correlation analysis for multi-modal data integration with full or partial observations

    Generalized probabilistic canonical correlation analysis for multi-modal data integration with full or partial observations arXiv:2504.11610v1 Announce Type: new Abstract: Background: The integration and analysis of multi-modal data are increasingly essential across various domains including bioinformatics. As the volume and complexity of such data grow, there is a pressing need for computational models that not only…

  • AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse

    AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse arXiv:2504.10540v1 Announce Type: new Abstract: Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped similarity pattern between adjacent steps through caching mechanisms, they lack…

  • Differentially Private Geodesic and Linear Regression

    Differentially Private Geodesic and Linear Regression arXiv:2504.11304v1 Announce Type: new Abstract: In statistical applications it has become increasingly common to encounter data structures that live on non-linear spaces such as manifolds. Classical linear regression, one of the most fundamental methodologies of statistical learning, captures the relationship between an independent variable and a response variable which…

  • Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks

    Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks arXiv:2504.10598v1 Announce Type: new Abstract: We revisit online binary classification by shifting the focus from competing with the best-in-class binary loss to competing against relaxed benchmarks that capture smoothed notions of optimality. Instead of measuring regret relative to the exact minimal binary error — a…

  • Formalising Anti-Discrimination Law in Automated Decision Systems

    Formalising Anti-Discrimination Law in Automated Decision Systems arXiv:2407.00400v2 Announce Type: cross Abstract: Algorithmic discrimination is a critical concern as machine learning models are used in high-stakes decision-making in legally protected contexts. Although substantial research on algorithmic bias and discrimination has led to the development of fairness metrics, several critical legal issues remain unaddressed in practice.…

  • Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

    Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling arXiv:2504.10612v1 Announce Type: cross Abstract: Generative models often map noise to data by matching flows or scores, but these approaches become cumbersome for incorporating partial observations or additional priors. Inspired by recent advances in Wasserstein gradient flows, we propose Energy Matching, a framework that…

  • Double Machine Learning for Causal Inference under Shared-State Interference

    Double Machine Learning for Causal Inference under Shared-State Interference arXiv:2504.08836v1 Announce Type: new Abstract: Researchers and practitioners often wish to measure treatment effects in settings where units interact via markets and recommendation systems. In these settings, units are affected by certain shared states, like prices, algorithmic recommendations or social signals. We formalize this structure, calling…

  • An Incremental Non-Linear Manifold Approximation Method

    An Incremental Non-Linear Manifold Approximation Method arXiv:2504.09068v1 Announce Type: new Abstract: Analyzing high-dimensional data presents challenges due to the “curse of dimensionality”, making computations intensive. Dimension reduction techniques, categorized as linear or non-linear, simplify such data. Non-linear methods are particularly essential for efficiently visualizing and processing complex data structures in interactive and graphical applications. This…

  • Improving the evaluation of samplers on multi-modal targets

    Improving the evaluation of samplers on multi-modal targets arXiv:2504.08916v1 Announce Type: new Abstract: Addressing multi-modality constitutes one of the major challenges of sampling. In this reflection paper, we advocate for a more systematic evaluation of samplers towards two sources of difficulty that are mode separation and dimension. For this, we propose a synthetic experimental setting…

  • Dose-finding design based on level set estimation in phase I cancer clinical trials

    Dose-finding design based on level set estimation in phase I cancer clinical trials arXiv:2504.09157v1 Announce Type: new Abstract: The primary objective of phase I cancer clinical trials is to evaluate the safety of a new experimental treatment and to find the maximum tolerated dose (MTD). We show that the MTD estimation problem can be regarded…

  • No-Regret Generative Modeling via Parabolic Monge-Amp`ere PDE

    No-Regret Generative Modeling via Parabolic Monge-Amp`ere PDE arXiv:2504.09279v1 Announce Type: new Abstract: We introduce a novel generative modeling framework based on a discretized parabolic Monge-Amp`ere PDE, which emerges as a continuous limit of the Sinkhorn algorithm commonly used in optimal transport. Our method performs iterative refinement in the space of Brenier maps using a mirror…

  • Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond

    Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond arXiv:2504.07133v1 Announce Type: new Abstract: We revisit the problem of estimating $k$ linear regressors with self-selection bias in $d$ dimensions with the maximum selection criterion, as introduced by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [CDIZ23, STOC’23]. Our main result is a $operatorname{poly}(d,k,1/varepsilon) + {k}^{O(k)}$…

  • Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

    Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents arXiv:2504.07347v1 Announce Type: new Abstract: As demand for Large Language Models (LLMs) and AI agents rapidly grows, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little is explored through a mathematical modeling and queuing perspective. In this paper, we…

  • Performance of Rank-One Tensor Approximation on Incomplete Data

    Performance of Rank-One Tensor Approximation on Incomplete Data arXiv:2504.07818v1 Announce Type: new Abstract: We are interested in the estimation of a rank-one tensor signal when only a portion $varepsilon$ of its noisy observation is available. We show that the study of this problem can be reduced to that of a random matrix model whose spectral…

  • Gradient-based Sample Selection for Faster Bayesian Optimization

    Gradient-based Sample Selection for Faster Bayesian Optimization arXiv:2504.07742v1 Announce Type: new Abstract: Bayesian optimization (BO) is an effective technique for black-box optimization. However, its applicability is typically limited to moderate-budget problems due to the cubic complexity in computing the Gaussian process (GP) surrogate model. In large-budget scenarios, directly employing the standard GP model faces significant…

  • Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

    Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows arXiv:2504.07820v1 Announce Type: new Abstract: Negative distance kernels $K(x,y) := – |x-y|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit…

  • Deep spatio-temporal point processes: Advances and new directions

    Deep spatio-temporal point processes: Advances and new directions arXiv:2504.06364v1 Announce Type: new Abstract: Spatio-temporal point processes (STPPs) model discrete events distributed in time and space, with important applications in areas such as criminology, seismology, epidemiology, and social networks. Traditional models often rely on parametric kernels, limiting their ability to capture heterogeneous, nonstationary dynamics. Recent innovations…

  • Sparsified-Learning for Heavy-Tailed Locally Stationary Processes

    Sparsified-Learning for Heavy-Tailed Locally Stationary Processes arXiv:2504.06477v1 Announce Type: new Abstract: Sparsified Learning is ubiquitous in many machine learning tasks. It aims to regularize the objective function by adding a penalization term that considers the constraints made on the learned parameters. This paper considers the problem of learning heavy-tailed LSP. We develop a flexible and…

  • Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks

    Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks arXiv:2504.06470v1 Announce Type: new Abstract: Ensuring fairness in machine learning is a critical and challenging task, as biased data representations often lead to unfair predictions. To address this, we propose Deep Fair Learning, a framework that integrates nonlinear sufficient dimension reduction with deep…

  • StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization

    StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization arXiv:2504.05804v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into information retrieval systems introduces new attack surfaces, particularly for adversarial ranking manipulations. We present StealthRank, a novel adversarial ranking attack that manipulates LLM-driven product recommendation systems while maintaining textual fluency and stealth. Unlike existing…

  • A Metropolis-Adjusted Langevin Algorithm for Sampling Jeffreys Prior

    A Metropolis-Adjusted Langevin Algorithm for Sampling Jeffreys Prior arXiv:2504.06372v1 Announce Type: cross Abstract: Inference and estimation are fundamental aspects of statistics, system identification and machine learning. For most inference problems, prior knowledge is available on the system to be modeled, and Bayesian analysis is a natural framework to impose such prior information in the form…

  • Hyperflows: Pruning Reveals the Importance of Weights

    Hyperflows: Pruning Reveals the Importance of Weights arXiv:2504.05349v1 Announce Type: new Abstract: Network pruning is used to reduce inference latency and power consumption in large neural networks. However, most existing methods struggle to accurately assess the importance of individual weights due to their inherent interrelatedness, leading to poor performance, especially at extreme sparsity levels. We…

  • Survey on Algorithms for multi-index models

    Survey on Algorithms for multi-index models arXiv:2504.05426v1 Announce Type: new Abstract: We review the literature on algorithms for estimating the index space in a multi-index model. The primary focus is on computationally efficient (polynomial-time) algorithms in Gaussian space, the assumptions under which consistency is guaranteed by these methods, and their sample complexity. In many cases,…

  • Actuarial Learning for Pension Fund Mortality Forecasting

    Actuarial Learning for Pension Fund Mortality Forecasting arXiv:2504.05881v1 Announce Type: new Abstract: For the assessment of the financial soundness of a pension fund, it is necessary to take into account mortality forecasting so that longevity risk is consistently incorporated into future cash flows. In this article, we employ machine learning models applied to actuarial science…

  • Improved Inference of Inverse Ising Problems under Missing Observations in Restricted Boltzmann Machines

    Improved Inference of Inverse Ising Problems under Missing Observations in Restricted Boltzmann Machines arXiv:2504.05643v1 Announce Type: new Abstract: Restricted Boltzmann machines (RBMs) are energy-based models analogous to the Ising model and are widely applied in statistical machine learning. The standard inverse Ising problem with a complete dataset requires computing both data and model expectations and…

  • Matched Topological Subspace Detector

    Matched Topological Subspace Detector arXiv:2504.05892v1 Announce Type: new Abstract: Topological spaces, represented by simplicial complexes, capture richer relationships than graphs by modeling interactions not only between nodes but also among higher-order entities, such as edges or triangles. This motivates the representation of information defined in irregular domains as topological signals. By leveraging the spectral dualities…

  • Batch Bayesian Optimization for High-Dimensional Experimental Design: Simulation and Visualization

    Batch Bayesian Optimization for High-Dimensional Experimental Design: Simulation and Visualization arXiv:2504.03943v1 Announce Type: new Abstract: Bayesian Optimization (BO) is increasingly used to guide experimental optimization tasks. To elucidate BO behavior in noisy and high-dimensional settings typical for materials science applications, we perform batch BO of two six-dimensional test functions: an Ackley function representing a needle-in-a-haystack…

  • Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

    Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning arXiv:2504.03784v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the Bradley-Terry model, which relies…

  • Spatially-Heterogeneous Causal Bayesian Networks for Seismic Multi-Hazard Estimation: A Variational Approach with Gaussian Processes and Normalizing Flows

    Spatially-Heterogeneous Causal Bayesian Networks for Seismic Multi-Hazard Estimation: A Variational Approach with Gaussian Processes and Normalizing Flows arXiv:2504.04013v1 Announce Type: new Abstract: Post-earthquake hazard and impact estimation are critical for effective disaster response, yet current approaches face significant limitations. Traditional models employ fixed parameters regardless of geographical context, misrepresenting how seismic effects vary across diverse…

  • Computational Efficient Informative Nonignorable Matrix Completion: A Row- and Column-Wise Matrix U-Statistic Pseudo-Likelihood Approach

    Computational Efficient Informative Nonignorable Matrix Completion: A Row- and Column-Wise Matrix U-Statistic Pseudo-Likelihood Approach arXiv:2504.04016v1 Announce Type: new Abstract: In this study, we establish a unified framework to deal with the high dimensional matrix completion problem under flexible nonignorable missing mechanisms. Although the matrix completion problem has attracted much attention over the years, there are…

  • Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes

    Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes arXiv:2504.04105v1 Announce Type: new Abstract: We study $textit{gradient descent}$ (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter $eta$. We show that after at most $1/gamma^2$ burn-in steps, GD…

  • ConfEviSurrogate: A Conformalized Evidential Surrogate Model for Uncertainty Quantification

    ConfEviSurrogate: A Conformalized Evidential Surrogate Model for Uncertainty Quantification arXiv:2504.02919v1 Announce Type: new Abstract: Surrogate models, crucial for approximating complex simulation data across sciences, inherently carry uncertainties that range from simulation noise to model prediction errors. Without rigorous uncertainty quantification, predictions become unreliable and hence hinder analysis. While methods like Monte Carlo dropout and ensemble…

  • High-dimensional ridge regression with random features for non-identically distributed data with a variance profile

    High-dimensional ridge regression with random features for non-identically distributed data with a variance profile arXiv:2504.03035v1 Announce Type: new Abstract: The behavior of the random feature model in the high-dimensional regression framework has become a popular issue of interest in the machine learning literature}. This model is generally considered for feature vectors $x_i = Sigma^{1/2} x_i’$,…

  • A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials

    A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials arXiv:2504.03097v1 Announce Type: new Abstract: In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=tfrac{1}{sqrt{1+sigma^2}}(Pi_* X Q_* +…

  • Accelerating Particle-based Energetic Variational Inference

    Accelerating Particle-based Energetic Variational Inference arXiv:2504.03158v1 Announce Type: new Abstract: In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method…

  • Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty

    Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty arXiv:2504.03172v1 Announce Type: new Abstract: Bayesian optimization based on Gaussian process upper confidence bound (GP-UCB) has a theoretical guarantee for optimizing black-box functions. Black-box functions often have input uncertainty, but even in this case, GP-UCB can be extended to optimize evaluation measures called…

  • Analytical Discovery of Manifold with Machine Learning

    Analytical Discovery of Manifold with Machine Learning arXiv:2504.02511v1 Announce Type: new Abstract: Understanding low-dimensional structures within high-dimensional data is crucial for visualization, interpretation, and denoising in complex datasets. Despite the advancements in manifold learning techniques, key challenges-such as limited global insight and the lack of interpretable analytical descriptions-remain unresolved. In this work, we introduce a…

  • Dynamic Assortment Selection and Pricing with Censored Preference Feedback

    Dynamic Assortment Selection and Pricing with Censored Preference Feedback arXiv:2504.02324v1 Announce Type: new Abstract: In this study, we investigate the problem of dynamic multi-product selection and pricing by introducing a novel framework based on a textit{censored multinomial logit} (C-MNL) choice model. In this model, sellers present a set of products with prices, and buyers filter…

  • Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting

    Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting arXiv:2504.02518v1 Announce Type: new Abstract: Probabilistic electricity price forecasting (PEPF) is a key task for market participants in short-term electricity markets. The increasing availability of high-frequency data and the need for real-time decision-making in energy markets require online estimation methods for efficient model updating.…

  • On Model Protection in Federated Learning against Eavesdropping Attacks

    On Model Protection in Federated Learning against Eavesdropping Attacks arXiv:2504.02114v1 Announce Type: cross Abstract: In this study, we investigate the protection offered by federated learning algorithms against eavesdropping adversaries. In our model, the adversary is capable of intercepting model updates transmitted from clients to the server, enabling it to create its own estimate of the…

  • Towards Interpretable Soft Prompts

    Towards Interpretable Soft Prompts arXiv:2504.02144v1 Announce Type: cross Abstract: Soft prompts have been popularized as a cheap and easy way to improve task-specific LLM performance beyond few-shot prompts. Despite their origin as an automated prompting method, however, soft prompts and other trainable prompts remain a black-box method with no immediately interpretable connections to prompting. We…

  • Fair Sufficient Representation Learning

    Fair Sufficient Representation Learning arXiv:2504.01030v1 Announce Type: new Abstract: The main objective of fair statistical modeling and machine learning is to minimize or eliminate biases that may arise from the data or the model itself, ensuring that predictions and decisions are not unjustly influenced by sensitive attributes such as race, gender, age, or other protected…

  • Estimating Unbounded Density Ratios: Applications in Error Control under Covariate Shift

    Estimating Unbounded Density Ratios: Applications in Error Control under Covariate Shift arXiv:2504.01031v1 Announce Type: new Abstract: The density ratio is an important metric for evaluating the relative likelihood of two probability distributions, with extensive applications in statistics and machine learning. However, existing estimation theories for density ratios often depend on stringent regularity conditions, mainly focusing…

  • Density estimation via mixture discrepancy and moments

    Density estimation via mixture discrepancy and moments arXiv:2504.01570v1 Announce Type: new Abstract: With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed [D. Li, K. Yang, W. Wong, Advances in Neural Information Processing Systems (2016) 1099-1107] to learn an adaptive piecewise constant approximation…

  • Denoising guarantees for optimized sampling schemes in compressed sensing

    Denoising guarantees for optimized sampling schemes in compressed sensing arXiv:2504.01046v1 Announce Type: new Abstract: Compressed sensing with subsampled unitary matrices benefits from emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused…

  • Sparse Gaussian Neural Processes

    Sparse Gaussian Neural Processes arXiv:2504.01650v1 Announce Type: new Abstract: Despite significant recent advances in probabilistic meta-learning, it is common for practitioners to avoid using deep learning models due to a comparative lack of interpretability. Instead, many practitioners simply use non-meta-models such as Gaussian processes with interpretable priors, and conduct the tedious procedure of training their…

  • Privacy-Preserving Transfer Learning for Community Detection using Locally Distributed Multiple Networks

    Privacy-Preserving Transfer Learning for Community Detection using Locally Distributed Multiple Networks arXiv:2504.00890v1 Announce Type: new Abstract: This paper develops a new spectral clustering-based method called TransNet for transfer learning in community detection of network data. Our goal is to improve the clustering performance of the target network using auxiliary source networks, which are heterogeneous, privacy-preserved,…

  • Communication-Efficient l_0 Penalized Least Square

    Communication-Efficient l_0 Penalized Least Square arXiv:2504.00722v1 Announce Type: new Abstract: In this paper, we propose a communication-efficient penalized regression algorithm for high-dimensional sparse linear regression models with massive data. This approach incorporates an optimized distributed system communication algorithm, named CESDAR algorithm, based on the Enhanced Support Detection and Root finding algorithm. The CESDAR algorithm leverages…

  • A formula for the area of a triangle: Useless, but explicitly in Deep Sets form

    A formula for the area of a triangle: Useless, but explicitly in Deep Sets form arXiv:2503.22786v1 Announce Type: cross Abstract: Any permutation-invariant function of data points $vec{r}_i$ can be written in the form $rho(sum_iphi(vec{r}_i))$ for suitable functions $rho$ and $phi$. This form – known in the machine-learning literature as Deep Sets – also generates a…

  • Nuclear Microreactor Control with Deep Reinforcement Learning

    Nuclear Microreactor Control with Deep Reinforcement Learning arXiv:2504.00156v1 Announce Type: cross Abstract: The economic feasibility of nuclear microreactors will depend on minimizing operating costs through advancements in autonomous control, especially when these microreactors are operating alongside other types of energy systems (e.g., renewable energy). This study explores the application of deep reinforcement learning (RL) for…

  • Backdoor Detection through Replicated Execution of Outsourced Training

    Backdoor Detection through Replicated Execution of Outsourced Training arXiv:2504.00170v1 Announce Type: cross Abstract: It is common practice to outsource the training of machine learning models to cloud providers. Clients who do so gain from the cloud’s economies of scale, but implicitly assume trust: the server should not deviate from the client’s training procedure. A malicious…

  • DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization

    DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization arXiv:2503.23430v1 Announce Type: new Abstract: Domain generalization (DG) aims to learn models that can generalize well to unseen domains by training only on a set of source domains. Sharpness-Aware Minimization (SAM) has been a popular approach for this, aiming to find flat minima in the total loss landscape.…

  • Accelerated Stein Variational Gradient Flow

    Accelerated Stein Variational Gradient Flow arXiv:2503.23462v1 Announce Type: new Abstract: Stein variational gradient descent (SVGD) is a kernel-based particle method for sampling from a target distribution, e.g., in generative modeling and Bayesian inference. SVGD does not require estimating the gradient of the log-density, which is called score estimation. In practice, SVGD can be slow compared…

  • Scalable Geometric Learning with Correlation-Based Functional Brain Networks

    Scalable Geometric Learning with Correlation-Based Functional Brain Networks arXiv:2503.23653v1 Announce Type: new Abstract: The correlation matrix is a central representation of functional brain networks in neuroimaging. Traditional analyses often treat pairwise interactions independently in a Euclidean setting, overlooking the intrinsic geometry of correlation matrices. While earlier attempts have embraced the quotient geometry of the correlation…

  • Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent

    Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent arXiv:2503.23642v1 Announce Type: new Abstract: We investigate the problem of learning a Single Index Model (SIM)- a popular model for studying the ability of neural networks to learn features – from anisotropic Gaussian inputs by training a neuron using vanilla Stochastic Gradient…

  • Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

    Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions arXiv:2503.23896v1 Announce Type: new Abstract: Deep neural networks learn structured features from complex, non-Gaussian inputs, but the mechanisms behind this process remain poorly understood. Our work is motivated by the observation that the first-layer filters learnt by deep convolutional neural networks…

  • Structured and sparse partial least squares coherence for multivariate cortico-muscular analysis

    Structured and sparse partial least squares coherence for multivariate cortico-muscular analysis arXiv:2503.21802v1 Announce Type: cross Abstract: Multivariate cortico-muscular analysis has recently emerged as a promising approach for evaluating the corticospinal neural pathway. However, current multivariate approaches encounter challenges such as high dimensionality and limited sample sizes, thus restricting their further applications. In this paper, we…

  • Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

    Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment arXiv:2503.21878v1 Announce Type: cross Abstract: Inference-time computation provides an important axis for scaling language model performance, but naively scaling compute through techniques like Best-of-$N$ sampling can cause performance to degrade due to reward hacking. Toward a theoretical understanding of how to best…

  • An Artificial Trend Index for Private Consumption Using Google Trends

    An Artificial Trend Index for Private Consumption Using Google Trends arXiv:2503.21981v1 Announce Type: cross Abstract: In recent years, the use of databases that analyze trends, sentiments or news to make economic projections or create indicators has gained significant popularity, particularly with the Google Trends platform. This article explores the potential of Google search data to…

  • Rolled Gaussian process models for curves on manifolds

    Rolled Gaussian process models for curves on manifolds arXiv:2503.21980v1 Announce Type: cross Abstract: Given a planar curve, imagine rolling a sphere along that curve without slipping or twisting, and by this means tracing out a curve on the sphere. It is well known that such a rolling operation induces a local isometry between the sphere…

  • Improving Equivariant Networks with Probabilistic Symmetry Breaking

    Improving Equivariant Networks with Probabilistic Symmetry Breaking arXiv:2503.21985v1 Announce Type: cross Abstract: Equivariance encodes known symmetries into neural networks, often enhancing generalization. However, equivariant networks cannot break symmetries: the output of an equivariant network must, by definition, have at least the same self-symmetries as the input. This poses an important problem, both (1) for prediction…

  • Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models

    Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models arXiv:2503.20807v1 Announce Type: new Abstract: Fine-tuning Large Language Models (LLMs) on some task-specific datasets has been a primary use of LLMs. However, it has been empirically observed that this approach to enhancing capability inevitably compromises safety, a phenomenon also known as the safety-capability trade-off in LLM fine-tuning.…

  • Squared families: Searching beyond regular probability models

    Squared families: Searching beyond regular probability models arXiv:2503.21128v1 Announce Type: new Abstract: We introduce squared families, which are families of probability densities obtained by squaring a linear transformation of a statistic. Squared families are singular, however their singularity can easily be handled so that they form regular models. After handling the singularity, squared families possess…

  • Debiasing Kernel-Based Generative Models

    Debiasing Kernel-Based Generative Models arXiv:2503.20825v1 Announce Type: new Abstract: We propose a novel two-stage framework of generative models named Debiasing Kernel-Based Generative Models (DKGM) with the insights from kernel density estimation (KDE) and stochastic approximation. In the first stage of DKGM, we employ KDE to bypass the obstacles in estimating the density of data without…

  • DeepRV: pre-trained spatial priors for accelerated disease mapping

    DeepRV: pre-trained spatial priors for accelerated disease mapping arXiv:2503.21473v1 Announce Type: new Abstract: Recently introduced prior-encoding deep generative models (e.g., PriorVAE, $pi$VAE, and PriorCVAE) have emerged as powerful tools for scalable Bayesian inference by emulating complex stochastic processes like Gaussian processes (GPs). However, these methods remain largely a proof-of-concept and inaccessible to practitioners. We propose…

  • Constraint-based causal discovery with tiered background knowledge and latent variables in single or overlapping datasets

    Constraint-based causal discovery with tiered background knowledge and latent variables in single or overlapping datasets arXiv:2503.21526v1 Announce Type: new Abstract: In this paper we consider the use of tiered background knowledge within constraint based causal discovery. Our focus is on settings relaxing causal sufficiency, i.e. allowing for latent variables which may arise because relevant information…

  • A stochastic gradient descent algorithm with random search directions

    A stochastic gradient descent algorithm with random search directions arXiv:2503.19942v1 Announce Type: new Abstract: Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted…

  • On the Robustness of Kernel Ridge Regression Using the Cauchy Loss Function

    On the Robustness of Kernel Ridge Regression Using the Cauchy Loss Function arXiv:2503.20120v1 Announce Type: new Abstract: Robust regression aims to develop methods for estimating an unknown regression function in the presence of outliers, heavy-tailed distributions, or contaminated data, which can severely impact performance. Most existing theoretical results in robust regression assume that the noise…

  • Learning Data-Driven Uncertainty Set Partitions for Robust and Adaptive Energy Forecasting with Missing Data

    Learning Data-Driven Uncertainty Set Partitions for Robust and Adaptive Energy Forecasting with Missing Data arXiv:2503.20410v1 Announce Type: new Abstract: Short-term forecasting models typically assume the availability of input data (features) when they are deployed and in use. However, equipment failures, disruptions, cyberattacks, may lead to missing features when such models are used operationally, which could…

  • An $(epsilon,delta)$-accurate level set estimation with a stopping criterion

    An $(epsilon,delta)$-accurate level set estimation with a stopping criterion arXiv:2503.20272v1 Announce Type: new Abstract: The level set estimation problem seeks to identify regions within a set of candidate points where an unknown and costly to evaluate function’s value exceeds a specified threshold, providing an efficient alternative to exhaustive evaluations of function values. Traditional methods often…

  • Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding

    Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding arXiv:2503.20546v1 Announce Type: new Abstract: We consider the problem of estimating the expected causal effect $E[Y|do(X)]$ for a target variable $Y$ when treatment $X$ is set by intervention, focusing on continuous random variables. In settings without selection bias or confounding, $E[Y|do(X)] =…

  • CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning

    CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning arXiv:2503.18980v1 Announce Type: new Abstract: Exploration remains a critical challenge in reinforcement learning, as many existing methods either lack theoretical guarantees or fall short of practical effectiveness. In this paper, we introduce CAE, a lightweight algorithm that repurposes the value networks in standard deep…

  • Minimum Volume Conformal Sets for Multivariate Regression

    Minimum Volume Conformal Sets for Multivariate Regression arXiv:2503.19068v1 Announce Type: new Abstract: Conformal prediction provides a principled framework for constructing predictive sets with finite-sample validity. While much of the focus has been on univariate response variables, existing multivariate methods either impose rigid geometric assumptions or rely on flexible but computationally expensive approaches that do not…

  • Centroid Decision Forest

    Centroid Decision Forest arXiv:2503.19306v1 Announce Type: new Abstract: This paper introduces the centroid decision forest (CDF), a novel ensemble learning framework that redefines the splitting strategy and tree building in the ordinary decision trees for high-dimensional classification. The splitting approach in CDF differs from the traditional decision trees in theat the class separability score (CSS)…

  • Universal Architectures for the Learning of Polyhedral Norms and Convex Regularization Functionals

    Universal Architectures for the Learning of Polyhedral Norms and Convex Regularization Functionals arXiv:2503.19190v1 Announce Type: new Abstract: This paper addresses the task of learning convex regularizers to guide the reconstruction of images from limited data. By imposing that the reconstruction be amplitude-equivariant, we narrow down the class of admissible functionals to those that can be…

  • Causal Bayesian Optimization with Unknown Graphs

    Causal Bayesian Optimization with Unknown Graphs arXiv:2503.19554v1 Announce Type: new Abstract: Causal Bayesian Optimization (CBO) is a methodology designed to optimize an outcome variable by leveraging known causal relationships through targeted interventions. Traditional CBO methods require a fully and accurately specified causal graph, which is a limitation in many real-world scenarios where such graphs are…

  • A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics

    A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics arXiv:2503.17538v1 Announce Type: new Abstract: Contrastive learning — a modern approach to extract useful representations from unlabeled data by training models to distinguish similar samples from dissimilar ones — has driven significant progress in foundation models. In this work, we develop a new theoretical framework…

  • Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures

    Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures arXiv:2503.17546v1 Announce Type: new Abstract: The behavior of multivariate dynamical processes is often governed by underlying structural connections that relate the components of the system. For example, brain activity which is often measured via time series is determined by an underlying structural graph, where…

  • Poisson-Process Topic Model for Integrating Knowledge from Pre-trained Language Models

    Poisson-Process Topic Model for Integrating Knowledge from Pre-trained Language Models arXiv:2503.17809v1 Announce Type: new Abstract: Topic modeling is traditionally applied to word counts without accounting for the context in which words appear. Recent advancements in large language models (LLMs) offer contextualized word embeddings, which capture deeper meaning and relationships between words. We aim to leverage…

  • Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality

    Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality arXiv:2503.17865v1 Announce Type: new Abstract: The goal of the Inverse reinforcement learning (IRL) task is to identify the underlying reward function and the corresponding optimal policy from a set of expert demonstrations. While most IRL algorithms’ theoretical guarantees rely on a linear reward structure,…

  • Quantile-Based Randomized Kaczmarz for Corrupted Tensor Linear Systems

    Quantile-Based Randomized Kaczmarz for Corrupted Tensor Linear Systems arXiv:2503.18190v1 Announce Type: new Abstract: The reconstruction of tensor-valued signals from corrupted measurements, known as tensor regression, has become essential in many multi-modal applications such as hyperspectral image reconstruction and medical imaging. In this work, we address the tensor linear system problem $mathcal{A} mathcal{X}=mathcal{B}$, where $mathcal{A}$ is…

  • Procrustes Wasserstein Metric: A Modified Benamou-Brenier Approach with Applications to Latent Gaussian Distributions

    Procrustes Wasserstein Metric: A Modified Benamou-Brenier Approach with Applications to Latent Gaussian Distributions arXiv:2503.16580v1 Announce Type: new Abstract: We introduce a modified Benamou-Brenier type approach leading to a Wasserstein type distance that allows global invariance, specifically, isometries, and we show that the problem can be summarized to orthogonal transformations. This distance is defined by penalizing…

  • EarlyStopping: Implicit Regularization for Iterative Learning Procedures in Python

    EarlyStopping: Implicit Regularization for Iterative Learning Procedures in Python arXiv:2503.16753v1 Announce Type: new Abstract: Iterative learning procedures are ubiquitous in machine learning and modern statistics. Regularision is typically required to prevent inflating the expected loss of a procedure in later iterations via the propagation of noise inherent in the data. Significant emphasis has been placed…

  • Optimal Nonlinear Online Learning under Sequential Price Competition via s-Concavity

    Optimal Nonlinear Online Learning under Sequential Price Competition via s-Concavity arXiv:2503.16737v1 Announce Type: new Abstract: We consider price competition among multiple sellers over a selling horizon of $T$ periods. In each period, sellers simultaneously offer their prices and subsequently observe their respective demand that is unobservable to competitors. The demand function for each seller depends…

  • Online Selective Conformal Prediction: Errors and Solutions

    Online Selective Conformal Prediction: Errors and Solutions arXiv:2503.16809v1 Announce Type: new Abstract: In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for…

  • Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates

    Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates arXiv:2503.16941v1 Announce Type: new Abstract: Personalized services are central to today’s digital landscape, where online decision-making is commonly formulated as contextual bandit problems. Two key challenges emerge in modern applications: high-dimensional covariates and the need for nonparametric models to capture complex reward-covariate…