Category: stat.ML

  • Hierarchical clustering with maximum density paths and mixture models

    Hierarchical clustering with maximum density paths and mixture models arXiv:2503.15582v1 Announce Type: new Abstract: Hierarchical clustering is an effective and interpretable technique for analyzing structure in data, offering a nuanced understanding by revealing insights at multiple scales and resolutions. It is particularly helpful in settings where the exact number of clusters is unknown, and provides…

  • Interpretable Neural Causal Models with TRAM-DAGs

    Interpretable Neural Causal Models with TRAM-DAGs arXiv:2503.16206v1 Announce Type: new Abstract: The ultimate goal of most scientific studies is to understand the underlying causal mechanism between the involved variables. Structural causal models (SCMs) are widely used to represent such causal mechanisms. Given an SCM, causal queries on all three levels of Pearl’s causal hierarchy can…

  • Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization

    Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization arXiv:2503.15704v1 Announce Type: new Abstract: The performance of sequential Monte Carlo (SMC) samplers heavily depends on the tuning of the Markov kernels used in the path proposal. For SMC samplers with unadjusted Markov kernels, standard tuning objectives, such as the Metropolis-Hastings acceptance rate or the…

  • Sparse Nonparametric Contextual Bandits

    Sparse Nonparametric Contextual Bandits arXiv:2503.16382v1 Announce Type: new Abstract: This paper studies the problem of simultaneously learning relevant features and minimising regret in contextual bandit problems. We introduce and analyse a new class of contextual bandit problems, called sparse nonparametric contextual bandits, in which the expected reward function lies in the linear span of a…

  • Data-Driven Approximation of Binary-State Network Reliability Function: Algorithm Selection and Reliability Thresholds for Large-Scale Systems

    Data-Driven Approximation of Binary-State Network Reliability Function: Algorithm Selection and Reliability Thresholds for Large-Scale Systems arXiv:2503.15545v1 Announce Type: cross Abstract: Network reliability assessment is pivotal for ensuring the robustness of modern infrastructure systems, from power grids to communication networks. While exact reliability computation for binary-state networks is NP-hard, existing approximation methods face critical tradeoffs between…

  • Variational Autoencoded Multivariate Spatial Fay-Herriot Models

    Variational Autoencoded Multivariate Spatial Fay-Herriot Models arXiv:2503.14710v1 Announce Type: new Abstract: Small area estimation models are essential for estimating population characteristics in regions with limited sample sizes, thereby supporting policy decisions, demographic studies, and resource allocation, among other use cases. The spatial Fay-Herriot model is one such approach that incorporates spatial dependence to improve estimation…

  • The Hardness of Validating Observational Studies with Experimental Data

    The Hardness of Validating Observational Studies with Experimental Data arXiv:2503.14795v1 Announce Type: new Abstract: Observational data is often readily available in large quantities, but can lead to biased causal effect estimates due to the presence of unobserved confounding. Recent works attempt to remove this bias by supplementing observational data with experimental data, which, when available,…

  • Interpretability of Graph Neural Networks to Assert Effects of Global Change Drivers on Ecological Networks

    Interpretability of Graph Neural Networks to Assert Effects of Global Change Drivers on Ecological Networks arXiv:2503.15107v1 Announce Type: new Abstract: Pollinators play a crucial role for plant reproduction, either in natural ecosystem or in human-modified landscape. Global change drivers,including climate change or land use modifications, can alter the plant-pollinator interactions. To assert the potential influence…

  • Nonlinear Bayesian Update via Ensemble Kernel Regression with Clustering and Subsampling

    Nonlinear Bayesian Update via Ensemble Kernel Regression with Clustering and Subsampling arXiv:2503.15160v1 Announce Type: new Abstract: Nonlinear Bayesian update for a prior ensemble is proposed to extend traditional ensemble Kalman filtering to settings characterized by non-Gaussian priors and nonlinear measurement operators. In this framework, the observed component is first denoised via a standard Kalman update,…

  • Online federated learning framework for classification

    Online federated learning framework for classification arXiv:2503.15210v1 Announce Type: new Abstract: In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized distance-weighted discriminant technique, making it robust to both homogeneous and heterogeneous…

  • Positivity sets of hinge functions

    Positivity sets of hinge functions arXiv:2503.13512v1 Announce Type: new Abstract: In this paper we investigate which subsets of the real plane are realisable as the set of points on which a one-layer ReLU neural network takes a positive value. In the case of cones we give a full characterisation of such sets. Furthermore, we give…

  • Micro Text Classification Based on Balanced Positive-Unlabeled Learning

    Micro Text Classification Based on Balanced Positive-Unlabeled Learning arXiv:2503.13562v1 Announce Type: new Abstract: In real-world text classification tasks, negative texts often contain a minimal proportion of negative content, which is especially problematic in areas like text quality control, legal risk screening, and sensitive information interception. This challenge manifests at two levels: at the macro level,…

  • Bayesian Kernel Regression for Functional Data

    Bayesian Kernel Regression for Functional Data arXiv:2503.13676v1 Announce Type: new Abstract: In supervised learning, the output variable to be predicted is often represented as a function, such as a spectrum or probability distribution. Despite its importance, functional output regression remains relatively unexplored. In this study, we propose a novel functional output regression model based on…

  • Optimizing ML Training with Metagradient Descent

    Optimizing ML Training with Metagradient Descent arXiv:2503.13751v1 Announce Type: new Abstract: A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an…

  • ROCK: A variational formulation for occupation kernel methods in Reproducing Kernel Hilbert Spaces

    ROCK: A variational formulation for occupation kernel methods in Reproducing Kernel Hilbert Spaces arXiv:2503.13791v1 Announce Type: new Abstract: We present a Representer Theorem result for a large class of weak formulation problems. We provide examples of applications of our formulation both in traditional machine learning and numerical methods as well as in new and emerging…

  • Ranking and Selection with Simultaneous Input Data Collection

    Ranking and Selection with Simultaneous Input Data Collection arXiv:2503.11773v1 Announce Type: new Abstract: In this paper, we propose a general and novel formulation of ranking and selection with the existence of streaming input data. The collection of multiple streams of such data may consume different types of resources, and hence can be conducted simultaneously. To…

  • Bayes and Biased Estimators Without Hyper-parameter Estimation: Comparable Performance to the Empirical-Bayes-Based Regularized Estimator

    Bayes and Biased Estimators Without Hyper-parameter Estimation: Comparable Performance to the Empirical-Bayes-Based Regularized Estimator arXiv:2503.11854v1 Announce Type: new Abstract: Regularized system identification has become a significant complement to more classical system identification. It has been numerically shown that kernel-based regularized estimators often perform better than the maximum likelihood estimator in terms of minimizing mean squared…

  • Support Collapse of Deep Gaussian Processes with Polynomial Kernels for a Wide Regime of Hyperparameters

    Support Collapse of Deep Gaussian Processes with Polynomial Kernels for a Wide Regime of Hyperparameters arXiv:2503.12266v1 Announce Type: new Abstract: We analyze the prior that a Deep Gaussian Process with polynomial kernels induces. We observe that, even for relatively small depths, averaging effects occur within such a Deep Gaussian Process and that the prior can…

  • SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

    SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement arXiv:2503.12760v1 Announce Type: new Abstract: To design effective digital interventions, experimenters face the challenge of learning decision policies that balance multiple objectives using offline data. Often, they aim to develop policies that maximize goal outcomes, while ensuring there are no undesirable changes in guardrail…

  • Nonlinear Principal Component Analysis with Random Bernoulli Features for Process Monitoring

    Nonlinear Principal Component Analysis with Random Bernoulli Features for Process Monitoring arXiv:2503.12456v1 Announce Type: new Abstract: The process generates substantial amounts of data with highly complex structures, leading to the development of numerous nonlinear statistical methods. However, most of these methods rely on computations involving large-scale dense kernel matrices. This dependence poses significant challenges in…

  • Learn then Decide: A Learning Approach for Designing Data Marketplaces

    Learn then Decide: A Learning Approach for Designing Data Marketplaces arXiv:2503.10773v1 Announce Type: new Abstract: As data marketplaces become increasingly central to the digital economy, it is crucial to design efficient pricing mechanisms that optimize revenue while ensuring fair and adaptive pricing. We introduce the Maximum Auction-to-Posted Price (MAPP) mechanism, a novel two-stage approach that…

  • Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization

    Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization arXiv:2503.10836v1 Announce Type: new Abstract: The contextual bandit framework is widely used to solve sequential optimization problems where the reward of each decision depends on auxiliary context variables. In settings such as medicine, business, and engineering, the decision maker often possesses additional structural information on the…

  • On the Identifiability of Causal Abstractions

    On the Identifiability of Causal Abstractions arXiv:2503.10834v1 Announce Type: new Abstract: Causal representation learning (CRL) enhances machine learning models’ robustness and generalizability by learning structural causal models associated with data-generating processes. We focus on a family of CRL methods that uses contrastive data pairs in the observable space, generated before and after a random, unknown…

  • Mamba time series forecasting with uncertainty propagation

    Mamba time series forecasting with uncertainty propagation arXiv:2503.10873v1 Announce Type: new Abstract: State space models, such as Mamba, have recently garnered attention in time series forecasting due to their ability to capture sequence patterns. However, in electricity consumption benchmarks, Mamba forecasts exhibit a mean error of approximately 8%. Similarly, in traffic occupancy benchmarks, the mean…

  • Clustering Items through Bandit Feedback: Finding the Right Feature out of Many

    Clustering Items through Bandit Feedback: Finding the Right Feature out of Many arXiv:2503.11209v1 Announce Type: new Abstract: We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown…

  • Power Spectrum Signatures of Graphs

    Power Spectrum Signatures of Graphs arXiv:2503.09660v1 Announce Type: new Abstract: Point signatures based on the Laplacian operators on graphs, point clouds, and manifolds have become popular tools in machine learning for graphs, clustering, and shape analysis. In this work, we propose a novel point signature, the power spectrum signature, a measure on $mathbb{R}$ defined as…

  • Explainable Bayesian deep learning through input-skip Latent Binary Bayesian Neural Networks

    Explainable Bayesian deep learning through input-skip Latent Binary Bayesian Neural Networks arXiv:2503.10496v1 Announce Type: new Abstract: Modeling natural phenomena with artificial neural networks (ANNs) often provides highly accurate predictions. However, ANNs often suffer from over-parameterization, complicating interpretation and raising uncertainty issues. Bayesian neural networks (BNNs) address the latter by representing weights as probability distributions, allowing…

  • Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures

    Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures arXiv:2503.10576v1 Announce Type: new Abstract: A common approach to generative modeling is to split model-fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We…

  • Technical Insights and Legal Considerations for Advancing Federated Learning in Bioinformatics

    Technical Insights and Legal Considerations for Advancing Federated Learning in Bioinformatics arXiv:2503.09649v1 Announce Type: cross Abstract: Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads…

  • Bags of Projected Nearest Neighbours: Competitors to Random Forests?

    Bags of Projected Nearest Neighbours: Competitors to Random Forests? arXiv:2503.09651v1 Announce Type: cross Abstract: In this paper we introduce a simple and intuitive adaptive k nearest neighbours classifier, and explore its utility within the context of bootstrap aggregating (“bagging”). The approach is based on finding discriminant subspaces which are computationally efficient to compute, and are…

  • Learning Pareto manifolds in high dimensions: How can regularization help?

    Learning Pareto manifolds in high dimensions: How can regularization help? arXiv:2503.08849v1 Announce Type: new Abstract: Simultaneously addressing multiple objectives is becoming increasingly important in modern machine learning. At the same time, data is often high-dimensional and costly to label. For a single objective such as prediction risk, conventional regularization techniques are known to improve generalization…

  • A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation

    A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation arXiv:2503.08902v1 Announce Type: new Abstract: Mutual Information (MI) is a crucial measure for capturing dependencies between variables, but exact computation is challenging in high dimensions with intractable likelihoods, impacting accuracy and robustness. One idea is to use an auxiliary neural network to train an MI…

  • Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms

    Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms arXiv:2503.08896v1 Announce Type: new Abstract: This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-sensitive models. An important and hitherto unknown observation is that for…

  • Self-Consistent Equation-guided Neural Networks for Censored Time-to-Event Data

    Self-Consistent Equation-guided Neural Networks for Censored Time-to-Event Data arXiv:2503.09097v1 Announce Type: new Abstract: In survival analysis, estimating the conditional survival function given predictors is often of interest. There is a growing trend in the development of deep learning methods for analyzing censored time-to-event data, especially when dealing with high-dimensional predictors that are complexly interrelated. Many…

  • Addressing pitfalls in implicit unobserved confounding synthesis using explicit block hierarchical ancestral sampling

    Addressing pitfalls in implicit unobserved confounding synthesis using explicit block hierarchical ancestral sampling arXiv:2503.09194v1 Announce Type: new Abstract: Unbiased data synthesis is crucial for evaluating causal discovery algorithms in the presence of unobserved confounding, given the scarcity of real-world datasets. A common approach, implicit parameterization, encodes unobserved confounding by modifying the off-diagonal entries of the…

  • Probabilistic Shielding for Safe Reinforcement Learning

    Probabilistic Shielding for Safe Reinforcement Learning arXiv:2503.07671v1 Announce Type: new Abstract: In real-life scenarios, a Reinforcement Learning (RL) agent aiming to maximise their reward, must often also behave in a safe manner, including at training time. Thus, much attention in recent years has been given to Safe RL, where an agent aims to learn an…

  • Personalized Convolutional Dictionary Learning of Physiological Time Series

    Personalized Convolutional Dictionary Learning of Physiological Time Series arXiv:2503.07687v1 Announce Type: new Abstract: Human physiological signals tend to exhibit both global and local structures: the former are shared across a population, while the latter reflect inter-individual variability. For instance, kinetic measurements of the gait cycle during locomotion present common characteristics, although idiosyncrasies may be observed…

  • Uncertainty quantification and posterior sampling for network reconstruction

    Uncertainty quantification and posterior sampling for network reconstruction arXiv:2503.07736v1 Announce Type: new Abstract: Network reconstruction is the task of inferring the unseen interactions between elements of a system, based only on their behavior or dynamics. This inverse problem is in general ill-posed, and admits many solutions for the same observation. Nevertheless, the vast majority of…

  • Cost-Aware Optimal Pairwise Pure Exploration

    Cost-Aware Optimal Pairwise Pure Exploration arXiv:2503.07877v1 Announce Type: new Abstract: Pure exploration is one of the fundamental problems in multi-armed bandits (MAB). However, existing works mostly focus on specific pure exploration tasks, without a holistic view of the general pure exploration problem. This work fills this gap by introducing a versatile framework to study pure…

  • Pure Exploration with Feedback Graphs

    Pure Exploration with Feedback Graphs arXiv:2503.07824v1 Announce Type: new Abstract: We study the sample complexity of pure exploration in an online learning problem with a feedback graph. This graph dictates the feedback available to the learner, covering scenarios between full-information, pure bandit feedback, and settings with no feedback on the chosen action. While variants of…

  • Analyzing the Role of Permutation Invariance in Linear Mode Connectivity

    Analyzing the Role of Permutation Invariance in Linear Mode Connectivity arXiv:2503.06001v1 Announce Type: new Abstract: It was empirically observed in Entezari et al. (2021) that when accounting for the permutation invariance of neural networks, there is likely no loss barrier along the linear interpolation between two SGD solutions — a phenomenon known as linear mode…

  • Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature

    Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature arXiv:2503.06079v1 Announce Type: new Abstract: Despite the significance of probabilistic time-series forecasting models, their evaluation metrics often involve intractable integrations. The most widely used metric, the continuous ranked probability score (CRPS), is a strictly proper scoring function; however, its computation requires approximation. We found…

  • On Statistical Estimation of Edge-Reinforced Random Walks

    On Statistical Estimation of Edge-Reinforced Random Walks arXiv:2503.06115v1 Announce Type: new Abstract: Reinforced random walks (RRWs), including vertex-reinforced random walks (VRRWs) and edge-reinforced random walks (ERRWs), model random walks where the transition probabilities evolve based on prior visitation history~cite{mgr, fmk, tarres, volkov}. These models have found applications in various areas, such as network representation learning~cite{xzzs},…

  • Double Debiased Machine Learning for Mediation Analysis with Continuous Treatments

    Double Debiased Machine Learning for Mediation Analysis with Continuous Treatments arXiv:2503.06156v1 Announce Type: new Abstract: Uncovering causal mediation effects is of significant value to practitioners seeking to isolate the direct treatment effect from the potential mediated effect. We propose a double machine learning (DML) algorithm for mediation analysis that supports continuous treatments. To estimate the…

  • Bayesian Optimization for Robust Identification of Ornstein-Uhlenbeck Model

    Bayesian Optimization for Robust Identification of Ornstein-Uhlenbeck Model arXiv:2503.06381v1 Announce Type: new Abstract: This paper deals with the identification of the stochastic Ornstein-Uhlenbeck (OU) process error model, which is characterized by an inverse time constant, and the unknown variances of the process and observation noises. Although the availability of the explicit expression of the log-likelihood…

  • A Practical Introduction to Kernel Discrepancies: MMD, HSIC & KSD

    A Practical Introduction to Kernel Discrepancies: MMD, HSIC & KSD arXiv:2503.04820v1 Announce Type: new Abstract: This article provides a practical introduction to kernel discrepancies, focusing on the Maximum Mean Discrepancy (MMD), the Hilbert-Schmidt Independence Criterion (HSIC), and the Kernel Stein Discrepancy (KSD). Various estimators for these discrepancies are presented, including the commonly-used V-statistics and U-statistics,…

  • Boltzmann convolutions and Welford mean-variance layers with an application to time series forecasting and classification

    Boltzmann convolutions and Welford mean-variance layers with an application to time series forecasting and classification arXiv:2503.04956v1 Announce Type: new Abstract: In this paper we propose a novel problem called the ForeClassing problem where the loss of a classification decision is only observed at a future time point after the classification decision has to be made.…

  • A characterization of sample adaptivity in UCB data

    A characterization of sample adaptivity in UCB data arXiv:2503.04855v1 Announce Type: new Abstract: We characterize a joint CLT of the number of pulls and the sample mean reward of the arms in a stochastic two-armed bandit environment under UCB algorithms. Several implications of this result are in place: (1) a nonstandard CLT of the number…

  • Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits

    Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits arXiv:2503.05098v1 Announce Type: new Abstract: Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one have prior knowledge of a (relatively) tight upper bound on…

  • Topology-Aware Conformal Prediction for Stream Networks

    Topology-Aware Conformal Prediction for Stream Networks arXiv:2503.04981v1 Announce Type: new Abstract: Stream networks, a unique class of spatiotemporal graphs, exhibit complex directional flow constraints and evolving dependencies, making uncertainty quantification a critical yet challenging task. Traditional conformal prediction methods struggle in this setting due to the need for joint predictions across multiple interdependent locations and…

  • Reheated Gradient-based Discrete Sampling for Combinatorial Optimization

    Reheated Gradient-based Discrete Sampling for Combinatorial Optimization arXiv:2503.04047v1 Announce Type: new Abstract: Recently, gradient-based discrete sampling has emerged as a highly efficient, general-purpose solver for various combinatorial optimization (CO) problems, achieving performance comparable to or surpassing the popular data-driven approaches. However, we identify a critical issue in these methods, which we term ”wandering in contours”.…

  • Conformal Prediction with Upper and Lower Bound Models

    Conformal Prediction with Upper and Lower Bound Models arXiv:2503.04071v1 Announce Type: new Abstract: This paper studies a Conformal Prediction (CP) methodology for building prediction intervals in a regression setting, given only deterministic lower and upper bounds on the target variable. It proposes a new CP mechanism (CPUL) that goes beyond post-processing by adopting a model…

  • Generalization in Federated Learning: A Conditional Mutual Information Framework

    Generalization in Federated Learning: A Conditional Mutual Information Framework arXiv:2503.04091v1 Announce Type: new Abstract: Federated Learning (FL) is a widely adopted privacy-preserving distributed learning framework, yet its generalization performance remains less explored compared to centralized learning. In FL, the generalization error consists of two components: the out-of-sample gap, which measures the gap between the empirical…

  • Learning Causal Response Representations through Direct Effect Analysis

    Learning Causal Response Representations through Direct Effect Analysis arXiv:2503.04358v1 Announce Type: new Abstract: We propose a novel approach for learning causal response representations. Our method aims to extract directions in which a multidimensional outcome is most directly caused by a treatment variable. By bridging conditional independence testing with causal representation learning, we formulate an optimisation…

  • Time-varying Factor Augmented Vector Autoregression with Grouped Sparse Autoencoder

    Time-varying Factor Augmented Vector Autoregression with Grouped Sparse Autoencoder arXiv:2503.04386v1 Announce Type: new Abstract: Recent economic events, including the global financial crisis and COVID-19 pandemic, have exposed limitations in linear Factor Augmented Vector Autoregressive (FAVAR) models for forecasting and structural analysis. Nonlinear dimension techniques, particularly autoencoders, have emerged as promising alternatives in a FAVAR framework,…

  • Applications of Entropy in Data Analysis and Machine Learning: A Review

    Applications of Entropy in Data Analysis and Machine Learning: A Review arXiv:2503.02921v1 Announce Type: new Abstract: Since its origin in the thermodynamics of the 19th century, the concept of entropy has also permeated other fields of physics and mathematics, such as Classical and Quantum Statistical Mechanics, Information Theory, Probability Theory, Ergodic Theory and the Theory…

  • LAPD: Langevin-Assisted Bayesian Active Learning for Physical Discovery

    LAPD: Langevin-Assisted Bayesian Active Learning for Physical Discovery arXiv:2503.02983v1 Announce Type: new Abstract: Discovering physical laws from data is a fundamental challenge in scientific research, particularly when high-quality data are scarce or costly to obtain. Traditional methods for identifying dynamical systems often struggle with noise sensitivity, inefficiency in data usage, and the inability to quantify…

  • PAC Learning with Improvements

    PAC Learning with Improvements arXiv:2503.03184v1 Announce Type: new Abstract: One of the most basic lower bounds in machine learning is that in nearly any nontrivial setting, it takes $textit{at least}$ $1/epsilon$ samples to learn to error $epsilon$ (and more, if the classifier being learned is complex). However, suppose that data points are agents who have…

  • Convergence Rates for Softmax Gating Mixture of Experts

    Convergence Rates for Softmax Gating Mixture of Experts arXiv:2503.03213v1 Announce Type: new Abstract: Mixture of experts (MoE) has recently emerged as an effective framework to advance the efficiency and scalability of machine learning models by softly dividing complex tasks among multiple specialized sub-models termed experts. Central to the success of MoE is an adaptive softmax…

  • Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations

    Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations arXiv:2503.03283v1 Announce Type: new Abstract: Drawing parallels with the way biological networks are studied, we adapt the treatment–control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating…

  • Mathematical Foundation of Interpretable Equivariant Surrogate Models

    Mathematical Foundation of Interpretable Equivariant Surrogate Models arXiv:2503.01942v1 Announce Type: new Abstract: This paper introduces a rigorous mathematical framework for neural network explainability, and more broadly for the explainability of equivariant operators called Group Equivariant Operators (GEOs) based on Group Equivariant Non-Expansive Operators (GENEOs) transformations. The central concept involves quantifying the distance between GEOs by…

  • Gradient-free stochastic optimization for additive models

    Gradient-free stochastic optimization for additive models arXiv:2503.02131v1 Announce Type: new Abstract: We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-{L}ojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H”older family…

  • Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

    Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification arXiv:2503.02110v1 Announce Type: new Abstract: We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. citet{GL} previously established the lack of asymptotic…

  • Online Inference for Quantiles by Constant Learning-Rate Stochastic Gradient Descent

    Online Inference for Quantiles by Constant Learning-Rate Stochastic Gradient Descent arXiv:2503.02178v1 Announce Type: new Abstract: This paper proposes an online inference method of the stochastic gradient descent (SGD) with a constant learning rate for quantile loss functions with theoretical guarantees. Since the quantile loss function is neither smooth nor strongly convex, we view such SGD…

  • Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements

    Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements arXiv:2503.02437v1 Announce Type: new Abstract: This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, LGTC-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form…

  • Approaching the Harm of Gradient Attacks While Only Flipping Labels

    Approaching the Harm of Gradient Attacks While Only Flipping Labels arXiv:2503.00140v1 Announce Type: new Abstract: Availability attacks are one of the strongest forms of training-phase attacks in machine learning, making the model unusable. While prior work in distributed ML has demonstrated such effect via gradient attacks and, more recently, data poisoning, we ask: can similar…

  • An interpretation of the Brownian bridge as a physics-informed prior for the Poisson equation

    An interpretation of the Brownian bridge as a physics-informed prior for the Poisson equation arXiv:2503.00213v1 Announce Type: new Abstract: Physics-informed machine learning is one of the most commonly used methods for fusing physical knowledge in the form of partial differential equations with experimental data. The idea is to construct a loss function where the physical…

  • Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits

    Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits arXiv:2503.00273v1 Announce Type: new Abstract: We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we…

  • LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention

    LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention arXiv:2503.00387v1 Announce Type: new Abstract: Existing contextual multi-armed bandit (MAB) algorithms fail to effectively capture both long-term trends and local patterns across all arms, leading to suboptimal performance in environments with rapidly changing reward structures. They also rely on static exploration rates, which do not dynamically adjust…

  • Generalization Bounds for Equivariant Networks on Markov Data

    Generalization Bounds for Equivariant Networks on Markov Data arXiv:2503.00292v1 Announce Type: new Abstract: Equivariant neural networks play a pivotal role in analyzing datasets with symmetry properties, particularly in complex data structures. However, integrating equivariance with Markov properties presents notable challenges due to the inherent dependencies within such data. Previous research has primarily concentrated on establishing…

  • Transfer Learning through Enhanced Sufficient Representation: Enriching Source Domain Knowledge with Target Data

    Transfer Learning through Enhanced Sufficient Representation: Enriching Source Domain Knowledge with Target Data arXiv:2502.20414v1 Announce Type: new Abstract: Transfer learning is an important approach for addressing the challenges posed by limited data availability in various applications. It accomplishes this by transferring knowledge from well-established source domains to a less familiar target domain. However, traditional transfer…

  • Efficient Risk-sensitive Planning via Entropic Risk Measures

    Efficient Risk-sensitive Planning via Entropic Risk Measures arXiv:2502.20423v1 Announce Type: new Abstract: Risk-sensitive planning aims to identify policies maximizing some tail-focused metrics in Markov Decision Processes (MDPs). Such an optimization task can be very costly for the most widely used and interpretable metrics such as threshold probabilities or (Conditional) Values at Risk. Indeed, previous work…

  • Amortized Conditional Independence Testing

    Amortized Conditional Independence Testing arXiv:2502.20925v1 Announce Type: new Abstract: Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery – a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the…

  • Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability

    Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability arXiv:2502.20531v1 Announce Type: new Abstract: Deep neural networks trained using gradient descent with a fixed learning rate $eta$ often operate in the regime of “edge of stability” (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold $2/eta$. In this work,…

  • Post-Hoc Uncertainty Quantification in Pre-Trained Neural Networks via Activation-Level Gaussian Processes

    Post-Hoc Uncertainty Quantification in Pre-Trained Neural Networks via Activation-Level Gaussian Processes arXiv:2502.20966v1 Announce Type: new Abstract: Uncertainty quantification in neural networks through methods such as Dropout, Bayesian neural networks and Laplace approximations is either prone to underfitting or computationally demanding, rendering these approaches impractical for large-scale datasets. In this work, we address these shortcomings by…

  • Practical Evaluation of Copula-based Survival Metrics: Beyond the Independent Censoring Assumption

    Practical Evaluation of Copula-based Survival Metrics: Beyond the Independent Censoring Assumption arXiv:2502.19460v1 Announce Type: new Abstract: Conventional survival metrics, such as Harrell’s concordance index and the Brier Score, rely on the independent censoring assumption for valid inference in the presence of right-censored data. However, when instances are censored for reasons related to the event of…

  • Advancing calibration for stochastic agent-based models in epidemiology with Stein variational inference and Gaussian process surrogates

    Advancing calibration for stochastic agent-based models in epidemiology with Stein variational inference and Gaussian process surrogates arXiv:2502.19550v1 Announce Type: new Abstract: Accurate calibration of stochastic agent-based models (ABMs) in epidemiology is crucial to make them useful in public health policy decisions and interventions. Traditional calibration methods, e.g., Markov Chain Monte Carlo (MCMC), that yield a…

  • Fast Debiasing of the LASSO Estimator

    Fast Debiasing of the LASSO Estimator arXiv:2502.19825v1 Announce Type: new Abstract: In high-dimensional sparse regression, the textsc{Lasso} estimator offers excellent theoretical guarantees but is well-known to produce biased estimates. To address this, cite{Javanmard2014} introduced a method to “debias” the textsc{Lasso} estimates for a random sub-Gaussian sensing matrix $boldsymbol{A}$. Their approach relies on computing an “approximate…

  • Multiple Linked Tensor Factorization

    Multiple Linked Tensor Factorization arXiv:2502.20286v1 Announce Type: new Abstract: In biomedical research and other fields, it is now common to generate high content data that are both multi-source and multi-way. Multi-source data are collected from different high-throughput technologies while multi-way data are collected over multiple dimensions, yielding multiple tensor arrays. Integrative analysis of these data…

  • Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula

    Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula arXiv:2502.20003v1 Announce Type: new Abstract: The analytic characterization of the high-dimensional behavior of optimization for Generalized Linear Models (GLMs) with Gaussian data has been a central focus in statistics and probability in recent years. While convex cases, such as the LASSO,…

  • Applications of Statistical Field Theory in Deep Learning

    Applications of Statistical Field Theory in Deep Learning arXiv:2502.18553v1 Announce Type: new Abstract: Deep learning algorithms have made incredible strides in the past decade yet due to the complexity of these algorithms, the science of deep learning remains in its early stages. Being an experimentally driven field, it is natural to seek a theory of…

  • Learning and Computation of $Phi$-Equilibria at the Frontier of Tractability

    Learning and Computation of $Phi$-Equilibria at the Frontier of Tractability arXiv:2502.18582v1 Announce Type: new Abstract: $Phi$-equilibria — and the associated notion of $Phi$-regret — are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $Phi$ begets stronger notions of rationality. Recently, Daskalakis, Farina, Fishelson,…

  • Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood

    Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood arXiv:2502.19086v1 Announce Type: new Abstract: We introduce the use of Gaussian Processes (GPs) for the probabilistic forecasting of intermittent time series. The model is trained in a Bayesian framework that accounts for the uncertainty about the latent function and marginalizes it out when making predictions.…

  • Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data

    Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data arXiv:2502.18756v1 Announce Type: new Abstract: Motivation: Biomedical studies increasingly produce multi-view high-dimensional datasets (e.g., multi-omics) that demand integrative analysis. Existing canonical correlation analysis (CCA) and generalized CCA methods address at most two of the following three key aspects simultaneously: (i) nonlinear dependence, (ii) sparsity for…

  • Enhancing Gradient-based Discrete Sampling via Parallel Tempering

    Enhancing Gradient-based Discrete Sampling via Parallel Tempering arXiv:2502.19240v1 Announce Type: new Abstract: While gradient-based discrete samplers are effective in sampling from complex distributions, they are susceptible to getting trapped in local minima, particularly in high-dimensional, multimodal discrete distributions, owing to the discontinuities inherent in these landscapes. To circumvent this issue, we combine parallel tempering, also…

  • Are GNNs doomed by the topology of their input graph?

    Are GNNs doomed by the topology of their input graph? arXiv:2502.17739v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable success in learning from graph-structured data. However, the influence of the input graph’s topology on GNN behavior remains poorly understood. In this work, we explore whether GNNs are inherently limited by the structure…

  • An Overview of Large Language Models for Statisticians

    An Overview of Large Language Models for Statisticians arXiv:2502.17814v1 Announce Type: new Abstract: Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI), exhibiting remarkable capabilities across diverse tasks such as text generation, reasoning, and decision-making. While their success has primarily been driven by advances in computational power and deep learning architectures,…

  • Conformal Prediction Under Generalized Covariate Shift with Posterior Drift

    Conformal Prediction Under Generalized Covariate Shift with Posterior Drift arXiv:2502.17744v1 Announce Type: new Abstract: In many real applications of statistical learning, collecting sufficiently many training data is often expensive, time-consuming, or even unrealistic. In this case, a transfer learning approach, which aims to leverage knowledge from a related source domain to improve the learning performance…

  • Golden Ratio Mixing of Real and Synthetic Data for Stabilizing Generative Model Training

    Golden Ratio Mixing of Real and Synthetic Data for Stabilizing Generative Model Training arXiv:2502.18049v1 Announce Type: new Abstract: Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies…

  • Near-Optimal Approximations for Bayesian Inference in Function Space

    Near-Optimal Approximations for Bayesian Inference in Function Space arXiv:2502.18279v1 Announce Type: new Abstract: We propose a scalable inference algorithm for Bayes posteriors defined on a reproducing kernel Hilbert space (RKHS). Given a likelihood function and a Gaussian random element representing the prior, the corresponding Bayes posterior measure $Pi_{text{B}}$ can be obtained as the stationary distribution…

  • Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements

    Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements arXiv:2502.16008v1 Announce Type: new Abstract: We consider the problem of exact recovery of a $k$-sparse binary vector from generalized linear measurements (such as logistic regression). We analyze the linear estimation algorithm (Plan, Vershynin, Yudovina, 2017), and also show information theoretic lower bounds on the number…

  • A Review of Causal Decision Making

    A Review of Causal Decision Making arXiv:2502.16156v1 Announce Type: new Abstract: To make effective decisions, it is important to have a thorough understanding of the causal relationships among actions, environments, and outcomes. This review aims to surface three crucial aspects of decision-making through a causal lens: 1) the discovery of causal relationships through causal structure…

  • Statistical Inference in Reinforcement Learning: A Selective Survey

    Statistical Inference in Reinforcement Learning: A Selective Survey arXiv:2502.16195v1 Announce Type: new Abstract: Reinforcement learning (RL) is concerned with how intelligence agents take actions in a given environment to maximize the cumulative reward they receive. In healthcare, applying RL algorithms could assist patients in improving their health status. In ride-sharing platforms, applying RL algorithms could…

  • Rectifying Conformity Scores for Better Conditional Coverage

    Rectifying Conformity Scores for Better Conditional Coverage arXiv:2502.16336v1 Announce Type: new Abstract: We present a new method for generating confidence sets within the split conformal prediction framework. Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage. The transformation is based on an estimate of…

  • Subspace Recovery in Winsorized PCA: Insights into Accuracy and Robustness

    Subspace Recovery in Winsorized PCA: Insights into Accuracy and Robustness arXiv:2502.16391v1 Announce Type: new Abstract: In this paper, we explore the theoretical properties of subspace recovery using Winsorized Principal Component Analysis (WPCA), utilizing a common data transformation technique that caps extreme values to mitigate the impact of outliers. Despite the widespread use of winsorization in…

  • Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making

    Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making arXiv:2502.15072v1 Announce Type: new Abstract: Policymakers often use Classification and Regression Trees (CART) to partition populations based on binary outcomes and target subpopulations whose probability of the binary event exceeds a threshold. However, classic CART and knowledge distillation method whose student model…

  • Variational phylogenetic inference with products over bipartitions

    Variational phylogenetic inference with products over bipartitions arXiv:2502.15110v1 Announce Type: new Abstract: Bayesian phylogenetics requires accurate and efficient approximation of posterior distributions over trees. In this work, we develop a variational Bayesian approach for ultrametric phylogenetic trees. We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form…

  • Tensor Product Neural Networks for Functional ANOVA Model

    Tensor Product Neural Networks for Functional ANOVA Model arXiv:2502.15215v1 Announce Type: new Abstract: Interpretability for machine learning models is becoming more and more important as machine learning models become more complex. The functional ANOVA model, which decomposes a high-dimensional function into a sum of lower dimensional functions so called components, is one of the most…

  • Fr’echet Cumulative Covariance Net for Deep Nonlinear Sufficient Dimension Reduction with Random Objects

    Fr’echet Cumulative Covariance Net for Deep Nonlinear Sufficient Dimension Reduction with Random Objects arXiv:2502.15374v1 Announce Type: new Abstract: Nonlinear sufficient dimension reductioncitep{libing_generalSDR}, which constructs nonlinear low-dimensional representations to summarize essential features of high-dimensional data, is an important branch of representation learning. However, most existing methods are not applicable when the response variables are complex non-Euclidean…

  • Jeffrey’s update rule as a minimizer of Kullback-Leibler divergence

    Jeffrey’s update rule as a minimizer of Kullback-Leibler divergence arXiv:2502.15504v1 Announce Type: new Abstract: In this paper, we show a more concise and high level proof than the original one, derived by researcher Bart Jacobs, for the following theorem: in the context of Bayesian update rules for learning or updating internal states that produce predictions,…