Category: stat.ML

  • Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data

    Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data arXiv:2501.13483v1 Announce Type: new Abstract: Neural amortized Bayesian inference (ABI) can solve probabilistic inverse problems orders of magnitude faster than classical methods. However, neural ABI is not yet sufficiently robust for widespread and safe applicability. In particular, when performing inference on observations outside of the…

  • LITE: Efficiently Estimating Gaussian Probability of Maximality

    LITE: Efficiently Estimating Gaussian Probability of Maximality arXiv:2501.13535v1 Announce Type: new Abstract: We consider the problem of computing the probability of maximality (PoM) of a Gaussian random vector, i.e., the probability for each dimension to be maximal. This is a key challenge in applications ranging from Bayesian optimization to reinforcement learning, where the PoM not…

  • Learning under Commission and Omission Event Outliers

    Learning under Commission and Omission Event Outliers arXiv:2501.13599v1 Announce Type: new Abstract: Event stream is an important data format in real life. The events are usually expected to follow some regular patterns over time. However, the patterns could be contaminated by unexpected absences or occurrences of events. In this paper, we adopt the temporal point…

  • Bayesian Model Parameter Learning in Linear Inverse Problems with Application in EEG Focal Source Imaging

    Bayesian Model Parameter Learning in Linear Inverse Problems with Application in EEG Focal Source Imaging arXiv:2501.13109v1 Announce Type: cross Abstract: Inverse problems can be described as limited-data problems in which the signal of interest cannot be observed directly. A physics-based forward model that relates the signal with the observations is typically needed. Unfortunately, unknown model…

  • A dimensionality reduction technique based on the Gromov-Wasserstein distance

    A dimensionality reduction technique based on the Gromov-Wasserstein distance arXiv:2501.13732v1 Announce Type: new Abstract: Analyzing relationships between objects is a pivotal problem within data science. In this context, Dimensionality reduction (DR) techniques are employed to generate smaller and more manageable data representations. This paper proposes a new method for dimensionality reduction, based on optimal transportation…

  • Ultralow-dimensionality reduction for identifying critical transitions by spatial-temporal PCA

    Ultralow-dimensionality reduction for identifying critical transitions by spatial-temporal PCA arXiv:2501.12582v1 Announce Type: new Abstract: Discovering dominant patterns and exploring dynamic behaviors especially critical state transitions and tipping points in high-dimensional time-series data are challenging tasks in study of real-world complex systems, which demand interpretable data representations to facilitate comprehension of both spatial and temporal information…

  • Sequential Change Point Detection via Denoising Score Matching

    Sequential Change Point Detection via Denoising Score Matching arXiv:2501.12667v1 Announce Type: new Abstract: Sequential change-point detection plays a critical role in numerous real-world applications, where timely identification of distributional shifts can greatly mitigate adverse outcomes. Classical methods commonly rely on parametric density assumptions of pre- and post-change distributions, limiting their effectiveness for high-dimensional, complex data…

  • On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration

    On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration arXiv:2501.12785v1 Announce Type: new Abstract: This paper tackles the efficiency and stability issues in learning from observations (LfO). We commence by investigating how reward functions and policies generalize in LfO. Subsequently, the built-in reinforcement learning (RL) approach in generative adversarial imitation from observation (GAIfO)…

  • Singular leaning coefficients and efficiency in learning theory

    Singular leaning coefficients and efficiency in learning theory arXiv:2501.12747v1 Announce Type: new Abstract: Singular learning models with non-positive Fisher information matrices include neural networks, reduced-rank regression, Boltzmann machines, normal mixture models, and others. These models have been widely used in the development of learning machines. However, theoretical analysis is still in its early stages. In…

  • Fixed-Budget Change Point Identification in Piecewise Constant Bandits

    Fixed-Budget Change Point Identification in Piecewise Constant Bandits arXiv:2501.12957v1 Announce Type: new Abstract: We study the piecewise constant bandit problem where the expected reward is a piecewise constant function with one change point (discontinuity) across the action space $[0,1]$ and the learner’s aim is to locate the change point. Under the assumption of a fixed…

  • Extension of Symmetrized Neural Network Operators with Fractional and Mixed Activation Functions

    Extension of Symmetrized Neural Network Operators with Fractional and Mixed Activation Functions arXiv:2501.10496v1 Announce Type: new Abstract: We propose a novel extension to symmetrized neural network operators by incorporating fractional and mixed activation functions. This study addresses the limitations of existing models in approximating higher-order smooth functions, particularly in complex and high-dimensional spaces. Our framework…

  • Simulation of Random LR Fuzzy Intervals

    Simulation of Random LR Fuzzy Intervals arXiv:2501.10482v1 Announce Type: new Abstract: Random fuzzy variables join the modeling of the impreciseness (due to their “fuzzy part”) and randomness. Statistical samples of such objects are widely used, and their direct, numerically effective generation is therefore necessary. Usually, these samples consist of triangular or trapezoidal fuzzy numbers. In…

  • Multi-Output Conformal Regression: A Unified Comparative Study with New Conformity Scores

    Multi-Output Conformal Regression: A Unified Comparative Study with New Conformity Scores arXiv:2501.10533v1 Announce Type: new Abstract: Quantifying uncertainty in multivariate regression is essential in many real-world applications, yet existing methods for constructing prediction regions often face limitations such as the inability to capture complex dependencies, lack of coverage guarantees, or high computational cost. Conformal prediction…

  • DPERC: Direct Parameter Estimation for Mixed Data

    DPERC: Direct Parameter Estimation for Mixed Data arXiv:2501.10540v1 Announce Type: new Abstract: The covariance matrix is a foundation in numerous statistical and machine-learning applications such as Principle Component Analysis, Correlation Heatmap, etc. However, missing values within datasets present a formidable obstacle to accurately estimating this matrix. While imputation methods offer one avenue for addressing this…

  • Model-Robust and Adaptive-Optimal Transfer Learning for Tackling Concept Shifts in Nonparametric Regression

    Model-Robust and Adaptive-Optimal Transfer Learning for Tackling Concept Shifts in Nonparametric Regression arXiv:2501.10870v1 Announce Type: new Abstract: When concept shifts and sample scarcity are present in the target domain of interest, nonparametric regression learners often struggle to generalize effectively. The technique of transfer learning remedies these issues by leveraging data or pre-trained models from similar…

  • SBAMDT: Bayesian Additive Decision Trees with Adaptive Soft Semi-multivariate Split Rules

    SBAMDT: Bayesian Additive Decision Trees with Adaptive Soft Semi-multivariate Split Rules arXiv:2501.09900v1 Announce Type: new Abstract: Bayesian Additive Regression Trees [BART, Chipman et al., 2010] have gained significant popularity due to their remarkable predictive performance and ability to quantify uncertainty. However, standard decision tree models rely on recursive data splits at each decision node, using…

  • Tracking student skills real-time through a continuous-variable dynamic Bayesian network

    Tracking student skills real-time through a continuous-variable dynamic Bayesian network arXiv:2501.10050v1 Announce Type: new Abstract: The field of Knowledge Tracing is focused on predicting the success rate of a student for a given skill. Modern methods like Deep Knowledge Tracing provide accurate estimates given enough data, but being based on neural networks they struggle to…

  • Statistical Inference for Sequential Feature Selection after Domain Adaptation

    Statistical Inference for Sequential Feature Selection after Domain Adaptation arXiv:2501.09933v1 Announce Type: new Abstract: In high-dimensional regression, feature selection methods, such as sequential feature selection (SeqFS), are commonly used to identify relevant features. When data is limited, domain adaptation (DA) becomes crucial for transferring knowledge from a related source domain to a target domain, improving…

  • Contributions to the Decision Theoretic Foundations of Machine Learning and Robust Statistics under Weakly Structured Information

    Contributions to the Decision Theoretic Foundations of Machine Learning and Robust Statistics under Weakly Structured Information arXiv:2501.10195v1 Announce Type: new Abstract: This habilitation thesis is cumulative and, therefore, is collecting and connecting research that I (together with several co-authors) have conducted over the last few years. Thus, the absolute core of the work is formed…

  • Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach

    Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach arXiv:2501.10202v1 Announce Type: new Abstract: This paper introduces a novel method, Sample-efficient Probabilistic Detection using Extreme Value Theory (SPADE), which transforms a classifier into an abstaining classifier, offering provable protection against out-of-distribution and adversarial samples. The approach is based on a…

  • Generative Models with ELBOs Converging to Entropy Sums

    Generative Models with ELBOs Converging to Entropy Sums arXiv:2501.09022v1 Announce Type: new Abstract: The evidence lower bound (ELBO) is one of the most central objectives for probabilistic unsupervised learning. For the ELBOs of several generative models and model classes, we here prove convergence to entropy sums. As one result, we provide a list of generative…

  • Estimating shared subspace with AJIVE: the power and limitation of multiple data matrices

    Estimating shared subspace with AJIVE: the power and limitation of multiple data matrices arXiv:2501.09336v1 Announce Type: new Abstract: Integrative data analysis often requires disentangling joint and individual variations across multiple datasets, a challenge commonly addressed by the Joint and Individual Variation Explained (JIVE) model. While numerous methods have been developed to estimate the shared subspace…

  • On the convergence of noisy Bayesian Optimization with Expected Improvement

    On the convergence of noisy Bayesian Optimization with Expected Improvement arXiv:2501.09262v1 Announce Type: new Abstract: Expected improvement (EI) is one of the most widely-used acquisition functions in Bayesian optimization (BO). Despite its proven success in applications for decades, important open questions remain on the theoretical convergence behaviors and rates for EI. In this paper, we…

  • Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

    Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI arXiv:2501.09731v1 Announce Type: new Abstract: We establish a formal connection between the decades-old surrogate outcome model in biostatistics and economics and the emerging field of prediction-powered inference (PPI). The connection treats predictions from pre-trained models, prevalent in the age of AI, as cost-effective surrogates…

  • Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

    Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks arXiv:2501.09137v1 Announce Type: cross Abstract: We study the gradient descent (GD) dynamics of a depth-2 linear neural network with a single input and output. We show that GD converges at an explicit linear rate to a global minimum of the training…

  • A Constant Velocity Latent Dynamics Approach for Accelerating Simulation of Stiff Nonlinear Systems

    A Constant Velocity Latent Dynamics Approach for Accelerating Simulation of Stiff Nonlinear Systems arXiv:2501.08423v1 Announce Type: new Abstract: Solving stiff ordinary differential equations (StODEs) requires sophisticated numerical solvers, which are often computationally expensive. In particular, StODE’s often cannot be solved with traditional explicit time integration schemes and one must resort to costly implicit methods to…

  • Causal vs. Anticausal merging of predictors

    Causal vs. Anticausal merging of predictors arXiv:2501.08426v1 Announce Type: cross Abstract: We study the differences arising from merging predictors in the causal and anticausal directions using the same data. In particular we study the asymmetries that arise in a simple model where we merge the predictors using one binary variable as target and two continuous…

  • A Theory of Optimistically Universal Online Learnability for General Concept Classes

    A Theory of Optimistically Universal Online Learnability for General Concept Classes arXiv:2501.08551v1 Announce Type: new Abstract: We provide a full characterization of the concept classes that are optimistically universally online learnable with ${0, 1}$ labels. The notion of optimistically universal online learning was defined in [Hanneke, 2021] in order to understand learnability under minimal assumptions.…

  • Quantum Reservoir Computing and Risk Bounds

    Quantum Reservoir Computing and Risk Bounds arXiv:2501.08640v1 Announce Type: cross Abstract: We propose a way to bound the generalisation errors of several classes of quantum reservoirs using the Rademacher complexity. We give specific, parameter-dependent bounds for two particular quantum reservoir classes. We analyse how the generalisation bounds scale with growing numbers of qubits. Applying our…

  • Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity

    Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity arXiv:2501.08679v1 Announce Type: cross Abstract: This paper introduces a diagonal adaptive kernel model that dynamically learns kernel eigenvalues and output coefficients simultaneously during training. Unlike fixed-kernel methods tied to the neural tangent kernel theory, the diagonal adaptive kernel model adapts…

  • Concentration of Measure for Distributions Generated via Diffusion Models

    Concentration of Measure for Distributions Generated via Diffusion Models arXiv:2501.07741v1 Announce Type: new Abstract: We show via a combination of mathematical arguments and empirical evidence that data distributions sampled from diffusion models satisfy a Concentration of Measure Property saying that any Lipschitz $1$-dimensional projection of a random vector is not too far from its mean…

  • On the use of Statistical Learning Theory for model selection in Structural Health Monitoring

    On the use of Statistical Learning Theory for model selection in Structural Health Monitoring arXiv:2501.08050v1 Announce Type: new Abstract: Whenever data-based systems are employed in engineering applications, defining an optimal statistical representation is subject to the problem of model selection. This paper focusses on how well models can generalise in Structural Health Monitoring (SHM). Although…

  • On the Statistical Capacity of Deep Generative Models

    On the Statistical Capacity of Deep Generative Models arXiv:2501.07763v1 Announce Type: new Abstract: Deep generative models are routinely used in generating samples from complex, high-dimensional distributions. Despite their apparent successes, their statistical properties are not well understood. A common assumption is that with enough training data and sufficiently large neural networks, deep generative model samples…

  • Globally Convergent Variational Inference

    Globally Convergent Variational Inference arXiv:2501.08201v1 Announce Type: new Abstract: In variational inference (VI), an approximation of the posterior distribution is selected from a family of distributions through numerical optimization. With the most common variational objective function, known as the evidence lower bound (ELBO), only convergence to a local optimum can be guaranteed. In this work,…

  • Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve

    Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve arXiv:2501.08288v1 Announce Type: new Abstract: Across the scientific realm, we find ourselves subtracting or dividing stochastic signals. For instance, consider a stochastic realization, $x$, generated from the addition or multiplication of two stochastic signals $a$ and $b$, namely $x=a+b$ or $x = ab$. For…

  • Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing

    Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing arXiv:2501.06366v1 Announce Type: new Abstract: When applied in healthcare, reinforcement learning (RL) seeks to dynamically match the right interventions to subjects to maximize population benefit. However, the learned policy may disproportionately allocate efficacious actions to one subpopulation, creating or exacerbating disparities in other socioeconomically-disadvantaged subgroups. These biases…

  • Computational and Statistical Asymptotic Analysis of the JKO Scheme for Iterative Algorithms to update distributions

    Computational and Statistical Asymptotic Analysis of the JKO Scheme for Iterative Algorithms to update distributions arXiv:2501.06408v1 Announce Type: new Abstract: The seminal paper of Jordan, Kinderlehrer, and Otto introduced what is now widely known as the JKO scheme, an iterative algorithmic framework for computing distributions. This scheme can be interpreted as a Wasserstein gradient flow…

  • Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age

    Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age arXiv:2501.06868v1 Announce Type: new Abstract: Many problems within personalized medicine and digital health rely on the analysis of continuous-time functional biomarkers and other complex data structures emerging from high-resolution patient monitoring. In this context, this work proposes new optimization-based variable selection…

  • Dynamic Causal Structure Discovery and Causal Effect Estimation

    Dynamic Causal Structure Discovery and Causal Effect Estimation arXiv:2501.06534v1 Announce Type: new Abstract: To represent the causal relationships between variables, a directed acyclic graph (DAG) is widely utilized in many areas, such as social sciences, epidemics, and genetics. Many causal structure learning approaches are developed to learn the hidden causal structure utilizing deep-learning approaches. However,…

  • Automatic Double Reinforcement Learning in Semiparametric Markov Decision Processes with Applications to Long-Term Causal Inference

    Automatic Double Reinforcement Learning in Semiparametric Markov Decision Processes with Applications to Long-Term Causal Inference arXiv:2501.06926v1 Announce Type: new Abstract: Double reinforcement learning (DRL) enables statistically efficient inference on the value of a policy in a nonparametric Markov Decision Process (MDP) given trajectories generated by another policy. However, this approach necessarily requires stringent overlap between…

  • Covariate Dependent Mixture of Bayesian Networks

    Covariate Dependent Mixture of Bayesian Networks arXiv:2501.05745v1 Announce Type: new Abstract: Learning the structure of Bayesian networks from data provides insights into underlying processes and the causal relationships that generate the data, but its usefulness depends on the homogeneity of the data population, a condition often violated in real-world applications. In such cases, using a…

  • Outlyingness Scores with Cluster Catch Digraphs

    Outlyingness Scores with Cluster Catch Digraphs arXiv:2501.05530v1 Announce Type: new Abstract: This paper introduces two novel, outlyingness scores (OSs) based on Cluster Catch Digraphs (CCDs): Outbound Outlyingness Score (OOS) and Inbound Outlyingness Score (IOS). These scores enhance the interpretability of outlier detection results. Both OSs employ graph-, density-, and distribution-based techniques, tailored to high-dimensional data…

  • Analog Bayesian neural networks are insensitive to the shape of the weight distribution

    Analog Bayesian neural networks are insensitive to the shape of the weight distribution arXiv:2501.05564v1 Announce Type: cross Abstract: Recent work has demonstrated that Bayesian neural networks (BNN’s) trained with mean field variational inference (MFVI) can be implemented in analog hardware, promising orders of magnitude energy savings compared to the standard digital implementations. However, while Gaussians…

  • rmlnomogram: An R package to construct an explainable nomogram for any machine learning algorithms

    rmlnomogram: An R package to construct an explainable nomogram for any machine learning algorithms arXiv:2501.05772v1 Announce Type: cross Abstract: Background: Current nomogram can only be created for regression algorithm. Providing nomogram for any machine learning (ML) algorithms may accelerate model deployment in clinical settings or improve model availability. We developed an R package and web…

  • Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks

    Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks arXiv:2501.05930v1 Announce Type: cross Abstract: We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global optimality of non-convex…

  • Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning

    Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning arXiv:2501.04870v1 Announce Type: new Abstract: In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when sample sizes are limited. While existing transfer learning methods primarily focus on linear regression settings,…

  • RieszBoost: Gradient Boosting for Riesz Regression

    RieszBoost: Gradient Boosting for Riesz Regression arXiv:2501.04871v1 Announce Type: new Abstract: Answering causal questions often involves estimating linear functionals of conditional expectations, such as the average treatment effect or the effect of a longitudinal modified treatment policy. By the Riesz representation theorem, these functionals can be expressed as the expected product of the conditional expectation…

  • Towards understanding the bias in decision trees

    Towards understanding the bias in decision trees arXiv:2501.04903v1 Announce Type: new Abstract: There is a widespread and longstanding belief that machine learning models are biased towards the majority (or negative) class when learning from imbalanced data, leading them to neglect or ignore the minority (or positive) class. In this study, we show that this belief…

  • Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

    Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression arXiv:2501.04898v1 Announce Type: new Abstract: We provide a convergence analysis of deep feature instrumental variable (DFIV) regression (Xu et al., 2021), a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. We prove that the DFIV algorithm…

  • Non-asymptotic analysis of the performance of the penalized least trimmed squares in sparse models

    Non-asymptotic analysis of the performance of the penalized least trimmed squares in sparse models arXiv:2501.04946v1 Announce Type: new Abstract: The least trimmed squares (LTS) estimator is a renowned robust alternative to the classic least squares estimator and is popular in location, regression, machine learning, and AI literature. Many studies exist on LTS, including its robustness,…

  • Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

    Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity arXiv:2501.04134v1 Announce Type: new Abstract: We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA…

  • Generation from Noisy Examples

    Generation from Noisy Examples arXiv:2501.04179v1 Announce Type: new Abstract: We continue to study the learning-theoretic foundations of generation by extending the results from Kleinberg and Mullainathan [2024] and Li et al. [2024] to account for noisy example streams. In the noiseless setting of Kleinberg and Mullainathan [2024] and Li et al. [2024], an adversary picks…

  • Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks

    Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks arXiv:2501.04234v1 Announce Type: new Abstract: Modern artificial intelligence is supported by machine learning models (e.g., foundation models) that are pretrained on a massive data corpus and then adapted to solve a variety of downstream tasks. To summarize performance across multiple tasks, evaluation metrics are…

  • Circuit Complexity Bounds for Visual Autoregressive Model

    Circuit Complexity Bounds for Visual Autoregressive Model arXiv:2501.04299v1 Announce Type: new Abstract: Understanding the expressive ability of a specific model is essential for grasping its capacity limitations. Recently, several studies have established circuit complexity bounds for Transformer architecture. Besides, the Visual AutoRegressive (VAR) model has risen to be a prominent method in the field of…

  • On weight and variance uncertainty in neural networks for regression tasks

    On weight and variance uncertainty in neural networks for regression tasks arXiv:2501.04272v1 Announce Type: new Abstract: We consider the problem of weight uncertainty proposed by [Blundell et al. (2015). Weight uncertainty in neural network. In International conference on machine learning, 1613-1622, PMLR.] in neural networks {(NNs)} specialized for regression tasks. {We further} investigate the effect…

  • Class-Balance Bias in Regularized Regression

    Class-Balance Bias in Regularized Regression arXiv:2501.03821v1 Announce Type: new Abstract: Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice…

  • Structure-Preference Enabled Graph Embedding Generation under Differential Privacy

    Structure-Preference Enabled Graph Embedding Generation under Differential Privacy arXiv:2501.03451v1 Announce Type: new Abstract: Graph embedding generation techniques aim to learn low-dimensional vectors for each node in a graph and have recently gained increasing research attention. Publishing low-dimensional node vectors enables various graph analysis tasks, such as structural equivalence and link prediction. Yet, improper publication opens…

  • Coupled Hierarchical Structure Learning using Tree-Wasserstein Distance

    Coupled Hierarchical Structure Learning using Tree-Wasserstein Distance arXiv:2501.03627v1 Announce Type: cross Abstract: In many applications, both data samples and features have underlying hierarchical structures. However, existing methods for learning these latent structures typically focus on either samples or features, ignoring possible coupling between them. In this paper, we introduce a coupled hierarchical structure learning method…

  • Deep Networks are Reproducing Kernel Chains

    Deep Networks are Reproducing Kernel Chains arXiv:2501.03697v1 Announce Type: cross Abstract: Identifying an appropriate function space for deep neural networks remains a key open question. While shallow neural networks are naturally associated with Reproducing Kernel Banach Spaces (RKBS), deep networks present unique challenges. In this work, we extend RKBS to chain RKBS (cRKBS), a new…

  • Symmetry and Generalisation in Machine Learning

    Symmetry and Generalisation in Machine Learning arXiv:2501.03858v1 Announce Type: cross Abstract: This work is about understanding the impact of invariance and equivariance on generalisation in supervised learning. We use the perspective afforded by an averaging operator to show that for any predictor that is not equivariant, there is an equivariant predictor with strictly lower test…

  • Modeling COVID-19 spread in the USA using metapopulation SIR models coupled with graph convolutional neural networks

    Modeling COVID-19 spread in the USA using metapopulation SIR models coupled with graph convolutional neural networks arXiv:2501.02043v1 Announce Type: new Abstract: Graph convolutional neural networks (GCNs) have shown tremendous promise in addressing data-intensive challenges in recent years. In particular, some attempts have been made to improve predictions of Susceptible-Infected-Recovered (SIR) models by incorporating human mobility…

  • Majorization-Minimization Dual Stagewise Algorithm for Generalized Lasso

    Majorization-Minimization Dual Stagewise Algorithm for Generalized Lasso arXiv:2501.02197v1 Announce Type: new Abstract: The generalized lasso is a natural generalization of the celebrated lasso approach to handle structural regularization problems. Many important methods and applications fall into this framework, including fused lasso, clustered lasso, and constrained lasso. To elevate its effectiveness in large-scale problems, extensive research…

  • Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance

    Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance arXiv:2501.02298v1 Announce Type: new Abstract: Score-based Generative Models (SGMs) aim to sample from a target distribution by learning score functions using samples perturbed by Gaussian noise. Existing convergence bounds for SGMs in the $mathcal{W}_2$-distance rely on stringent assumptions about the data…

  • Robust Multi-Dimensional Scaling via Accelerated Alternating Projections

    Robust Multi-Dimensional Scaling via Accelerated Alternating Projections arXiv:2501.02208v1 Announce Type: new Abstract: We consider the robust multi-dimensional scaling (RMDS) problem in this paper. The goal is to localize point locations from pairwise distances that may be corrupted by outliers. Inspired by classic MDS theories, and nonconvex works for the robust principal component analysis (RPCA) problem,…

  • Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities

    Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities arXiv:2501.02406v1 Announce Type: new Abstract: Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc. This problem is becoming increasingly difficult as text generated by Large Language Models (LLMs)…

  • Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent

    Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent arXiv:2501.01696v1 Announce Type: new Abstract: Tensors, which give a faithful and effective representation to deliver the intrinsic structure of multi-dimensional data, play a crucial role in an increasing number of signal processing and machine learning problems. However, tensor data are often accompanied by arbitrary signal corruptions,…

  • Signal Recovery Using a Spiked Mixture Model

    Signal Recovery Using a Spiked Mixture Model arXiv:2501.01840v1 Announce Type: new Abstract: We introduce the spiked mixture model (SMM) to address the problem of estimating a set of signals from many randomly scaled and noisy observations. Subsequently, we design a novel expectation-maximization (EM) algorithm to recover all parameters of the SMM. Numerical experiments show that…

  • Unified Native Spaces in Kernel Methods

    Unified Native Spaces in Kernel Methods arXiv:2501.01825v1 Announce Type: new Abstract: There exists a plethora of parametric models for positive definite kernels, and their use is ubiquitous in disciplines as diverse as statistics, machine learning, numerical analysis, and approximation theory. Usually, the kernel parameters index certain features of an associated process. Amongst those features, smoothness…

  • Transfer Neyman-Pearson Algorithm for Outlier Detection

    Transfer Neyman-Pearson Algorithm for Outlier Detection arXiv:2501.01525v1 Announce Type: cross Abstract: We consider the problem of transfer learning in outlier detection where target abnormal data is rare. While transfer learning has been considered extensively in traditional balanced classification, the problem of transfer in outlier detection and more generally in imbalanced classification settings has received less…

  • Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information

    Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information arXiv:2501.01544v1 Announce Type: cross Abstract: Post-alignment of large language models (LLMs) is critical in improving their utility, safety, and alignment with human intentions. Direct preference optimisation (DPO) has become one of the most widely used algorithms for achieving this alignment, given its ability…

  • Post Launch Evaluation of Policies in a High-Dimensional Setting

    Post Launch Evaluation of Policies in a High-Dimensional Setting arXiv:2501.00119v1 Announce Type: new Abstract: A/B tests, also known as randomized controlled experiments (RCTs), are the gold standard for evaluating the impact of new policies, products, or decisions. However, these tests can be costly in terms of time and resources, potentially exposing users, customers, or other…

  • Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems

    Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems arXiv:2501.00277v1 Announce Type: new Abstract: Modern AI algorithms require labeled data. In real world, majority of data are unlabeled. Labeling the data are costly. this is particularly true for some areas requiring special skills, such as reading radiology images by physicians. To…

  • Different thresholding methods on Nearest Shrunken Centroid algorithm

    Different thresholding methods on Nearest Shrunken Centroid algorithm arXiv:2501.00632v1 Announce Type: new Abstract: This article considers the impact of different thresholding methods to the Nearest Shrunken Centroid algorithm, which is popularly referred as the Prediction Analysis of Microarrays (PAM) for high-dimensional classification. PAM uses soft thresholding to achieve high computational efficiency and high classification accuracy…

  • A Distributional Evaluation of Generative Image Models

    A Distributional Evaluation of Generative Image Models arXiv:2501.00744v1 Announce Type: new Abstract: Generative models are ubiquitous in modern artificial intelligence (AI) applications. Recent advances have led to a variety of generative modeling approaches that are capable of synthesizing highly realistic samples. Despite these developments, evaluating the distributional match between the synthetic samples and the target…

  • Ensuring superior learning outcomes and data security for authorized learner

    Ensuring superior learning outcomes and data security for authorized learner arXiv:2501.00754v1 Announce Type: new Abstract: The learner’s ability to generate a hypothesis that closely approximates the target function is crucial in machine learning. Achieving this requires sufficient data; however, unauthorized access by an eavesdropping learner can lead to security risks. Thus, it is important to…

  • Surrogate Modeling for Explainable Predictive Time Series Corrections

    Surrogate Modeling for Explainable Predictive Time Series Corrections arXiv:2412.19897v1 Announce Type: new Abstract: We introduce a local surrogate approach for explainable time-series forecasting. An initially non-interpretable predictive model to improve the forecast of a classical time-series ‘base model’ is used. ‘Explainability’ of the correction is provided by fitting the base model again to the data…

  • Confidence Interval Construction and Conditional Variance Estimation with Dense ReLU Networks

    Confidence Interval Construction and Conditional Variance Estimation with Dense ReLU Networks arXiv:2412.20355v1 Announce Type: new Abstract: This paper addresses the problems of conditional variance estimation and confidence interval construction in nonparametric regression using dense networks with the Rectified Linear Unit (ReLU) activation function. We present a residual-based framework for conditional variance estimation, deriving nonasymptotic bounds…

  • Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games

    Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games arXiv:2412.20279v1 Announce Type: new Abstract: Generalized Schr”odinger Bridges (GSBs) are a fundamental mathematical framework used to analyze the most likely particle evolution based on the principle of least action including kinetic and potential energy. In parallel to their well-established presence in the theoretical realms…

  • Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces

    Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces arXiv:2412.20556v1 Announce Type: new Abstract: We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using…

  • Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models

    Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models arXiv:2412.20586v1 Announce Type: new Abstract: Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models, which are statistical models representing cognitive processes. In this study, we test and improve the robustness of parameter estimation using amortized Bayesian inference (ABI)…

  • Neural Networks Perform Sufficient Dimension Reduction

    Neural Networks Perform Sufficient Dimension Reduction arXiv:2412.19033v1 Announce Type: new Abstract: This paper investigates the connection between neural networks and sufficient dimension reduction (SDR), demonstrating that neural networks inherently perform SDR in regression tasks under appropriate rank regularizations. Specifically, the weights in the first layer span the central mean subspace. We establish the statistical consistency…

  • Adaptive Conformal Inference by Betting

    Adaptive Conformal Inference by Betting arXiv:2412.19318v1 Announce Type: new Abstract: Conformal prediction is a valuable tool for quantifying predictive uncertainty of machine learning models. However, its applicability relies on the assumption of data exchangeability, a condition which is often not met in real-world scenarios. In this paper, we consider the problem of adaptive conformal inference…

  • Localized exploration in contextual dynamic pricing achieves dimension-free regret

    Localized exploration in contextual dynamic pricing achieves dimension-free regret arXiv:2412.19252v1 Announce Type: new Abstract: We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy,…

  • Asymptotically Optimal Search for a Change Point Anomaly under a Composite Hypothesis Model

    Asymptotically Optimal Search for a Change Point Anomaly under a Composite Hypothesis Model arXiv:2412.19392v1 Announce Type: new Abstract: We address the problem of searching for a change point in an anomalous process among a finite set of M processes. Specifically, we address a composite hypothesis model in which each process generates measurements following a common…

  • Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

    Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback arXiv:2412.19436v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) has become a cornerstone for aligning large language models with human preferences. However, the heterogeneity of human feedback, driven by diverse individual contexts and preferences, poses significant challenges for reward learning. To address this, we propose…

  • Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems

    Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems arXiv:2412.17916v1 Announce Type: new Abstract: We establish the theoretical framework for implementing the maximumn entropy on the mean (MEM) method for linear inverse problems in the setting of approximate (data-driven) priors. We prove a.s. convergence for empirical means and further develop…

  • An information theoretic limit to data amplification

    An information theoretic limit to data amplification arXiv:2412.18041v1 Announce Type: new Abstract: In recent years generative artificial intelligence has been used to create data to support science analysis. For example, Generative Adversarial Networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem. This has the…

  • Fr’echet regression for multi-label feature selection with implicit regularization

    Fr’echet regression for multi-label feature selection with implicit regularization arXiv:2412.18247v1 Announce Type: new Abstract: Fr’echet regression extends linear regression to model complex responses in metric spaces, making it particularly relevant for multi-label regression, where each instance can have multiple associated labels. However, variable selection within this framework remains underexplored. In this paper, we pro pose…

  • Heterogeneous transfer learning for high dimensional regression with feature mismatch

    Heterogeneous transfer learning for high dimensional regression with feature mismatch arXiv:2412.18081v1 Announce Type: new Abstract: We consider the problem of transferring knowledge from a source, or proxy, domain to a new target domain for learning a high-dimensional regression model with possibly different features. Recently, the statistical properties of homogeneous transfer learning have been investigated. However,…

  • A Statistical Framework for Ranking LLM-Based Chatbots

    A Statistical Framework for Ranking LLM-Based Chatbots arXiv:2412.18407v1 Announce Type: new Abstract: Large language models (LLMs) have transformed natural language processing, with frameworks like Chatbot Arena providing pioneering platforms for evaluating these models. By facilitating millions of pairwise comparisons based on human judgments, Chatbot Arena has become a cornerstone in LLM evaluation, offering rich datasets…

  • Robust random graph matching in dense graphs via vector approximate message passing

    Robust random graph matching in dense graphs via vector approximate message passing arXiv:2412.16457v1 Announce Type: new Abstract: In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation…

  • Fast Multi-Group Gaussian Process Factor Models

    Fast Multi-Group Gaussian Process Factor Models arXiv:2412.16773v1 Announce Type: new Abstract: Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending Gaussian…

  • Integrating Random Effects in Variational Autoencoders for Dimensionality Reduction of Correlated Data

    Integrating Random Effects in Variational Autoencoders for Dimensionality Reduction of Correlated Data arXiv:2412.16899v1 Announce Type: new Abstract: Variational Autoencoders (VAE) are widely used for dimensionality reduction of large-scale tabular and image datasets, under the assumption of independence between data observations. In practice, however, datasets are often correlated, with typical sources of correlation including spatial, temporal…

  • Gradient-Based Non-Linear Inverse Learning

    Gradient-Based Non-Linear Inverse Learning arXiv:2412.16794v1 Announce Type: new Abstract: We study statistical inverse learning in the context of nonlinear inverse problems under random design. Specifically, we address a class of nonlinear problems by employing gradient descent (GD) and stochastic gradient descent (SGD) with mini-batching, both using constant step sizes. Our analysis derives convergence rates for…

  • Learning from Summarized Data: Gaussian Process Regression with Sample Quasi-Likelihood

    Learning from Summarized Data: Gaussian Process Regression with Sample Quasi-Likelihood arXiv:2412.17455v1 Announce Type: new Abstract: Gaussian process regression is a powerful Bayesian nonlinear regression method. Recent research has enabled the capture of many types of observations using non-Gaussian likelihoods. To deal with various tasks in spatial modeling, we benefit from this development. Difficulties still arise…

  • Enhancing Masked Time-Series Modeling via Dropping Patches

    Enhancing Masked Time-Series Modeling via Dropping Patches arXiv:2412.15315v1 Announce Type: new Abstract: This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by…

  • Deep learning joint extremes of metocean variables using the SPAR model

    Deep learning joint extremes of metocean variables using the SPAR model arXiv:2412.15808v1 Announce Type: new Abstract: This paper presents a novel deep learning framework for estimating multivariate joint extremes of metocean variables, based on the Semi-Parametric Angular-Radial (SPAR) model. When considered in polar coordinates, the problem of modelling multivariate extremes is transformed to one of…

  • Using matrix-product states for time-series machine learning

    Using matrix-product states for time-series machine learning arXiv:2412.15826v1 Announce Type: new Abstract: Matrix-product states (MPS) have proven to be a versatile ansatz for modeling quantum many-body physics. For many applications, and particularly in one-dimension, they capture relevant quantum correlations in many-body wavefunctions while remaining tractable to store and manipulate on a classical computer. This has…

  • On Robust Cross Domain Alignment

    On Robust Cross Domain Alignment arXiv:2412.15861v1 Announce Type: new Abstract: The Gromov-Wasserstein (GW) distance is an effective measure of alignment between distributions supported on distinct ambient spaces. Calculating essentially the mutual departure from isometry, it has found vast usage in domain translation and network analysis. It has long been shown to be vulnerable to contamination…

  • Learning sparsity-promoting regularizers for linear inverse problems

    Learning sparsity-promoting regularizers for linear inverse problems arXiv:2412.16031v1 Announce Type: new Abstract: This paper introduces a novel approach to learning sparsity-promoting regularizers for solving linear inverse problems. We develop a bilevel optimization framework to select an optimal synthesis operator, denoted as $B$, which regularizes the inverse problem while promoting sparsity in the solution. The method…