Category: cs.LG
-
rmlnomogram: An R package to construct an explainable nomogram for any machine learning algorithms
rmlnomogram: An R package to construct an explainable nomogram for any machine learning algorithms arXiv:2501.05772v1 Announce Type: cross Abstract: Background: Current nomogram can only be created for regression algorithm. Providing nomogram for any machine learning (ML) algorithms may accelerate model deployment in clinical settings or improve model availability. We developed an R package and web…
-
Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning
Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning arXiv:2501.04870v1 Announce Type: new Abstract: In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when sample sizes are limited. While existing transfer learning methods primarily focus on linear regression settings,…
-
RieszBoost: Gradient Boosting for Riesz Regression
RieszBoost: Gradient Boosting for Riesz Regression arXiv:2501.04871v1 Announce Type: new Abstract: Answering causal questions often involves estimating linear functionals of conditional expectations, such as the average treatment effect or the effect of a longitudinal modified treatment policy. By the Riesz representation theorem, these functionals can be expressed as the expected product of the conditional expectation…
-
Towards understanding the bias in decision trees
Towards understanding the bias in decision trees arXiv:2501.04903v1 Announce Type: new Abstract: There is a widespread and longstanding belief that machine learning models are biased towards the majority (or negative) class when learning from imbalanced data, leading them to neglect or ignore the minority (or positive) class. In this study, we show that this belief…
-
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression arXiv:2501.04898v1 Announce Type: new Abstract: We provide a convergence analysis of deep feature instrumental variable (DFIV) regression (Xu et al., 2021), a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. We prove that the DFIV algorithm…
-
Non-asymptotic analysis of the performance of the penalized least trimmed squares in sparse models
Non-asymptotic analysis of the performance of the penalized least trimmed squares in sparse models arXiv:2501.04946v1 Announce Type: new Abstract: The least trimmed squares (LTS) estimator is a renowned robust alternative to the classic least squares estimator and is popular in location, regression, machine learning, and AI literature. Many studies exist on LTS, including its robustness,…
-
Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity
Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity arXiv:2501.04134v1 Announce Type: new Abstract: We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA…
-
Generation from Noisy Examples
Generation from Noisy Examples arXiv:2501.04179v1 Announce Type: new Abstract: We continue to study the learning-theoretic foundations of generation by extending the results from Kleinberg and Mullainathan [2024] and Li et al. [2024] to account for noisy example streams. In the noiseless setting of Kleinberg and Mullainathan [2024] and Li et al. [2024], an adversary picks…
-
Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks
Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks arXiv:2501.04234v1 Announce Type: new Abstract: Modern artificial intelligence is supported by machine learning models (e.g., foundation models) that are pretrained on a massive data corpus and then adapted to solve a variety of downstream tasks. To summarize performance across multiple tasks, evaluation metrics are…
-
Circuit Complexity Bounds for Visual Autoregressive Model
Circuit Complexity Bounds for Visual Autoregressive Model arXiv:2501.04299v1 Announce Type: new Abstract: Understanding the expressive ability of a specific model is essential for grasping its capacity limitations. Recently, several studies have established circuit complexity bounds for Transformer architecture. Besides, the Visual AutoRegressive (VAR) model has risen to be a prominent method in the field of…
-
On weight and variance uncertainty in neural networks for regression tasks
On weight and variance uncertainty in neural networks for regression tasks arXiv:2501.04272v1 Announce Type: new Abstract: We consider the problem of weight uncertainty proposed by [Blundell et al. (2015). Weight uncertainty in neural network. In International conference on machine learning, 1613-1622, PMLR.] in neural networks {(NNs)} specialized for regression tasks. {We further} investigate the effect…
-
Class-Balance Bias in Regularized Regression
Class-Balance Bias in Regularized Regression arXiv:2501.03821v1 Announce Type: new Abstract: Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice…
-
Coupled Hierarchical Structure Learning using Tree-Wasserstein Distance
Coupled Hierarchical Structure Learning using Tree-Wasserstein Distance arXiv:2501.03627v1 Announce Type: cross Abstract: In many applications, both data samples and features have underlying hierarchical structures. However, existing methods for learning these latent structures typically focus on either samples or features, ignoring possible coupling between them. In this paper, we introduce a coupled hierarchical structure learning method…
-
Deep Networks are Reproducing Kernel Chains
Deep Networks are Reproducing Kernel Chains arXiv:2501.03697v1 Announce Type: cross Abstract: Identifying an appropriate function space for deep neural networks remains a key open question. While shallow neural networks are naturally associated with Reproducing Kernel Banach Spaces (RKBS), deep networks present unique challenges. In this work, we extend RKBS to chain RKBS (cRKBS), a new…
-
Symmetry and Generalisation in Machine Learning
Symmetry and Generalisation in Machine Learning arXiv:2501.03858v1 Announce Type: cross Abstract: This work is about understanding the impact of invariance and equivariance on generalisation in supervised learning. We use the perspective afforded by an averaging operator to show that for any predictor that is not equivariant, there is an equivariant predictor with strictly lower test…
-
Modeling COVID-19 spread in the USA using metapopulation SIR models coupled with graph convolutional neural networks
Modeling COVID-19 spread in the USA using metapopulation SIR models coupled with graph convolutional neural networks arXiv:2501.02043v1 Announce Type: new Abstract: Graph convolutional neural networks (GCNs) have shown tremendous promise in addressing data-intensive challenges in recent years. In particular, some attempts have been made to improve predictions of Susceptible-Infected-Recovered (SIR) models by incorporating human mobility…
-
Majorization-Minimization Dual Stagewise Algorithm for Generalized Lasso
Majorization-Minimization Dual Stagewise Algorithm for Generalized Lasso arXiv:2501.02197v1 Announce Type: new Abstract: The generalized lasso is a natural generalization of the celebrated lasso approach to handle structural regularization problems. Many important methods and applications fall into this framework, including fused lasso, clustered lasso, and constrained lasso. To elevate its effectiveness in large-scale problems, extensive research…
-
Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance
Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance arXiv:2501.02298v1 Announce Type: new Abstract: Score-based Generative Models (SGMs) aim to sample from a target distribution by learning score functions using samples perturbed by Gaussian noise. Existing convergence bounds for SGMs in the $mathcal{W}_2$-distance rely on stringent assumptions about the data…
-
Robust Multi-Dimensional Scaling via Accelerated Alternating Projections
Robust Multi-Dimensional Scaling via Accelerated Alternating Projections arXiv:2501.02208v1 Announce Type: new Abstract: We consider the robust multi-dimensional scaling (RMDS) problem in this paper. The goal is to localize point locations from pairwise distances that may be corrupted by outliers. Inspired by classic MDS theories, and nonconvex works for the robust principal component analysis (RPCA) problem,…
-
Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities
Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities arXiv:2501.02406v1 Announce Type: new Abstract: Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc. This problem is becoming increasingly difficult as text generated by Large Language Models (LLMs)…
-
Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent
Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent arXiv:2501.01696v1 Announce Type: new Abstract: Tensors, which give a faithful and effective representation to deliver the intrinsic structure of multi-dimensional data, play a crucial role in an increasing number of signal processing and machine learning problems. However, tensor data are often accompanied by arbitrary signal corruptions,…
-
Signal Recovery Using a Spiked Mixture Model
Signal Recovery Using a Spiked Mixture Model arXiv:2501.01840v1 Announce Type: new Abstract: We introduce the spiked mixture model (SMM) to address the problem of estimating a set of signals from many randomly scaled and noisy observations. Subsequently, we design a novel expectation-maximization (EM) algorithm to recover all parameters of the SMM. Numerical experiments show that…
-
Unified Native Spaces in Kernel Methods
Unified Native Spaces in Kernel Methods arXiv:2501.01825v1 Announce Type: new Abstract: There exists a plethora of parametric models for positive definite kernels, and their use is ubiquitous in disciplines as diverse as statistics, machine learning, numerical analysis, and approximation theory. Usually, the kernel parameters index certain features of an associated process. Amongst those features, smoothness…
-
Transfer Neyman-Pearson Algorithm for Outlier Detection
Transfer Neyman-Pearson Algorithm for Outlier Detection arXiv:2501.01525v1 Announce Type: cross Abstract: We consider the problem of transfer learning in outlier detection where target abnormal data is rare. While transfer learning has been considered extensively in traditional balanced classification, the problem of transfer in outlier detection and more generally in imbalanced classification settings has received less…
-
Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information
Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information arXiv:2501.01544v1 Announce Type: cross Abstract: Post-alignment of large language models (LLMs) is critical in improving their utility, safety, and alignment with human intentions. Direct preference optimisation (DPO) has become one of the most widely used algorithms for achieving this alignment, given its ability…
-
Post Launch Evaluation of Policies in a High-Dimensional Setting
Post Launch Evaluation of Policies in a High-Dimensional Setting arXiv:2501.00119v1 Announce Type: new Abstract: A/B tests, also known as randomized controlled experiments (RCTs), are the gold standard for evaluating the impact of new policies, products, or decisions. However, these tests can be costly in terms of time and resources, potentially exposing users, customers, or other…
-
Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems
Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems arXiv:2501.00277v1 Announce Type: new Abstract: Modern AI algorithms require labeled data. In real world, majority of data are unlabeled. Labeling the data are costly. this is particularly true for some areas requiring special skills, such as reading radiology images by physicians. To…
-
Different thresholding methods on Nearest Shrunken Centroid algorithm
Different thresholding methods on Nearest Shrunken Centroid algorithm arXiv:2501.00632v1 Announce Type: new Abstract: This article considers the impact of different thresholding methods to the Nearest Shrunken Centroid algorithm, which is popularly referred as the Prediction Analysis of Microarrays (PAM) for high-dimensional classification. PAM uses soft thresholding to achieve high computational efficiency and high classification accuracy…
-
A Distributional Evaluation of Generative Image Models
A Distributional Evaluation of Generative Image Models arXiv:2501.00744v1 Announce Type: new Abstract: Generative models are ubiquitous in modern artificial intelligence (AI) applications. Recent advances have led to a variety of generative modeling approaches that are capable of synthesizing highly realistic samples. Despite these developments, evaluating the distributional match between the synthetic samples and the target…
-
Surrogate Modeling for Explainable Predictive Time Series Corrections
Surrogate Modeling for Explainable Predictive Time Series Corrections arXiv:2412.19897v1 Announce Type: new Abstract: We introduce a local surrogate approach for explainable time-series forecasting. An initially non-interpretable predictive model to improve the forecast of a classical time-series ‘base model’ is used. ‘Explainability’ of the correction is provided by fitting the base model again to the data…
-
Confidence Interval Construction and Conditional Variance Estimation with Dense ReLU Networks
Confidence Interval Construction and Conditional Variance Estimation with Dense ReLU Networks arXiv:2412.20355v1 Announce Type: new Abstract: This paper addresses the problems of conditional variance estimation and confidence interval construction in nonparametric regression using dense networks with the Rectified Linear Unit (ReLU) activation function. We present a residual-based framework for conditional variance estimation, deriving nonasymptotic bounds…
-
Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games
Deep Generalized Schr”odinger Bridges: From Image Generation to Solving Mean-Field Games arXiv:2412.20279v1 Announce Type: new Abstract: Generalized Schr”odinger Bridges (GSBs) are a fundamental mathematical framework used to analyze the most likely particle evolution based on the principle of least action including kinetic and potential energy. In parallel to their well-established presence in the theoretical realms…
-
Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces
Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces arXiv:2412.20556v1 Announce Type: new Abstract: We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using…
-
Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models
Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models arXiv:2412.20586v1 Announce Type: new Abstract: Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models, which are statistical models representing cognitive processes. In this study, we test and improve the robustness of parameter estimation using amortized Bayesian inference (ABI)…
-
Neural Networks Perform Sufficient Dimension Reduction
Neural Networks Perform Sufficient Dimension Reduction arXiv:2412.19033v1 Announce Type: new Abstract: This paper investigates the connection between neural networks and sufficient dimension reduction (SDR), demonstrating that neural networks inherently perform SDR in regression tasks under appropriate rank regularizations. Specifically, the weights in the first layer span the central mean subspace. We establish the statistical consistency…
-
Adaptive Conformal Inference by Betting
Adaptive Conformal Inference by Betting arXiv:2412.19318v1 Announce Type: new Abstract: Conformal prediction is a valuable tool for quantifying predictive uncertainty of machine learning models. However, its applicability relies on the assumption of data exchangeability, a condition which is often not met in real-world scenarios. In this paper, we consider the problem of adaptive conformal inference…
-
Localized exploration in contextual dynamic pricing achieves dimension-free regret
Localized exploration in contextual dynamic pricing achieves dimension-free regret arXiv:2412.19252v1 Announce Type: new Abstract: We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy,…
-
Asymptotically Optimal Search for a Change Point Anomaly under a Composite Hypothesis Model
Asymptotically Optimal Search for a Change Point Anomaly under a Composite Hypothesis Model arXiv:2412.19392v1 Announce Type: new Abstract: We address the problem of searching for a change point in an anomalous process among a finite set of M processes. Specifically, we address a composite hypothesis model in which each process generates measurements following a common…
-
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback arXiv:2412.19436v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) has become a cornerstone for aligning large language models with human preferences. However, the heterogeneity of human feedback, driven by diverse individual contexts and preferences, poses significant challenges for reward learning. To address this, we propose…
-
Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems
Data-Driven Priors in the Maximum Entropy on the Mean Method for Linear Inverse Problems arXiv:2412.17916v1 Announce Type: new Abstract: We establish the theoretical framework for implementing the maximumn entropy on the mean (MEM) method for linear inverse problems in the setting of approximate (data-driven) priors. We prove a.s. convergence for empirical means and further develop…
-
An information theoretic limit to data amplification
An information theoretic limit to data amplification arXiv:2412.18041v1 Announce Type: new Abstract: In recent years generative artificial intelligence has been used to create data to support science analysis. For example, Generative Adversarial Networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem. This has the…
-
Fr’echet regression for multi-label feature selection with implicit regularization
Fr’echet regression for multi-label feature selection with implicit regularization arXiv:2412.18247v1 Announce Type: new Abstract: Fr’echet regression extends linear regression to model complex responses in metric spaces, making it particularly relevant for multi-label regression, where each instance can have multiple associated labels. However, variable selection within this framework remains underexplored. In this paper, we pro pose…
-
Heterogeneous transfer learning for high dimensional regression with feature mismatch
Heterogeneous transfer learning for high dimensional regression with feature mismatch arXiv:2412.18081v1 Announce Type: new Abstract: We consider the problem of transferring knowledge from a source, or proxy, domain to a new target domain for learning a high-dimensional regression model with possibly different features. Recently, the statistical properties of homogeneous transfer learning have been investigated. However,…
-
A Statistical Framework for Ranking LLM-Based Chatbots
A Statistical Framework for Ranking LLM-Based Chatbots arXiv:2412.18407v1 Announce Type: new Abstract: Large language models (LLMs) have transformed natural language processing, with frameworks like Chatbot Arena providing pioneering platforms for evaluating these models. By facilitating millions of pairwise comparisons based on human judgments, Chatbot Arena has become a cornerstone in LLM evaluation, offering rich datasets…
-
Robust random graph matching in dense graphs via vector approximate message passing
Robust random graph matching in dense graphs via vector approximate message passing arXiv:2412.16457v1 Announce Type: new Abstract: In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation…
-
Fast Multi-Group Gaussian Process Factor Models
Fast Multi-Group Gaussian Process Factor Models arXiv:2412.16773v1 Announce Type: new Abstract: Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending Gaussian…
-
Gradient-Based Non-Linear Inverse Learning
Gradient-Based Non-Linear Inverse Learning arXiv:2412.16794v1 Announce Type: new Abstract: We study statistical inverse learning in the context of nonlinear inverse problems under random design. Specifically, we address a class of nonlinear problems by employing gradient descent (GD) and stochastic gradient descent (SGD) with mini-batching, both using constant step sizes. Our analysis derives convergence rates for…
-
Learning from Summarized Data: Gaussian Process Regression with Sample Quasi-Likelihood
Learning from Summarized Data: Gaussian Process Regression with Sample Quasi-Likelihood arXiv:2412.17455v1 Announce Type: new Abstract: Gaussian process regression is a powerful Bayesian nonlinear regression method. Recent research has enabled the capture of many types of observations using non-Gaussian likelihoods. To deal with various tasks in spatial modeling, we benefit from this development. Difficulties still arise…
-
Enhancing Masked Time-Series Modeling via Dropping Patches
Enhancing Masked Time-Series Modeling via Dropping Patches arXiv:2412.15315v1 Announce Type: new Abstract: This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by…
-
Deep learning joint extremes of metocean variables using the SPAR model
Deep learning joint extremes of metocean variables using the SPAR model arXiv:2412.15808v1 Announce Type: new Abstract: This paper presents a novel deep learning framework for estimating multivariate joint extremes of metocean variables, based on the Semi-Parametric Angular-Radial (SPAR) model. When considered in polar coordinates, the problem of modelling multivariate extremes is transformed to one of…
-
Using matrix-product states for time-series machine learning
Using matrix-product states for time-series machine learning arXiv:2412.15826v1 Announce Type: new Abstract: Matrix-product states (MPS) have proven to be a versatile ansatz for modeling quantum many-body physics. For many applications, and particularly in one-dimension, they capture relevant quantum correlations in many-body wavefunctions while remaining tractable to store and manipulate on a classical computer. This has…
-
On Robust Cross Domain Alignment
On Robust Cross Domain Alignment arXiv:2412.15861v1 Announce Type: new Abstract: The Gromov-Wasserstein (GW) distance is an effective measure of alignment between distributions supported on distinct ambient spaces. Calculating essentially the mutual departure from isometry, it has found vast usage in domain translation and network analysis. It has long been shown to be vulnerable to contamination…
-
Learning sparsity-promoting regularizers for linear inverse problems
Learning sparsity-promoting regularizers for linear inverse problems arXiv:2412.16031v1 Announce Type: new Abstract: This paper introduces a novel approach to learning sparsity-promoting regularizers for solving linear inverse problems. We develop a bilevel optimization framework to select an optimal synthesis operator, denoted as $B$, which regularizes the inverse problem while promoting sparsity in the solution. The method…
-
Statistical Undersampling with Mutual Information and Support Points
Statistical Undersampling with Mutual Information and Support Points arXiv:2412.14527v1 Announce Type: new Abstract: Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work introduces two novel undersampling approaches: mutual information-based stratified simple random sampling and…
-
On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models
On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models arXiv:2412.14315v1 Announce Type: new Abstract: In a graph bisection problem, we are given a graph $G$ with two equally-sized unlabeled communities, and the goal is to recover the vertices in these communities. A popular heuristic, known as spectral clustering, is to output an estimated…
-
From Point to probabilistic gradient boosting for claim frequency and severity prediction
From Point to probabilistic gradient boosting for claim frequency and severity prediction arXiv:2412.14916v1 Announce Type: new Abstract: Gradient boosting for decision tree algorithms are increasingly used in actuarial applications as they show superior predictive performance over traditional generalized linear models. Many improvements and sophistications to the first gradient boosting machine algorithm exist. We present in…
-
FedSTaS: Client Stratification and Client Level Sampling for Efficient Federated Learning
FedSTaS: Client Stratification and Client Level Sampling for Efficient Federated Learning arXiv:2412.14226v1 Announce Type: cross Abstract: Federated learning (FL) is a machine learning methodology that involves the collaborative training of a global model across multiple decentralized clients in a privacy-preserving way. Several FL methods are introduced to tackle communication inefficiencies but do not address how…
-
Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes
Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes arXiv:2412.14291v1 Announce Type: cross Abstract: We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the “vanilla” PG method, achieving the…
-
Time-Reversible Bridges of Data with Machine Learning
Time-Reversible Bridges of Data with Machine Learning arXiv:2412.13665v1 Announce Type: new Abstract: The analysis of dynamical systems is a fundamental tool in the natural sciences and engineering. It is used to understand the evolution of systems as large as entire galaxies and as small as individual molecules. With predefined conditions on the evolution of dy-namical…
-
jinns: a JAX Library for Physics-Informed Neural Networks
jinns: a JAX Library for Physics-Informed Neural Networks arXiv:2412.14132v1 Announce Type: new Abstract: jinns is an open-source Python library for physics-informed neural networks, built to tackle both forward and inverse problems, as well as meta-model learning. Rooted in the JAX ecosystem, it provides a versatile framework for efficiently prototyping real-problems, while easily allowing extensions to…
-
Preconditioned Subspace Langevin Monte Carlo
Preconditioned Subspace Langevin Monte Carlo arXiv:2412.13928v1 Announce Type: new Abstract: We develop a new efficient method for high-dimensional sampling called Subspace Langevin Monte Carlo. The primary application of these methods is to efficiently implement Preconditioned Langevin Monte Carlo. To demonstrate the usefulness of this new method, we extend ideas from subspace descent methods in Euclidean…
-
Deep Learning for Hydroelectric Optimization: Generating Long-Term River Discharge Scenarios with Ensemble Forecasts from Global Circulation Models
Deep Learning for Hydroelectric Optimization: Generating Long-Term River Discharge Scenarios with Ensemble Forecasts from Global Circulation Models arXiv:2412.12234v1 Announce Type: cross Abstract: Hydroelectric power generation is a critical component of the global energy matrix, particularly in countries like Brazil, where it represents the majority of the energy supply. However, its strong dependence on river discharges,…
-
How to Choose a Threshold for an Evaluation Metric for Large Language Models
How to Choose a Threshold for an Evaluation Metric for Large Language Models arXiv:2412.12148v1 Announce Type: new Abstract: To ensure and monitor large language models (LLMs) reliably, various evaluation metrics have been proposed in the literature. However, there is little research on prescribing a methodology to identify a robust threshold on these metrics even though…
-
Adversarially robust generalization theory via Jacobian regularization for deep neural networks
Adversarially robust generalization theory via Jacobian regularization for deep neural networks arXiv:2412.12449v1 Announce Type: new Abstract: Powerful deep neural networks are vulnerable to adversarial attacks. To obtain adversarially robust models, researchers have separately developed adversarial training and Jacobian regularization techniques. There are abundant theoretical and empirical studies for adversarial training, but theoretical foundations for Jacobian…
-
BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings
BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings arXiv:2412.12918v1 Announce Type: new Abstract: When it comes to expensive black-box optimization problems, Bayesian Optimization (BO) is a well-known and powerful solution. Many real-world applications involve a large number of dimensions, hence scaling BO to high dimension is of much interest. However, state-of-the-art high-dimensional…
-
Sequential Harmful Shift Detection Without Labels
Sequential Harmful Shift Detection Without Labels arXiv:2412.12910v1 Announce Type: new Abstract: We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments, which requires no access to ground truth data labels. It builds upon the work of Podkopaev and Ramdas [2022], who address scenarios…
-
On Model Extrapolation in Marginal Shapley Values
On Model Extrapolation in Marginal Shapley Values arXiv:2412.13158v1 Announce Type: new Abstract: As the use of complex machine learning models continues to grow, so does the need for reliable explainability methods. One of the most popular methods for model explainability is based on Shapley values. There are two most commonly used approaches to calculating Shapley…
-
Generative Modeling with Diffusion
Generative Modeling with Diffusion arXiv:2412.10948v1 Announce Type: new Abstract: We introduce the diffusion model as a method to generate new samples. Generative models have been recently adopted for tasks such as art generation (Stable Diffusion, Dall-E) and text generation (ChatGPT). Diffusion models in particular apply noise to sample data and then “reverse” this noising process…
-
Representation learning of dynamic networks
Representation learning of dynamic networks arXiv:2412.11065v1 Announce Type: new Abstract: This study presents a novel representation learning model tailored for dynamic networks, which describes the continuously evolving relationships among individuals within a population. The problem is encapsulated in the dimension reduction topic of functional data analysis. With dynamic networks represented as matrix-valued functions, our objective…
-
Deep Learning-based Approaches for State Space Models: A Selective Review
Deep Learning-based Approaches for State Space Models: A Selective Review arXiv:2412.11211v1 Announce Type: new Abstract: State-space models (SSMs) offer a powerful framework for dynamical system analysis, wherein the temporal dynamics of the system are assumed to be captured through the evolution of the latent states, which govern the values of the observations. This paper provides…
-
datadriftR: An R Package for Concept Drift Detection in Predictive Models
datadriftR: An R Package for Concept Drift Detection in Predictive Models arXiv:2412.11308v1 Announce Type: new Abstract: Predictive models often face performance degradation due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is particularly challenging to detect and…
-
Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate
Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate arXiv:2412.11257v1 Announce Type: new Abstract: Despite being an essential tool across engineering and finance, Monte Carlo simulation can be computationally intensive, especially in large-scale, path-dependent problems that hinder straightforward parallelization. A natural alternative is to replace simulation with machine learning or surrogate prediction, though this…
-
Langevin Monte Carlo Beyond Lipschitz Gradient Continuity
Langevin Monte Carlo Beyond Lipschitz Gradient Continuity arXiv:2412.09698v1 Announce Type: new Abstract: We present a significant advancement in the field of Langevin Monte Carlo (LMC) methods by introducing the Inexact Proximal Langevin Algorithm (IPLA). This novel algorithm broadens the scope of problems that LMC can effectively address while maintaining controlled computational costs. IPLA extends LMC’s…
-
Investigating the Impact of Balancing, Filtering, and Complexity on Predictive Multiplicity: A Data-Centric Perspective
Investigating the Impact of Balancing, Filtering, and Complexity on Predictive Multiplicity: A Data-Centric Perspective arXiv:2412.09712v1 Announce Type: new Abstract: The Rashomon effect presents a significant challenge in model selection. It occurs when multiple models achieve similar performance on a dataset but produce different predictions, resulting in predictive multiplicity. This is especially problematic in high-stakes environments,…
-
A Statistical Analysis for Supervised Deep Learning with Exponential Families for Intrinsically Low-dimensional Data
A Statistical Analysis for Supervised Deep Learning with Exponential Families for Intrinsically Low-dimensional Data arXiv:2412.09779v1 Announce Type: new Abstract: Recent advances have revealed that the rate of convergence of the expected test error in deep supervised learning decays as a function of the intrinsic dimension and not the dimension $d$ of the input space. Existing…
-
DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations
DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations arXiv:2412.09687v1 Announce Type: cross Abstract: Quantization of Deep Neural Network (DNN) activations is a commonly used technique to reduce compute and memory demands during DNN inference, which can be particularly beneficial on resource-constrained devices. To achieve high accuracy, existing methods for quantizing activations…
-
Matrix Completion via Residual Spectral Matching
Matrix Completion via Residual Spectral Matching arXiv:2412.10005v1 Announce Type: new Abstract: Noisy matrix completion has attracted significant attention due to its applications in recommendation systems, signal processing and image restoration. Most existing works rely on (weighted) least squares methods under various low-rank constraints. However, minimizing the sum of squared residuals is not always efficient, as…
-
GeoConformal prediction: a model-agnostic framework of measuring the uncertainty of spatial prediction
GeoConformal prediction: a model-agnostic framework of measuring the uncertainty of spatial prediction arXiv:2412.08661v1 Announce Type: new Abstract: Spatial prediction is a fundamental task in geography. In recent years, with advances in geospatial artificial intelligence (GeoAI), numerous models have been developed to improve the accuracy of geographic variable predictions. Beyond achieving higher accuracy, it is equally…
-
On the Precise Asymptotics and Refined Regret of the Variance-Aware UCB Algorithm
On the Precise Asymptotics and Refined Regret of the Variance-Aware UCB Algorithm arXiv:2412.08843v1 Announce Type: new Abstract: In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates into its decision-making process. More…
-
$(epsilon, delta)$-Differentially Private Partial Least Squares Regression
$(epsilon, delta)$-Differentially Private Partial Least Squares Regression arXiv:2412.09164v1 Announce Type: new Abstract: As data-privacy requirements are becoming increasingly stringent and statistical models based on sensitive data are being deployed and used more routinely, protecting data-privacy becomes pivotal. Partial Least Squares (PLS) regression is the premier tool for building such models in analytical chemistry, yet it…
-
Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction
Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction arXiv:2412.08961v1 Announce Type: new Abstract: We introduce a unified, flexible, and easy-to-implement framework of sufficient dimension reduction that can accommodate both linear and nonlinear dimension reduction, and both the conditional distribution and the conditional mean as the targets of estimation. This unified framework…
-
Distribution free uncertainty quantification in neuroscience-inspired deep operators
Distribution free uncertainty quantification in neuroscience-inspired deep operators arXiv:2412.09369v1 Announce Type: new Abstract: Energy-efficient deep learning algorithms are essential for a sustainable future and feasible edge computing setups. Spiking neural networks (SNNs), inspired from neuroscience, are a positive step in the direction of achieving the required energy efficiency. However, in a bid to lower the…
-
Score-Optimal Diffusion Schedules
Score-Optimal Diffusion Schedules arXiv:2412.07877v1 Announce Type: new Abstract: Denoising diffusion models (DDMs) offer a flexible framework for sampling from high dimensional data distributions. DDMs generate a path of probability distributions interpolating between a reference Gaussian distribution and a data distribution by incrementally injecting noise into the data. To numerically simulate the sampling process, a discretisation…
-
Low-Rank Correction for Quantized LLMs
Low-Rank Correction for Quantized LLMs arXiv:2412.07902v1 Announce Type: new Abstract: We consider the problem of model compression for Large Language Models (LLMs) at post-training time, where the task is to compress a well-trained model using only a small set of calibration input data. In this work, we introduce a new low-rank approach to correct for…
-
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints arXiv:2412.08060v1 Announce Type: new Abstract: We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make repeated decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the algorithm has access to predictions of…
-
Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models
Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models arXiv:2412.07972v1 Announce Type: cross Abstract: We analyze the training of a two-layer autoencoder used to parameterize a flow-based generative model for sampling from a high-dimensional Gaussian mixture. Previous work shows that the phase where the relative probability between the modes is learned disappears as the dimension…
-
Generalized Least Squares Kernelized Tensor Factorization
Generalized Least Squares Kernelized Tensor Factorization arXiv:2412.07041v1 Announce Type: new Abstract: Real-world datasets often contain missing or corrupted values. Completing multidimensional tensor-structured data with missing entries is essential for numerous applications. Smoothness-constrained low-rank factorization models have shown superior performance with reduced computational costs. While effective at capturing global and long-range correlations, these models struggle to…
-
Sequential Controlled Langevin Diffusions
Sequential Controlled Langevin Diffusions arXiv:2412.07081v1 Announce Type: new Abstract: An effective approach for sampling from unnormalized densities is based on the idea of gradually transporting samples from an easy prior to the complicated target distribution. Two popular methods are (1) Sequential Monte Carlo (SMC), where the transport is performed through successive annealed densities via prescribed…
-
A Note on Sample Complexity of Interactive Imitation Learning with Log Loss
A Note on Sample Complexity of Interactive Imitation Learning with Log Loss arXiv:2412.07057v1 Announce Type: new Abstract: Imitation learning (IL) is a general paradigm for learning from experts in sequential decision-making problems. Recent advancements in IL have shown that offline imitation learning, specifically Behavior Cloning (BC) with log loss, is minimax optimal. Meanwhile, its interactive…
-
Optimization Can Learn Johnson Lindenstrauss Embeddings
Optimization Can Learn Johnson Lindenstrauss Embeddings arXiv:2412.07242v1 Announce Type: new Abstract: Embeddings play a pivotal role across various disciplines, offering compact representations of complex data structures. Randomized methods like Johnson-Lindenstrauss (JL) provide state-of-the-art and essentially unimprovable theoretical guarantees for achieving such representations. These guarantees are worst-case and in particular, neither the analysis, nor the algorithm,…
-
Modeling High-Resolution Spatio-Temporal Wind with Deep Echo State Networks and Stochastic Partial Differential Equations
Modeling High-Resolution Spatio-Temporal Wind with Deep Echo State Networks and Stochastic Partial Differential Equations arXiv:2412.07265v1 Announce Type: new Abstract: In the past decades, clean and renewable energy has gained increasing attention due to a global effort on carbon footprint reduction. In particular, Saudi Arabia is gradually shifting its energy portfolio from an exclusive use of…
-
Ranking of Large Language Model with Nonparametric Prompts
Ranking of Large Language Model with Nonparametric Prompts arXiv:2412.05506v1 Announce Type: new Abstract: We consider the inference for the ranking of large language models (LLMs). Alignment arises as a big challenge to mitigate hallucinations in the use of LLMs. Ranking LLMs has been shown as a well-performing tool to improve alignment based on the best-of-$N$…
-
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models arXiv:2412.05723v1 Announce Type: new Abstract: Estimating the uncertainty of responses of Large Language Models~(LLMs) remains a critical challenge. While recent Bayesian methods have demonstrated effectiveness in quantifying uncertainty through low-rank weight updates, they typically require complex fine-tuning or post-training procedures. In this paper, we propose Training-Free…
-
Proximal Iteration for Nonlinear Adaptive Lasso
Proximal Iteration for Nonlinear Adaptive Lasso arXiv:2412.05726v1 Announce Type: new Abstract: Augmenting a smooth cost function with an $ell_1$ penalty allows analysts to efficiently conduct estimation and variable selection simultaneously in sophisticated models and can be efficiently implemented using proximal gradient methods. However, one drawback of the $ell_1$ penalty is bias: nonzero parameters are underestimated…
-
Leveraging Black-box Models to Assess Feature Importance in Unconditional Distribution
Leveraging Black-box Models to Assess Feature Importance in Unconditional Distribution arXiv:2412.05759v1 Announce Type: new Abstract: Understanding how changes in explanatory features affect the unconditional distribution of the outcome is important in many applications. However, existing black-box predictive models are not readily suited for analyzing such questions. In this work, we develop an approximation method to…
-
Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application
Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application arXiv:2412.05906v1 Announce Type: new Abstract: We study the discrete-time linear-quadratic (LQ) control model using reinforcement learning (RL). Using entropy to measure the cost of exploration, we prove that the optimal feedback policy for the problem must be Gaussian type. Then, we apply the results…
-
The Polynomial Stein Discrepancy for Assessing Moment Convergence
The Polynomial Stein Discrepancy for Assessing Moment Convergence arXiv:2412.05135v1 Announce Type: new Abstract: We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such…
-
Disentangled Representation Learning for Causal Inference with Instruments
Disentangled Representation Learning for Causal Inference with Instruments arXiv:2412.04641v1 Announce Type: cross Abstract: Latent confounders are a fundamental challenge for inferring causal effects from observational data. The instrumental variable (IV) approach is a practical way to address this challenge. Existing IV based estimators need a known IV or other strong assumptions, such as the existence…