Tag: high

Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval

Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval arXiv:2602.17779v1 Announce Type: new Abstract: We consider the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models). The objective is to recover an unknown signal $boldsymbol{theta}^star in mathbb{R}^d$ (where $d gg 1$) from a loss function $hat{R}(boldsymbol{theta})$…

February 23, 2026
High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory

High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory arXiv:2602.06320v1 Announce Type: new Abstract: Modern machine learning models are typically trained via multi-pass stochastic gradient descent (SGD) with small batch sizes, and understanding their dynamics in high dimensions is of great interest. However, an analytical framework for describing the high-dimensional asymptotic behavior of multi-pass…

February 9, 2026
High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations

High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations arXiv:2512.15684v1 Announce Type: new Abstract: Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. Despite decades of practical success, a precise theoretical understanding of its behavior in high-dimensional regimes remains limited. In this…

December 18, 2025
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes

High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes arXiv:2511.03952v1 Announce Type: new Abstract: We develop a high-dimensional scaling limit for Stochastic Gradient Descent with Polyak Momentum (SGD-M) and adaptive step-sizes. This provides a framework to rigourously compare online SGD with some of its popular variants. We show that the scaling limits of SGD-M coincide…

November 7, 2025
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks arXiv:2511.02258v1 Announce Type: new Abstract: This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD) for single-layer networks. Building on the seminal work of Saad and Solla, which analyzed the deterministic (ballistic) scaling limits of SGD corresponding to the gradient flow of…

November 5, 2025
High-Dimensional BWDM: A Robust Nonparametric Clustering Validation Index for Large-Scale Data

High-Dimensional BWDM: A Robust Nonparametric Clustering Validation Index for Large-Scale Data arXiv:2510.14145v1 Announce Type: new Abstract: Determining the appropriate number of clusters in unsupervised learning is a central problem in statistics and data science. Traditional validity indices such as Calinski-Harabasz, Silhouette, and Davies-Bouldin-depend on centroid-based distances and therefore degrade in high-dimensional or contaminated data. This…

October 17, 2025
Gaussian Certified Unlearning in High Dimensions: A Hypothesis Testing Approach

Gaussian Certified Unlearning in High Dimensions: A Hypothesis Testing Approach arXiv:2510.13094v1 Announce Type: new Abstract: Machine unlearning seeks to efficiently remove the influence of selected data while preserving generalization. Significant progress has been made in low dimensions $(p ll n)$, but high dimensions pose serious theoretical challenges as standard optimization assumptions of $Omega(1)$ strong convexity…

October 16, 2025
Spectral Thresholds for Identifiability and Stability:Finite-Sample Phase Transitions in High-Dimensional Learning

Spectral Thresholds for Identifiability and Stability:Finite-Sample Phase Transitions in High-Dimensional Learning arXiv:2510.03809v1 Announce Type: new Abstract: In high-dimensional learning, models remain stable until they collapse abruptly once the sample size falls below a critical level. This instability is not algorithm-specific but a geometric mechanism: when the weakest Fisher eigendirection falls beneath sample-level fluctuations, identifiability fails.…

October 7, 2025
Learning Rate Should Scale Inversely with High-Order Data Moments in High-Dimensional Online Independent Component Analysis

Learning Rate Should Scale Inversely with High-Order Data Moments in High-Dimensional Online Independent Component Analysis arXiv:2509.15127v1 Announce Type: new Abstract: We investigate the impact of high-order moments on the learning dynamics of an online Independent Component Analysis (ICA) algorithm under a high-dimensional data model composed of a weighted sum of two non-Gaussian random variables. This…

September 19, 2025
High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation

High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation arXiv:2508.05570v1 Announce Type: new Abstract: In this paper, we study the bias and high-order error bounds of the Linear Stochastic Approximation (LSA) algorithm with Polyak-Ruppert (PR) averaging under Markovian noise. We focus on the version of the algorithm with constant step size $alpha$ and propose a…

August 8, 2025
AdapDISCOM: An Adaptive Sparse Regression Method for High-Dimensional Multimodal Data With Block-Wise Missingness and Measurement Errors

AdapDISCOM: An Adaptive Sparse Regression Method for High-Dimensional Multimodal Data With Block-Wise Missingness and Measurement Errors arXiv:2508.00120v1 Announce Type: cross Abstract: Multimodal high-dimensional data are increasingly prevalent in biomedical research, yet they are often compromised by block-wise missingness and measurement errors, posing significant challenges for statistical inference and prediction. We propose AdapDISCOM, a novel adaptive…

August 4, 2025
Newfluence: Boosting Model interpretability and Understanding in High Dimensions

Newfluence: Boosting Model interpretability and Understanding in High Dimensions arXiv:2507.11895v1 Announce Type: new Abstract: The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions. Influence functions, originating from robust statistics, have emerged as…

July 17, 2025
An Observation on Lloyd’s k-Means Algorithm in High Dimensions

An Observation on Lloyd’s k-Means Algorithm in High Dimensions arXiv:2506.14952v1 Announce Type: new Abstract: Clustering and estimating cluster means are core problems in statistics and machine learning, with k-means and Expectation Maximization (EM) being two widely used algorithms. In this work, we provide a theoretical explanation for the failure of k-means in high-dimensional settings with…

June 19, 2025
On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiologic boundary conditions

On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiologic boundary conditions arXiv:2506.11683v1 Announce Type: new Abstract: Solving inverse problems in cardiovascular modeling is particularly challenging due to the high computational cost of running high-fidelity simulations. In this work, we focus on Bayesian parameter estimation and explore different methods to reduce the…

June 16, 2025
High-Dimensional Importance-Weighted Information Criteria: Theory and Optimality

High-Dimensional Importance-Weighted Information Criteria: Theory and Optimality arXiv:2505.06531v1 Announce Type: new Abstract: Imori and Ing (2025) proposed the importance-weighted orthogonal greedy algorithm (IWOGA) for model selection in high-dimensional misspecified regression models under covariate shift. To determine the number of IWOGA iterations, they introduced the high-dimensional importance-weighted information criterion (HDIWIC). They argued that the combined use…

May 13, 2025
Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates

Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates arXiv:2503.16941v1 Announce Type: new Abstract: Personalized services are central to today’s digital landscape, where online decision-making is commonly formulated as contextual bandit problems. Two key challenges emerge in modern applications: high-dimensional covariates and the need for nonparametric models to capture complex reward-covariate…

March 24, 2025
Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula

Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula arXiv:2502.20003v1 Announce Type: new Abstract: The analytic characterization of the high-dimensional behavior of optimization for Generalized Linear Models (GLMs) with Gaussian data has been a central focus in statistics and probability in recent years. While convex cases, such as the LASSO,…

February 28, 2025
BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings

BOIDS: High-dimensional Bayesian Optimization via Incumbent-guided Direction Lines and Subspace Embeddings arXiv:2412.12918v1 Announce Type: new Abstract: When it comes to expensive black-box optimization problems, Bayesian Optimization (BO) is a well-known and powerful solution. Many real-world applications involve a large number of dimensions, hence scaling BO to high dimension is of much interest. However, state-of-the-art high-dimensional…

December 18, 2024
Modeling High-Dimensional Dependent Data in the Presence of Many Explanatory Variables and Weak Signals

Modeling High-Dimensional Dependent Data in the Presence of Many Explanatory Variables and Weak Signals arXiv:2412.04736v1 Announce Type: cross Abstract: This article considers a novel and widely applicable approach to modeling high-dimensional dependent data when a large number of explanatory variables are available and the signal-to-noise ratio is low. We postulate that a $p$-dimensional response series…

December 9, 2024
Contrastive representations of high-dimensional, structured treatments

Contrastive representations of high-dimensional, structured treatments arXiv:2411.19245v1 Announce Type: new Abstract: Estimating causal effects is vital for decision making. In standard causal effect estimation, treatments are usually binary- or continuous-valued. However, in many important real-world settings, treatments can be structured, high-dimensional objects, such as text, video, or audio. This provides a challenge to traditional causal…

December 2, 2024