Category: cs.DS

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions arXiv:2603.04635v1 Announce Type: new Abstract: Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $epsilon$-far from all product distributions in total variation distance.…

March 6, 2026
Low-Degree Method Fails to Predict Robust Subspace Recovery

Low-Degree Method Fails to Predict Robust Subspace Recovery arXiv:2603.02594v1 Announce Type: new Abstract: The low-degree polynomial framework has been highly successful in predicting computational versus statistical gaps for high-dimensional problems in average-case analysis and machine learning. This success has led to the low-degree conjecture, which posits that this method captures the power and limitations of…

March 4, 2026
Combinatorial Sparse PCA Beyond the Spiked Identity Model

Combinatorial Sparse PCA Beyond the Spiked Identity Model arXiv:2603.02607v1 Announce Type: new Abstract: Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Sigma$, whose top eigenvector $v in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into…

March 4, 2026
Linear Regression with Unknown Truncation Beyond Gaussian Features

Linear Regression with Unknown Truncation Beyond Gaussian Features arXiv:2602.12534v1 Announce Type: new Abstract: In truncated linear regression, samples $(x,y)$ are shown only when the outcome $y$ falls inside a certain survival set $S^star$ and the goal is to estimate the unknown $d$-dimensional regressor $w^star$. This problem has a long history of study in Statistics and…

February 16, 2026
SCaLE: Switching Cost aware Learning and Exploration

SCaLE: Switching Cost aware Learning and Exploration arXiv:2601.09042v1 Announce Type: cross Abstract: This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the…

January 15, 2026
Learning Multinomial Logits in $O(n log n)$ time

Learning Multinomial Logits in $O(n log n)$ time arXiv:2601.04423v1 Announce Type: cross Abstract: A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]={1,…, n}$, each assigned a positive weight. A query specifies an admissible subset — called a slate — and the model chooses one item from that slate with probability…

January 9, 2026
Online Learning with Limited Information in the Sliding Window Model

Online Learning with Limited Information in the Sliding Window Model arXiv:2601.03533v1 Announce Type: new Abstract: Motivated by recent work on the experts problem in the streaming model, we consider the experts problem in the sliding window model. The sliding window model is a well-studied model that captures applications such as traffic monitoring, epidemic tracking, and…

January 8, 2026
Rate-optimal community detection near the KS threshold via node-robust algorithms

Rate-optimal community detection near the KS threshold via node-robust algorithms arXiv:2511.16613v1 Announce Type: new Abstract: We study community detection in the emph{symmetric $k$-stochastic block model}, where $n$ nodes are evenly partitioned into $k$ clusters with intra- and inter-cluster connection probabilities $p$ and $q$, respectively. Our main result is a polynomial-time algorithm that achieves the minimax-optimal…

November 21, 2025
QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design

QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design arXiv:2410.07961v2 Announce Type: cross Abstract: Quantum computing is an emerging field recognized for the significant speedup it offers over classical computing through quantum algorithms. However, designing and implementing quantum algorithms pose challenges due to the complex nature of quantum mechanics and the necessity for precise control…

November 10, 2025
High-Probability Bounds For Heterogeneous Local Differential Privacy

High-Probability Bounds For Heterogeneous Local Differential Privacy arXiv:2510.11895v1 Announce Type: new Abstract: We study statistical estimation under local differential privacy (LDP) when users may hold heterogeneous privacy levels and accuracy must be guaranteed with high probability. Departing from the common in-expectation analyses, and for one-dimensional and multi-dimensional mean estimation problems, we develop finite sample upper…

October 15, 2025
Adaptive randomized pivoting and volume sampling

Adaptive randomized pivoting and volume sampling arXiv:2510.02513v1 Announce Type: new Abstract: Adaptive randomized pivoting (ARP) is a recently proposed and highly effective algorithm for column subset selection. This paper reinterprets the ARP algorithm by drawing connections to the volume sampling distribution and active learning algorithms for linear regression. As consequences, this paper presents new analysis…

October 6, 2025
Private Learning of Littlestone Classes, Revisited

Private Learning of Littlestone Classes, Revisited arXiv:2510.00076v1 Announce Type: new Abstract: We consider online and PAC learning of Littlestone classes subject to the constraint of approximate differential privacy. Our main result is a private learner to online-learn a Littlestone class with a mistake bound of $tilde{O}(d^{9.5}cdot log(T))$ in the realizable case, where $d$ denotes the…

October 2, 2025
Instance-Optimal Matrix Multiplicative Weight Update and Its Quantum Applications

Instance-Optimal Matrix Multiplicative Weight Update and Its Quantum Applications arXiv:2509.08911v1 Announce Type: cross Abstract: The Matrix Multiplicative Weight Update (MMWU) is a seminal online learning algorithm with numerous applications. Applied to the matrix version of the Learning from Expert Advice (LEA) problem on the $d$-dimensional spectraplex, it is well known that MMWU achieves the minimax-optimal…

September 12, 2025
Optimal Variance and Covariance Estimation under Differential Privacy in the Add-Remove Model and Beyond

Optimal Variance and Covariance Estimation under Differential Privacy in the Add-Remove Model and Beyond arXiv:2509.04919v1 Announce Type: new Abstract: In this paper, we study the problem of estimating the variance and covariance of datasets under differential privacy in the add-remove model. While estimation in the swap model has been extensively studied in the literature, the…

September 8, 2025
Optimal Subspace Embeddings: Resolving Nelson-Nguyen Conjecture Up to Sub-Polylogarithmic Factors

Optimal Subspace Embeddings: Resolving Nelson-Nguyen Conjecture Up to Sub-Polylogarithmic Factors arXiv:2508.14234v1 Announce Type: cross Abstract: We give a proof of the conjecture of Nelson and Nguyen [FOCS 2013] on the optimal dimension and sparsity of oblivious subspace embeddings, up to sub-polylogarithmic factors: For any $ngeq d$ and $epsilongeq d^{-O(1)}$, there is a random $tilde O(d/epsilon^2)times…

August 21, 2025
On computing and the complexity of computing higher-order $U$-statistics, exactly

On computing and the complexity of computing higher-order $U$-statistics, exactly arXiv:2508.12627v1 Announce Type: new Abstract: Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper…

August 19, 2025
Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation

Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation arXiv:2507.08108v1 Announce Type: new Abstract: textit{Mallows model} is a widely-used probabilistic framework for learning from ranking data, with applications ranging from recommendation systems and voting to aligning language models with human preferences~cite{chen2024mallows, kleinberg2021algorithmic, rafailov2024direct}. Under this model, observed rankings are noisy perturbations of a…

July 14, 2025
Lower Bounds for Greedy Teaching Set Constructions

Lower Bounds for Greedy Teaching Set Constructions arXiv:2505.03223v1 Announce Type: new Abstract: A fundamental open problem in learning theory is to characterize the best-case teaching dimension $operatorname{TS}_{min}$ of a concept class $mathcal{C}$ with finite VC dimension $d$. Resolving this problem will, in particular, settle the conjectured upper bound on Recursive Teaching Dimension posed by [Simon…

May 7, 2025
Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond

Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond arXiv:2504.07133v1 Announce Type: new Abstract: We revisit the problem of estimating $k$ linear regressors with self-selection bias in $d$ dimensions with the maximum selection criterion, as introduced by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [CDIZ23, STOC’23]. Our main result is a $operatorname{poly}(d,k,1/varepsilon) + {k}^{O(k)}$…

April 11, 2025
Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs

Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs arXiv:2502.09832v1 Announce Type: new Abstract: In this paper, assuming a natural strengthening of the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated ErdH{o}s-R’enyi graphs $mathcal G(n,q;rho)$ when the edge-density $q=n^{-1+o(1)}$ and…

February 17, 2025
Algorithms with Calibrated Machine Learning Predictions

Algorithms with Calibrated Machine Learning Predictions arXiv:2502.02861v1 Announce Type: new Abstract: The field of algorithms with predictions incorporates machine learning advice in the design of online algorithms to improve real-world performance. While this theoretical framework often assumes uniform reliability across all predictions, modern machine learning models can now provide instance-level uncertainty estimates. In this paper,…

February 6, 2025
Near-Optimal Algorithms for Omniprediction

Near-Optimal Algorithms for Omniprediction arXiv:2501.17205v1 Announce Type: new Abstract: Omnipredictors are simple prediction functions that encode loss-minimizing predictions with respect to a hypothesis class $H$, simultaneously for every loss function within a class of losses $L$. In this work, we give near-optimal learning algorithms for omniprediction, in both the online and offline settings. To begin,…

January 30, 2025
Robust random graph matching in dense graphs via vector approximate message passing

Robust random graph matching in dense graphs via vector approximate message passing arXiv:2412.16457v1 Announce Type: new Abstract: In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation…

December 24, 2024
On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models

On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models arXiv:2412.14315v1 Announce Type: new Abstract: In a graph bisection problem, we are given a graph $G$ with two equally-sized unlabeled communities, and the goal is to recover the vertices in these communities. A popular heuristic, known as spectral clustering, is to output an estimated…

December 20, 2024
The Broader Landscape of Robustness in Algorithmic Statistics

The Broader Landscape of Robustness in Algorithmic Statistics arXiv:2412.02670v1 Announce Type: new Abstract: The last decade has seen a number of advances in computationally efficient algorithms for statistical methods subject to robustness constraints. An estimator may be robust in a number of different ways: to contamination of the dataset, to heavy-tailed data, or in the…

December 4, 2024