Category: cs.IT

  • Dictionary Based Pattern Entropy for Causal Direction Discovery

    Dictionary Based Pattern Entropy for Causal Direction Discovery arXiv:2603.04473v1 Announce Type: new Abstract: Discovering causal direction from temporal observational data is particularly challenging for symbolic sequences, where functional models and noise assumptions are often unavailable. We propose a novel emph{Dictionary Based Pattern Entropy ($DPE$)} framework that infers both the direction of causation and the specific…

  • Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,lambda}$ Targets

    Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,lambda}$ Targets arXiv:2602.20555v1 Announce Type: new Abstract: The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard Transformers can…

  • Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs

    Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs arXiv:2602.15091v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate. Within an information-theoretic learning framework, we specialize…

  • Locally Private Parametric Methods for Change-Point Detection

    Locally Private Parametric Methods for Change-Point Detection arXiv:2602.13619v1 Announce Type: new Abstract: We study parametric change-point detection, where the goal is to identify distributional changes in time series, under local differential privacy. In the non-private setting, we derive improved finite-sample accuracy guarantees for a change-point detection algorithm based on the generalized log-likelihood ratio test, via…

  • Persistent Entropy as a Detector of Phase Transitions

    Persistent Entropy as a Detector of Phase Transitions arXiv:2602.09058v1 Announce Type: new Abstract: Persistent entropy (PE) is an information-theoretic summary statistic of persistence barcodes that has been widely used to detect regime changes in complex systems. Despite its empirical success, a general theoretical understanding of when and why persistent entropy reliably detects phase transitions has…

  • The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning

    The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning arXiv:2602.09394v1 Announce Type: new Abstract: Manufacturing lines, service journeys, supply chains, and AI reasoning chains share a common challenge: attributing a terminal outcome to the intermediate stage that caused it. We establish an information-theoretic barrier to this credit assignment problem: the signal connecting…

  • Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

    Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions arXiv:2602.02577v1 Announce Type: new Abstract: The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing work has demonstrated that KL divergence between multivariate Gaussian distributions follows a relaxed triangle inequality.…

  • Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget

    Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget arXiv:2601.18950v1 Announce Type: new Abstract: Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One…

  • Parametric RDT approach to computational gap of symmetric binary perceptron

    Parametric RDT approach to computational gap of symmetric binary perceptron arXiv:2601.10628v1 Announce Type: new Abstract: We study potential presence of statistical-computational gaps (SCG) in symmetric binary perceptrons (SBP) via a parametric utilization of emph{fully lifted random duality theory} (fl-RDT) [96]. A structural change from decreasingly to arbitrarily ordered $c$-sequence (a key fl-RDT parametric component) is…

  • Towards A Unified PAC-Bayesian Framework for Norm-based Generalization Bounds

    Towards A Unified PAC-Bayesian Framework for Norm-based Generalization Bounds arXiv:2601.08100v1 Announce Type: new Abstract: Understanding the generalization behavior of deep neural networks remains a fundamental challenge in modern statistical learning theory. Among existing approaches, PAC-Bayesian norm-based bounds have demonstrated particular promise due to their data-dependent nature and their ability to capture algorithmic and geometric properties…

  • Learning Causality for Longitudinal Data

    Learning Causality for Longitudinal Data arXiv:2512.04980v1 Announce Type: new Abstract: This thesis develops methods for causal inference and causal representation learning (CRL) in high-dimensional, time-varying data. The first contribution introduces the Causal Dynamic Variational Autoencoder (CDVAE), a model for estimating Individual Treatment Effects (ITEs) by capturing unobserved heterogeneity in treatment response driven by latent risk…

  • Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

    Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit arXiv:2511.15120v1 Announce Type: new Abstract: In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbol{x})=g(boldsymbol{U}boldsymbol{x})$ with hidden subspace $boldsymbol{U}in mathbb{R}^{rtimes d}$, which is the…

  • Unifying Information-Theoretic and Pair-Counting Clustering Similarity

    Unifying Information-Theoretic and Pair-Counting Clustering Similarity arXiv:2511.03000v1 Announce Type: new Abstract: Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate…

  • Graphical model for tensor factorization by sparse sampling

    Graphical model for tensor factorization by sparse sampling arXiv:2510.17886v1 Announce Type: new Abstract: We consider tensor factorizations based on sparse measurements of the tensor components. The measurements are designed in a way that the underlying graph of interactions is a random graph. The setup will be useful in cases where a substantial amount of data…

  • A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws

    A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws arXiv:2510.00504v1 Announce Type: new Abstract: When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller…

  • Fundamental bounds on efficiency-confidence trade-off for transductive conformal prediction

    Fundamental bounds on efficiency-confidence trade-off for transductive conformal prediction arXiv:2509.04631v1 Announce Type: cross Abstract: Transductive conformal prediction addresses the simultaneous prediction for multiple data points. Given a desired confidence level, the objective is to construct a prediction set that includes the true outcomes with the prescribed confidence. We demonstrate a fundamental trade-off between confidence and…

  • Rao Differential Privacy

    Rao Differential Privacy arXiv:2508.17135v1 Announce Type: new Abstract: Differential privacy (DP) has recently emerged as a definition of privacy to release private estimates. DP calibrates noise to be on the order of an individuals contribution. Due to the this calibration a private estimate obscures any individual while preserving the utility of the estimate. Since the…

  • An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise

    An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise arXiv:2508.10879v1 Announce Type: new Abstract: Given $n$ i.i.d. random matrices $A_i in mathbb{R}^{d times d}$ that share a common expectation $Sigma$, the objective of Differentially Private Stochastic PCA is to identify a subspace of dimension $k$ that captures the largest variance directions of $Sigma$, while…

  • Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality

    Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality arXiv:2507.16953v1 Announce Type: new Abstract: Estimating high-dimensional covariance matrices is a key task across many fields. This paper explores the theoretical limits of distributed covariance estimation in a feature-split setting, where communication between agents is constrained. Specifically, we study a scenario…

  • On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective

    On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective arXiv:2507.06552v1 Announce Type: new Abstract: This paper studies the hardness of unsupervised domain adaptation (UDA) under covariate shift. We model the uncertainty that the learner faces by a distribution $pi$ in the ground-truth triples $(p, q, f)$ — which we call a UDA…

  • POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes

    POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes arXiv:2506.20406v1 Announce Type: new Abstract: Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on…

  • Rare dense solutions clusters in asymmetric binary perceptrons — local entropy via fully lifted RDT

    Rare dense solutions clusters in asymmetric binary perceptrons — local entropy via fully lifted RDT arXiv:2506.19276v1 Announce Type: new Abstract: We study classical asymmetric binary perceptron (ABP) and associated emph{local entropy} (LE) as potential source of its algorithmic hardness. Isolation of emph{typical} ABP solutions in SAT phase seemingly suggests a universal algorithmic hardness. Paradoxically, efficient…

  • Coupled Entropy: A Goldilocks Generalization?

    Coupled Entropy: A Goldilocks Generalization? arXiv:2506.17229v1 Announce Type: new Abstract: Nonextensive Statistical Mechanics (NSM) has developed into a powerful toolset for modeling and analyzing complex systems. Despite its many successes, a puzzle arose early in its development. The constraints on the Tsallis entropy are in the form of an escort distribution with elements proportional to…

  • Overfitting has a limitation: a model-independent generalization error bound based on R’enyi entropy

    Overfitting has a limitation: a model-independent generalization error bound based on R’enyi entropy arXiv:2506.00182v1 Announce Type: new Abstract: Will further scaling up of machine learning models continue to bring success? A significant challenge in answering this question lies in understanding generalization error, which is the impact of overfitting. Understanding generalization error behavior of increasingly large-scale…

  • Optimal Regret of Bernoulli Bandits under Global Differential Privacy

    Optimal Regret of Bernoulli Bandits under Global Differential Privacy arXiv:2505.05613v1 Announce Type: new Abstract: As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under $epsilon$-global Differential Privacy (DP) has been widely studied. Unlike bandits…

  • An Efficient Transport-Based Dissimilarity Measure for Time Series Classification under Warping Distortions

    An Efficient Transport-Based Dissimilarity Measure for Time Series Classification under Warping Distortions arXiv:2505.05676v1 Announce Type: cross Abstract: Time Series Classification (TSC) is an important problem with numerous applications in science and technology. Dissimilarity-based approaches, such as Dynamic Time Warping (DTW), are classical methods for distinguishing time series when time deformations are confounding information. In this…

  • Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior

    Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior arXiv:2504.18455v1 Announce Type: new Abstract: We study the problem of distributed multi-view representation learning. In this problem, $K$ agents observe each one distinct, possibly statistically correlated, view and independently extracts from it a suitable representation in a manner that a…

  • Denoising guarantees for optimized sampling schemes in compressed sensing

    Denoising guarantees for optimized sampling schemes in compressed sensing arXiv:2504.01046v1 Announce Type: new Abstract: Compressed sensing with subsampled unitary matrices benefits from emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused…

  • Cost-Aware Optimal Pairwise Pure Exploration

    Cost-Aware Optimal Pairwise Pure Exploration arXiv:2503.07877v1 Announce Type: new Abstract: Pure exploration is one of the fundamental problems in multi-armed bandits (MAB). However, existing works mostly focus on specific pure exploration tasks, without a holistic view of the general pure exploration problem. This work fills this gap by introducing a versatile framework to study pure…

  • On Statistical Estimation of Edge-Reinforced Random Walks

    On Statistical Estimation of Edge-Reinforced Random Walks arXiv:2503.06115v1 Announce Type: new Abstract: Reinforced random walks (RRWs), including vertex-reinforced random walks (VRRWs) and edge-reinforced random walks (ERRWs), model random walks where the transition probabilities evolve based on prior visitation history~cite{mgr, fmk, tarres, volkov}. These models have found applications in various areas, such as network representation learning~cite{xzzs},…

  • Generalization in Federated Learning: A Conditional Mutual Information Framework

    Generalization in Federated Learning: A Conditional Mutual Information Framework arXiv:2503.04091v1 Announce Type: new Abstract: Federated Learning (FL) is a widely adopted privacy-preserving distributed learning framework, yet its generalization performance remains less explored compared to centralized learning. In FL, the generalization error consists of two components: the out-of-sample gap, which measures the gap between the empirical…

  • Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits

    Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits arXiv:2503.00273v1 Announce Type: new Abstract: We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we…

  • Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements

    Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements arXiv:2502.16008v1 Announce Type: new Abstract: We consider the problem of exact recovery of a $k$-sparse binary vector from generalized linear measurements (such as logistic regression). We analyze the linear estimation algorithm (Plan, Vershynin, Yudovina, 2017), and also show information theoretic lower bounds on the number…

  • Generative Models with ELBOs Converging to Entropy Sums

    Generative Models with ELBOs Converging to Entropy Sums arXiv:2501.09022v1 Announce Type: new Abstract: The evidence lower bound (ELBO) is one of the most central objectives for probabilistic unsupervised learning. For the ELBOs of several generative models and model classes, we here prove convergence to entropy sums. As one result, we provide a list of generative…

  • Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities

    Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities arXiv:2501.02406v1 Announce Type: new Abstract: Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc. This problem is becoming increasingly difficult as text generated by Large Language Models (LLMs)…

  • Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent

    Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent arXiv:2501.01696v1 Announce Type: new Abstract: Tensors, which give a faithful and effective representation to deliver the intrinsic structure of multi-dimensional data, play a crucial role in an increasing number of signal processing and machine learning problems. However, tensor data are often accompanied by arbitrary signal corruptions,…

  • The Broader Landscape of Robustness in Algorithmic Statistics

    The Broader Landscape of Robustness in Algorithmic Statistics arXiv:2412.02670v1 Announce Type: new Abstract: The last decade has seen a number of advances in computationally efficient algorithms for statistical methods subject to robustness constraints. An estimator may be robust in a number of different ways: to contamination of the dataset, to heavy-tailed data, or in the…