Tag: sample

Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics

Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics arXiv:2512.13997v1 Announce Type: new Abstract: Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require discarding valuable data, unnecessarily reducing test power.…

December 17, 2025
Minimax-Optimal Two-Sample Test with Sliced Wasserstein

Minimax-Optimal Two-Sample Test with Sliced Wasserstein arXiv:2510.27498v1 Announce Type: new Abstract: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees and computational efficiency, its theoretical foundations for hypothesis testing remain limited.…

November 3, 2025
Spectral Thresholds for Identifiability and Stability:Finite-Sample Phase Transitions in High-Dimensional Learning

Spectral Thresholds for Identifiability and Stability:Finite-Sample Phase Transitions in High-Dimensional Learning arXiv:2510.03809v1 Announce Type: new Abstract: In high-dimensional learning, models remain stable until they collapse abruptly once the sample size falls below a critical level. This instability is not algorithm-specific but a geometric mechanism: when the weakest Fisher eigendirection falls beneath sample-level fluctuations, identifiability fails.…

October 7, 2025
Decorrelated feature importance from local sample weighting

Decorrelated feature importance from local sample weighting arXiv:2508.06337v1 Announce Type: new Abstract: Feature importance (FI) statistics provide a prominent and valuable method of insight into the decision process of machine learning (ML) models, but their effectiveness has well-known limitations when correlation is present among the features in the training data. In this case, the FI…

August 11, 2025
Measuring Sample Quality with Copula Discrepancies

Measuring Sample Quality with Copula Discrepancies arXiv:2507.21434v1 Announce Type: new Abstract: The scalable Markov chain Monte Carlo (MCMC) algorithms that underpin modern Bayesian machine learning, such as Stochastic Gradient Langevin Dynamics (SGLD), sacrifice asymptotic exactness for computational speed, creating a critical diagnostic gap: traditional sample quality measures fail catastrophically when applied to biased samplers. While…

July 30, 2025
Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations

Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations arXiv:2506.06613v1 Announce Type: new Abstract: Learning distribution families over $mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to…

June 10, 2025
A characterization of sample adaptivity in UCB data

A characterization of sample adaptivity in UCB data arXiv:2503.04855v1 Announce Type: new Abstract: We characterize a joint CLT of the number of pulls and the sample mean reward of the arms in a stochastic two-armed bandit environment under UCB algorithms. Several implications of this result are in place: (1) a nonstandard CLT of the number…

March 10, 2025
An Efficient Permutation-Based Kernel Two-Sample Test

An Efficient Permutation-Based Kernel Two-Sample Test arXiv:2502.13570v1 Announce Type: new Abstract: Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due…

February 20, 2025
Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training

Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training arXiv:2502.10793v1 Announce Type: new Abstract: Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers…

February 18, 2025
Synthetic Control Sample for Before and After A/B Test

Synthetic Control Sample for Before and After A/B Test Learn a simple way to use linear regression to create a synthetic control sample for your A/B test Continue reading on Towards Data Science » Gustavo R Santos Go to original source

December 20, 2024
A Note on Sample Complexity of Interactive Imitation Learning with Log Loss

A Note on Sample Complexity of Interactive Imitation Learning with Log Loss arXiv:2412.07057v1 Announce Type: new Abstract: Imitation learning (IL) is a general paradigm for learning from experts in sequential decision-making problems. Recent advancements in IL have shown that offline imitation learning, specifically Behavior Cloning (BC) with log loss, is minimax optimal. Meanwhile, its interactive…

December 11, 2024