Tag: test

  • Roast my AB test analysis [A]

    Roast my AB test analysis [A] I have just finished up a sample analysis on an AB test dummy dataset, and would love feedback. The dataset is from Udacity’s AB Testing course. It tracks data on two landing page variations, treatment and control, with mean conversion rate as the defining metric. In my analysis, I…

  • Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation

    Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation arXiv:2602.02633v1 Announce Type: new Abstract: Often, constraints arise in deployment settings where even lightweight parameter updates e.g. parameter-efficient fine-tuning could induce model shift or tuning instability. We study test-time adaptation of foundation models for few-shot classification under a completely frozen-model regime, where additionally, no…

  • Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing

    Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing arXiv:2512.20007v1 Announce Type: new Abstract: Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending them to powerful nonparametric alternatives is difficult due to the lack of suitable…

  • Minimax-Optimal Two-Sample Test with Sliced Wasserstein

    Minimax-Optimal Two-Sample Test with Sliced Wasserstein arXiv:2510.27498v1 Announce Type: new Abstract: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees and computational efficiency, its theoretical foundations for hypothesis testing remain limited.…

  • Why Your A/B Test Winner Might Just Be Random Noise

    Why Your A/B Test Winner Might Just Be Random Noise What a coach’s warm-up trial can teach us about running better experiments The post Why Your A/B Test Winner Might Just Be Random Noise appeared first on Towards Data Science. Pol Marin Go to original source

  • A Two-Sample Test of Text Generation Similarity

    A Two-Sample Test of Text Generation Similarity arXiv:2505.05269v1 Announce Type: new Abstract: The surge in digitized text data requires reliable inferential methods on observed textual patterns. This article proposes a novel two-sample text test for comparing similarity between two groups of documents. The hypothesis is whether the probabilistic mapping generating the textual data is identical…

  • Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs

    Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs arXiv:2505.03814v1 Announce Type: new Abstract: As foundation models continue to scale, the size of trained models grows exponentially, presenting significant challenges for their evaluation. Current evaluation practices involve curating increasingly large datasets to assess the performance of large language models (LLMs). However, there is a lack of…

  • Data Science: From School to Work, Part IV

    Data Science: From School to Work, Part IV Introduction Let’s start with a simple example that will appeal to most of us. If you want to check if the blinkers of your car are working properly, you sit in the car, turn on the ignition and test a turn signal to see if the front…

  • An Efficient Permutation-Based Kernel Two-Sample Test

    An Efficient Permutation-Based Kernel Two-Sample Test arXiv:2502.13570v1 Announce Type: new Abstract: Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due…

  • Statistical Verification of Linear Classifiers

    Statistical Verification of Linear Classifiers arXiv:2501.14430v1 Announce Type: new Abstract: We propose a homogeneity test closely related to the concept of linear separability between two samples. Using the test one can answer the question whether a linear classifier is merely “random” or effectively captures differences between two classes. We focus on establishing upper bounds for…

  • Four Ways to Improve Statistical Power in A/B Testing (Without Increasing Test Duration, Duh)

    Four Ways to Improve Statistical Power in A/B Testing (Without Increasing Test Duration, Duh) In A/B testing, you often have to balance statistical power and how long the test takes. Learn how Allocation, Effect Size, CUPED & Binarization can help you. Image by author In A/B testing, you often have to balance statistical power and how long…

  • Chi-Squared Test: Comparing Variations Through Soccer

    Chi-Squared Test: Comparing Variations Through Soccer Understanding Different Types of Chi-Squared Tests: A/B Testing for Data Science Series (8) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source