Tag: testing
-
Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions
Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions arXiv:2603.04635v1 Announce Type: new Abstract: Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $epsilon$-far from all product distributions in total variation distance.…
-
How We Are Testing Our Agents in Dev
How We Are Testing Our Agents in Dev Testing that your AI agent is performing as expected is not easy. Here are a few strategies we learned the hard way. The post How We Are Testing Our Agents in Dev appeared first on Towards Data Science. Michael Segner Go to original source
-
Beyond Normality: Reliable A/B Testing with Non-Gaussian Data
Beyond Normality: Reliable A/B Testing with Non-Gaussian Data arXiv:2510.23666v1 Announce Type: new Abstract: A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise $t$-test to compare outcomes between the treatment and control groups, thereby…
-
Strategic A/B testing via Maximum Probability-driven Two-armed Bandit
Strategic A/B testing via Maximum Probability-driven Two-armed Bandit arXiv:2506.22536v1 Announce Type: new Abstract: Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their…
-
From Two Sample Testing to Singular Gaussian Discrimination
From Two Sample Testing to Singular Gaussian Discrimination arXiv:2505.04613v1 Announce Type: new Abstract: We establish that testing for the equality of two probability measures on a general separable and compact metric space is equivalent to testing for the singularity between two corresponding Gaussian measures on a suitable Reproducing Kernel Hilbert Space. The corresponding Gaussians are…
-
Load-Testing LLMs Using LLMPerf
Load-Testing LLMs Using LLMPerf Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level…
-
Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs
Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs Creating efficient prompts for large language models often starts as a simple task… but it doesn’t always stay that way. Initially, following basic best practices seems sufficient: adopt the persona of a specialist, write clear instructions, require a specific response format, and…
-
Amortized Conditional Independence Testing
Amortized Conditional Independence Testing arXiv:2502.20925v1 Announce Type: new Abstract: Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery – a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the…
-
Testing Conditional Mean Independence Using Generative Neural Networks
Testing Conditional Mean Independence Using Generative Neural Networks arXiv:2501.17345v1 Announce Type: new Abstract: Conditional mean independence (CMI) testing is crucial for statistical tasks including model determination and variable importance evaluation. In this work, we introduce a novel population CMI measure and a bootstrap-based testing procedure that utilizes deep generative neural networks to estimate the conditional…
-
Bayesian A/B Testing Falls Short
Bayesian A/B Testing Falls Short Why Bayesian A/B testing can lead to misunderstandings, inflated false positive rates, introduce bias and complicate results (Image generated by the author using Midjourney) Over the past decade, I’ve engaged in countless discussions about Bayesian A/B testing versus Frequentist A/B testing. In nearly every conversation, I’ve maintained the same viewpoint:…