Tag: testing

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions arXiv:2603.04635v1 Announce Type: new Abstract: Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $epsilon$-far from all product distributions in total variation distance.…

March 6, 2026
How We Are Testing Our Agents in Dev

How We Are Testing Our Agents in Dev Testing that your AI agent is performing as expected is not easy. Here are a few strategies we learned the hard way. The post How We Are Testing Our Agents in Dev appeared first on Towards Data Science. Michael Segner Go to original source

December 7, 2025
Beyond Normality: Reliable A/B Testing with Non-Gaussian Data

Beyond Normality: Reliable A/B Testing with Non-Gaussian Data arXiv:2510.23666v1 Announce Type: new Abstract: A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise $t$-test to compare outcomes between the treatment and control groups, thereby…

October 29, 2025
Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

Strategic A/B testing via Maximum Probability-driven Two-armed Bandit arXiv:2506.22536v1 Announce Type: new Abstract: Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their…

July 1, 2025
From Two Sample Testing to Singular Gaussian Discrimination

From Two Sample Testing to Singular Gaussian Discrimination arXiv:2505.04613v1 Announce Type: new Abstract: We establish that testing for the equality of two probability measures on a general separable and compact metric space is equivalent to testing for the singularity between two corresponding Gaussian measures on a suitable Reproducing Kernel Hilbert Space. The corresponding Gaussians are…

May 8, 2025
Load-Testing LLMs Using LLMPerf

Load-Testing LLMs Using LLMPerf Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level…

April 19, 2025
Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs

Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs Creating efficient prompts for large language models often starts as a simple task… but it doesn’t always stay that way. Initially, following basic best practices seems sufficient: adopt the persona of a specialist, write clear instructions, require a specific response format, and…

March 15, 2025
Amortized Conditional Independence Testing

Amortized Conditional Independence Testing arXiv:2502.20925v1 Announce Type: new Abstract: Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery – a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the…

March 3, 2025
Testing Conditional Mean Independence Using Generative Neural Networks

Testing Conditional Mean Independence Using Generative Neural Networks arXiv:2501.17345v1 Announce Type: new Abstract: Conditional mean independence (CMI) testing is crucial for statistical tasks including model determination and variable importance evaluation. In this work, we introduce a novel population CMI measure and a bootstrap-based testing procedure that utilizes deep generative neural networks to estimate the conditional…

January 30, 2025
Bayesian A/B Testing Falls Short

Bayesian A/B Testing Falls Short Why Bayesian A/B testing can lead to misunderstandings, inflated false positive rates, introduce bias and complicate results (Image generated by the author using Midjourney) Over the past decade, I’ve engaged in countless discussions about Bayesian A/B testing versus Frequentist A/B testing. In nearly every conversation, I’ve maintained the same viewpoint:…

January 9, 2025