Category: statistics

How to Define the Modeling Scope of an Internal Credit Risk Model

How to Define the Modeling Scope of an Internal Credit Risk Model Dataset construction for Internal Ratings-Based (IRB) Probability of Default (PD) models The post How to Define the Modeling Scope of an Internal Credit Risk Model appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

February 26, 2026
Understanding the Chi-Square Test Beyond the Formula

Understanding the Chi-Square Test Beyond the Formula How categorical data becomes statistical evidence. The post Understanding the Chi-Square Test Beyond the Formula appeared first on Towards Data Science. Nikhil Dasari Go to original source

February 20, 2026
Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning

Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning Estimating neighborhood-level pedestrian risk from real-world incident data The post Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning appeared first on Towards Data Science. Aneesh Patil Go to original source

January 29, 2026
Causal ML for the Aspiring Data Scientist

Causal ML for the Aspiring Data Scientist An accessible introduction to causal inference and ML The post Causal ML for the Aspiring Data Scientist appeared first on Towards Data Science. Ross Lauterbach Go to original source

January 27, 2026
From Transactions to Trends: Predict When a Customer Is About to Stop Buying

From Transactions to Trends: Predict When a Customer Is About to Stop Buying Customer churn is usually a gradual process, not a sudden event. In this post, we analyze monthly transaction trends and convert regression slopes into degrees to clearly identify declining purchase behavior. A small negative slope today can prevent a big revenue loss…

January 24, 2026
Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data

Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data Google Trends is one of the most widely used tools for analysing human behaviour at scale. Journalists use it. Data scientists use it. Entire papers are built on it. But there is a fundamental property of Google Trends data that makes…

January 22, 2026
A Case for the T-statistic

A Case for the T-statistic And how it compares to the run-of-the-mill z-score The post A Case for the T-statistic appeared first on Towards Data Science. Aniruddha Karajgi Go to original source

January 22, 2026
The Machine Learning “Advent Calendar” Bonus 1: AUC in Excel

The Machine Learning “Advent Calendar” Bonus 1: AUC in Excel AUC measures how well a model ranks positives above negatives, independent of any chosen threshold. The post The Machine Learning “Advent Calendar” Bonus 1: AUC in Excel appeared first on Towards Data Science. angela shi Go to original source

December 31, 2025
Keeping Probabilities Honest: The Jacobian Adjustment

Keeping Probabilities Honest: The Jacobian Adjustment An intuitive explanation of transforming random variables correctly. The post Keeping Probabilities Honest: The Jacobian Adjustment appeared first on Towards Data Science. Aniruddha Karajgi Go to original source

December 26, 2025
Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction

Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction Multiple hypothesis testing, P-values, and Monte Carlo The post Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction appeared first on Towards Data Science. Marco Hening Tallarico Go to original source

December 25, 2025
The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel

The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel From Random Ensembles to Optimization: Gradient Boosting Explained The post The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel appeared first on Towards Data Science. angela shi Go to original source

December 23, 2025
The Machine Learning “Advent Calendar” Day 19: Bagging in Excel

The Machine Learning “Advent Calendar” Day 19: Bagging in Excel Understanding ensemble learning from first principles in Excel The post The Machine Learning “Advent Calendar” Day 19: Bagging in Excel appeared first on Towards Data Science. angela shi Go to original source

December 20, 2025
Geospatial exploratory data analysis with GeoPandas and DuckDB

Geospatial exploratory data analysis with GeoPandas and DuckDB In this article, I’ll show you how to use two popular Python libraries to carry out some geospatial analysis of traffic accident data within the UK. I was a relatively early adopter of DuckDB, the fast OLAP database, after it became available, but only recently realised that, through…

December 16, 2025
The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel

The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel Softmax Regression is simply Logistic Regression extended to multiple classes. By computing one linear score per class and normalizing them with Softmax, we obtain multiclass probabilities without changing the core logic. The loss, the gradients, and the optimization remain the same. Only the number…

December 15, 2025
The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel

The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel In this article, we rebuild Logistic Regression step by step directly in Excel. Starting from a binary dataset, we explore why linear regression struggles as a classifier, how the logistic function fixes these issues, and how log-loss naturally appears from the likelihood. With a…

December 13, 2025
The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel

The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel Isolation Forest may look technical, but its idea is simple: isolate points using random splits. If a point is isolated quickly, it is an anomaly; if it takes many splits, it is normal. Using the tiny dataset 1, 2, 3, 9, we can see…

December 9, 2025
The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall

The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall A modification to the Boruta algorithm that dramatically reduces computation while maintaining high sensitivity The post The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall appeared first on Towards Data Science. Nicolas Vana Go to original source

December 1, 2025
Metric Deception: When Your Best KPIs Hide Your Worst Failures

Metric Deception: When Your Best KPIs Hide Your Worst Failures The most dangerous KPIs aren’t broken; they’re the ones trusted long after they’ve lost their meaning. The post Metric Deception: When Your Best KPIs Hide Your Worst Failures appeared first on Towards Data Science. Shafeeq Ur Rahaman Go to original source

November 30, 2025
The Absolute Beginner’s Guide to Pandas DataFrames

The Absolute Beginner’s Guide to Pandas DataFrames Learn how to initialize dataframes from dictionaries, lists, and NumPy arrays The post The Absolute Beginner’s Guide to Pandas DataFrames appeared first on Towards Data Science. Ibrahim Salami Go to original source

November 18, 2025
Spearman Correlation Coefficient for When Pearson Isn’t Enough

Spearman Correlation Coefficient for When Pearson Isn’t Enough Not all relationships are linear, and that is where Spearman comes in. The post Spearman Correlation Coefficient for When Pearson Isn’t Enough appeared first on Towards Data Science. Nikhil Dasari Go to original source

November 14, 2025
Power Analysis in Marketing: A Hands-On Introduction

Power Analysis in Marketing: A Hands-On Introduction Part 1: What is statistical power and how do we compute it? The post Power Analysis in Marketing: A Hands-On Introduction appeared first on Towards Data Science. Sam Arrington Go to original source

November 9, 2025
Evaluating Synthetic Data — The Million Dollar Question

Evaluating Synthetic Data — The Million Dollar Question Learn how to evaluate synthetic data quality using the Maximum Similarity Test — a simple, quantitative approach for assessing fidelity, utility, and privacy in synthetic datasets. The post Evaluating Synthetic Data — The Million Dollar Question appeared first on Towards Data Science. Andrew Skabar Go to original…

November 8, 2025
Expected Value Analysis in AI Product Management

Expected Value Analysis in AI Product Management An introduction to key concepts and practical applications The post Expected Value Analysis in AI Product Management appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

November 7, 2025
What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later

What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later Here’s why it happens — and how to fix it The post What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later appeared first on Towards Data Science. Javier Marin Go to original source

November 5, 2025
The Pearson Correlation Coefficient, Explained Simply

The Pearson Correlation Coefficient, Explained Simply A simple explanation of the Pearson correlation coefficient with examples The post The Pearson Correlation Coefficient, Explained Simply appeared first on Towards Data Science. Nikhil Dasari Go to original source

November 2, 2025
Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood)

Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood) Can I use NumPy to figure out how my habits affect my mood and productivity? The post Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood) appeared first on Towards Data Science. Ibrahim Salami Go to original source

October 29, 2025
Building a Monitoring System That Actually Works

Building a Monitoring System That Actually Works A step-by-step guide to catching real anomalies without drowning in false alerts The post Building a Monitoring System That Actually Works appeared first on Towards Data Science. Mariya Mansurova Go to original source

October 28, 2025
The Power of Framework Dimensions: What Data Scientists Should Know

The Power of Framework Dimensions: What Data Scientists Should Know Practical guidance and a case study The post The Power of Framework Dimensions: What Data Scientists Should Know appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

October 27, 2025
Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know

Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know I’ve been learning data analytics for a year now. So far, I can consider myself confident in SQL and Power BI. The transition to Python has been quite exciting. I’ve been exposed to some neat and smarter approaches to data analysis. After brushing up…

October 22, 2025
Statistical Method mcRigor Enhances the Rigor of Metacell Partitioning in Single-Cell Data Analysis

Statistical Method mcRigor Enhances the Rigor of Metacell Partitioning in Single-Cell Data Analysis mcRigor detects dubious metacells within each metacell partition and selects the optimal metacell partitioning method and hyperparameter for a given dataset The post Statistical Method mcRigor Enhances the Rigor of Metacell Partitioning in Single-Cell Data Analysis appeared first on Towards Data Science.…

October 18, 2025
What Makes a Language Look Like Itself?

What Makes a Language Look Like Itself? How simple statistics reveal the visual fingerprints of 20 languages The post What Makes a Language Look Like Itself? appeared first on Towards Data Science. Kenneth McCarthy Go to original source

October 3, 2025
Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply

Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply Understanding Gini and Lorenz curves for smarter model evaluation The post Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply appeared first on Towards Data Science. Nikhil Dasari Go to original source

October 1, 2025
The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling

The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling Understanding how banks use the KS statistic in loan approvals. The post The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling appeared first on Towards Data Science. Nikhil Dasari Go to original source

September 23, 2025
The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI

The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI Is it possible to build a perfect induction machine? The post The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI appeared first on Towards Data Science. Angjelin Hila Go to original source

September 23, 2025
Why Your A/B Test Winner Might Just Be Random Noise

Why Your A/B Test Winner Might Just Be Random Noise What a coach’s warm-up trial can teach us about running better experiments The post Why Your A/B Test Winner Might Just Be Random Noise appeared first on Towards Data Science. Pol Marin Go to original source

September 17, 2025
How to Become a Machine Learning Engineer (Step-by-Step)

How to Become a Machine Learning Engineer (Step-by-Step) Your one-stop guide to becoming a machine learning engineer The post How to Become a Machine Learning Engineer (Step-by-Step) appeared first on Towards Data Science. Egor Howell Go to original source

September 16, 2025
Is Your Training Data Representative? A Guide to Checking with PSI in Python

Is Your Training Data Representative? A Guide to Checking with PSI in Python Comparing Variable Distributions Between Two Datasets Using Population Stability Index (PSI) and Cramér’s V. The post Is Your Training Data Representative? A Guide to Checking with PSI in Python appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

September 11, 2025
When A Difference Actually Makes A Difference

When A Difference Actually Makes A Difference Bite-Sized Analytics for Business Decision-Makers (1) The post When A Difference Actually Makes A Difference appeared first on Towards Data Science. Mena Wang Go to original source

September 11, 2025
Hands On Time Series Modeling of Rare Events, with Python

Hands On Time Series Modeling of Rare Events, with Python This is how to model rare events occurrences in a time series in a few lines of code The post Hands On Time Series Modeling of Rare Events, with Python appeared first on Towards Data Science. Piero Paialunga Go to original source

September 4, 2025
Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2

Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2 The Ornstein-Uhlenbeck process in Python The post Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2 appeared first on Towards Data Science. Marco Hening Tallarico Go to original source

September 4, 2025
How to Benchmark Classical Machine Learning Workloads on Google Cloud

How to Benchmark Classical Machine Learning Workloads on Google Cloud Harnessing CPUs for Practical, Cost-Effective Machine Learning The post How to Benchmark Classical Machine Learning Workloads on Google Cloud appeared first on Towards Data Science. Ehssan Khan Go to original source

August 26, 2025
Cracking the Density Code: Why MAF Flows Where KDE Stalls

Cracking the Density Code: Why MAF Flows Where KDE Stalls Learn why autoregressive flows are the superior density estimation tool for high-dimensional data The post Cracking the Density Code: Why MAF Flows Where KDE Stalls appeared first on Towards Data Science. Zackary Nay Go to original source

August 23, 2025
Help Your Model Learn the True Signal

Help Your Model Learn the True Signal An algorithm-agnostic approach inspired by Cook’s distance The post Help Your Model Learn the True Signal appeared first on Towards Data Science. Mena Wang Go to original source

August 20, 2025
LangGraph + SciPy: Building an AI That Reads Documentation and Makes Decisions

LangGraph + SciPy: Building an AI That Reads Documentation and Makes Decisions Stop guessing your statistical test. Let this AI do it for you. The post LangGraph + SciPy: Building an AI That Reads Documentation and Makes Decisions appeared first on Towards Data Science. Gustavo Santos Go to original source

August 12, 2025
Mastering NLP with spaCy – Part 2

Mastering NLP with spaCy – Part 2 POS tagging, dependency parser and named entity recognition. The post Mastering NLP with spaCy – Part 2 appeared first on Towards Data Science. Marcello Politi Go to original source

August 2, 2025
A Well-Designed Experiment Can Teach You More Than a Time Machine!

A Well-Designed Experiment Can Teach You More Than a Time Machine! How experimentation is more powerful than knowing counterfactuals The post A Well-Designed Experiment Can Teach You More Than a Time Machine! appeared first on Towards Data Science. Jarom Hulet Go to original source

July 23, 2025
The Hidden Trap of Fixed and Random Effects

The Hidden Trap of Fixed and Random Effects My lesson of how blindly over-controlling for noise can erase the effects you are measuring The post The Hidden Trap of Fixed and Random Effects appeared first on Towards Data Science. Ngoc Doan Go to original source

July 19, 2025
Estimating Disease Rates Without Diagnosis

Estimating Disease Rates Without Diagnosis Immune genes as predictors of disease The post Estimating Disease Rates Without Diagnosis appeared first on Towards Data Science. David Wells Go to original source

July 18, 2025
Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not!

Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! An explanation of the causal assumption implicit in prescriptive modeling and how to satisfy it. The post Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! appeared first on Towards Data Science. Jarom Hulet Go to original source

July 1, 2025
A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python

A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python Learn Causal Structures and make inferences with Bayesian Methods: Python Tutorial The post A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python appeared first on Towards Data Science. Erdogan Taskesen Go to original source

June 17, 2025
Exploring the Proportional Odds Model for Ordinal Logistic Regression

Exploring the Proportional Odds Model for Ordinal Logistic Regression Understanding and Implementing Brant’s Tests in Ordinal Logistic Regression with Python The post Exploring the Proportional Odds Model for Ordinal Logistic Regression appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

June 12, 2025
10,000x Faster Bayesian Inference: Multi-GPU SVI vs. Traditional MCMC

10,000x Faster Bayesian Inference: Multi-GPU SVI vs. Traditional MCMC Using GPU acceleration to speed up Bayesian Inference from months to minutes… The post 10,000x Faster Bayesian Inference: Multi-GPU SVI vs. Traditional MCMC appeared first on Towards Data Science. Derek Tran Go to original source

June 11, 2025
Applications of Density Estimation to Legal Theory

Applications of Density Estimation to Legal Theory A brief analysis using density estimation to compare the two-verdict and three-verdict systems. The post Applications of Density Estimation to Legal Theory appeared first on Towards Data Science. Jimin Kang Go to original source

June 11, 2025
The Role of Luck in Sports: Can We Measure It?

The Role of Luck in Sports: Can We Measure It? From last-minute goals to coin tosses: How much does randomness influence the outcomes of games? The post The Role of Luck in Sports: Can We Measure It? appeared first on Towards Data Science. Pol Marin Go to original source

June 7, 2025
Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is

Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is Monitoring is easy; what to monitor is not. In the field of machine learning, data drift is just noise until you know what it means. The post Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is appeared first on Towards Data Science.…

June 4, 2025
Estimating Product-Level Price Elasticities Using Hierarchical Bayesian

Estimating Product-Level Price Elasticities Using Hierarchical Bayesian Using one model to personalize ML results The post Estimating Product-Level Price Elasticities Using Hierarchical Bayesian appeared first on Towards Data Science. Derek Tran Go to original source

May 24, 2025
What Statistics Can Tell Us About NBA Coaches

What Statistics Can Tell Us About NBA Coaches Using Python to determine where NBA coaches come from and what makes them successful The post What Statistics Can Tell Us About NBA Coaches appeared first on Towards Data Science. Brayden Gerrard Go to original source

May 23, 2025
How to Learn the Math Needed for Machine Learning

How to Learn the Math Needed for Machine Learning Maths can be a scary topic for people. Many of you want to work in machine learning, but the maths skills needed may seem overwhelming. I am here to tell you that it’s nowhere as intimidating as you may think and to give you a roadmap, resources,…

May 16, 2025
🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem

🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem The Monty Hall Problem is a well-known brain teaser from which we can learn important lessons in Decision Making that are useful in general and in particular for data scientists. If you are not familiar with this problem, prepare to be perplexed . If you…

May 16, 2025
Strength in Numbers: Ensembling Models with Bagging and Boosting

Strength in Numbers: Ensembling Models with Bagging and Boosting Bagging and boosting are two powerful ensemble techniques in machine learning – they are must-knows for data scientists! After reading this article, you are going to have a solid understanding of how bagging and boosting work and when to use them. We’ll cover the following topics,…

May 15, 2025
Survival Analysis When No One Dies: A Value-Based Approach

Survival Analysis When No One Dies: A Value-Based Approach Survival Analysis is a statistical approach used to answer the question: “How long will something last?” That “something” could range from a patient’s lifespan to the durability of a machine component or the duration of a user’s subscription. One of the most widely used tools in…

May 14, 2025
Non-Parametric Density Estimation: Theory and Applications

Non-Parametric Density Estimation: Theory and Applications In this article, we’ll talk about what Density Estimation is and the role it plays in statistical analysis. We’ll analyze two popular density estimation methods, histograms and kernel density estimators, and analyze their theoretical properties as well as how they perform in practice. Finally, we’ll look at how density…

May 14, 2025
Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis

Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis Although normal distributions are the most commonly used, a lot of real-world data unfortunately is not normal. When faced with extremely skewed data, it’s tempting for us to utilize log transformations to normalize the distribution and stabilize the variance. I…

May 10, 2025
The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics

The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics This is a follow-up to my earlier article: The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines. My first article focused on how visualizations can be used to mislead, diving into a form of data presentation widely used in public matters. In this article,…

May 9, 2025
Beyond Glorified Curve Fitting: Exploring the Probabilistic Foundations of Machine Learning

Beyond Glorified Curve Fitting: Exploring the Probabilistic Foundations of Machine Learning You see a math formula you don’t immediately understand. Your instinct? Stop reading. Don’t. That’s exactly what I told myself when I started reading Probabilistic Machine Learning – An Introduction by Kevin P. Murphy. And it was absolutely worth it. It changed how I…

May 1, 2025
When Predictors Collide: Mastering VIF in Multicollinear Regression

When Predictors Collide: Mastering VIF in Multicollinear Regression In regression models, the independent variables must be not or only slightly dependent on each other, i.e. that they are not correlated. However, if such a dependency exists, this is referred to as Multicollinearity and leads to unstable models and results that are difficult to interpret. The…

April 17, 2025
Are You Sure Your Posterior Makes Sense?

Are You Sure Your Posterior Makes Sense? This article is co-authored by Felipe Bandeira, Giselle Fretta, Thu Than, and Elbion Redenica. We also thank Prof. Carl Scheffler for his support. Introduction Parameter estimation has been for decades one of the most important topics in statistics. While frequentist approaches, such as Maximum Likelihood Estimations, used to…

April 12, 2025
How to Measure Real Model Accuracy When Labels Are Noisy

How to Measure Real Model Accuracy When Labels Are Noisy Ground truth is never perfect. From scientific measurements to human annotations used to train deep learning models, ground truth always has some amount of errors. ImageNet, arguably the most well-curated image dataset has 0.3% errors in human annotations. Then, how can we evaluate predictive models…

April 11, 2025
Unlock the Power of ROC Curves: Intuitive Insights for Better Model Evaluation

Unlock the Power of ROC Curves: Intuitive Insights for Better Model Evaluation We’ve all been in that moment, right? Staring at a chart as if it’s some ancient script, wondering how we’re supposed to make sense of it all. That’s exactly how I felt when I was asked to explain the AUC for the ROC…

April 9, 2025
Linear Programming: Managing Multiple Targets with Goal Programming

Linear Programming: Managing Multiple Targets with Goal Programming This is the sixth (and likely last) part of a Linear Programming series I’ve been writing. With the core concepts covered by the prior articles, this article focuses on goal programming which is a less frequent linear programming (LP) use case. Goal programming is a specific linear…

April 4, 2025
Uncertainty Quantification in Machine Learning with an Easy Python Interface

Uncertainty Quantification in Machine Learning with an Easy Python Interface Uncertainty quantification (UQ) in a Machine Learning (ML) model allows one to estimate the precision of its predictions. This is extremely important for utilizing its predictions in real-world tasks. For instance, if a machine learning model is trained to predict a property of a material,…

March 27, 2025
When You Just Can’t Decide on a Single Action

When You Just Can’t Decide on a Single Action In Game Theory, the players typically have to make assumptions about the other players’ actions. What will the other player do? Will he use rock, paper or scissors? You never know, but in some cases, you might have an idea of the probability of some actions…

March 8, 2025
How to Spot and Prevent Model Drift Before it Impacts Your Business

How to Spot and Prevent Model Drift Before it Impacts Your Business Despite the AI hype, many tech companies still rely heavily on machine learning to power critical applications, from personalized recommendations to fraud detection. I’ve seen firsthand how undetected drifts can result in significant costs — missed fraud detection, lost revenue, and suboptimal business…

March 7, 2025
One-Tailed Vs. Two-Tailed Tests

One-Tailed Vs. Two-Tailed Tests Introduction If you’ve ever analyzed data using built-in t-test functions, such as those in R or SciPy, here’s a question for you: have you ever adjusted the default setting for the alternative hypothesis? If your answer is no—or if you’re not even sure what this means—then this blog post is for…

March 6, 2025
I Won’t Change Unless You Do

I Won’t Change Unless You Do In Game Theory, how can players ever come to an end if there still might be a better option to decide for? Maybe one player still wants to change their decision. But if they do, maybe the other player wants to change too. How can they ever hope to…

March 1, 2025
The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines

The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines “You don’t have to be an expert to deceive someone, though you might need some expertise to reliably recognize when you are being deceived.” When my co-instructor and I start our quarterly lesson on deceptive visualizations for the data visualization course we teach at the University…

February 27, 2025
Do European M&Ms Actually Taste Better than American M&Ms?

Do European M&Ms Actually Taste Better than American M&Ms? (Oh, I am the only one who’s been asking this question…? Hm. Well, if you have a minute, please enjoy this exploratory Data Analysis — featuring experimental design, statistics, and interactive visualization — applied a bit too earnestly to resolve an international debate.) 1. Introduction 1.1…

February 22, 2025
Talking about Games

Talking about Games Game theory is a field of research that is quite prominent in Economics but rather unpopular in other scientific disciplines. However, the concepts used in game theory can be of interest to a wider audience, including data scientists, statisticians, computer scientists or psychologists, to name just a few. This article is the…

February 22, 2025
Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics

Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics [ The article was written by Guanao Yan, Ph.D. student of Statistics and Data Science at UCLA. Guanao is the first author of the Nature Communications review article [1]. Spatially resolved transcriptomics (SRT) is revolutionizing Genomics by enabling the high-throughput measurement of gene expression while…

February 21, 2025
Honestly Uncertain

Honestly Uncertain Ethical issues aside, should you be honest when asked how certain you are about some belief? Of course, it depends. In this blog post, you’ll learn on what. Different ways of evaluating probabilistic predictions come with dramatically different degrees of “optimal honesty”. Perhaps surprisingly, the linear function that assigns +1 to true and fully…

February 19, 2025
Method of Moments Estimation with Python Code

Method of Moments Estimation with Python Code Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two, … etc., calls per…

February 13, 2025
The Gamma Hurdle Distribution

The Gamma Hurdle Distribution Which Outcome Matters? Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be…

February 8, 2025
The Method of Moments Estimator for Gaussian Mixture Models

The Method of Moments Estimator for Gaussian Mixture Models Audio Processing is one of the most important application domains of digital signal processing (DSP) and machine learning. Modeling acoustic environments is an essential step in developing digital audio processing systems such as: speech recognition, speech enhancement, acoustic echo cancellation, etc. Acoustic environments are filled with background…

February 8, 2025
How Likely Is a Six Nations Grand Slam in 2025?

How Likely Is a Six Nations Grand Slam in 2025? Quantifying uncertainty in sports fixtures Photo by Thomas Serer on Unsplash Introduction For rugby fans the long wait is nearly over, like Christmas the Six Nations comes once a year to lift our spirits in the cold winter months. If you’re not very familiar with rugby, the…

February 1, 2025
Navigating Data Science Content: Recognizing Common Pitfalls, Part 1

Navigating Data Science Content: Recognizing Common Pitfalls, Part 1 Uncovering and correcting misconceptions in online data science content to help you learn more effectively Continue reading on Towards Data Science » Geremie Yeo Go to original source

January 31, 2025
NLP Illustrated, Part 3: Word2Vec

NLP Illustrated, Part 3: Word2Vec An exhaustive and illustrated guide to Word2Vec with code! Continue reading on Towards Data Science » Shreya Rao Go to original source

January 30, 2025
Basics of Probability Notations

Basics of Probability Notations Union, Intersection, Independence, Disjoint, Complement: Advanced Probability for Data Science Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

January 29, 2025
Who is Right? The Dean or the Students?

Who is Right? The Dean or the Students? A cautionary tale on two perspectives on averaging Continue reading on Towards Data Science » Paolo Molignini, PhD Go to original source

January 29, 2025
Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus

Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus Why descriptive statistics aren’t enough and plotting your data is always essential Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source

January 28, 2025
Does It Matter That Online Experiments Interact?

Does It Matter That Online Experiments Interact? What interactions do, why they are just like any other change in the environment post-experiment, and some reassurance Photo by Uriel Soberanes on Unsplash Experiments do not run one at a time. At any moment, hundreds to thousands of experiments run on a mature website. The question comes up:…

January 25, 2025
Basics of GANs & SMOTE for Data Augmentation

Basics of GANs & SMOTE for Data Augmentation GANs and SMOTE Explained with Bartending: Data Science for Machine Learning Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

January 16, 2025
Water Cooler Small Talk: Benford’s Law

Water Cooler Small Talk: Benford’s Law A look into the strange first digit distribution of naturally occurring datasets Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source

January 16, 2025
Scale Experiment Decision-Making with Programmatic Decision Rules

Scale Experiment Decision-Making with Programmatic Decision Rules Decide what to do with experiment results in code Photo by Cytonn Photography on Unsplash The experiment lifecycle is like the human lifecycle. First, a person or idea is born, then it develops, then it is tested, then its test ends, and then the Gods (or Product Managers) decide its worth.…

January 15, 2025
Four Ways to Improve Statistical Power in A/B Testing (Without Increasing Test Duration, Duh)

Four Ways to Improve Statistical Power in A/B Testing (Without Increasing Test Duration, Duh) In A/B testing, you often have to balance statistical power and how long the test takes. Learn how Allocation, Effect Size, CUPED & Binarization can help you. Image by author In A/B testing, you often have to balance statistical power and how long…

January 14, 2025
Bayesian A/B Testing Falls Short

Bayesian A/B Testing Falls Short Why Bayesian A/B testing can lead to misunderstandings, inflated false positive rates, introduce bias and complicate results (Image generated by the author using Midjourney) Over the past decade, I’ve engaged in countless discussions about Bayesian A/B testing versus Frequentist A/B testing. In nearly every conversation, I’ve maintained the same viewpoint:…

January 9, 2025
Method of Moments Estimation with Python Code

Method of Moments Estimation with Python Code How to understand and implement the estimator from scratch Photo by Petr Macháček on Unsplash Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question:…

January 9, 2025
In Defense of Statistical Significance

In Defense of Statistical Significance We have to draw the line somewhere Photo by Siora Photography on Unsplash It’s become something of a meme that statistical significance is a bad standard. Several recent blogs have made the rounds, making the case that statistical significance is a “cult” or “arbitrary.” If you’d like a classic polemic (and…

January 7, 2025
How to Tell Among Two Regression Models with Statistical Significance

How to Tell Among Two Regression Models with Statistical Significance Diving into the F-test for nested models with algorithms, examples and code Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

January 4, 2025
Chi-Squared Test: Comparing Variations Through Soccer

Chi-Squared Test: Comparing Variations Through Soccer Understanding Different Types of Chi-Squared Tests: A/B Testing for Data Science Series (8) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

January 4, 2025
How to Stand Out in The Data Science Job Market

How to Stand Out in The Data Science Job Market How to have the edge in your data science application Continue reading on Towards Data Science » Egor Howell Go to original source

January 3, 2025