Category: statistics

  • How to Define the Modeling Scope of an Internal Credit Risk Model

    How to Define the Modeling Scope of an Internal Credit Risk Model Dataset construction for Internal Ratings-Based (IRB) Probability of Default (PD) models The post How to Define the Modeling Scope of an Internal Credit Risk Model appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

  • Understanding the Chi-Square Test Beyond the Formula

    Understanding the Chi-Square Test Beyond the Formula How categorical data becomes statistical evidence. The post Understanding the Chi-Square Test Beyond the Formula appeared first on Towards Data Science. Nikhil Dasari Go to original source

  • Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning

    Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning Estimating neighborhood-level pedestrian risk from real-world incident data The post Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning appeared first on Towards Data Science. Aneesh Patil Go to original source

  • Causal ML for the Aspiring Data Scientist

    Causal ML for the Aspiring Data Scientist An accessible introduction to causal inference and ML The post Causal ML for the Aspiring Data Scientist appeared first on Towards Data Science. Ross Lauterbach Go to original source

  • From Transactions to Trends: Predict When a Customer Is About to Stop Buying

    From Transactions to Trends: Predict When a Customer Is About to Stop Buying Customer churn is usually a gradual process, not a sudden event. In this post, we analyze monthly transaction trends and convert regression slopes into degrees to clearly identify declining purchase behavior. A small negative slope today can prevent a big revenue loss…

  • Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data

    Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data Google Trends is one of the most widely used tools for analysing human behaviour at scale. Journalists use it. Data scientists use it. Entire papers are built on it. But there is a fundamental property of Google Trends data that makes…

  • A Case for the T-statistic

    A Case for the T-statistic And how it compares to the run-of-the-mill z-score The post A Case for the T-statistic appeared first on Towards Data Science. Aniruddha Karajgi Go to original source

  • The Machine Learning “Advent Calendar” Bonus 1: AUC in Excel

    The Machine Learning “Advent Calendar” Bonus 1: AUC in Excel AUC measures how well a model ranks positives above negatives, independent of any chosen threshold. The post The Machine Learning “Advent Calendar” Bonus 1: AUC in Excel appeared first on Towards Data Science. angela shi Go to original source

  • Keeping Probabilities Honest: The Jacobian Adjustment

    Keeping Probabilities Honest: The Jacobian Adjustment An intuitive explanation of transforming random variables correctly. The post Keeping Probabilities Honest: The Jacobian Adjustment appeared first on Towards Data Science. Aniruddha Karajgi Go to original source

  • Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction

    Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction Multiple hypothesis testing, P-values, and Monte Carlo The post Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction appeared first on Towards Data Science. Marco Hening Tallarico Go to original source

  • The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel

    The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel From Random Ensembles to Optimization: Gradient Boosting Explained The post The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel appeared first on Towards Data Science. angela shi Go to original source

  • The Machine Learning “Advent Calendar” Day 19: Bagging in Excel

    The Machine Learning “Advent Calendar” Day 19: Bagging in Excel Understanding ensemble learning from first principles in Excel The post The Machine Learning “Advent Calendar” Day 19: Bagging in Excel appeared first on Towards Data Science. angela shi Go to original source

  • Geospatial exploratory data analysis with GeoPandas and DuckDB

    Geospatial exploratory data analysis with GeoPandas and DuckDB In this article, I’ll show you how to use two popular Python libraries to carry out some geospatial analysis of traffic accident data within the UK. I was a relatively early adopter of DuckDB, the fast OLAP database, after it became available, but only recently realised that, through…

  • The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel

    The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel Softmax Regression is simply Logistic Regression extended to multiple classes. By computing one linear score per class and normalizing them with Softmax, we obtain multiclass probabilities without changing the core logic. The loss, the gradients, and the optimization remain the same. Only the number…

  • The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel

    The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel In this article, we rebuild Logistic Regression step by step directly in Excel. Starting from a binary dataset, we explore why linear regression struggles as a classifier, how the logistic function fixes these issues, and how log-loss naturally appears from the likelihood. With a…

  • The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel

    The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel Isolation Forest may look technical, but its idea is simple: isolate points using random splits. If a point is isolated quickly, it is an anomaly; if it takes many splits, it is normal. Using the tiny dataset 1, 2, 3, 9, we can see…

  • The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall

    The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall A modification to the Boruta algorithm that dramatically reduces computation while maintaining high sensitivity The post The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall appeared first on Towards Data Science. Nicolas Vana Go to original source

  • Metric Deception: When Your Best KPIs Hide Your Worst Failures

    Metric Deception: When Your Best KPIs Hide Your Worst Failures The most dangerous KPIs aren’t broken; they’re the ones trusted long after they’ve lost their meaning. The post Metric Deception: When Your Best KPIs Hide Your Worst Failures appeared first on Towards Data Science. Shafeeq Ur Rahaman Go to original source

  • The Absolute Beginner’s Guide to Pandas DataFrames

    The Absolute Beginner’s Guide to Pandas DataFrames Learn how to initialize dataframes from dictionaries, lists, and NumPy arrays The post The Absolute Beginner’s Guide to Pandas DataFrames appeared first on Towards Data Science. Ibrahim Salami Go to original source

  • Spearman Correlation Coefficient for When Pearson Isn’t Enough

    Spearman Correlation Coefficient for When Pearson Isn’t Enough Not all relationships are linear, and that is where Spearman comes in. The post Spearman Correlation Coefficient for When Pearson Isn’t Enough appeared first on Towards Data Science. Nikhil Dasari Go to original source

  • Power Analysis in Marketing: A Hands-On Introduction

    Power Analysis in Marketing: A Hands-On Introduction Part 1: What is statistical power and how do we compute it? The post Power Analysis in Marketing: A Hands-On Introduction appeared first on Towards Data Science. Sam Arrington Go to original source

  • Evaluating Synthetic Data — The Million Dollar Question

    Evaluating Synthetic Data — The Million Dollar Question Learn how to evaluate synthetic data quality using the Maximum Similarity Test — a simple, quantitative approach for assessing fidelity, utility, and privacy in synthetic datasets. The post Evaluating Synthetic Data — The Million Dollar Question appeared first on Towards Data Science. Andrew Skabar Go to original…

  • Expected Value Analysis in AI Product Management

    Expected Value Analysis in AI Product Management An introduction to key concepts and practical applications The post Expected Value Analysis in AI Product Management appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

  • What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later

    What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later Here’s why it happens — and how to fix it The post What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later appeared first on Towards Data Science. Javier Marin Go to original source

  • The Pearson Correlation Coefficient, Explained Simply

    The Pearson Correlation Coefficient, Explained Simply A simple explanation of the Pearson correlation coefficient with examples The post The Pearson Correlation Coefficient, Explained Simply appeared first on Towards Data Science. Nikhil Dasari Go to original source

  • Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood)

    Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood) Can I use NumPy to figure out how my habits affect my mood and productivity? The post Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood) appeared first on Towards Data Science. Ibrahim Salami Go to original source

  • Building a Monitoring System That Actually Works

    Building a Monitoring System That Actually Works A step-by-step guide to catching real anomalies without drowning in false alerts The post Building a Monitoring System That Actually Works appeared first on Towards Data Science. Mariya Mansurova Go to original source

  • The Power of Framework Dimensions: What Data Scientists Should Know

    The Power of Framework Dimensions: What Data Scientists Should Know Practical guidance and a case study The post The Power of Framework Dimensions: What Data Scientists Should Know appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

  • Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know

    Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know I’ve been learning data analytics for a year now. So far, I can consider myself confident in SQL and Power BI. The transition to Python has been quite exciting. I’ve been exposed to some neat and smarter approaches to data analysis. After brushing up…

  • Statistical Method mcRigor Enhances the Rigor of Metacell Partitioning in Single-Cell Data Analysis

    Statistical Method mcRigor Enhances the Rigor of Metacell Partitioning in Single-Cell Data Analysis mcRigor detects dubious metacells within each metacell partition and selects the optimal metacell partitioning method and hyperparameter for a given dataset The post Statistical Method mcRigor Enhances the Rigor of Metacell Partitioning in Single-Cell Data Analysis appeared first on Towards Data Science.…

  • What Makes a Language Look Like Itself?

    What Makes a Language Look Like Itself? How simple statistics reveal the visual fingerprints of 20 languages The post What Makes a Language Look Like Itself? appeared first on Towards Data Science. Kenneth McCarthy Go to original source

  • Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply

    Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply Understanding Gini and Lorenz curves for smarter model evaluation The post Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply appeared first on Towards Data Science. Nikhil Dasari Go to original source

  • The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling

    The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling Understanding how banks use the KS statistic in loan approvals. The post The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling appeared first on Towards Data Science. Nikhil Dasari Go to original source

  • The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI

    The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI Is it possible to build a perfect induction machine? The post The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI appeared first on Towards Data Science. Angjelin Hila Go to original source

  • Why Your A/B Test Winner Might Just Be Random Noise

    Why Your A/B Test Winner Might Just Be Random Noise What a coach’s warm-up trial can teach us about running better experiments The post Why Your A/B Test Winner Might Just Be Random Noise appeared first on Towards Data Science. Pol Marin Go to original source

  • How to Become a Machine Learning Engineer (Step-by-Step)

    How to Become a Machine Learning Engineer (Step-by-Step) Your one-stop guide to becoming a machine learning engineer The post How to Become a Machine Learning Engineer (Step-by-Step) appeared first on Towards Data Science. Egor Howell Go to original source

  • Is Your Training Data Representative? A Guide to Checking with PSI in Python

    Is Your Training Data Representative? A Guide to Checking with PSI in Python Comparing Variable Distributions Between Two Datasets Using Population Stability Index (PSI) and Cramér’s V. The post Is Your Training Data Representative? A Guide to Checking with PSI in Python appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

  • When A Difference Actually Makes A Difference

    When A Difference Actually Makes A Difference Bite-Sized Analytics for Business Decision-Makers (1) The post When A Difference Actually Makes A Difference appeared first on Towards Data Science. Mena Wang Go to original source

  • Hands On Time Series Modeling of Rare Events, with Python

    Hands On Time Series Modeling of Rare Events, with Python This is how to model rare events occurrences in a time series in a few lines of code The post Hands On Time Series Modeling of Rare Events, with Python appeared first on Towards Data Science. Piero Paialunga Go to original source

  • Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2

    Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2 The Ornstein-Uhlenbeck process in Python The post Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2 appeared first on Towards Data Science. Marco Hening Tallarico Go to original source

  • How to Benchmark Classical Machine Learning Workloads on Google Cloud

    How to Benchmark Classical Machine Learning Workloads on Google Cloud Harnessing CPUs for Practical, Cost-Effective Machine Learning The post How to Benchmark Classical Machine Learning Workloads on Google Cloud appeared first on Towards Data Science. Ehssan Khan Go to original source

  • Cracking the Density Code: Why MAF Flows Where KDE Stalls

    Cracking the Density Code: Why MAF Flows Where KDE Stalls Learn why autoregressive flows are the superior density estimation tool for high-dimensional data The post Cracking the Density Code: Why MAF Flows Where KDE Stalls appeared first on Towards Data Science. Zackary Nay Go to original source

  • Help Your Model Learn the True Signal

    Help Your Model Learn the True Signal An algorithm-agnostic approach inspired by Cook’s distance The post Help Your Model Learn the True Signal appeared first on Towards Data Science. Mena Wang Go to original source

  • LangGraph + SciPy: Building an AI That Reads Documentation and Makes Decisions

    LangGraph + SciPy: Building an AI That Reads Documentation and Makes Decisions Stop guessing your statistical test. Let this AI do it for you. The post LangGraph + SciPy: Building an AI That Reads Documentation and Makes Decisions appeared first on Towards Data Science. Gustavo Santos Go to original source

  • Mastering NLP with spaCy – Part 2

    Mastering NLP with spaCy – Part 2 POS tagging, dependency parser and named entity recognition. The post Mastering NLP with spaCy – Part 2 appeared first on Towards Data Science. Marcello Politi Go to original source

  • A Well-Designed Experiment Can Teach You More Than a Time Machine!

    A Well-Designed Experiment Can Teach You More Than a Time Machine! How experimentation is more powerful than knowing counterfactuals The post A Well-Designed Experiment Can Teach You More Than a Time Machine! appeared first on Towards Data Science. Jarom Hulet Go to original source

  • The Hidden Trap of Fixed and Random Effects

    The Hidden Trap of Fixed and Random Effects My lesson of how blindly over-controlling for noise can erase the effects you are measuring The post The Hidden Trap of Fixed and Random Effects appeared first on Towards Data Science. Ngoc Doan Go to original source

  • Estimating Disease Rates Without Diagnosis

    Estimating Disease Rates Without Diagnosis Immune genes as predictors of disease The post Estimating Disease Rates Without Diagnosis appeared first on Towards Data Science. David Wells Go to original source

  • Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not!

    Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! An explanation of the causal assumption implicit in prescriptive modeling and how to satisfy it. The post Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! appeared first on Towards Data Science. Jarom Hulet Go to original source

  • A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python

    A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python Learn Causal Structures and make inferences with Bayesian Methods: Python Tutorial The post A Practical Starters’ Guide to Causal Structure Learning with Bayesian Methods in Python appeared first on Towards Data Science. Erdogan Taskesen Go to original source

  • Exploring the Proportional Odds Model for Ordinal Logistic Regression

    Exploring the Proportional Odds Model for Ordinal Logistic Regression Understanding and Implementing Brant’s Tests in Ordinal Logistic Regression with Python The post Exploring the Proportional Odds Model for Ordinal Logistic Regression appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

  • 10,000x Faster Bayesian Inference: Multi-GPU SVI vs. Traditional MCMC

    10,000x Faster Bayesian Inference: Multi-GPU SVI vs. Traditional MCMC Using GPU acceleration to speed up Bayesian Inference from months to minutes… The post 10,000x Faster Bayesian Inference: Multi-GPU SVI vs. Traditional MCMC appeared first on Towards Data Science. Derek Tran Go to original source

  • Applications of Density Estimation to Legal Theory

    Applications of Density Estimation to Legal Theory A brief analysis using density estimation to compare the two-verdict and three-verdict systems. The post Applications of Density Estimation to Legal Theory appeared first on Towards Data Science. Jimin Kang Go to original source

  • The Role of Luck in Sports: Can We Measure It?

    The Role of Luck in Sports: Can We Measure It? From last-minute goals to coin tosses: How much does randomness influence the outcomes of games? The post The Role of Luck in Sports: Can We Measure It? appeared first on Towards Data Science. Pol Marin Go to original source

  • Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is

    Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is Monitoring is easy; what to monitor is not. In the field of machine learning, data drift is just noise until you know what it means. The post Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is appeared first on Towards Data Science.…

  • Estimating Product-Level Price Elasticities Using Hierarchical Bayesian

    Estimating Product-Level Price Elasticities Using Hierarchical Bayesian Using one model to personalize ML results The post Estimating Product-Level Price Elasticities Using Hierarchical Bayesian appeared first on Towards Data Science. Derek Tran Go to original source

  • What Statistics Can Tell Us About NBA Coaches

    What Statistics Can Tell Us About NBA Coaches Using Python to determine where NBA coaches come from and what makes them successful The post What Statistics Can Tell Us About NBA Coaches appeared first on Towards Data Science. Brayden Gerrard Go to original source

  • How to Learn the Math Needed for Machine Learning

    How to Learn the Math Needed for Machine Learning Maths can be a scary topic for people. Many of you want to work in machine learning, but the maths skills needed may seem overwhelming. I am here to tell you that it’s nowhere as intimidating as you may think and to give you a roadmap, resources,…

  • 🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem

    🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem The Monty Hall Problem is a well-known brain teaser from which we can learn important lessons in Decision Making that are useful in general and in particular for data scientists. If you are not familiar with this problem, prepare to be perplexed . If you…

  • Strength in Numbers: Ensembling Models with Bagging and Boosting

    Strength in Numbers: Ensembling Models with Bagging and Boosting Bagging and boosting are two powerful ensemble techniques in machine learning – they are must-knows for data scientists! After reading this article, you are going to have a solid understanding of how bagging and boosting work and when to use them. We’ll cover the following topics,…

  • Survival Analysis When No One Dies: A Value-Based Approach

    Survival Analysis When No One Dies: A Value-Based Approach Survival Analysis is a statistical approach used to answer the question: “How long will something last?” That “something” could range from a patient’s lifespan to the durability of a machine component or the duration of a user’s subscription. One of the most widely used tools in…

  • Non-Parametric Density Estimation: Theory and Applications

    Non-Parametric Density Estimation: Theory and Applications In this article, we’ll talk about what Density Estimation is and the role it plays in statistical analysis. We’ll analyze two popular density estimation methods, histograms and kernel density estimators, and analyze their theoretical properties as well as how they perform in practice. Finally, we’ll look at how density…

  • Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis

    Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis Although normal distributions are the most commonly used, a lot of real-world data unfortunately is not normal. When faced with extremely skewed data, it’s tempting for us to utilize log transformations to normalize the distribution and stabilize the variance. I…

  • The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics

    The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics This is a follow-up to my earlier article: The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines. My first article focused on how visualizations can be used to mislead, diving into a form of data presentation widely used in public matters. In this article,…

  • Beyond Glorified Curve Fitting: Exploring the Probabilistic Foundations of Machine Learning

    Beyond Glorified Curve Fitting: Exploring the Probabilistic Foundations of Machine Learning You see a math formula you don’t immediately understand. Your instinct? Stop reading. Don’t. That’s exactly what I told myself when I started reading Probabilistic Machine Learning – An Introduction by Kevin P. Murphy. And it was absolutely worth it. It changed how I…

  • When Predictors Collide: Mastering VIF in Multicollinear Regression

    When Predictors Collide: Mastering VIF in Multicollinear Regression In regression models, the independent variables must be not or only slightly dependent on each other, i.e. that they are not correlated. However, if such a dependency exists, this is referred to as Multicollinearity and leads to unstable models and results that are difficult to interpret. The…

  • Are You Sure Your Posterior Makes Sense?

    Are You Sure Your Posterior Makes Sense? This article is co-authored by Felipe Bandeira, Giselle Fretta, Thu Than, and Elbion Redenica. We also thank Prof. Carl Scheffler for his support. Introduction Parameter estimation has been for decades one of the most important topics in statistics. While frequentist approaches, such as Maximum Likelihood Estimations, used to…

  • How to Measure Real Model Accuracy When Labels Are Noisy

    How to Measure Real Model Accuracy When Labels Are Noisy Ground truth is never perfect. From scientific measurements to human annotations used to train deep learning models, ground truth always has some amount of errors. ImageNet, arguably the most well-curated image dataset has 0.3% errors in human annotations. Then, how can we evaluate predictive models…

  • Unlock the Power of ROC Curves: Intuitive Insights for Better Model Evaluation

    Unlock the Power of ROC Curves: Intuitive Insights for Better Model Evaluation We’ve all been in that moment, right? Staring at a chart as if it’s some ancient script, wondering how we’re supposed to make sense of it all. That’s exactly how I felt when I was asked to explain the AUC for the ROC…

  • Linear Programming: Managing Multiple Targets with Goal Programming

    Linear Programming: Managing Multiple Targets with Goal Programming This is the sixth (and likely last) part of a Linear Programming series I’ve been writing. With the core concepts covered by the prior articles, this article focuses on goal programming which is a less frequent linear programming (LP) use case. Goal programming is a specific linear…

  • Uncertainty Quantification in Machine Learning with an Easy Python Interface

    Uncertainty Quantification in Machine Learning with an Easy Python Interface Uncertainty quantification (UQ) in a Machine Learning (ML) model allows one to estimate the precision of its predictions. This is extremely important for utilizing its predictions in real-world tasks. For instance, if a machine learning model is trained to predict a property of a material,…

  • When You Just Can’t Decide on a Single Action

    When You Just Can’t Decide on a Single Action In Game Theory, the players typically have to make assumptions about the other players’ actions. What will the other player do? Will he use rock, paper or scissors? You never know, but in some cases, you might have an idea of the probability of some actions…

  • How to Spot and Prevent Model Drift Before it Impacts Your Business

    How to Spot and Prevent Model Drift Before it Impacts Your Business Despite the AI hype, many tech companies still rely heavily on machine learning to power critical applications, from personalized recommendations to fraud detection.  I’ve seen firsthand how undetected drifts can result in significant costs — missed fraud detection, lost revenue, and suboptimal business…

  • One-Tailed Vs. Two-Tailed Tests

    One-Tailed Vs. Two-Tailed Tests Introduction If you’ve ever analyzed data using built-in t-test functions, such as those in R or SciPy, here’s a question for you: have you ever adjusted the default setting for the alternative hypothesis? If your answer is no—or if you’re not even sure what this means—then this blog post is for…

  • I Won’t Change Unless You Do

    I Won’t Change Unless You Do In Game Theory, how can players ever come to an end if there still might be a better option to decide for? Maybe one player still wants to change their decision. But if they do, maybe the other player wants to change too. How can they ever hope to…

  • The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines

    The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines “You don’t have to be an expert to deceive someone, though you might need some expertise to reliably recognize when you are being deceived.” When my co-instructor and I start our quarterly lesson on deceptive visualizations for the data visualization course we teach at the University…

  • Do European M&Ms Actually Taste Better than American M&Ms?

    Do European M&Ms Actually Taste Better than American M&Ms? (Oh, I am the only one who’s been asking this question…? Hm. Well, if you have a minute, please enjoy this exploratory Data Analysis — featuring experimental design, statistics, and interactive visualization — applied a bit too earnestly to resolve an international debate.) 1. Introduction 1.1…

  • Talking about Games

    Talking about Games Game theory is a field of research that is quite prominent in Economics but rather unpopular in other scientific disciplines. However, the concepts used in game theory can be of interest to a wider audience, including data scientists, statisticians, computer scientists or psychologists, to name just a few. This article is the…

  • Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics

    Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics [ The article was written by Guanao Yan, Ph.D. student of Statistics and Data Science at UCLA. Guanao is the first author of the Nature Communications review article [1]. Spatially resolved transcriptomics (SRT) is revolutionizing Genomics by enabling the high-throughput measurement of gene expression while…

  • Honestly Uncertain

    Honestly Uncertain Ethical issues aside, should you be honest when asked how certain you are about some belief? Of course, it depends. In this blog post, you’ll learn on what. Different ways of evaluating probabilistic predictions come with dramatically different degrees of “optimal honesty”. Perhaps surprisingly, the linear function that assigns +1 to true and fully…

  • Method of Moments Estimation with Python Code

    Method of Moments Estimation with Python Code Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two, … etc., calls per…

  • The Gamma Hurdle Distribution

    The Gamma Hurdle Distribution Which Outcome Matters? Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be…

  • The Method of Moments Estimator for Gaussian Mixture Models

    The Method of Moments Estimator for Gaussian Mixture Models Audio Processing is one of the most important application domains of digital signal processing (DSP) and machine learning. Modeling acoustic environments is an essential step in developing digital audio processing systems such as: speech recognition, speech enhancement, acoustic echo cancellation, etc. Acoustic environments are filled with background…

  • How Likely Is a Six Nations Grand Slam in 2025?

    How Likely Is a Six Nations Grand Slam in 2025? Quantifying uncertainty in sports fixtures Photo by Thomas Serer on Unsplash Introduction For rugby fans the long wait is nearly over, like Christmas the Six Nations comes once a year to lift our spirits in the cold winter months. If you’re not very familiar with rugby, the…

  • Navigating Data Science Content: Recognizing Common Pitfalls, Part 1

    Navigating Data Science Content: Recognizing Common Pitfalls, Part 1 Uncovering and correcting misconceptions in online data science content to help you learn more effectively Continue reading on Towards Data Science » Geremie Yeo Go to original source

  • NLP Illustrated, Part 3: Word2Vec

    NLP Illustrated, Part 3: Word2Vec An exhaustive and illustrated guide to Word2Vec with code! Continue reading on Towards Data Science » Shreya Rao Go to original source

  • Basics of Probability Notations

    Basics of Probability Notations Union, Intersection, Independence, Disjoint, Complement: Advanced Probability for Data Science Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • Who is Right? The Dean or the Students?

    Who is Right? The Dean or the Students? A cautionary tale on two perspectives on averaging Continue reading on Towards Data Science » Paolo Molignini, PhD Go to original source

  • Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus

    Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus Why descriptive statistics aren’t enough and plotting your data is always essential Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source

  • Does It Matter That Online Experiments Interact?

    Does It Matter That Online Experiments Interact? What interactions do, why they are just like any other change in the environment post-experiment, and some reassurance Photo by Uriel Soberanes on Unsplash Experiments do not run one at a time. At any moment, hundreds to thousands of experiments run on a mature website. The question comes up:…

  • Basics of GANs & SMOTE for Data Augmentation

    Basics of GANs & SMOTE for Data Augmentation GANs and SMOTE Explained with Bartending: Data Science for Machine Learning Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • Water Cooler Small Talk: Benford’s Law

    Water Cooler Small Talk: Benford’s Law A look into the strange first digit distribution of naturally occurring datasets Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source

  • Scale Experiment Decision-Making with Programmatic Decision Rules

    Scale Experiment Decision-Making with Programmatic Decision Rules Decide what to do with experiment results in code Photo by Cytonn Photography on Unsplash The experiment lifecycle is like the human lifecycle. First, a person or idea is born, then it develops, then it is tested, then its test ends, and then the Gods (or Product Managers) decide its worth.…

  • Four Ways to Improve Statistical Power in A/B Testing (Without Increasing Test Duration, Duh)

    Four Ways to Improve Statistical Power in A/B Testing (Without Increasing Test Duration, Duh) In A/B testing, you often have to balance statistical power and how long the test takes. Learn how Allocation, Effect Size, CUPED & Binarization can help you. Image by author In A/B testing, you often have to balance statistical power and how long…

  • Bayesian A/B Testing Falls Short

    Bayesian A/B Testing Falls Short Why Bayesian A/B testing can lead to misunderstandings, inflated false positive rates, introduce bias and complicate results (Image generated by the author using Midjourney) Over the past decade, I’ve engaged in countless discussions about Bayesian A/B testing versus Frequentist A/B testing. In nearly every conversation, I’ve maintained the same viewpoint:…

  • Method of Moments Estimation with Python Code

    Method of Moments Estimation with Python Code How to understand and implement the estimator from scratch Photo by Petr Macháček on Unsplash Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question:…

  • In Defense of Statistical Significance

    In Defense of Statistical Significance We have to draw the line somewhere Photo by Siora Photography on Unsplash It’s become something of a meme that statistical significance is a bad standard. Several recent blogs have made the rounds, making the case that statistical significance is a “cult” or “arbitrary.” If you’d like a classic polemic (and…

  • How to Tell Among Two Regression Models with Statistical Significance

    How to Tell Among Two Regression Models with Statistical Significance Diving into the F-test for nested models with algorithms, examples and code Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

  • Chi-Squared Test: Comparing Variations Through Soccer

    Chi-Squared Test: Comparing Variations Through Soccer Understanding Different Types of Chi-Squared Tests: A/B Testing for Data Science Series (8) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • How to Stand Out in The Data Science Job Market

    How to Stand Out in The Data Science Job Market How to have the edge in your data science application Continue reading on Towards Data Science » Egor Howell Go to original source