Tag: when

Synthetic Augmentation in Imbalanced Learning: When It Helps, When It Hurts, and How Much to Add

Synthetic Augmentation in Imbalanced Learning: When It Helps, When It Hurts, and How Much to Add arXiv:2601.16120v1 Announce Type: new Abstract: Imbalanced classification, where one class is observed far less frequently than the other, often causes standard training procedures to prioritize the majority class and perform poorly on rare but important cases. A classic and…

January 23, 2026
When Does Adding Fancy RAG Features Work?

When Does Adding Fancy RAG Features Work? Looking at the performance of different pipelines The post When Does Adding Fancy RAG Features Work? appeared first on Towards Data Science. Ida Silfverskiöld Go to original source

January 13, 2026
When (Not) to Use Vector DB

When (Not) to Use Vector DB When indexing hurts more than it helps: how we realized our RAG use case needed a key-value store, not a vector database The post When (Not) to Use Vector DB appeared first on Towards Data Science. Uri Peled Go to original source

December 17, 2025
What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later

What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later Here’s why it happens — and how to fix it The post What to Do When Your Credit Risk Model Works Today, but Breaks Six Months Later appeared first on Towards Data Science. Javier Marin Go to original source

November 5, 2025
When A Difference Actually Makes A Difference

When A Difference Actually Makes A Difference Bite-Sized Analytics for Business Decision-Makers (1) The post When A Difference Actually Makes A Difference appeared first on Towards Data Science. Mena Wang Go to original source

September 11, 2025
The Relative Instability of Model Comparison with Cross-validation

The Relative Instability of Model Comparison with Cross-validation arXiv:2508.04409v1 Announce Type: new Abstract: Existing work has shown that cross-validation (CV) can be used to provide an asymptotic confidence interval for the test error of a stable machine learning algorithm, and existing stability results for many popular algorithms can be applied to derive positive instances where…

August 7, 2025
When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems

When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems Models don’t just fail with noise; they fail in silence, by narrowing their attention to the point of fragility. The post When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems appeared first on Towards Data Science. Mahe Jabeen Abdul Go…

August 2, 2025
When 50/50 Isn’t Optimal: Debunking Even Rebalancing

When 50/50 Isn’t Optimal: Debunking Even Rebalancing A new theory of class imbalance demonstrates that the optimal training imbalance in a binary problem is not 50% The post When 50/50 Isn’t Optimal: Debunking Even Rebalancing appeared first on Towards Data Science. Marco Baity-Jesi Go to original source

July 25, 2025
The Automation Trap: Why Low-Code AI Models Fail When You Scale

The Automation Trap: Why Low-Code AI Models Fail When You Scale In the beginning, building Machine Learning models was a skill only data scientists with knowledge of Python could master. However, low-code AI platforms have made things much easier now. Anyone can now directly make a model, link it to data, and publish it as…

May 17, 2025
The Shadow Side of AutoML: When No-Code Tools Hurt More Than Help

The Shadow Side of AutoML: When No-Code Tools Hurt More Than Help Automl has become the gateway drug to machine learning for many organizations. It promises exactly what teams under pressure want to hear: you bring the data, and we’ll handle the modeling. There are no pipelines to manage, no hyperparameters to tune, and no…

May 9, 2025
Regression Discontinuity Design: How It Works and When to Use It

Regression Discontinuity Design: How It Works and When to Use It Regression Discontinuity Design: How It Works and When to Use It You’re an avid data scientist and experimenter. You know that randomisation is the summit of Mount Evidence Credibility, and you also know that when you can’t randomise, you resort to observational data and…

May 7, 2025
When OpenAI Isn’t Always the Answer: Enterprise Risks Behind Wrapper-Based AI Agents

When OpenAI Isn’t Always the Answer: Enterprise Risks Behind Wrapper-Based AI Agents “Wait… are you sending journal entries to OpenAI?” That was the first thing my friend asked when I showed her Feel-Write, an AI-powered journaling app I built during a hackathon in San Francisco. I shrugged. “It was an AI-themed hackathon, I had to…

April 29, 2025
Mastering the Poisson Distribution: Intuition and Foundations

Mastering the Poisson Distribution: Intuition and Foundations You’ve probably used the normal distribution one or two times too many. We all have — It’s a true workhorse. But sometimes, we run into problems. For instance, when predicting or forecasting values, simulating data given a particular data-generating process, or when we try to visualise model output…

March 21, 2025
Where to Start when Data is Limited: A Guide

Where to Start when Data is Limited: A Guide Hey, I’ve put together an article on my thoughts and some research around how to get the most out of small datasets when performance requirements mean conventional analysis isn’t enough. It’s aimed at helping people get started with new projects who have already started with the…

January 20, 2025
When Averages Lie: Moving Beyond Single-Point Predictions

When Averages Lie: Moving Beyond Single-Point Predictions The Case for Predicting Full Probability Distributions in Decision-Making Some people like hot coffee, some people like iced coffee, but no one likes lukewarm coffee. Yet, a simple model trained on coffee temperatures might predict that the next coffee served should be… lukewarm. This illustrates a fundamental problem…

December 21, 2024
ABROCA Distributions For Algorithmic Bias Assessment: Considerations Around Interpretation

ABROCA Distributions For Algorithmic Bias Assessment: Considerations Around Interpretation arXiv:2411.19090v1 Announce Type: new Abstract: Algorithmic bias continues to be a key concern of learning analytics. We study the statistical properties of the Absolute Between-ROC Area (ABROCA) metric. This fairness measure quantifies group-level differences in classifier performance through the absolute difference in ROC curves. ABROCA is…

December 2, 2024