Category: model-evaluation

Does More Data Always Yield Better Performance?

Does More Data Always Yield Better Performance? Exploring and challenging the conventional wisdom of “more data → better performance” by experimenting with the interactions between sample size, attribute set, and model complexity. The post Does More Data Always Yield Better Performance? appeared first on Towards Data Science. Mohannad Elhamod Go to original source

November 11, 2025
Choosing the Best Model Size and Dataset Size under a Fixed Budget for LLMs

Choosing the Best Model Size and Dataset Size under a Fixed Budget for LLMs A small-scale exploration using Tiny Transformers The post Choosing the Best Model Size and Dataset Size under a Fixed Budget for LLMs appeared first on Towards Data Science. Shuyang Go to original source

October 25, 2025
Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need

Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need A deep dive into advanced evaluation for data scientists The post Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need appeared first on Towards Data Science. Pol Marin Go to original source

July 15, 2025
How to Evaluate LLMs and Algorithms — The Right Way

How to Evaluate LLMs and Algorithms — The Right Way Never miss a new edition of The Variable, our weekly newsletter featuring a top-notch selection of editors’ picks, deep dives, community news, and more. Subscribe today! All the hard work it takes to integrate large language models and powerful algorithms into your workflows can go to waste…

May 24, 2025
Agentic AI 102: Guardrails and Agent Evaluation

Agentic AI 102: Guardrails and Agent Evaluation Introduction In the first post of this series (Agentic AI 101: Starting Your Journey Building AI Agents), we talked about the fundamentals of creating AI Agents and introduced concepts like reasoning, memory, and tools. Of course, that first post touched only the surface of this new area of…

May 17, 2025
How To Build a Benchmark for Your Models

How To Build a Benchmark for Your Models I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among most of the clients I worked with: They rarely have a clear idea of…

May 16, 2025
Attaining LLM Certainty with AI Decision Circuits

Attaining LLM Certainty with AI Decision Circuits The promise of AI agents has taken the world by storm. Agents can interact with the world around them, write articles (not this one though), take actions on your behalf, and generally make the difficult part of automating any task easy and approachable. Agents take aim at the most…

May 3, 2025
Choose the Right One: Evaluating Topic Models for Business Intelligence

Choose the Right One: Evaluating Topic Models for Business Intelligence Topic models are used in businesses to classify brand-related text datasets (such as product and site reviews, surveys, and social media comments) and to track how customer satisfaction metrics change over time. There is a myriad of recent topic models one can choose from: the…

April 25, 2025
Learnings from a Machine Learning Engineer — Part 3: The Evaluation

Learnings from a Machine Learning Engineer — Part 3: The Evaluation In this third part of my series, I will explore the evaluation process which is a critical piece that will lead to a cleaner data set and elevate your model performance. We will see the difference between evaluation of a trained model (one not yet in…

February 14, 2025
How to Measure the Reliability of a Large Language Model’s Response

How to Measure the Reliability of a Large Language Model’s Response The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophisticated when it…

February 13, 2025
Understanding Model Calibration: A Gentle Introduction & Visual Exploration

Understanding Model Calibration: A Gentle Introduction & Visual Exploration How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibration and then dive…

February 12, 2025
Building a Regression Model: Delivery Duration Prediction

Building a Regression Model: Delivery Duration Prediction Building a Regression Model to Predict Delivery Durations: A Practical Guide E2E walkthrough for approaching a regression modeling task In this article, we’re going to walk through the process of building a regression model — from dataset cleaning & preparation, to model training & evaluation. The specific regression task we will…

January 28, 2025