Category: editors-pick

How Do Grayscale Images Affect Visual Anomaly Detection?

How Do Grayscale Images Affect Visual Anomaly Detection? A practical exploration focusing on performance and speed The post How Do Grayscale Images Affect Visual Anomaly Detection? appeared first on Towards Data Science. Aimira Baitieva Go to original source

July 25, 2025
How Not to Mislead with Your Data-Driven Story

How Not to Mislead with Your Data-Driven Story Data storytelling can enlighten—but it can also deceive. When persuasive narratives meet biased framing, cherry-picked data, or misleading visuals, insights risk becoming illusions. This article explores the hidden biases embedded in data-driven storytelling—from the seduction of beautiful charts to the quiet influence of AI-generated insights—and offers practical…

July 24, 2025
Things I Wish I Had Known Before Starting ML

Things I Wish I Had Known Before Starting ML Part 1: Data, Sales Pitches, Bugs, and Breakthroughs The post Things I Wish I Had Known Before Starting ML appeared first on Towards Data Science. Pascal Janetzky Go to original source

July 23, 2025
A Well-Designed Experiment Can Teach You More Than a Time Machine!

A Well-Designed Experiment Can Teach You More Than a Time Machine! How experimentation is more powerful than knowing counterfactuals The post A Well-Designed Experiment Can Teach You More Than a Time Machine! appeared first on Towards Data Science. Jarom Hulet Go to original source

July 23, 2025
I Analysed 25,000 Hotel Names and Found Four Surprising Truths

I Analysed 25,000 Hotel Names and Found Four Surprising Truths Why are there so many hotels named after cities they are not in? Follow along for a data analysis on hotel names. The post I Analysed 25,000 Hotel Names and Found Four Surprising Truths appeared first on Towards Data Science. Anna Gordun Peiro Go to…

July 22, 2025
Exploratory Data Analysis: Gamma Spectroscopy in Python (Part 2)

Exploratory Data Analysis: Gamma Spectroscopy in Python (Part 2) Let’s observe the matter on the atomic level The post Exploratory Data Analysis: Gamma Spectroscopy in Python (Part 2) appeared first on Towards Data Science. Dmitrii Eliuseev Go to original source

July 19, 2025
Your 1M+ Context Window LLM Is Less Powerful Than You Think

Your 1M+ Context Window LLM Is Less Powerful Than You Think Why working memory is a more important bottleneck than raw context window size The post Your 1M+ Context Window LLM Is Less Powerful Than You Think appeared first on Towards Data Science. Tobias Schnabel Go to original source

July 18, 2025
Midyear 2025 AI Reflection

Midyear 2025 AI Reflection Impressions on agentic AI progress and the AI-2027 Jobocalypse scenario The post Midyear 2025 AI Reflection appeared first on Towards Data Science. Marina Tosic Go to original source

July 17, 2025
Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems

Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems Prompt learning presents a compelling approach for continuous improvement of AI applications The post Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems appeared first on Towards Data Science. Aparna Dhinakaran Go to original source

July 17, 2025
How to Overlay a Heatmap on a Real Map with Python

How to Overlay a Heatmap on a Real Map with Python Visualizing historical tornado trends The post How to Overlay a Heatmap on a Real Map with Python appeared first on Towards Data Science. Lee Vaughan Go to original source

July 17, 2025
How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes

How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes When numbers lie — and your metrics mislead you The post How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes appeared first on Towards Data Science. Subha Ganapathi Go to original source

July 16, 2025
What Can the History of Data Tell Us About the Future of AI?

What Can the History of Data Tell Us About the Future of AI? A 40-Year Look at Data, Business Models, and the Forces Shaping Intelligent Systems The post What Can the History of Data Tell Us About the Future of AI? appeared first on Towards Data Science. Steve Hedden Go to original source

July 15, 2025
Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain

Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain Scaling a simple RAG pipeline from simple notes to full books The post Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain appeared first on Towards Data Science. Maria Mouschoutzi Go to original source

July 12, 2025
Reducing Time to Value for Data Science Projects: Part 3

Reducing Time to Value for Data Science Projects: Part 3 Setting up a robust experimentation process The post Reducing Time to Value for Data Science Projects: Part 3 appeared first on Towards Data Science. Kristopher McGlinchey Go to original source

July 11, 2025
How to Perform Effective Data Cleaning for Machine Learning

How to Perform Effective Data Cleaning for Machine Learning Learn how you can improve your machine learning models using effective data cleaning The post How to Perform Effective Data Cleaning for Machine Learning appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

July 10, 2025
What I Learned in my First 18 Months as a Freelance Data Scientist

What I Learned in my First 18 Months as a Freelance Data Scientist The taxes and health insurance edition The post What I Learned in my First 18 Months as a Freelance Data Scientist appeared first on Towards Data Science. CJ Sullivan Go to original source

July 9, 2025
Your Personal Analytics Toolbox

Your Personal Analytics Toolbox Leveraging MCP for automating your daily routine The post Your Personal Analytics Toolbox appeared first on Towards Data Science. Mariya Mansurova Go to original source

July 8, 2025
POSET Representations in Python Can Have a Huge Impact on Business

POSET Representations in Python Can Have a Huge Impact on Business Discover how POSET indicators transform data into coherent scoring systems, enabling meaningful comparisons while preserving the data’s multi-dimensional semantic structure. The post POSET Representations in Python Can Have a Huge Impact on Business appeared first on Towards Data Science. Andrea D’Agostino Go to original…

July 8, 2025
Rethinking Data Science Interviews in the Age of AI

Rethinking Data Science Interviews in the Age of AI How AI is transforming data science interviews—and what hiring managers and candidates should do to adapt The post Rethinking Data Science Interviews in the Age of AI appeared first on Towards Data Science. Yu Dong Go to original source

July 5, 2025
Fairness Pruning: Precision Surgery to Reduce Bias in LLMs

Fairness Pruning: Precision Surgery to Reduce Bias in LLMs From unjustified shootings to neutral stories: how to fix toxic narratives with selective pruning The post Fairness Pruning: Precision Surgery to Reduce Bias in LLMs appeared first on Towards Data Science. Pere Martra Go to original source

July 4, 2025
Software Engineering in the LLM Era

Software Engineering in the LLM Era On growing new software engineers, even when it’s inefficient The post Software Engineering in the LLM Era appeared first on Towards Data Science. Stephanie Kirmer Go to original source

July 3, 2025
How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 From architectural design to food security. The post How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 appeared first on Towards Data Science. Marco Hening Tallarico Go to…

July 2, 2025
An Introduction to Remote Model Context Protocol Servers

An Introduction to Remote Model Context Protocol Servers Writing, testing and using them. The post An Introduction to Remote Model Context Protocol Servers appeared first on Towards Data Science. Thomas Reid Go to original source

July 2, 2025
Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not!

Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! An explanation of the causal assumption implicit in prescriptive modeling and how to satisfy it. The post Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! appeared first on Towards Data Science. Jarom Hulet Go to original source

July 1, 2025
From Pixels to Plots

From Pixels to Plots How I built an AI-powered prototype to turn images into insights The post From Pixels to Plots appeared first on Towards Data Science. Jens Winkelmann Go to original source

July 1, 2025
A Developer’s Guide to Building Scalable AI: Workflows vs Agents

A Developer’s Guide to Building Scalable AI: Workflows vs Agents A practical guide to choosing between AI agents and workflows for production systems, covering the hidden costs, architectural trade-offs, and decision framework that can save you thousands in deployment mistakes. Includes real-world examples and a scoring system to determine which approach fits your specific use…

June 28, 2025
Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.”

Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.” Companies pursuing incremental productivity gains risk being displaced by AI-native competitors building entirely new business models The post Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.” appeared first on Towards Data Science. Shreshth Sharma Go to original source

June 26, 2025
Why Your Next LLM Might Not Have A Tokenizer

Why Your Next LLM Might Not Have A Tokenizer The Tokenizer Has Been a Necessary Evil, but This Radical Approach Shows That It Might Not Be Necessary Anymore. The post Why Your Next LLM Might Not Have A Tokenizer appeared first on Towards Data Science. Moulik Gupta Go to original source

June 25, 2025
Building AI-Powered Low-Code Workflows with n8n

Building AI-Powered Low-Code Workflows with n8n Three powerful workflows that you can apply to your personal life or business today The post Building AI-Powered Low-Code Workflows with n8n appeared first on Towards Data Science. ALESSANDRA COSTA Go to original source

June 24, 2025
What PyTorch Really Means by a Leaf Tensor and Its Grad

What PyTorch Really Means by a Leaf Tensor and Its Grad The secret life of leaves, gradients, and the mighty requires_grad flag The post What PyTorch Really Means by a Leaf Tensor and Its Grad appeared first on Towards Data Science. Maciej J. Mikulski Go to original source

June 20, 2025
Can We Use Chess to Predict Soccer?

Can We Use Chess to Predict Soccer? An adaptation of Elo ratings for soccer implemented in Python The post Can We Use Chess to Predict Soccer? appeared first on Towards Data Science. Felipe Bandeira Go to original source

June 19, 2025
Grad-CAM from Scratch with PyTorch Hooks

Grad-CAM from Scratch with PyTorch Hooks A hands-on look at an explainable AI (XAI) technique that helps reveal why a convolutional neural network (CNN) made a particular decision The post Grad-CAM from Scratch with PyTorch Hooks appeared first on Towards Data Science. Conor O’Sullivan Go to original source

June 17, 2025
What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization

What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization An LLM in 2018 would not have trivialized a complex project, although it could have enhanced the final solution The post What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization appeared first on Towards Data Science. Hugo Ducruc…

June 14, 2025
Connecting the Dots for Better Movie Recommendations

Connecting the Dots for Better Movie Recommendations Connecting the Dots for Better Movie Recommendations: Lightweight graph RAG on Rotten Tomatoes movie reviews The post Connecting the Dots for Better Movie Recommendations appeared first on Towards Data Science. Brian Godsey Go to original source

June 13, 2025
Design Smarter Prompts and Boost Your LLM Output: Real Tricks from an AI Engineer’s Toolbox

Design Smarter Prompts and Boost Your LLM Output: Real Tricks from an AI Engineer’s Toolbox Not just what you ask, but how you ask it. Practical techniques for prompt engineering that deliver The post Design Smarter Prompts and Boost Your LLM Output: Real Tricks from an AI Engineer’s Toolbox appeared first on Towards Data Science. Ugo Pradère…

June 13, 2025
Exploring the Proportional Odds Model for Ordinal Logistic Regression

Exploring the Proportional Odds Model for Ordinal Logistic Regression Understanding and Implementing Brant’s Tests in Ordinal Logistic Regression with Python The post Exploring the Proportional Odds Model for Ordinal Logistic Regression appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

June 12, 2025
Audio Spectrogram Transformers Beyond the Lab

Audio Spectrogram Transformers Beyond the Lab A recipe for building a portable soundscape monitoring app with AudioMoth, Raspberry Pi, and a decent dose of deep learning. The post Audio Spectrogram Transformers Beyond the Lab appeared first on Towards Data Science. Maciej Adamiak Go to original source

June 11, 2025
Applications of Density Estimation to Legal Theory

Applications of Density Estimation to Legal Theory A brief analysis using density estimation to compare the two-verdict and three-verdict systems. The post Applications of Density Estimation to Legal Theory appeared first on Towards Data Science. Jimin Kang Go to original source

June 11, 2025
Exploratory Data Analysis: Gamma Spectroscopy in Python

Exploratory Data Analysis: Gamma Spectroscopy in Python Let’s observe the matter on the atomic level The post Exploratory Data Analysis: Gamma Spectroscopy in Python appeared first on Towards Data Science. Dmitrii Eliuseev Go to original source

June 10, 2025
Trying to Stay Sane in the Age of AI

Trying to Stay Sane in the Age of AI A machine learning engineer’s quiet way of not losing her mind The post Trying to Stay Sane in the Age of AI appeared first on Towards Data Science. Amy Ma Go to original source

June 10, 2025
Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value

Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value What actually works with AI agents inside enterprise organizations? The post Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value appeared first on Towards Data Science. Weiwei Hu Go to original source

June 7, 2025
The Journey from Jupyter to Programmer: A Quick-Start Guide

The Journey from Jupyter to Programmer: A Quick-Start Guide Explore the real benefits of ditching the notebook The post The Journey from Jupyter to Programmer: A Quick-Start Guide appeared first on Towards Data Science. Lucy Dickinson Go to original source

June 5, 2025
Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is

Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is Monitoring is easy; what to monitor is not. In the field of machine learning, data drift is just noise until you know what it means. The post Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is appeared first on Towards Data Science.…

June 4, 2025
Decision Trees Natively Handle Categorical Data

Decision Trees Natively Handle Categorical Data But mean target encoding is their turbocharger The post Decision Trees Natively Handle Categorical Data appeared first on Towards Data Science. Vadim Arzamasov Go to original source

June 4, 2025
Evaluating LLMs for Inference, or Lessons from Teaching for Machine Learning

Evaluating LLMs for Inference, or Lessons from Teaching for Machine Learning It’s like grading papers, but your student is an LLM The post Evaluating LLMs for Inference, or Lessons from Teaching for Machine Learning appeared first on Towards Data Science. Stephanie Kirmer Go to original source

June 3, 2025
Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other

Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other Exploring how Google’s A2A enables plug-and-play communication between LLM-powered agents across frameworks The post Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other appeared first on Towards Data Science. Hailey Quach Go to original source

June 3, 2025
The Secret Power of Data Science in Customer Support

The Secret Power of Data Science in Customer Support Customer support is a data goldmine. Here’s how to unlock its full potential with data science. The post The Secret Power of Data Science in Customer Support appeared first on Towards Data Science. Yu Dong Go to original source

May 31, 2025
I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know

I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know A personal guide to the skills, tools, and mindset behind the title The post I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know appeared first on Towards Data Science. Sara Nobrega Go to original source

May 30, 2025
Multi-Agent Communication with the A2A Python SDK

Multi-Agent Communication with the A2A Python SDK The Agent Card helps discover agents, but how does communication between agents actually work in practice? The post Multi-Agent Communication with the A2A Python SDK appeared first on Towards Data Science. Deborah Mesquita Go to original source

May 29, 2025
JAX: Is This Google’s NumPy killer?

JAX: Is This Google’s NumPy killer? Auto differentiation and JIT compilation make a compelling case. The post JAX: Is This Google’s NumPy killer? appeared first on Towards Data Science. Thomas Reid Go to original source

May 29, 2025
Why Regularization Isn’t Enough: A Better Way to Train Neural Networks with Two Objectives

Why Regularization Isn’t Enough: A Better Way to Train Neural Networks with Two Objectives Why splitting your objectives and your model might be the key to better performance and clearer trade-offs in deep learning. The post Why Regularization Isn’t Enough: A Better Way to Train Neural Networks with Two Objectives appeared first on Towards Data…

May 28, 2025
Code Agents: The Future of Agentic AI

Code Agents: The Future of Agentic AI HuggingFace smolagents framework in action The post Code Agents: The Future of Agentic AI appeared first on Towards Data Science. Mariya Mansurova Go to original source

May 27, 2025
Understanding Matrices | Part 1: Matrix-Vector Multiplication

Understanding Matrices | Part 1: Matrix-Vector Multiplication The physical meaning of multiplying a matrix by a vector, and how it works on several special matrices. The post Understanding Matrices | Part 1: Matrix-Vector Multiplication appeared first on Towards Data Science. Tigran Hayrapetyan Go to original source

May 27, 2025
New to LLMs? Start Here

New to LLMs? Start Here A guide to Agents, LLMs, RAG, Fine-tuning, LangChain with practical examples to start building The post New to LLMs? Start Here appeared first on Towards Data Science. ALESSANDRA COSTA Go to original source

May 24, 2025
Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed

Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed Coding concepts that distinguish an amateur from a professional data scientist The post Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed appeared first on Towards Data Science. Benjamin Lee Go to original source

May 23, 2025
Use PyTorch to Easily Access Your GPU

Use PyTorch to Easily Access Your GPU Let’s say you are lucky enough to have access to a system with an Nvidia Graphical Processing Unit (Gpu). Did you know there is an absurdly easy method to use your GPU’s capabilities using a Python library intended and predominantly used for machine learning (ML) applications? Don’t worry…

May 22, 2025
What the Most Detailed Peer-Reviewed Study on AI in the Classroom Taught Us

What the Most Detailed Peer-Reviewed Study on AI in the Classroom Taught Us The rapid proliferation and superb capabilities of widely available LLMs has ignited intense debate within the educational sector. On one side they offer students a 24/7 tutor who is always available to help; but then of course students can use LLMs to…

May 21, 2025
The Automation Trap: Why Low-Code AI Models Fail When You Scale

The Automation Trap: Why Low-Code AI Models Fail When You Scale In the beginning, building Machine Learning models was a skill only data scientists with knowledge of Python could master. However, low-code AI platforms have made things much easier now. Anyone can now directly make a model, link it to data, and publish it as…

May 17, 2025
Boost 2-Bit LLM Accuracy with EoRA

Boost 2-Bit LLM Accuracy with EoRA Quantization is one of the key techniques for reducing the memory footprint of large language models (LLMs). It works by converting the data type of model parameters from higher-precision formats such as 32-bit floating point (FP32) or 16-bit floating point (FP16/BF16) to lower-precision integer formats, typically INT8 or INT4.…

May 15, 2025
Strength in Numbers: Ensembling Models with Bagging and Boosting

Strength in Numbers: Ensembling Models with Bagging and Boosting Bagging and boosting are two powerful ensemble techniques in machine learning – they are must-knows for data scientists! After reading this article, you are going to have a solid understanding of how bagging and boosting work and when to use them. We’ll cover the following topics,…

May 15, 2025
Survival Analysis When No One Dies: A Value-Based Approach

Survival Analysis When No One Dies: A Value-Based Approach Survival Analysis is a statistical approach used to answer the question: “How long will something last?” That “something” could range from a patient’s lifespan to the durability of a machine component or the duration of a user’s subscription. One of the most widely used tools in…

May 14, 2025
Non-Parametric Density Estimation: Theory and Applications

Non-Parametric Density Estimation: Theory and Applications In this article, we’ll talk about what Density Estimation is and the role it plays in statistical analysis. We’ll analyze two popular density estimation methods, histograms and kernel density estimators, and analyze their theoretical properties as well as how they perform in practice. Finally, we’ll look at how density…

May 14, 2025
How I Finally Understood MCP — and Got It Working in Real Life

How I Finally Understood MCP — and Got It Working in Real Life Table of Content Introduction: Why I Wrote This The Evolution of Tool Integration with LLMs What Is Model Context Protocol (MCP), Really? Wait, MCP sounds like RAG… but is it? In an MCP-based setup In a traditional RAG system Traditional RAG Implementation MCP Implementation…

May 13, 2025
The Westworld Blunder

The Westworld Blunder We’re entering an interesting moment in AI development. AI systems are getting memory, reasoning chains, self-critiques, and long-context recall. These capabilities are exactly some of the things that I’ve previously written would be prerequisites for an AI system to be conscious. Just to be clear, I don’t believe today’s AI systems are self-aware, but…

May 13, 2025
What My GPT Stylist Taught Me About Prompting Better

What My GPT Stylist Taught Me About Prompting Better When I built a GPT-powered fashion assistant, I expected runway looks—not memory loss, hallucinations, or semantic déjà vu. But what unfolded became a lesson in how prompting really works—and why LLMs are more like wild animals than tools. This article builds on my previous article on…

May 10, 2025
A Review of AccentFold: One of the Most Important Papers on African ASR

A Review of AccentFold: One of the Most Important Papers on African ASR I really enjoyed reading this paper, not because I’ve met some of the authors before, but because it felt necessary. Most of the papers I’ve written about so far have made waves in the broader ML community, which is great. This one, though,…

May 10, 2025
The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics

The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics This is a follow-up to my earlier article: The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines. My first article focused on how visualizations can be used to mislead, diving into a form of data presentation widely used in public matters. In this article,…

May 9, 2025
Uh-Uh, Not Guilty

Uh-Uh, Not Guilty When the six merry murderesses of the Cook County Jail climbed the stage in the Chicago musical, they were aligned on the message: They had it coming, they had it coming all along. I didn’t do it. But if I’d done it, how could you tell me that I was wrong? And the part of…

May 8, 2025
Regression Discontinuity Design: How It Works and When to Use It

Regression Discontinuity Design: How It Works and When to Use It Regression Discontinuity Design: How It Works and When to Use It You’re an avid data scientist and experimenter. You know that randomisation is the summit of Mount Evidence Credibility, and you also know that when you can’t randomise, you resort to observational data and…

May 7, 2025
Think. Know. Act. How AI’s Core Capabilities Will Shape the Future of Work

Think. Know. Act. How AI’s Core Capabilities Will Shape the Future of Work “It is not the strongest of the species that survives, nor the most intelligent, but the one most responsive to change.” – Charles Darwin, Originator of Evolutionary Theory Not long ago, I came across an article about a CEO, who was visibly…

May 6, 2025
Making Sense of KPI Changes

Making Sense of KPI Changes As analysts, we are usually monitoring metrics. Quite often, metrics change. And when they do, it’s our job to figure out what’s going on: why did the conversion rate suddenly drop, or what is driving consistent revenue growth? I started my journey in data analytics as a Kpi analyst. For almost…

May 6, 2025
Build and Query Knowledge Graphs with LLMs

Build and Query Knowledge Graphs with LLMs Knowledge Graphs are relevant A Knowledge Graph could be defined as a structured representation of information that connects concepts, entities, and their relationships in a way that mimics human understanding. It is often used to organise and integrate data from various sources, enabling machines to reason, infer, and retrieve relevant…

May 3, 2025
Rust for Python Developers: Why You Should Take a Look at the Rust Programming Language

Rust for Python Developers: Why You Should Take a Look at the Rust Programming Language The programming language Rust is now appearing in many feeds as it offers a performant and secure way to write programs and places great emphasis on performance. If you come from the Python world of Pandas, Jupyter or Flask, you might think that…

May 2, 2025
A Farewell to APMs — The Future of Observability is MCP tools

A Farewell to APMs — The Future of Observability is MCP tools Image generated using Midjourney The past years have been an absolute rollercoaster (or joyride) of rapidly evolving generative AI technologies. In the twenty-five years I’ve counted myself a software developer, I cannot recall a tectonic shift of a similar magnitude, one that is already fundamentally changing…

May 2, 2025
Beyond Glorified Curve Fitting: Exploring the Probabilistic Foundations of Machine Learning

Beyond Glorified Curve Fitting: Exploring the Probabilistic Foundations of Machine Learning You see a math formula you don’t immediately understand. Your instinct? Stop reading. Don’t. That’s exactly what I told myself when I started reading Probabilistic Machine Learning – An Introduction by Kevin P. Murphy. And it was absolutely worth it. It changed how I…

May 1, 2025
From FOMO to Opportunity: Analytical AI in the Era of LLM Agents

From FOMO to Opportunity: Analytical AI in the Era of LLM Agents Are you feeling “fear of missing out” (FOMO) when it comes to LLM agents? Well, that was the case for me for quite a while. In recent months, it feels like my online feeds have been completely bombarded by “LLM Agents”: every other…

April 30, 2025
Building a Scalable and Accurate Audio Interview Transcription Pipeline with Google Gemini

Building a Scalable and Accurate Audio Interview Transcription Pipeline with Google Gemini This article is co-authored by Ugo Pradère and David Haüet How hard can it be to transcribe an interview? You feed the audio to an AI model, wait a few minutes, and boom: perfect transcript, right? Well… not quite. When it comes to…

April 30, 2025
How to Ensure Your AI Solution Does What You Expect iI to Do

How to Ensure Your AI Solution Does What You Expect iI to Do Generative AI (GenAI) is evolving fast — and it’s no longer just about fun chatbots or impressive image generation. 2025 is the year where the focus is on turning the AI hype into real value. Companies everywhere are looking into ways to…

April 29, 2025
When OpenAI Isn’t Always the Answer: Enterprise Risks Behind Wrapper-Based AI Agents

When OpenAI Isn’t Always the Answer: Enterprise Risks Behind Wrapper-Based AI Agents “Wait… are you sending journal entries to OpenAI?” That was the first thing my friend asked when I showed her Feel-Write, an AI-powered journaling app I built during a hackathon in San Francisco. I shrugged. “It was an AI-themed hackathon, I had to…

April 29, 2025
Choose the Right One: Evaluating Topic Models for Business Intelligence

Choose the Right One: Evaluating Topic Models for Business Intelligence Topic models are used in businesses to classify brand-related text datasets (such as product and site reviews, surveys, and social media comments) and to track how customer satisfaction metrics change over time. There is a myriad of recent topic models one can choose from: the…

April 25, 2025
How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals

How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals The recent launch of the DeepSeek-R1 model sent ripples across the global AI community. It delivered breakthroughs on par with the reasoning models from Meta and OpenAI, achieving this in a fraction of the time and at a significantly lower cost. Beyond…

April 24, 2025
Enterprise AI: From Build-or-Buy to Partner-and-Grow

Enterprise AI: From Build-or-Buy to Partner-and-Grow Not long ago, a cooperation partner casually approached me with an AI use case at their organization. They wanted to make their onboarding process for new staff more efficient by using AI to answer the repetitive questions of newcomers. I suggested a practical chat approach that would integrate their…

April 23, 2025
AI Agents Processing Time Series and Large Dataframes

AI Agents Processing Time Series and Large Dataframes Intro Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This…

April 23, 2025
Retrieval Augmented Generation (RAG) — An Introduction

Retrieval Augmented Generation (RAG) — An Introduction The model hallucinated! It was giving me OK answers and then it just started hallucinating. We’ve all heard or experienced it. Natural Language Generation models can sometimes hallucinate, i.e., they start generating text that is not quite accurate for the prompt provided. In layman’s terms, they start making…

April 22, 2025
Load-Testing LLMs Using LLMPerf

Load-Testing LLMs Using LLMPerf Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level…

April 19, 2025
The Good-Enough Truth

The Good-Enough Truth Could Shopify be right in requiring teams to demonstrate why AI can’t do a job before approving new human hires? Will companies that prioritize AI solutions eventually evolve into AI entities with significantly fewer employees? These are open-ended questions that have puzzled me about where such transformations might leave us in our quest for…

April 18, 2025
When Predictors Collide: Mastering VIF in Multicollinear Regression

When Predictors Collide: Mastering VIF in Multicollinear Regression In regression models, the independent variables must be not or only slightly dependent on each other, i.e. that they are not correlated. However, if such a dependency exists, this is referred to as Multicollinearity and leads to unstable models and results that are difficult to interpret. The…

April 17, 2025
Layers of the AI Stack, Explained Simply

Layers of the AI Stack, Explained Simply This is the first in a multi-part series on creating web applications with Generative Ai integration. Table of Contents Introduction The Virtues of the Application Layer Thick Wrappers The Return of Clippy Getting Stuff Done While You Sleep Introduction The AI space is a vast and complicated landscape. Matt…

April 15, 2025
Are You Sure Your Posterior Makes Sense?

Are You Sure Your Posterior Makes Sense? This article is co-authored by Felipe Bandeira, Giselle Fretta, Thu Than, and Elbion Redenica. We also thank Prof. Carl Scheffler for his support. Introduction Parameter estimation has been for decades one of the most important topics in statistics. While frequentist approaches, such as Maximum Likelihood Estimations, used to…

April 12, 2025
Ivory Tower Notes: The Problem

Ivory Tower Notes: The Problem Did you ever spend months on a Machine Learning project, only to discover you never defined the “correct” problem at the start? If so, or even if not, and you are only starting with the data science or AI field, welcome to my first Ivory Tower Note, where I will address…

April 11, 2025
Mining Rules from Data

Mining Rules from Data Working with products, we might face a need to introduce some “rules”. Let me explain what I mean by “rules” in practical examples: Imagine that we’re seeing a massive wave of fraud in our product, and we want to restrict onboarding for a particular segment of customers to lower this risk. For…

April 10, 2025
Avoiding Costly Mistakes with Uncertainty Quantification for Algorithmic Home Valuations

Avoiding Costly Mistakes with Uncertainty Quantification for Algorithmic Home Valuations When you’re about to buy a home, whether you’re an everyday buyer looking for your dream house or a seasoned property investor, there’s a good chance you’ve encountered automated valuation models, or AVMs. These clever tools use massive datasets filled with past property transactions to…

April 8, 2025
Linear Programming: Managing Multiple Targets with Goal Programming

Linear Programming: Managing Multiple Targets with Goal Programming This is the sixth (and likely last) part of a Linear Programming series I’ve been writing. With the core concepts covered by the prior articles, this article focuses on goal programming which is a less frequent linear programming (LP) use case. Goal programming is a specific linear…

April 4, 2025
The Art of Noise

The Art of Noise Introduction In my last several articles I talked about generative deep learning algorithms, which mostly are related to text generation tasks. So, I think it would be interesting to switch to generative algorithms for image generation now. We knew that nowadays there have been plenty of deep learning models specialized for…

April 3, 2025
AI in Social Research and Polling

AI in Social Research and Polling This month, I’m going to be discussing a really interesting topic that I came across in a recent draft paper by a professor at the University of Maryland named M. R. Sauter. In the paper, they discuss (among other things) the phenomenon of social scientists and pollsters trying to employ…

April 2, 2025
Graph Neural Networks Part 3: How GraphSAGE Handles Changing Graph Structure

Graph Neural Networks Part 3: How GraphSAGE Handles Changing Graph Structure In the previous parts of this series, we looked at Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs). Both architectures work fine, but they also have some limitations! A big one is that for large graphs, calculating the node representations with GCNs and…

April 1, 2025
My Learning to Be Hired Again After a Year… Part 2

My Learning to Be Hired Again After a Year… Part 2 This is the second part of “My learning to being hired again after a year… Part I”. Hard to believe, but it’s been a full year since I published the first part on TDS. And in that time, something beautiful happened. Every so often,…

April 1, 2025
A Little More Conversation, A Little Less Action — A Case Against Premature Data Integration

A Little More Conversation, A Little Less Action — A Case Against Premature Data Integration When I talk to [large] organisations that have not yet properly started with Data Science (DS) and Machine Learning (ML), they often tell me that they have to run a data integration project first, because “…all the data is scattered…

March 29, 2025
Master the 3D Reconstruction Process: A Step-by-Step Guide

Master the 3D Reconstruction Process: A Step-by-Step Guide The 3d Reconstruction journey from 2D photographs to 3D models follows a structured path. This path consists of distinct steps that build upon each other to transform flat images into spatial information. Understanding this pipeline is crucial for anyone looking to create high-quality 3D reconstructions. Let me…

March 29, 2025
Japanese-Chinese Translation with GenAI: What Works and What Doesn’t

Japanese-Chinese Translation with GenAI: What Works and What Doesn’t Authors Alex (Qian) Wan: Alex (Qian) is a designer specializing in AI for B2B products. She is currently working at Microsoft, focusing on machine learning and Copilot for data analysis. Previously, she was the Gen AI design lead at VMware.Eli Ruoyong Hong : Eli is a…

March 28, 2025