Category: editors-pick

  • How Do Grayscale Images Affect Visual Anomaly Detection?

    How Do Grayscale Images Affect Visual Anomaly Detection? A practical exploration focusing on performance and speed The post How Do Grayscale Images Affect Visual Anomaly Detection? appeared first on Towards Data Science. Aimira Baitieva Go to original source

  • How Not to Mislead with Your Data-Driven Story

    How Not to Mislead with Your Data-Driven Story Data storytelling can enlighten—but it can also deceive. When persuasive narratives meet biased framing, cherry-picked data, or misleading visuals, insights risk becoming illusions. This article explores the hidden biases embedded in data-driven storytelling—from the seduction of beautiful charts to the quiet influence of AI-generated insights—and offers practical…

  • Things I Wish I Had Known Before Starting ML

    Things I Wish I Had Known Before Starting ML Part 1: Data, Sales Pitches, Bugs, and Breakthroughs The post Things I Wish I Had Known Before Starting ML appeared first on Towards Data Science. Pascal Janetzky Go to original source

  • A Well-Designed Experiment Can Teach You More Than a Time Machine!

    A Well-Designed Experiment Can Teach You More Than a Time Machine! How experimentation is more powerful than knowing counterfactuals The post A Well-Designed Experiment Can Teach You More Than a Time Machine! appeared first on Towards Data Science. Jarom Hulet Go to original source

  • I Analysed 25,000 Hotel Names and Found Four Surprising Truths

    I Analysed 25,000 Hotel Names and Found Four Surprising Truths Why are there so many hotels named after cities they are not in? Follow along for a data analysis on hotel names. The post I Analysed 25,000 Hotel Names and Found Four Surprising Truths appeared first on Towards Data Science. Anna Gordun Peiro Go to…

  • Exploratory Data Analysis: Gamma Spectroscopy in Python (Part 2)

    Exploratory Data Analysis: Gamma Spectroscopy in Python (Part 2) Let’s observe the matter on the atomic level The post Exploratory Data Analysis: Gamma Spectroscopy in Python (Part 2) appeared first on Towards Data Science. Dmitrii Eliuseev Go to original source

  • Your 1M+ Context Window LLM Is Less Powerful Than You Think

    Your 1M+ Context Window LLM Is Less Powerful Than You Think Why working memory is a more important bottleneck than raw context window size The post Your 1M+ Context Window LLM Is Less Powerful Than You Think appeared first on Towards Data Science. Tobias Schnabel Go to original source

  • Midyear 2025 AI Reflection

    Midyear 2025 AI Reflection Impressions on agentic AI progress and the AI-2027 Jobocalypse scenario The post Midyear 2025 AI Reflection appeared first on Towards Data Science. Marina Tosic Go to original source

  • Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems

    Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems Prompt learning presents a compelling approach for continuous improvement of AI applications The post Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems appeared first on Towards Data Science. Aparna Dhinakaran Go to original source

  • How to Overlay a Heatmap on a Real Map with Python

    How to Overlay a Heatmap on a Real Map with Python Visualizing historical tornado trends The post How to Overlay a Heatmap on a Real Map with Python appeared first on Towards Data Science. Lee Vaughan Go to original source

  • How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes

    How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes When numbers lie — and your metrics mislead you The post How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes appeared first on Towards Data Science. Subha Ganapathi Go to original source

  • What Can the History of Data Tell Us About the Future of AI?

    What Can the History of Data Tell Us About the Future of AI? A 40-Year Look at Data, Business Models, and the Forces Shaping Intelligent Systems The post What Can the History of Data Tell Us About the Future of AI? appeared first on Towards Data Science. Steve Hedden Go to original source

  • Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain

    Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain Scaling a simple RAG pipeline from simple notes to full books The post Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain appeared first on Towards Data Science. Maria Mouschoutzi Go to original source

  • Reducing Time to Value for Data Science Projects: Part 3

    Reducing Time to Value for Data Science Projects: Part 3 Setting up a robust experimentation process The post Reducing Time to Value for Data Science Projects: Part 3 appeared first on Towards Data Science. Kristopher McGlinchey Go to original source

  • How to Perform Effective Data Cleaning for Machine Learning

    How to Perform Effective Data Cleaning for Machine Learning Learn how you can improve your machine learning models using effective data cleaning The post How to Perform Effective Data Cleaning for Machine Learning appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • What I Learned in my First 18 Months as a Freelance Data Scientist

    What I Learned in my First 18 Months as a Freelance Data Scientist The taxes and health insurance edition The post What I Learned in my First 18 Months as a Freelance Data Scientist appeared first on Towards Data Science. CJ Sullivan Go to original source

  • Your Personal Analytics Toolbox

    Your Personal Analytics Toolbox Leveraging MCP for automating your daily routine The post Your Personal Analytics Toolbox appeared first on Towards Data Science. Mariya Mansurova Go to original source

  • POSET Representations in Python Can Have a Huge Impact on Business

    POSET Representations in Python Can Have a Huge Impact on Business Discover how POSET indicators transform data into coherent scoring systems, enabling meaningful comparisons while preserving the data’s multi-dimensional semantic structure. The post POSET Representations in Python Can Have a Huge Impact on Business appeared first on Towards Data Science. Andrea D’Agostino Go to original…

  • Rethinking Data Science Interviews in the Age of AI

    Rethinking Data Science Interviews in the Age of AI How AI is transforming data science interviews—and what hiring managers and candidates should do to adapt The post Rethinking Data Science Interviews in the Age of AI appeared first on Towards Data Science. Yu Dong Go to original source

  • Fairness Pruning: Precision Surgery to Reduce Bias in LLMs

    Fairness Pruning: Precision Surgery to Reduce Bias in LLMs From unjustified shootings to neutral stories: how to fix toxic narratives with selective pruning The post Fairness Pruning: Precision Surgery to Reduce Bias in LLMs appeared first on Towards Data Science. Pere Martra Go to original source

  • Software Engineering in the LLM Era

    Software Engineering in the LLM Era On growing new software engineers, even when it’s inefficient The post Software Engineering in the LLM Era appeared first on Towards Data Science. Stephanie Kirmer Go to original source

  • How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

    How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 From architectural design to food security. The post How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 appeared first on Towards Data Science. Marco Hening Tallarico Go to…

  • An Introduction to Remote Model Context Protocol Servers

    An Introduction to Remote Model Context Protocol Servers Writing, testing and using them. The post An Introduction to Remote Model Context Protocol Servers appeared first on Towards Data Science. Thomas Reid Go to original source

  • Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not!

    Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! An explanation of the causal assumption implicit in prescriptive modeling and how to satisfy it. The post Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! appeared first on Towards Data Science. Jarom Hulet Go to original source

  • From Pixels to Plots

    From Pixels to Plots How I built an AI-powered prototype to turn images into insights The post From Pixels to Plots appeared first on Towards Data Science. Jens Winkelmann Go to original source

  • A Developer’s Guide to Building Scalable AI: Workflows vs Agents

    A Developer’s Guide to Building Scalable AI: Workflows vs Agents A practical guide to choosing between AI agents and workflows for production systems, covering the hidden costs, architectural trade-offs, and decision framework that can save you thousands in deployment mistakes. Includes real-world examples and a scoring system to determine which approach fits your specific use…

  • Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.”

    Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.” Companies pursuing incremental productivity gains risk being displaced by AI-native competitors building entirely new business models The post Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.” appeared first on Towards Data Science. Shreshth Sharma Go to original source

  • Why Your Next LLM Might Not Have A Tokenizer

    Why Your Next LLM Might Not Have A Tokenizer The Tokenizer Has Been a Necessary Evil, but This Radical Approach Shows That It Might Not Be Necessary Anymore. The post Why Your Next LLM Might Not Have A Tokenizer appeared first on Towards Data Science. Moulik Gupta Go to original source

  • Building AI-Powered Low-Code Workflows with n8n

    Building AI-Powered Low-Code Workflows with n8n Three powerful workflows that you can apply to your personal life or business today The post Building AI-Powered Low-Code Workflows with n8n appeared first on Towards Data Science. ALESSANDRA COSTA Go to original source

  • What PyTorch Really Means by a Leaf Tensor and Its Grad

    What PyTorch Really Means by a Leaf Tensor and Its Grad The secret life of leaves, gradients, and the mighty requires_grad flag The post What PyTorch Really Means by a Leaf Tensor and Its Grad appeared first on Towards Data Science. Maciej J. Mikulski Go to original source

  • Can We Use Chess to Predict Soccer?

    Can We Use Chess to Predict Soccer? An adaptation of Elo ratings for soccer implemented in Python The post Can We Use Chess to Predict Soccer? appeared first on Towards Data Science. Felipe Bandeira Go to original source

  • Grad-CAM from Scratch with PyTorch Hooks

    Grad-CAM from Scratch with PyTorch Hooks A hands-on look at an explainable AI (XAI) technique that helps reveal why a convolutional neural network (CNN) made a particular decision The post Grad-CAM from Scratch with PyTorch Hooks appeared first on Towards Data Science. Conor O’Sullivan Go to original source

  • What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization

    What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization An LLM in 2018 would not have trivialized a complex project, although it could have enhanced the final solution The post What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization appeared first on Towards Data Science. Hugo Ducruc…

  • Connecting the Dots for Better Movie Recommendations

    Connecting the Dots for Better Movie Recommendations Connecting the Dots for Better Movie Recommendations: Lightweight graph RAG on Rotten Tomatoes movie reviews The post Connecting the Dots for Better Movie Recommendations appeared first on Towards Data Science. Brian Godsey Go to original source

  • Design Smarter Prompts and Boost Your LLM Output: Real Tricks from an AI Engineer’s Toolbox

    Design Smarter Prompts and Boost Your LLM Output: Real Tricks from an AI Engineer’s Toolbox Not just what you ask, but how you ask it. Practical techniques for prompt engineering that deliver The post Design Smarter Prompts and Boost Your LLM Output: Real Tricks from an AI Engineer’s Toolbox appeared first on Towards Data Science. Ugo Pradère…

  • Exploring the Proportional Odds Model for Ordinal Logistic Regression

    Exploring the Proportional Odds Model for Ordinal Logistic Regression Understanding and Implementing Brant’s Tests in Ordinal Logistic Regression with Python The post Exploring the Proportional Odds Model for Ordinal Logistic Regression appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

  • Audio Spectrogram Transformers Beyond the Lab

    Audio Spectrogram Transformers Beyond the Lab A recipe for building a portable soundscape monitoring app with AudioMoth, Raspberry Pi, and a decent dose of deep learning. The post Audio Spectrogram Transformers Beyond the Lab appeared first on Towards Data Science. Maciej Adamiak Go to original source

  • Applications of Density Estimation to Legal Theory

    Applications of Density Estimation to Legal Theory A brief analysis using density estimation to compare the two-verdict and three-verdict systems. The post Applications of Density Estimation to Legal Theory appeared first on Towards Data Science. Jimin Kang Go to original source

  • Exploratory Data Analysis: Gamma Spectroscopy in Python

    Exploratory Data Analysis: Gamma Spectroscopy in Python Let’s observe the matter on the atomic level The post Exploratory Data Analysis: Gamma Spectroscopy in Python appeared first on Towards Data Science. Dmitrii Eliuseev Go to original source

  • Trying to Stay Sane in the Age of AI

    Trying to Stay Sane in the Age of AI A machine learning engineer’s quiet way of not losing her mind The post Trying to Stay Sane in the Age of AI appeared first on Towards Data Science. Amy Ma Go to original source

  • Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value

    Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value What actually works with AI agents inside enterprise organizations? The post Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value appeared first on Towards Data Science. Weiwei Hu Go to original source

  • The Journey from Jupyter to Programmer: A Quick-Start Guide

    The Journey from Jupyter to Programmer: A Quick-Start Guide Explore the real benefits of ditching the notebook The post The Journey from Jupyter to Programmer: A Quick-Start Guide appeared first on Towards Data Science. Lucy Dickinson Go to original source

  • Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is

    Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is Monitoring is easy; what to monitor is not. In the field of machine learning, data drift is just noise until you know what it means. The post Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is appeared first on Towards Data Science.…

  • Decision Trees Natively Handle Categorical Data

    Decision Trees Natively Handle Categorical Data But mean target encoding is their turbocharger The post Decision Trees Natively Handle Categorical Data appeared first on Towards Data Science. Vadim Arzamasov Go to original source

  • Evaluating LLMs for Inference, or Lessons from Teaching for Machine Learning

    Evaluating LLMs for Inference, or Lessons from Teaching for Machine Learning It’s like grading papers, but your student is an LLM The post Evaluating LLMs for Inference, or Lessons from Teaching for Machine Learning appeared first on Towards Data Science. Stephanie Kirmer Go to original source

  • Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other

    Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other Exploring how Google’s A2A enables plug-and-play communication between LLM-powered agents across frameworks The post Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other appeared first on Towards Data Science. Hailey Quach Go to original source

  • The Secret Power of Data Science in Customer Support

    The Secret Power of Data Science in Customer Support Customer support is a data goldmine. Here’s how to unlock its full potential with data science. The post The Secret Power of Data Science in Customer Support appeared first on Towards Data Science. Yu Dong Go to original source

  • I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know

    I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know A personal guide to the skills, tools, and mindset behind the title The post I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know appeared first on Towards Data Science. Sara Nobrega Go to original source

  • Multi-Agent Communication with the A2A Python SDK

    Multi-Agent Communication with the A2A Python SDK The Agent Card helps discover agents, but how does communication between agents actually work in practice? The post Multi-Agent Communication with the A2A Python SDK appeared first on Towards Data Science. Deborah Mesquita Go to original source

  • JAX: Is This Google’s NumPy killer?

    JAX: Is This Google’s NumPy killer? Auto differentiation and JIT compilation make a compelling case. The post JAX: Is This Google’s NumPy killer? appeared first on Towards Data Science. Thomas Reid Go to original source

  • Why Regularization Isn’t Enough: A Better Way to Train Neural Networks with Two Objectives

    Why Regularization Isn’t Enough: A Better Way to Train Neural Networks with Two Objectives Why splitting your objectives and your model might be the key to better performance and clearer trade-offs in deep learning. The post Why Regularization Isn’t Enough: A Better Way to Train Neural Networks with Two Objectives appeared first on Towards Data…

  • Code Agents: The Future of Agentic AI

    Code Agents: The Future of Agentic AI HuggingFace smolagents framework in action The post Code Agents: The Future of Agentic AI appeared first on Towards Data Science. Mariya Mansurova Go to original source

  • Understanding Matrices | Part 1: Matrix-Vector Multiplication

    Understanding Matrices | Part 1: Matrix-Vector Multiplication The physical meaning of multiplying a matrix by a vector, and how it works on several special matrices. The post Understanding Matrices | Part 1: Matrix-Vector Multiplication appeared first on Towards Data Science. Tigran Hayrapetyan Go to original source

  • New to LLMs? Start Here 

    New to LLMs? Start Here  A guide to Agents, LLMs, RAG, Fine-tuning, LangChain with practical examples to start building The post New to LLMs? Start Here  appeared first on Towards Data Science. ALESSANDRA COSTA Go to original source

  • Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed

    Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed Coding concepts that distinguish an amateur from a professional data scientist The post Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed appeared first on Towards Data Science. Benjamin Lee Go to original source

  • Use PyTorch to Easily Access Your GPU

    Use PyTorch to Easily Access Your GPU Let’s say you are lucky enough to have access to a system with an Nvidia Graphical Processing Unit (Gpu). Did you know there is an absurdly easy method to use your GPU’s capabilities using a Python library intended and predominantly used for machine learning (ML) applications?  Don’t worry…

  • What the Most Detailed Peer-Reviewed Study on AI in the Classroom Taught Us

    What the Most Detailed Peer-Reviewed Study on AI in the Classroom Taught Us The rapid proliferation and superb capabilities of widely available LLMs has ignited intense debate within the educational sector. On one side they offer students a 24/7 tutor who is always available to help; but then of course students can use LLMs to…

  • The Automation Trap: Why Low-Code AI Models Fail When You Scale

    The Automation Trap: Why Low-Code AI Models Fail When You Scale In the beginning, building Machine Learning models was a skill only data scientists with knowledge of Python could master. However, low-code AI platforms have made things much easier now. Anyone can now directly make a model, link it to data, and publish it as…

  • Boost 2-Bit LLM Accuracy with EoRA

    Boost 2-Bit LLM Accuracy with EoRA Quantization is one of the key techniques for reducing the memory footprint of large language models (LLMs). It works by converting the data type of model parameters from higher-precision formats such as 32-bit floating point (FP32) or 16-bit floating point (FP16/BF16) to lower-precision integer formats, typically INT8 or INT4.…

  • Strength in Numbers: Ensembling Models with Bagging and Boosting

    Strength in Numbers: Ensembling Models with Bagging and Boosting Bagging and boosting are two powerful ensemble techniques in machine learning – they are must-knows for data scientists! After reading this article, you are going to have a solid understanding of how bagging and boosting work and when to use them. We’ll cover the following topics,…

  • Survival Analysis When No One Dies: A Value-Based Approach

    Survival Analysis When No One Dies: A Value-Based Approach Survival Analysis is a statistical approach used to answer the question: “How long will something last?” That “something” could range from a patient’s lifespan to the durability of a machine component or the duration of a user’s subscription. One of the most widely used tools in…

  • Non-Parametric Density Estimation: Theory and Applications

    Non-Parametric Density Estimation: Theory and Applications In this article, we’ll talk about what Density Estimation is and the role it plays in statistical analysis. We’ll analyze two popular density estimation methods, histograms and kernel density estimators, and analyze their theoretical properties as well as how they perform in practice. Finally, we’ll look at how density…

  • How I Finally Understood MCP — and Got It Working in Real Life

    How I Finally Understood MCP — and Got It Working in Real Life Table of Content Introduction: Why I Wrote This The Evolution of Tool Integration with LLMs What Is Model Context Protocol (MCP), Really? Wait, MCP sounds like RAG… but is it? In an MCP-based setup In a traditional RAG system Traditional RAG Implementation MCP Implementation…

  • The Westworld Blunder

    The Westworld Blunder We’re entering an interesting moment in AI development. AI systems are getting memory, reasoning chains, self-critiques, and long-context recall. These capabilities are exactly some of the things that I’ve previously written would be prerequisites for an AI system to be conscious. Just to be clear, I don’t believe today’s AI systems are self-aware, but…

  • What My GPT Stylist Taught Me About Prompting Better

    What My GPT Stylist Taught Me About Prompting Better When I built a GPT-powered fashion assistant, I expected runway looks—not memory loss, hallucinations, or semantic déjà vu. But what unfolded became a lesson in how prompting really works—and why LLMs are more like wild animals than tools. This article builds on my previous article on…

  • A Review of AccentFold: One of the Most Important Papers on African ASR

    A Review of AccentFold: One of the Most Important Papers on African ASR I really enjoyed reading this paper, not because I’ve met some of the authors before, but because it felt necessary. Most of the papers I’ve written about so far have made waves in the broader ML community, which is great. This one, though,…

  • The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics

    The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics This is a follow-up to my earlier article: The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines. My first article focused on how visualizations can be used to mislead, diving into a form of data presentation widely used in public matters. In this article,…

  • Uh-Uh, Not Guilty

    Uh-Uh, Not Guilty When the six merry murderesses of the Cook County Jail climbed the stage in the Chicago musical, they were aligned on the message:  They had it coming, they had it coming all along. I didn’t do it. But if I’d done it, how could you tell me that I was wrong? And the part of…

  • Regression Discontinuity Design: How It Works and When to Use It

    Regression Discontinuity Design: How It Works and When to Use It Regression Discontinuity Design: How It Works and When to Use It You’re an avid data scientist and experimenter. You know that randomisation is the summit of Mount Evidence Credibility, and you also know that when you can’t randomise, you resort to observational data and…

  • Think. Know. Act. How AI’s Core Capabilities Will Shape the Future of Work

    Think. Know. Act. How AI’s Core Capabilities Will Shape the Future of Work “It is not the strongest of the species that survives, nor the most intelligent, but the one most responsive to change.” – Charles Darwin, Originator of Evolutionary Theory Not long ago, I came across an article about a CEO, who was visibly…

  • Making Sense of KPI Changes

    Making Sense of KPI Changes As analysts, we are usually monitoring metrics. Quite often, metrics change. And when they do, it’s our job to figure out what’s going on: why did the conversion rate suddenly drop, or what is driving consistent revenue growth? I started my journey in data analytics as a Kpi analyst. For almost…

  • Build and Query Knowledge Graphs with LLMs

    Build and Query Knowledge Graphs with LLMs Knowledge Graphs are relevant A Knowledge Graph could be defined as a structured representation of information that connects concepts, entities, and their relationships in a way that mimics human understanding. It is often used to organise and integrate data from various sources, enabling machines to reason, infer, and retrieve relevant…

  • Rust for Python Developers: Why You Should Take a Look at the Rust Programming Language

    Rust for Python Developers: Why You Should Take a Look at the Rust Programming Language The programming language Rust is now appearing in many feeds as it offers a performant and secure way to write programs and places great emphasis on performance. If you come from the Python world of Pandas, Jupyter or Flask, you might think that…

  • A Farewell to APMs — The Future of Observability is MCP tools

    A Farewell to APMs — The Future of Observability is MCP tools Image generated using Midjourney The past years have been an absolute rollercoaster (or joyride) of rapidly evolving generative AI technologies. In the twenty-five years I’ve counted myself a software developer, I cannot recall a tectonic shift of a similar magnitude, one that is already fundamentally changing…

  • Beyond Glorified Curve Fitting: Exploring the Probabilistic Foundations of Machine Learning

    Beyond Glorified Curve Fitting: Exploring the Probabilistic Foundations of Machine Learning You see a math formula you don’t immediately understand. Your instinct? Stop reading. Don’t. That’s exactly what I told myself when I started reading Probabilistic Machine Learning – An Introduction by Kevin P. Murphy. And it was absolutely worth it. It changed how I…

  • From FOMO to Opportunity: Analytical AI in the Era of LLM Agents

    From FOMO to Opportunity: Analytical AI in the Era of LLM Agents Are you feeling “fear of missing out” (FOMO) when it comes to LLM agents? Well, that was the case for me for quite a while. In recent months, it feels like my online feeds have been completely bombarded by “LLM Agents”: every other…

  • Building a Scalable and Accurate Audio Interview Transcription Pipeline with Google Gemini

    Building a Scalable and Accurate Audio Interview Transcription Pipeline with Google Gemini This article is co-authored by Ugo Pradère and David Haüet How hard can it be to transcribe an interview? You feed the audio to an AI model, wait a few minutes, and boom: perfect transcript, right? Well… not quite. When it comes to…

  • How to Ensure Your AI Solution Does What You Expect iI to Do

    How to Ensure Your AI Solution Does What You Expect iI to Do Generative AI (GenAI) is evolving fast — and it’s no longer just about fun chatbots or impressive image generation. 2025 is the year where the focus is on turning the AI hype into real value. Companies everywhere are looking into ways to…

  • When OpenAI Isn’t Always the Answer: Enterprise Risks Behind Wrapper-Based AI Agents

    When OpenAI Isn’t Always the Answer: Enterprise Risks Behind Wrapper-Based AI Agents “Wait… are you sending journal entries to OpenAI?” That was the first thing my friend asked when I showed her Feel-Write, an AI-powered journaling app I built during a hackathon in San Francisco. I shrugged. “It was an AI-themed hackathon, I had to…

  • Choose the Right One: Evaluating Topic Models for Business Intelligence

    Choose the Right One: Evaluating Topic Models for Business Intelligence Topic models are used in businesses to classify brand-related text datasets (such as product and site reviews, surveys, and social media comments) and to track how customer satisfaction metrics change over time. There is a myriad of recent topic models one can choose from: the…

  • How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals

    How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals The recent launch of the DeepSeek-R1 model sent ripples across the global AI community. It delivered breakthroughs on par with the reasoning models from Meta and OpenAI, achieving this in a fraction of the time and at a significantly lower cost. Beyond…

  • Enterprise AI: From Build-or-Buy to Partner-and-Grow

    Enterprise AI: From Build-or-Buy to Partner-and-Grow Not long ago, a cooperation partner casually approached me with an AI use case at their organization. They wanted to make their onboarding process for new staff more efficient by using AI to answer the repetitive questions of newcomers. I suggested a practical chat approach that would integrate their…

  • AI Agents Processing Time Series and Large Dataframes

    AI Agents Processing Time Series and Large Dataframes Intro Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This…

  • Retrieval Augmented Generation (RAG) — An Introduction

    Retrieval Augmented Generation (RAG) — An Introduction The model hallucinated! It was giving me OK answers and then it just started hallucinating. We’ve all heard or experienced it. Natural Language Generation models can sometimes hallucinate, i.e., they start generating text that is not quite accurate for the prompt provided. In layman’s terms, they start making…

  • Load-Testing LLMs Using LLMPerf

    Load-Testing LLMs Using LLMPerf Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level…

  • The Good-Enough Truth

    The Good-Enough Truth Could Shopify be right in requiring teams to demonstrate why AI can’t do a job before approving new human hires? Will companies that prioritize AI solutions eventually evolve into AI entities with significantly fewer employees? These are open-ended questions that have puzzled me about where such transformations might leave us in our quest for…

  • When Predictors Collide: Mastering VIF in Multicollinear Regression

    When Predictors Collide: Mastering VIF in Multicollinear Regression In regression models, the independent variables must be not or only slightly dependent on each other, i.e. that they are not correlated. However, if such a dependency exists, this is referred to as Multicollinearity and leads to unstable models and results that are difficult to interpret. The…

  • Layers of the AI Stack, Explained Simply

    Layers of the AI Stack, Explained Simply This is the first in a multi-part series on creating web applications with Generative Ai integration. Table of Contents Introduction The Virtues of the Application Layer Thick Wrappers The Return of Clippy Getting Stuff Done While You Sleep Introduction The AI space is a vast and complicated landscape. Matt…

  • Are You Sure Your Posterior Makes Sense?

    Are You Sure Your Posterior Makes Sense? This article is co-authored by Felipe Bandeira, Giselle Fretta, Thu Than, and Elbion Redenica. We also thank Prof. Carl Scheffler for his support. Introduction Parameter estimation has been for decades one of the most important topics in statistics. While frequentist approaches, such as Maximum Likelihood Estimations, used to…

  • Ivory Tower Notes: The Problem

    Ivory Tower Notes: The Problem Did you ever spend months on a Machine Learning project, only to discover you never defined the “correct” problem at the start? If so, or even if not, and you are only starting with the data science or AI field, welcome to my first Ivory Tower Note, where I will address…

  • Mining Rules from Data

    Mining Rules from Data Working with products, we might face a need to introduce some “rules”. Let me explain what I mean by “rules” in practical examples:  Imagine that we’re seeing a massive wave of fraud in our product, and we want to restrict onboarding for a particular segment of customers to lower this risk. For…

  • Avoiding Costly Mistakes with Uncertainty Quantification for Algorithmic Home Valuations

    Avoiding Costly Mistakes with Uncertainty Quantification for Algorithmic Home Valuations When you’re about to buy a home, whether you’re an everyday buyer looking for your dream house or a seasoned property investor, there’s a good chance you’ve encountered automated valuation models, or AVMs. These clever tools use massive datasets filled with past property transactions to…

  • Linear Programming: Managing Multiple Targets with Goal Programming

    Linear Programming: Managing Multiple Targets with Goal Programming This is the sixth (and likely last) part of a Linear Programming series I’ve been writing. With the core concepts covered by the prior articles, this article focuses on goal programming which is a less frequent linear programming (LP) use case. Goal programming is a specific linear…

  • The Art of Noise

    The Art of Noise Introduction In my last several articles I talked about generative deep learning algorithms, which mostly are related to text generation tasks. So, I think it would be interesting to switch to generative algorithms for image generation now. We knew that nowadays there have been plenty of deep learning models specialized for…

  • AI in Social Research and Polling

    AI in Social Research and Polling This month, I’m going to be discussing a really interesting topic that I came across in a recent draft paper by a professor at the University of Maryland named M. R. Sauter. In the paper, they discuss (among other things) the phenomenon of social scientists and pollsters trying to employ…

  • Graph Neural Networks Part 3: How GraphSAGE Handles Changing Graph Structure

    Graph Neural Networks Part 3: How GraphSAGE Handles Changing Graph Structure In the previous parts of this series, we looked at Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs). Both architectures work fine, but they also have some limitations! A big one is that for large graphs, calculating the node representations with GCNs and…

  • My Learning to Be Hired Again After a Year… Part 2

    My Learning to Be Hired Again After a Year… Part 2 This is the second part of “My learning to being hired again after a year… Part I”. Hard to believe, but it’s been a full year since I published the first part on TDS. And in that time, something beautiful happened. Every so often,…

  • A Little More Conversation, A Little Less Action — A Case Against Premature Data Integration

    A Little More Conversation, A Little Less Action — A Case Against Premature Data Integration When I talk to [large] organisations that have not yet properly started with Data Science (DS) and Machine Learning (ML), they often tell me that they have to run a data integration project first, because “…all the data is scattered…

  • Master the 3D Reconstruction Process: A Step-by-Step Guide

    Master the 3D Reconstruction Process: A Step-by-Step Guide The 3d Reconstruction journey from 2D photographs to 3D models follows a structured path.  This path consists of distinct steps that build upon each other to transform flat images into spatial information.  Understanding this pipeline is crucial for anyone looking to create high-quality 3D reconstructions. Let me…

  • Japanese-Chinese Translation with GenAI: What Works and What Doesn’t

    Japanese-Chinese Translation with GenAI: What Works and What Doesn’t Authors Alex (Qian) Wan: Alex (Qian) is a designer specializing in AI for B2B products. She is currently working at Microsoft, focusing on machine learning and Copilot for data analysis. Previously, she was the Gen AI design lead at VMware.Eli Ruoyong Hong : Eli is a…