Tag: how

How to Correctly Apply Limits on the Result in DAX (and SQL)

How to Correctly Apply Limits on the Result in DAX (and SQL) What if the output of a measure mustn’t be above a specific limit? How can we ensure that the total is calculated correctly? This piece is about correctly calculating and summarizing such output. The post How to Correctly Apply Limits on the Result…

August 19, 2025
How to Create Powerful LLM Applications with Context Engineering

How to Create Powerful LLM Applications with Context Engineering Improve your LLM by optimizing its context The post How to Create Powerful LLM Applications with Context Engineering appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

August 19, 2025
How different is “Senior Data Analyst” from “Data Scientist”?

How different is “Senior Data Analyst” from “Data Scientist”? I often see Senior DA roles that seem focused on using R/Python for analysis (vs. Excel and Power BI), but don’t have any insight into the day-to-day of theese roles. At the senior level, how different is Data Analyst from Data Scientist? submitted by /u/empirical-sadboy [link]…

August 18, 2025
How to Use LLMs for Powerful Automatic Evaluations

How to Use LLMs for Powerful Automatic Evaluations A beginner-friendly introduction to LLM-as-a-Judge The post How to Use LLMs for Powerful Automatic Evaluations appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

August 14, 2025
How to Design Machine Learning Experiments — the Right Way

How to Design Machine Learning Experiments — the Right Way The key to successful ML projects isn’t always more resources The post How to Design Machine Learning Experiments — the Right Way appeared first on Towards Data Science. TDS Editors Go to original source

August 9, 2025
How to Write Insightful Technical Articles

How to Write Insightful Technical Articles Learn how to write informative technical articles The post How to Write Insightful Technical Articles appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

August 9, 2025
How I Won the “Mostly AI” Synthetic Data Challenge

How I Won the “Mostly AI” Synthetic Data Challenge A deep dive into how post-processing can supercharge synthetic data generation The post How I Won the “Mostly AI” Synthetic Data Challenge appeared first on Towards Data Science. Daniel Gärber Go to original source

August 7, 2025
How a Research Lab Made Entirely of LLM Agents Developed Molecules That Can Block a Virus

How a Research Lab Made Entirely of LLM Agents Developed Molecules That Can Block a Virus Welcome to the 21st century by the hand of large language models and reasoning AI agents The post How a Research Lab Made Entirely of LLM Agents Developed Molecules That Can Block a Virus appeared first on Towards Data…

August 6, 2025
How Computers “See” Molecules

How Computers “See” Molecules Generative Molecular Design (Part 1): common molecular representations in data science. The post How Computers “See” Molecules appeared first on Towards Data Science. Tianyuan Zheng Go to original source

August 2, 2025
How to Benchmark LLMs – ARC AGI 3

How to Benchmark LLMs – ARC AGI 3 Learn how to LLMs are benchmarked, and try out the newly released ARC AGI 3 The post How to Benchmark LLMs – ARC AGI 3 appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

August 1, 2025
How Your Prompts Lead AI Astray

How Your Prompts Lead AI Astray Practical tips to recognise and avoid prompt bias. The post How Your Prompts Lead AI Astray appeared first on Towards Data Science. Daphne de Klerk Go to original source

July 30, 2025
How to Evaluate Graph Retrieval in MCP Agentic Systems

How to Evaluate Graph Retrieval in MCP Agentic Systems A framework for measuring retrieval quality in Model Context Protocol agents. The post How to Evaluate Graph Retrieval in MCP Agentic Systems appeared first on Towards Data Science. Tomaz Bratanic Go to original source

July 30, 2025
How I Fine-Tuned Granite-Vision 2B to Beat a 90B Model — Insights and Lessons Learned

How I Fine-Tuned Granite-Vision 2B to Beat a 90B Model — Insights and Lessons Learned A hands-on journey exploring fine-tuning techniques that unlock the power of small vision models. The post How I Fine-Tuned Granite-Vision 2B to Beat a 90B Model — Insights and Lessons Learned appeared first on Towards Data Science. Julio Sanchez Go…

July 26, 2025
How Do Grayscale Images Affect Visual Anomaly Detection?

How Do Grayscale Images Affect Visual Anomaly Detection? A practical exploration focusing on performance and speed The post How Do Grayscale Images Affect Visual Anomaly Detection? appeared first on Towards Data Science. Aimira Baitieva Go to original source

July 25, 2025
How Not to Mislead with Your Data-Driven Story

How Not to Mislead with Your Data-Driven Story Data storytelling can enlighten—but it can also deceive. When persuasive narratives meet biased framing, cherry-picked data, or misleading visuals, insights risk becoming illusions. This article explores the hidden biases embedded in data-driven storytelling—from the seduction of beautiful charts to the quiet influence of AI-generated insights—and offers practical…

July 24, 2025
From Rules to Relationships: How Machines Are Learning to Understand Each Other

From Rules to Relationships: How Machines Are Learning to Understand Each Other Using knowledge graphs to handle the unexpected in semantic communication The post From Rules to Relationships: How Machines Are Learning to Understand Each Other appeared first on Towards Data Science. Shireesh Kumar Singh Go to original source

July 23, 2025
How would you structure a project (data frame) to scrape and track listing changes over time?

How would you structure a project (data frame) to scrape and track listing changes over time? I’m working on a project where I want to scrape data daily (e.g., real estate listings from a site like RentFaster or Zillow) and track how each listing changes over time. I want to be able to answer questions…

July 21, 2025
How to Overlay a Heatmap on a Real Map with Python

How to Overlay a Heatmap on a Real Map with Python Visualizing historical tornado trends The post How to Overlay a Heatmap on a Real Map with Python appeared first on Towards Data Science. Lee Vaughan Go to original source

July 17, 2025
How to Ensure Reliability in LLM Applications

How to Ensure Reliability in LLM Applications Learn how to make your LLM applications more robust The post How to Ensure Reliability in LLM Applications appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

July 16, 2025
How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes

How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes When numbers lie — and your metrics mislead you The post How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes appeared first on Towards Data Science. Subha Ganapathi Go to original source

July 16, 2025
How much DSA for FAANG+ ?

How much DSA for FAANG+ ? Hello all, I am going to be graduating in 6 months and have been practicing Leetcode as I believe this to be my weakest point. I have solved 250 LC with 130 Easy and 120 Hard, covering concepts like arrays, hashing, binary trees, SQL, linked list, two pointers, stack,…

July 14, 2025
How do you efficiently traverse hundreds of features in the dataset?

How do you efficiently traverse hundreds of features in the dataset? Currently, working on a fintech classification algorithm, with close to a thousand features which is very tiresome. I’m not a domain expert, so creating sensible hypotesis is difficult. How do you tackle EDA and forming reasonable hypotesis in these cases? Even with proper documentation…

July 14, 2025
How to Perform Effective Data Cleaning for Machine Learning

How to Perform Effective Data Cleaning for Machine Learning Learn how you can improve your machine learning models using effective data cleaning The post How to Perform Effective Data Cleaning for Machine Learning appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

July 10, 2025
How to Fine-Tune Small Language Models to Think with Reinforcement Learning

How to Fine-Tune Small Language Models to Think with Reinforcement Learning A visual tour and from-scratch guide to train GRPO reasoning models in PyTorch The post How to Fine-Tune Small Language Models to Think with Reinforcement Learning appeared first on Towards Data Science. Avishek Biswas Go to original source

July 9, 2025
How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 From architectural design to food security. The post How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 appeared first on Towards Data Science. Marco Hening Tallarico Go to…

July 2, 2025
From Pixels to Plots

From Pixels to Plots How I built an AI-powered prototype to turn images into insights The post From Pixels to Plots appeared first on Towards Data Science. Jens Winkelmann Go to original source

July 1, 2025
How to Train a Chatbot Using RAG and Custom Data

How to Train a Chatbot Using RAG and Custom Data Retrieval-Augmented Generation made easy with Llama The post How to Train a Chatbot Using RAG and Custom Data appeared first on Towards Data Science. Haden Pelletier Go to original source

June 26, 2025
How AI Agents “Talk” to Each Other

How AI Agents “Talk” to Each Other Minimize chaos and maintain inter-agent harmony in your projects The post How AI Agents “Talk” to Each Other appeared first on Towards Data Science. TDS Editors Go to original source

June 14, 2025
How to Transition From Data Analyst to Data Scientist

How to Transition From Data Analyst to Data Scientist Playbook on how data analysts can become data scientists The post How to Transition From Data Analyst to Data Scientist appeared first on Towards Data Science. Egor Howell Go to original source

June 10, 2025
How I Automated My Machine Learning Workflow with Just 10 Lines of Python

How I Automated My Machine Learning Workflow with Just 10 Lines of Python Use LazyPredict and PyCaret to skip the grunt work and jump straight to performance. The post How I Automated My Machine Learning Workflow with Just 10 Lines of Python appeared first on Towards Data Science. Himanshu Sharma Go to original source

June 7, 2025
LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries

LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries Local Large Language Models can convert massive DataFrames to presentable Markdown reports — here’s how. The post LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries appeared first on Towards Data Science. Dario Radečić Go to original source

June 3, 2025
How Microsoft Power BI Elevated My Data Analysis and Visualization Workflow

How Microsoft Power BI Elevated My Data Analysis and Visualization Workflow Explaining useful features every data analyst needs The post How Microsoft Power BI Elevated My Data Analysis and Visualization Workflow appeared first on Towards Data Science. Benjamin Nweke Go to original source

May 28, 2025
How to Generate Synthetic Data: A Comprehensive Guide Using Bayesian Sampling and Univariate Distributions

How to Generate Synthetic Data: A Comprehensive Guide Using Bayesian Sampling and Univariate Distributions Data makes the engine run in many organisations. But what if the number of observations is too low or there is only expert knowledge? I will demonstrate how to generate synthetic data with applications in predictive maintenance. The post How to…

May 27, 2025
How to Evaluate LLMs and Algorithms — The Right Way

How to Evaluate LLMs and Algorithms — The Right Way Never miss a new edition of The Variable, our weekly newsletter featuring a top-notch selection of editors’ picks, deep dives, community news, and more. Subscribe today! All the hard work it takes to integrate large language models and powerful algorithms into your workflows can go to waste…

May 24, 2025
Survival Analysis When No One Dies: A Value-Based Approach

Survival Analysis When No One Dies: A Value-Based Approach Survival Analysis is a statistical approach used to answer the question: “How long will something last?” That “something” could range from a patient’s lifespan to the durability of a machine component or the duration of a user’s subscription. One of the most widely used tools in…

May 14, 2025
How I Built Business-Automating Workflows with AI Agents

How I Built Business-Automating Workflows with AI Agents AI agents and automation are no longer just a trend — they are transforming how companies operate. In a previous article, I shared several case studies of AI Agents supporting the sustainability roadmaps of small, medium and large companies. AI Agents for Sustainability — (Image by Samir Saci) This is part of a…

May 7, 2025
Why Most Cyber Risk Models Fail Before They Begin

Why Most Cyber Risk Models Fail Before They Begin Cybersecurity leaders are being asked impossible questions. “What’s the likelihood of a breach this year?” “How much would it cost?” And “how much should we spend to stop it?” Yet most risk models used today are still built on guesswork, gut instinct, and colorful heatmaps, not…

April 24, 2025
Are We Watching More Ads Than Content? Analyzing YouTube Sponsor Data

Are We Watching More Ads Than Content? Analyzing YouTube Sponsor Data I’m definitely not the only person who feels that YouTube sponsor segments have become longer and more frequent recently. Sometimes, I watch videos that seem to be trying to sell me something every couple of seconds. On one hand, it’s great that both small and…

April 4, 2025
From Fuzzy to Precise: How a Morphological Feature Extractor Enhances AI’s Recognition Capabilities

From Fuzzy to Precise: How a Morphological Feature Extractor Enhances AI’s Recognition Capabilities Introduction: Can AI really distinguish dog breeds like human experts? One day while taking a walk, I saw a fluffy white puppy and wondered, Is that a Bichon Frise or a Maltese? No matter how closely I looked, they seemed almost identical.…

March 25, 2025
What Germany Currently Is Up To, Debt-Wise

What Germany Currently Is Up To, Debt-Wise €1,600 per second. That’s how much interest Germany has to pay for its debts. In total, the German state has debts ranging into the trillions — more than a thousand billion Euros. And the government is planning to make even more, up to one trillion additional debt is…

March 22, 2025
Learning Pareto manifolds in high dimensions: How can regularization help?

Learning Pareto manifolds in high dimensions: How can regularization help? arXiv:2503.08849v1 Announce Type: new Abstract: Simultaneously addressing multiple objectives is becoming increasingly important in modern machine learning. At the same time, data is often high-dimensional and costly to label. For a single objective such as prediction risk, conventional regularization techniques are known to improve generalization…

March 13, 2025
How to Develop Complex DAX Expressions

How to Develop Complex DAX Expressions At some point or another, any Power BI developer must write complex Dax expressions to analyze data. But nobody tells you how to do it. What’s the process for doing it? What is the best way to do it, and how supportive can a development process be? These are the questions…

March 12, 2025
From Fuzzy to Precise: How a Morphological Feature Extractor Enhances AI’s Recognition Capabilities

From Fuzzy to Precise: How a Morphological Feature Extractor Enhances AI’s Recognition Capabilities Introduction: Can AI really distinguish dog breeds like human experts? One day while taking a walk, I saw a fluffy white puppy and wondered, Is that a Bichon Frise or a Maltese? No matter how closely I looked, they seemed almost identical.…

March 11, 2025
Experiments Illustrated: How Random Assignment Saved Us $1M in Marketing Spend

Experiments Illustrated: How Random Assignment Saved Us $1M in Marketing Spend Running cool experiments is easily one of my favorite parts of working in data science. Most experiments don’t deliver big wins, so the winners make for fun stories. We’ve had a few of these at IntelyCare, and I’m sharing each story in a way…

March 11, 2025
Experiments Illustrated: How We Optimized Premium Listings on Our Nursing Job Board

Experiments Illustrated: How We Optimized Premium Listings on Our Nursing Job Board Running experiments is a task that often falls to data scientists. If that’s you, congrats! It can be a rewarding and high-impact area of work, but also requires tools found outside the typical ML-heavy data science curriculum. Even with the best tools, only…

March 11, 2025
Generative AI Is Declarative

Generative AI Is Declarative ChatGPT launched in 2022 and kicked off the Generative Ai boom. In the two years since, academics, technologists, and armchair experts have written libraries worth of articles on the technical underpinnings of generative AI and about the potential capabilities of both current and future generative AI models. Surprisingly little has been…

March 6, 2025
How to Train LLMs to “Think” (o1 & DeepSeek-R1)

How to Train LLMs to “Think” (o1 & DeepSeek-R1) In September 2024, OpenAI released its o1 model, trained on large-scale reinforcement learning, giving it “advanced reasoning” capabilities. Unfortunately, the details of how they pulled this off were never shared publicly. Today, however, DeepSeek (an AI research lab) has replicated this reasoning behavior and published the…

March 4, 2025
Learnings from a Machine Learning Engineer — Part 4: The Model

Learnings from a Machine Learning Engineer — Part 4: The Model In this latest part of my series, I will share what I have learned on selecting a model for Image Classification and how to fine tune that model. I will also show how you can leverage the model to accelerate your labelling process, and…

February 14, 2025
How to Measure the Reliability of a Large Language Model’s Response

How to Measure the Reliability of a Large Language Model’s Response The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophisticated when it…

February 13, 2025
How Likely Is a Six Nations Grand Slam in 2025?

How Likely Is a Six Nations Grand Slam in 2025? Quantifying uncertainty in sports fixtures Photo by Thomas Serer on Unsplash Introduction For rugby fans the long wait is nearly over, like Christmas the Six Nations comes once a year to lift our spirits in the cold winter months. If you’re not very familiar with rugby, the…

February 1, 2025
How to do Date calculations in DAX

How to do Date calculations in DAX Moving back and forth in time is a common task for Time Intelligence in DAX. Let’s take a deeper look on how DATEADD() works. Continue reading on Towards Data Science » Salvatore Cagliari Go to original source

January 29, 2025
How Cheap Mortgages Transformed Poland’s Real Estate Market

How Cheap Mortgages Transformed Poland’s Real Estate Market Insights from a synthetic control group Continue reading on Towards Data Science » Lukasz Szubelak Go to original source

January 26, 2025
How to Utilize ModernBERT and Synthetic Data for Robust Text Classification

How to Utilize ModernBERT and Synthetic Data for Robust Text Classification Learn how to fine-tune ModernBERT and create augmentations of text samples Continue reading on Towards Data Science » Eivind Kjosbakken Go to original source

January 23, 2025
Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT

Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT Mastering the art of fine-tuning: Learnings for training your own LLMs. (Image from Unsplash) This is the third article in our GPT series, and also the most practical one: finally, we will talk about how to effectively fine-tune LLMs. It is practical in the…

January 22, 2025
How to Use Pre-Trained Language Models for Regression

How to Use Pre-Trained Language Models for Regression Why and how to convert mT5 into a regression metric for numerical prediction Continue reading on Towards Data Science » Aden Haussmann Go to original source

January 19, 2025
How To: Forecast Time Series Using Lags

How To: Forecast Time Series Using Lags Lag columns can significantly boost your model’s performance Continue reading on Towards Data Science » Haden Pelletier Go to original source

January 15, 2025
How we matured Fisher, our A/B testing library

How we matured Fisher, our A/B testing library submitted by /u/chomoloc0 [link] [comments] /u/chomoloc0 Go to original source

January 13, 2025
How to Run Jupyter Notebooks and Generate HTML Reports with Python Scripts

How to Run Jupyter Notebooks and Generate HTML Reports with Python Scripts A step-by-step guide to automating Jupyter Notebook execution and report generation using Python Continue reading on Towards Data Science » Amanda Iglesias Moreno Go to original source

January 10, 2025
How Recurrent Neural Networks (RNNs) Are Revolutionizing Decision-Making Research

How Recurrent Neural Networks (RNNs) Are Revolutionizing Decision-Making Research A deep dive into the world of computational modeling and its applications Continue reading on Towards Data Science » Kaushik Rajan Go to original source

January 8, 2025
How to Tell Among Two Regression Models with Statistical Significance

How to Tell Among Two Regression Models with Statistical Significance Diving into the F-test for nested models with algorithms, examples and code Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

January 4, 2025
How to Stand Out in The Data Science Job Market

How to Stand Out in The Data Science Job Market How to have the edge in your data science application Continue reading on Towards Data Science » Egor Howell Go to original source

January 3, 2025
Transforming Data into Solutions: Building a Smart App with Python and AI

Transforming Data into Solutions: Building a Smart App with Python and AI Some financial analysts worry that artificial intelligence may not justify the massive investments being made in the field. While I understand their concerns, I see things differently. I’m neither an AI Boomer nor an AI Doomer — I believe AI has the potential to drive…

January 2, 2025
Partial Dependence Plots: How to Discover Variables Influencing a Model

Partial Dependence Plots: How to Discover Variables Influencing a Model Have you ever wondered how machine learning models are constructed? ‘Explainability of machine learning models’ and ‘machine learning… Continue reading on Towards Data Science » Mythili Krishnan Go to original source

January 1, 2025
How to Ensure the Stability of a Model Using Jackknife Estimation

How to Ensure the Stability of a Model Using Jackknife Estimation How to ensure the robustness of a model and detect influential data observations Continue reading on Towards Data Science » Paula LC Go to original source

December 31, 2024
How To Start A Data Science Blog on Medium

How To Start A Data Science Blog on Medium Tips on how to get started, write your first article, and get noticed Continue reading on Towards Data Science » Haden Pelletier Go to original source

December 28, 2024
How to Clean Your Data for Your Real-Life Data Science Projects

How to Clean Your Data for Your Real-Life Data Science Projects How I treat missing values—with a quick Python Guide Continue reading on Towards Data Science » Mythili Krishnan Go to original source

December 24, 2024
How (and Where) ML Beginners Can Find Papers

How (and Where) ML Beginners Can Find Papers From conferences to surveys Continue reading on Towards Data Science » Pascal Janetzky Go to original source

December 23, 2024
How to Stand Out as a Junior Data Scientist

How to Stand Out as a Junior Data Scientist 7 things you can do to show your skills even if you have no experience at all Continue reading on Towards Data Science » Idit Cohen Go to original source

December 20, 2024
How Have Data Science Interviews Changed Over 4 Years?

How Have Data Science Interviews Changed Over 4 Years? An aggregated look on the differences between then & now: 2020 vs 2024 — some big frustrations and positive learnings. Continue reading on Towards Data Science » Matt Przybyla Go to original source

December 15, 2024
How to Apply the Central Limit Theorem to Constrained Data

How to Apply the Central Limit Theorem to Constrained Data What can we say about the mean of data distributed in an interval [a, b]? Continue reading on Towards Data Science » Ryan Burn Go to original source

December 11, 2024
How to Evaluate Multilingual LLMs With Global-MMLU

How to Evaluate Multilingual LLMs With Global-MMLU Evaluation of language-specific LLM accuracy on the global Massive Multitask Language Understanding benchmark in Python Continue reading on Towards Data Science » Dr. Leon Eversberg Go to original source

December 10, 2024
How to find freelance opportunities – what is the most typical troupe of project you do as freelance

How to find freelance opportunities – what is the most typical troupe of project you do as freelance Hi all, I have 5+ years of experience. I’m based in Europe Lately I’m thinking switch from full time employee to contractor, doing freelancing and working for different companies at the same time. I think that freelancing…

December 9, 2024
Modeling DAU with Markov Chain

Modeling DAU with Markov Chain How to predict DAU using Duolingo’s growth model and control the prediction 1. Introduction Doubtlessly, DAU, WAU, and MAU — daily, weekly, and monthly active users — are critical business metrics. An article “How Duolingo reignited user growth” by Jorge Mazal, former CPO of Duolingo, is #1 in the Growth section of Lenny’s Newsletter…

December 7, 2024
How to Integrate AI and Data Science into Your Business Strategy

How to Integrate AI and Data Science into Your Business Strategy DATA SCIENCE CONSULTING Insider consulting guide to conducting a successful 2-day executive workshop Image by author using Canva “Our industry does not respect tradition — it only respects innovation.” — Satya Nadella, CEO Microsoft, Letter to employees in 2014 While not all industries are as competitive and cutthroat as the…

December 7, 2024
How to Solve a Simple Problem With Machine Learning

How to Solve a Simple Problem With Machine Learning A technical walkthrough of lesson one Continue reading on Towards Data Science » Oscar Leo Go to original source

December 2, 2024
How to Prune LLaMA 3.2 and Similar Large Language Models

How to Prune LLaMA 3.2 and Similar Large Language Models This article explores a structured pruning technique for state-of-the-art models, that uses a GLU architecture, enabling the creation of… Continue reading on Towards Data Science » Pere Martra Go to original source

November 28, 2024
How to Develop an Effective AI-Powered Legal Assistant

How to Develop an Effective AI-Powered Legal Assistant Create a machine-learning-based search into legal decisions Continue reading on Towards Data Science » Eivind Kjosbakken Go to original source

November 28, 2024
how does btrfs do it?

https://github.com/markfasheh/duperemove https://www.jdupes.com/ https://despairlabs.com/blog/posts/2024-10-27-openzfs-dedup-is-good-dont-use-it/

October 31, 2024