Category: machine-learning

  • An Unbiased Review of Snowflake’s Document AI

    An Unbiased Review of Snowflake’s Document AI As data professionals, we’re comfortable with tabular data… Tabular data. Image by Author. We can also handle words, json, xml feeds, and pictures of cats. But what about a cardboard box full of things like this? (Image by Annie Spratt, Unsplash) The info on this receipt wants so…

  • Plotly’s AI Tools Are Redefining Data Science Workflows 

    Plotly’s AI Tools Are Redefining Data Science Workflows  Is there anything more frustrating than building a powerful data model but then struggling to turn it into a tool stakeholders can use to achieve their desired outcome? Data Science has never been short on potential but is also never short on complexity. You can refine algorithms…

  • Sesame  Speech Model:  How This Viral AI Model Generates Human-Like Speech

    Sesame  Speech Model:  How This Viral AI Model Generates Human-Like Speech Recently, Sesame AI published a demo of their latest Speech-to-Speech model. A conversational AI agent who is really good at speaking, they provide relevant answers, they speak with expressions, and honestly, they are just very fun and interactive to play with. Note that a…

  • Learnings from a Machine Learning Engineer — Part 6: The Human Side

    Learnings from a Machine Learning Engineer — Part 6: The Human Side In my previous articles, I have spent a lot of time talking about the technical aspects of an Image Classification problem from data collection, model evaluation, performance optimization, and a detailed look at model training. These elements require a certain degree of in-depth expertise, and they (usually) have well-defined…

  • The Basis of Cognitive Complexity: Teaching CNNs to See Connections

    The Basis of Cognitive Complexity: Teaching CNNs to See Connections Liberating education consists in acts of cognition, not transferrals of information. Paulo freire One of the most heated discussions around artificial intelligence is: What aspects of human learning is it capable of capturing? Many authors suggest that artificial intelligence models do not possess the same…

  • The Invisible Revolution: How Vectors Are (Re)defining Business Success

    The Invisible Revolution: How Vectors Are (Re)defining Business Success In a world that focuses more on data, business leaders must understand vector thinking. At first, vectors may appear as complicated as algebra was in school, but they serve as a fundamental building block. Vectors are as essential as algebra for tasks like sharing a bill…

  • How to Measure Real Model Accuracy When Labels Are Noisy

    How to Measure Real Model Accuracy When Labels Are Noisy Ground truth is never perfect. From scientific measurements to human annotations used to train deep learning models, ground truth always has some amount of errors. ImageNet, arguably the most well-curated image dataset has 0.3% errors in human annotations. Then, how can we evaluate predictive models…

  • Ivory Tower Notes: The Problem

    Ivory Tower Notes: The Problem Did you ever spend months on a Machine Learning project, only to discover you never defined the “correct” problem at the start? If so, or even if not, and you are only starting with the data science or AI field, welcome to my first Ivory Tower Note, where I will address…

  • Why CatBoost Works So Well: The Engineering Behind the Magic

    Why CatBoost Works So Well: The Engineering Behind the Magic Gradient boosting is a cornerstone technique for modeling tabular data due to its speed and simplicity. It delivers great results without any fuss. When you look around you’ll see multiple options like LightGBM, XGBoost, etc. Catboost is one such variant. In this post, we will…

  • Mining Rules from Data

    Mining Rules from Data Working with products, we might face a need to introduce some “rules”. Let me explain what I mean by “rules” in practical examples:  Imagine that we’re seeing a massive wave of fraud in our product, and we want to restrict onboarding for a particular segment of customers to lower this risk. For…

  • A Data Scientist’s Guide to Docker Containers

    A Data Scientist’s Guide to Docker Containers For a ML model to be useful it needs to run somewhere. This somewhere is most likely not your local machine. A not-so-good model that runs in a production environment is better than a perfect model that never leaves your local machine. However, the production machine is usually…

  • Circuit Tracing: A Step Closer to Understanding Large Language Models

    Circuit Tracing: A Step Closer to Understanding Large Language Models Context Over the years, Transformer-based large language models (LLMs) have made substantial progress across a wide range of tasks evolving from simple information retrieval systems to sophisticated agents capable of coding, writing, conducting research, and much more. But despite their capabilities, these models are still largely…

  • How I Would Learn To Code (If I Could Start Over)

    How I Would Learn To Code (If I Could Start Over) According to various sources, the average salary for Coding jobs is ~£47.5k in the UK, which is ~35% higher than the median salary of about £35k. So, coding is a very valuable skill that will earn you more money, not to mention it’s really fun.…

  • Kernel Case Study: Flash Attention

    Kernel Case Study: Flash Attention The attention mechanism is at the core of modern day transformers. But scaling the context window of these transformers was a major challenge, and it still is even though we are in the era of a million tokens + context window (Qwen 2.5 [1]). There are both considerable compute and memory…

  • The Art of Noise

    The Art of Noise Introduction In my last several articles I talked about generative deep learning algorithms, which mostly are related to text generation tasks. So, I think it would be interesting to switch to generative algorithms for image generation now. We knew that nowadays there have been plenty of deep learning models specialized for…

  • The Case for Centralized AI Model Inference Serving

    The Case for Centralized AI Model Inference Serving As AI models continue to increase in scope and accuracy, even tasks once dominated by traditional algorithms are gradually being replaced by Deep Learning models. Algorithmic pipelines — workflows that take an input, process it through a series of algorithms, and produce an output — increasingly rely…

  • A Simple Implementation of the Attention Mechanism from Scratch

    A Simple Implementation of the Attention Mechanism from Scratch Introduction The Attention Mechanism is often associated with the transformer architecture, but it was already used in RNNs. In Machine Translation or MT (e.g., English-Italian) tasks, when you want to predict the next Italian word, you need your model to focus, or pay attention, on the…

  • My Learning to Be Hired Again After a Year… Part 2

    My Learning to Be Hired Again After a Year… Part 2 This is the second part of “My learning to being hired again after a year… Part I”. Hard to believe, but it’s been a full year since I published the first part on TDS. And in that time, something beautiful happened. Every so often,…

  • The Art of Hybrid Architectures

    The Art of Hybrid Architectures In my previous article, I discussed how morphological feature extractors mimic the way biological experts visually assess images. This time, I want to go a step further and explore a new question:Can different architectures complement each other to build an AI that “sees” like an expert? Introduction: Rethinking Model Architecture…

  • A Little More Conversation, A Little Less Action — A Case Against Premature Data Integration

    A Little More Conversation, A Little Less Action — A Case Against Premature Data Integration When I talk to [large] organisations that have not yet properly started with Data Science (DS) and Machine Learning (ML), they often tell me that they have to run a data integration project first, because “…all the data is scattered…

  • From Physics to Probability: Hamiltonian Mechanics for Generative Modeling and MCMC

    From Physics to Probability: Hamiltonian Mechanics for Generative Modeling and MCMC Phase space of a nonlinear pendulum. Photo by the author. Hamiltonian mechanics is a way to describe how physical systems, like planets or pendulums, move over time, focusing on energy rather than just forces. By reframing complex dynamics through energy lenses, this 19th-century physics…

  • Uncertainty Quantification in Machine Learning with an Easy Python Interface

    Uncertainty Quantification in Machine Learning with an Easy Python Interface Uncertainty quantification (UQ) in a Machine Learning (ML) model allows one to estimate the precision of its predictions. This is extremely important for utilizing its predictions in real-world tasks. For instance, if a machine learning model is trained to predict a property of a material,…

  • Attractors in Neural Network Circuits: Beauty and Chaos

    Attractors in Neural Network Circuits: Beauty and Chaos The state space of the first two neuron activations over time follows an attractor. What is one thing in common between memories, oscillating chemical reactions and double pendulums? All these systems have a basin of attraction for possible states, like a magnet that draws the system towards certain…

  • The Ultimate AI/ML Roadmap For Beginners

    The Ultimate AI/ML Roadmap For Beginners AI is transforming the way businesses operate, and nearly every company is exploring how to leverage this technology. As a result, the demand for AI and machine learning skills has skyrocketed in recent years. With nearly four years of experience in AI/ML, I’ve decided to create the ultimate guide…

  • Testing the Power of Multimodal AI Systems in Reading and Interpreting Photographs, Maps, Charts and More

    Testing the Power of Multimodal AI Systems in Reading and Interpreting Photographs, Maps, Charts and More Introduction It’s no news that artificial intelligence has made huge strides in recent years, particularly with the advent of multimodal models that can process and create both text and images, and some very new ones that also process and produce…

  • A Clear Intro to MCP (Model Context Protocol) with Code Examples

    A Clear Intro to MCP (Model Context Protocol) with Code Examples As the race to move AI agents from prototype to production heats up, the need for a standardized way for agents to call tools across different providers is pressing. This transition to a standardized approach to agent tool calling is similar to what we…

  • Evolving Product Operating Models in the Age of AI

    Evolving Product Operating Models in the Age of AI In a previous article on organizing for AI (link), we looked at how the interplay between three key dimensions — ownership of outcomes, outsourcing of staff, and the geographical proximity of team members — can yield a variety of organizational archetypes for implementing strategic AI initiatives,…

  • Google’s Data Science Agent: Can It Really Do Your Job?

    Google’s Data Science Agent: Can It Really Do Your Job? On March 3rd, Google officially rolled out its Data Science Agent to most Colab users for free. This is not something brand new — it was first announced in December last year, but it is now integrated into Colab and made widely accessible. Google says…

  • R.E.D.: Scaling Text Classification with Expert Delegation

    R.E.D.: Scaling Text Classification with Expert Delegation With the new age of problem-solving augmented by Large Language Models (LLMs), only a handful of problems remain that have subpar solutions. Most classification problems (at a PoC level) can be solved by leveraging LLMs at 70–90% Precision/F1 with just good prompt engineering techniques, as well as adaptive…

  • Algorithm Protection in the Context of Federated Learning 

    Algorithm Protection in the Context of Federated Learning  While working at a biotech company, we aim to advance ML & AI Algorithms to enable, for example, brain lesion segmentation to be executed at the hospital/clinic location where patient data resides, so it is processed in a secure manner. This, in essence, is guaranteed by federated…

  • Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs 

    Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs  Creating efficient prompts for large language models often starts as a simple task… but it doesn’t always stay that way. Initially, following basic best practices seems sufficient: adopt the persona of a specialist, write clear instructions, require a specific response format, and…

  • Essential Review Papers on Physics-Informed Neural Networks: A Curated Guide for Practitioners

    Essential Review Papers on Physics-Informed Neural Networks: A Curated Guide for Practitioners Staying on top of a fast-growing research field is never easy. I face this challenge firsthand as a practitioner in Physics-Informed Neural Networks (PINNs). New papers, be they algorithmic advancements or cutting-edge applications, are published at an accelerating pace by both academia and…

  • Custom Training Pipeline for Object Detection Models

    Custom Training Pipeline for Object Detection Models What if you want to write the whole object detection training pipeline from scratch, so you can understand each step and be able to customize it? That’s what I set out to do. I examined several well-known object detection pipelines and designed one that best suits my needs…

  • Image Captioning, Transformer Mode On

    Image Captioning, Transformer Mode On Introduction In my previous article, I discussed one of the earliest Deep Learning approaches for image captioning. If you’re interested in reading it, you can find the link to that article at the end of this one. Today, I would like to talk about Image Captioning again, but this time…

  • How to Spot and Prevent Model Drift Before it Impacts Your Business

    How to Spot and Prevent Model Drift Before it Impacts Your Business Despite the AI hype, many tech companies still rely heavily on machine learning to power critical applications, from personalized recommendations to fraud detection.  I’ve seen firsthand how undetected drifts can result in significant costs — missed fraud detection, lost revenue, and suboptimal business…

  • Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation

    Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation Introduction Many generative AI use cases still revolve around Retrieval Augmented Generation (RAG), yet consistently fall short of user expectations. Despite the growing body of research on RAG improvements and even adding Agents into the process, many solutions still fail to return exhaustive results,…

  • Generative AI Is Declarative

    Generative AI Is Declarative ChatGPT launched in 2022 and kicked off the Generative Ai boom. In the two years since, academics, technologists, and armchair experts have written libraries worth of articles on the technical underpinnings of generative AI and about the potential capabilities of both current and future generative AI models. Surprisingly little has been…

  • How to Train LLMs to “Think” (o1 & DeepSeek-R1)

    How to Train LLMs to “Think” (o1 & DeepSeek-R1) In September 2024, OpenAI released its o1 model, trained on large-scale reinforcement learning, giving it “advanced reasoning” capabilities. Unfortunately, the details of how they pulled this off were never shared publicly. Today, however, DeepSeek (an AI research lab) has replicated this reasoning behavior and published the…

  • LLM + RAG: Creating an AI-Powered File Reader Assistant

    LLM + RAG: Creating an AI-Powered File Reader Assistant Introduction AI is everywhere.  It is hard not to interact at least once a day with a Large Language Model (LLM). The chatbots are here to stay. They’re in your apps, they help you write better, they compose emails, they read emails…well, they do a lot.…

  • Avoidable and Unavoidable Randomness in GPT-4o

    Avoidable and Unavoidable Randomness in GPT-4o Of course there is randomness in GPT-4o’s outputs. After all, the model samples from a probability distribution when choosing each token. But what I didn’t understand was that those very probabilities themselves are not deterministic. Even with consistent prompts, fixed seeds, and temperature set to zero, GPT-4o still introduces…

  • Vision Transformers (ViT) Explained: Are They Better Than CNNs?

    Vision Transformers (ViT) Explained: Are They Better Than CNNs? 1. Introduction Ever since the introduction of the self-attention mechanism, Transformers have been the top choice when it comes to Natural Language Processing (NLP) tasks. Self-attention-based models are highly parallelizable and require substantially fewer parameters, making them much more computationally efficient, less prone to overfitting, and…

  • Unraveling Large Language Model Hallucinations

    Unraveling Large Language Model Hallucinations Introduction In a YouTube video titled Deep Dive into LLMs like ChatGPT, former Senior Director of AI at Tesla, Andrej Karpathy discusses the psychology of Large Language Models (LLMs) as emergent cognitive effects of the training pipeline. This article is inspired by his explanation of LLM hallucinations and the information presented in the…

  • Debugging the Dreaded NaN

    Debugging the Dreaded NaN You are training your latest AI model, anxiously watching as the loss steadily decreases when suddenly — boom! Your logs are flooded with NaNs (Not a Number) — your model is irreparably corrupted and you’re left staring at your screen in despair. To make matters worse, the NaNs don’t appear consistently.…

  • How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

    How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo Welcome to part 2 of my LLM deep dive. If you’ve not read Part 1, I highly encourage you to check it out first.  Previously, we covered the first two major stages of training an LLM: Pre-training — Learning from massive datasets to form a base…

  • LLaDA: The Diffusion Model That Could Redefine Language Generation

    LLaDA: The Diffusion Model That Could Redefine Language Generation Introduction What if we could make language models think more like humans? Instead of writing one word at a time, what if they could sketch out their thoughts first, and gradually refine them? This is exactly what Large Language Diffusion Models (LLaDA) introduces: a different approach to…

  • Enhancing RAG: Beyond Vanilla Approaches

    Enhancing RAG: Beyond Vanilla Approaches Retrieval-Augmented Generation (RAG) is a powerful technique that enhances language models by incorporating external information retrieval mechanisms. While standard RAG implementations improve response relevance, they often struggle in complex retrieval scenarios. This article explores the limitations of a vanilla RAG setup and introduces advanced techniques to enhance its accuracy and…

  • 6 Common LLM Customization Strategies Briefly Explained

    6 Common LLM Customization Strategies Briefly Explained Why Customize LLMs? Large Language Models (Llms) are deep learning models pre-trained based on self-supervised learning, requiring a vast amount of resources on training data, training time and holding a large number of parameters. LLM have revolutionized natural language processing especially in the last 2 years, demonstrating remarkable…

  • Reinforcement Learning with PDEs

    Reinforcement Learning with PDEs Previously we discussed applying reinforcement learning to Ordinary Differential Equations (ODEs) by integrating ODEs within gymnasium. ODEs are a powerful tool that can describe a wide range of systems but are limited to a single variable. Partial Differential Equations (PDEs) are differential equations involving derivatives of multiple variables that can cover…

  • Multimodal Search Engine Agents Powered by BLIP-2 and Gemini

    Multimodal Search Engine Agents Powered by BLIP-2 and Gemini This post was co-authored with Rafael Guedes. Introduction Traditional models can only process a single type of data, such as text, images, or tabular data. Multimodality is a trending concept in the AI research community, referring to a model’s ability to learn from multiple types of…

  • Formulation of Feature Circuits with Sparse Autoencoders in LLM

    Formulation of Feature Circuits with Sparse Autoencoders in LLM Large Language models (LLMs) have witnessed impressive progress and these large models can do a variety of tasks, from generating human-like text to answering questions. However, understanding how these models work still remains challenging, especially due a phenomenon called superposition where features are mixed into one…

  • How to Fine-Tune DistilBERT for Emotion Classification

    How to Fine-Tune DistilBERT for Emotion Classification The customer support teams were drowning with the overwhelming volume of customer inquiries at every company I’ve worked at. Have you had similar experiences? What if I told you that you could use AI to automatically identify, categorize, and even resolve the most common issues? By fine-tuning a…

  • Learning How to Play Atari Games Through Deep Neural Networks

    Learning How to Play Atari Games Through Deep Neural Networks In July 1959, Arthur Samuel developed one of the first agents to play the game of checkers. What constitutes an agent that plays checkers can be best described in Samuel’s own words, “…a computer [that] can be programmed so that it will learn to play…

  • How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference

    How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference With the recent explosion of interest in large language models (LLMs), they often seem almost magical. But let’s demystify them. I wanted to step back and unpack the fundamentals — breaking down how LLMs are built, trained, and fine-tuned to become the AI systems we interact…

  • On-Device Machine Learning in Spatial Computing

    On-Device Machine Learning in Spatial Computing The landscape of computing is undergoing a profound transformation with the emergence of spatial computing platforms(VR and AR). As we step into this new era, the intersection of virtual reality, Augmented Reality, and on-device machine learning presents unprecedented opportunities for developers to create experiences that seamlessly blend digital content…

  • How I Became A Machine Learning Engineer (No CS Degree, No Bootcamp)

    How I Became A Machine Learning Engineer (No CS Degree, No Bootcamp) Machine learning and AI are among the most popular topics nowadays, especially within the tech space. I am fortunate enough to work and develop with these technologies every day as a machine learning engineer! In this article, I will walk you through my…

  • Roadmap to Becoming a Data Scientist, Part 4: Advanced Machine Learning

    Roadmap to Becoming a Data Scientist, Part 4: Advanced Machine Learning Introduction Data science is undoubtedly one of the most fascinating fields today. Following significant breakthroughs in machine learning about a decade ago, data science has surged in popularity within the tech community. Each year, we witness increasingly powerful tools that once seemed unimaginable. Innovations such as the Transformer…

  • Learnings from a Machine Learning Engineer — Part 5: The Training

    Learnings from a Machine Learning Engineer — Part 5: The Training In this fifth part of my series, I will outline the steps for creating a Docker container for training your image classification model, evaluating performance, and preparing for deployment. AI/ML engineers would prefer to focus on model training and data engineering, but the reality…

  • Learnings from a Machine Learning Engineer — Part 3: The Evaluation

    Learnings from a Machine Learning Engineer — Part 3: The Evaluation In this third part of my series, I will explore the evaluation process which is a critical piece that will lead to a cleaner data set and elevate your model performance. We will see the difference between evaluation of a trained model (one not yet in…

  • Learnings from a Machine Learning Engineer — Part 1: The Data

    Learnings from a Machine Learning Engineer — Part 1: The Data It is said that in order for a machine learning model to be successful, you need to have good data. While this is true (and pretty much obvious), it is extremely difficult to define, build, and sustain good data. Let me share with you…

  • Learnings from a Machine Learning Engineer — Part 4: The Model

    Learnings from a Machine Learning Engineer — Part 4: The Model In this latest part of my series, I will share what I have learned on selecting a model for Image Classification and how to fine tune that model. I will also show how you can leverage the model to accelerate your labelling process, and…

  • Should Data Scientists Care About Quantum Computing?

    Should Data Scientists Care About Quantum Computing? I am sure the quantum hype has reached every person in tech (and outside it, most probably). With some over-the-top claims, like “some company has proved quantum supremacy,” “the quantum revolution is here,” or my favorite, “quantum computers are here, and it will make classical computers obsolete.” I…

  • How to Measure the Reliability of a Large Language Model’s Response

    How to Measure the Reliability of a Large Language Model’s Response The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophisticated when it…

  • Build a Decision Tree in Polars from Scratch

    Build a Decision Tree in Polars from Scratch Decision Tree algorithms have always fascinated me. They are easy to implement and achieve good results on various classification and regression tasks. Combined with boosting, decision trees are still state-of-the-art in many applications. Frameworks such as sklearn, Lightgbm, xgboost and catboost have done a very good job…

  • Understanding Model Calibration: A Gentle Introduction & Visual Exploration

    Understanding Model Calibration: A Gentle Introduction & Visual Exploration How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibration and then dive…

  • Six Ways to Control Style and Content in Diffusion Models

    Six Ways to Control Style and Content in Diffusion Models Stable Diffusion 1.5/2.0/2.1/XL 1.0, DALL-E, Imagen… In the past years, Diffusion Models have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more specialised queries, for example generating images in a specific style,…

  • I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

    I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too…

  • Synthetic Data Generation with LLMs

    Synthetic Data Generation with LLMs Popularity of RAG Over the past two years while working with financial firms, I’ve observed firsthand how they identify and prioritize Generative AI use cases, balancing complexity with potential value. Retrieval-Augmented Generation (RAG) often stands out as a foundational capability across many LLM-driven solutions, striking a balance between ease of implementation…

  • The Method of Moments Estimator for Gaussian Mixture Models

    The Method of Moments Estimator for Gaussian Mixture Models Audio Processing is one of the most important application domains of digital signal processing (DSP) and machine learning. Modeling acoustic environments is an essential step in developing digital audio processing systems such as: speech recognition, speech enhancement, acoustic echo cancellation, etc. Acoustic environments are filled with background…

  • Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics

    Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics Metric collection is an essential part of every machine learning project, enabling us to track model performance and monitor training progress. Ideally, Metrics should be collected and computed without introducing any additional overhead to the training process. However, just like other components of the…

  • A Visual Guide to How Diffusion Models Work

    A Visual Guide to How Diffusion Models Work This article is aimed at those who want to understand exactly how Diffusion Models work, with no prior knowledge expected. I’ve tried to use illustrations wherever possible to provide visual intuitions on each part of these models. I’ve kept mathematical notation and equations to a minimum, and where…

  • Training Large Language Models: From TRPO to GRPO

    Training Large Language Models: From TRPO to GRPO Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning…

  • ML Feature Management: A Practical Evolution Guide

    ML Feature Management: A Practical Evolution Guide In the world of machine learning, we obsess over model architectures, training pipelines, and hyper-parameter tuning, yet often overlook a fundamental aspect: how our features live and breathe throughout their lifecycle. From in-memory calculations that vanish after each prediction to the challenge of reproducing exact feature values months…

  • How to Get Promoted as a Data Scientist

    How to Get Promoted as a Data Scientist Image artificially generated using Grok 2. Introduction I have been working as a Data Scientist since 2017, and during that time I have been promoted from a junior/mid-level to a senior, and most recently to a Lead Data Scientist. There is a lot of content online regarding…

  • 5 Essential Tips Learned from My Data Science Journey

    5 Essential Tips Learned from My Data Science Journey Personal reflections on my 10-year data odyssey Continue reading on Towards Data Science » Federico Rucci Go to original source

  • How to Make a Data Science Portfolio That Stands Out

    How to Make a Data Science Portfolio That Stands Out Create a data science portfolio with Cloud-flare and HUGO Continue reading on Towards Data Science » Egor Howell Go to original source

  • Are Data Scientists at Risk in 2025?

    Are Data Scientists at Risk in 2025? The impact of AI on data science jobs. Continue reading on Towards Data Science » Natassha Selvaraj Go to original source

  • Fine-tuning Multimodal Embedding Models

    Fine-tuning Multimodal Embedding Models Adapting CLIP to YouTube Data (with Python Code) This is the 4th article in a larger series on multimodal AI. In the previous post, we discussed multimodal RAG systems, which can retrieve and synthesize information from different data modalities (e.g. text, images, audio). There, we saw how we could implement such a…

  • How Likely Is a Six Nations Grand Slam in 2025?

    How Likely Is a Six Nations Grand Slam in 2025? Quantifying uncertainty in sports fixtures Photo by Thomas Serer on Unsplash Introduction For rugby fans the long wait is nearly over, like Christmas the Six Nations comes once a year to lift our spirits in the cold winter months. If you’re not very familiar with rugby, the…

  • Can Machines Dream? On the Creativity of Large Language Models

    Can Machines Dream? On the Creativity of Large Language Models Exploring the Role of Hallucinations, Dependencies, and Imagination in AI Creativity Continue reading on Towards Data Science » Salvatore Raieli Go to original source

  • 2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy

    2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU Continue reading on Towards Data Science » Benjamin Marie Go to original source

  • Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data

    Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data How much data does AI really need? TLDR: Data-centric AI can create more efficient and accurate models. I experimented with data pruning on MNIST¹ to classify handwritten digits. Best runs for “furthest-from-centroid” selection compared to full dataset. Image by author. What if I told you…

  • Actually, Being a Data Scientist is Awesome

    Actually, Being a Data Scientist is Awesome Don’t let the doom and gloom get to you Continue reading on Towards Data Science » Marina Wyss – Gratitude Driven Go to original source

  • Great Books for AI Engineering

    Great Books for AI Engineering 10 books with valuable insights about AI science and engineering Great books for AI Engineering — Plus ‘Brave New Words’ (Image is Author’s own work) A few years ago I recommended 21 books in Great Books for Data Science and Great Books for Data Science 2. Since then a lot has changed. While…

  • AI Ethics for the Everyday User — Why Should You Care?

    AI Ethics for the Everyday User — Why Should You Care? A beginner’s guide to understanding the importance of ethics in artificial intelligence Continue reading on Towards Data Science » Murtaza Ali Go to original source

  • NLP Illustrated, Part 3: Word2Vec

    NLP Illustrated, Part 3: Word2Vec An exhaustive and illustrated guide to Word2Vec with code! Continue reading on Towards Data Science » Shreya Rao Go to original source

  • The Challenges and Realities of Being a Data Scientist

    The Challenges and Realities of Being a Data Scientist Some harsh truths behind the field of data science Continue reading on Towards Data Science » Egor Howell Go to original source

  • Machine Learning Incidents in AdTech

    Machine Learning Incidents in AdTech Source: https://unsplash.com/photos/a-couple-of-signs-that-are-on-a-fence-xXbQIrWH2_A Challenges with deep learning in production One of the biggest challenges I encountered in my career as a data scientist was migrating the core algorithms in a mobile AdTech platform from classic machine learning models to deep learning. I worked on a Demand Side Platform (DSP) for user…

  • Basics of Probability Notations

    Basics of Probability Notations Union, Intersection, Independence, Disjoint, Complement: Advanced Probability for Data Science Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • Building a Regression Model: Delivery Duration Prediction

    Building a Regression Model: Delivery Duration Prediction Building a Regression Model to Predict Delivery Durations: A Practical Guide E2E walkthrough for approaching a regression modeling task In this article, we’re going to walk through the process of building a regression model — from dataset cleaning & preparation, to model training & evaluation. The specific regression task we will…

  • Beyond Causal Language Modeling

    Beyond Causal Language Modeling A deep dive into “Not All Tokens Are What You Need for Pretraining” Introduction A few days ago, I had the chance to present at a local reading group that focused on some of the most exciting and insightful papers from NeurIPS 2024. As a presenter, I selected a paper titled…

  • Your Neural Network Can’t Explain This. TMLE to the Rescue!

    Your Neural Network Can’t Explain This. TMLE to the Rescue! Targeted Maximum Likelihood Estimation (TMLE) helps you explain patterns where other techniques fall short Continue reading on Towards Data Science » Ari Joury, PhD Go to original source

  • How Cheap Mortgages Transformed Poland’s Real Estate Market

    How Cheap Mortgages Transformed Poland’s Real Estate Market Insights from a synthetic control group Continue reading on Towards Data Science » Lukasz Szubelak Go to original source

  • Choosing Classification Model Evaluation Criteria

    Choosing Classification Model Evaluation Criteria Is Recall / Precision better than Sensitivity / Specificity? Continue reading on Towards Data Science » Viyaleta Apgar Go to original source

  • Deep Learning for Click Prediction in Mobile AdTech

    Deep Learning for Click Prediction in Mobile AdTech Source: https://pixabay.com/illustrations/rays-stars-light-explosion-galaxy-9350519/ Machine Learning for Real-Time Bidding The past few years were a revolution for the mobile advertising and gaming industries, with the broad adoption of neural networks for advertising tasks, including click prediction. This migration occurred prior to the success of Large Language Models (LLMs) and…

  • Multi-Headed Cross Attention — By Hand

    Multi-Headed Cross Attention — By Hand Hand computing a fundamental component of multimodal models Continue reading on Towards Data Science » Daniel Warfield Go to original source

  • Avoid These Easily Missed Mistakes in Machine Learning Workflows — Part 2

    Avoid These Easily Missed Mistakes in Machine Learning Workflows — Part 2 Using Unavailable Data at Prediction Time and Mixing Magic Numbers with Real Numbers Continue reading on Towards Data Science » Thomas A Dorfer Go to original source

  • A Derivation and Application of Restricted Boltzmann Machines (2024 Nobel Prize)

    A Derivation and Application of Restricted Boltzmann Machines (2024 Nobel Prize) Investigating Geoffrey Hinton’s Nobel Prize-winning work and building it from scratch using PyTorch One recipient of the 2024 Nobel Prize in Physics was Geoffrey Hinton for his contributions in the field of AI and machine learning. A lot of people know he worked on neural…

  • Topic Modelling in Business Intelligence: FASTopic and BERTopic in Code

    Topic Modelling in Business Intelligence: FASTopic and BERTopic in Code A comparison of two cutting-edge dynamic topic models solving consumer complaints classification exercise Continue reading on Towards Data Science » Petr Korab Go to original source

  • How to Utilize ModernBERT and Synthetic Data for Robust Text Classification

    How to Utilize ModernBERT and Synthetic Data for Robust Text Classification Learn how to fine-tune ModernBERT and create augmentations of text samples Continue reading on Towards Data Science » Eivind Kjosbakken Go to original source

  • Large Language Models: A Short Introduction

    Large Language Models: A Short Introduction And why you should care about LLMs Image by author. There’s an acronym you’ve probably heard non-stop for the past few years: LLM, which stands for Large Language Model. In this article we’re going to take a brief look at what LLMs are, why they’re an extremely exciting piece of technology, why…