Category: deep-dives

  • ➡️ Start Asking Your Data ‘Why?’ — A Gentle Intro To Causality

    ➡️ Start Asking Your Data ‘Why?’ — A Gentle Intro To Causality Correlation does not imply causation. It turns out, however, that with some simple ingenious tricks one can, potentially, unveil causal relationships within standard observational data, without having to resort to expensive randomised control trials. This post is targeted towards anyone making data driven…

  • The Gamma Hurdle Distribution

    The Gamma Hurdle Distribution Which Outcome Matters? Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be…

  • A Visual Guide to How Diffusion Models Work

    A Visual Guide to How Diffusion Models Work This article is aimed at those who want to understand exactly how Diffusion Models work, with no prior knowledge expected. I’ve tried to use illustrations wherever possible to provide visual intuitions on each part of these models. I’ve kept mathematical notation and equations to a minimum, and where…

  • Injecting domain expertise into your AI system

    Injecting domain expertise into your AI system How to connect the dots between AI technology and real life (Source: Getty Images) When starting their AI initiatives, many companies are trapped in silos and treat AI as a purely technical enterprise, sidelining domain experts or involving them too late. They end up with generic AI applications that miss…

  • Inequality in Practice: E-commerce Portfolio Analysis

    Inequality in Practice: E-commerce Portfolio Analysis From Mathematical Theory to Actionable Insights: A 6-Year Shopify Case Study Image generated by DALL-E, based on author’s prompt, inspired by “The Bremen Town Musicians” Are your top-selling products making or breaking your business? It’s terrifying to think your entire revenue might collapse if one or two products fall out…

  • Analyze Tornado Data with Python and GeoPandas

    Analyze Tornado Data with Python and GeoPandas Insights from NOAA’s public domain database Continue reading on Towards Data Science » Lee Vaughan Go to original source

  • Why Generative-AI Apps’ Quality Often Sucks and What to Do About It

    Why Generative-AI Apps’ Quality Often Sucks and What to Do About It How to get from PoCs to tested high-quality applications in production Image licensed from elements.envato.com, edit by Marcel Müller, 2025 The generative AI hype has rolled through the business world in the past two years. This technology can make business process executions more efficient,…

  • Where to Start When Data is Limited

    Where to Start When Data is Limited A launch pad for projects with small datasets Photo by Google DeepMind: https://www.pexels.com/photo/an-artist-s-illustration-of-artificial-intelligence-ai-this-image-depicts-how-ai-can-help-humans-to-understand-the-complexity-of-biology-it-was-created-by-artist-khyati-trehan-as-part-17484975/ Machine Learning (ML) has driven remarkable breakthroughs in computer vision, natural language processing, and speech recognition, largely due to the abundance of data in these fields. However, many challenges — especially those tied to specific product features or…

  • The AI (R)Evolution, Looking From 2024 Into the Immediate Future

    The AI (R)Evolution, Looking From 2024 Into the Immediate Future Witnessing rapid innovation, fierce competition, and transformative tools for life, work, and human development Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

  • A Visual Understanding of Neural Networks

    A Visual Understanding of Neural Networks The math behind neural networks visually explained Continue reading on Towards Data Science » Reza Bagheri Go to original source

  • GDD: Generative Driven Design

    GDD: Generative Driven Design Reflective generative AI software components as a development paradigm Nowhere has the proliferation of generative AI tooling been more aggressive than in the world of software development. It began with GitHub Copilot’s supercharged autocomplete, then exploded into direct code-along integrated tools like Aider and Cursor that allow software engineers to dictate…

  • Top 12 Skills Data Scientists Need to Succeed in 2025

    Top 12 Skills Data Scientists Need to Succeed in 2025 It’s (not) all about LLMs and AI tools Continue reading on Towards Data Science » Benjamin Bodner Go to original source

  • A Bird’s-Eye View of Linear Algebra: Orthonormal Matrices

    A Bird’s-Eye View of Linear Algebra: Orthonormal Matrices Orthonormal matrices: the most elegant matrices in all of linear algebra. Continue reading on Towards Data Science » Rohit Pandey Go to original source

  • 100 Years of (eXplainable) AI

    100 Years of (eXplainable) AI Reflecting on advances and challenges in deep learning and explainability in the ever-evolving era of LLMs and AI governance Image by author Background Imagine you are navigating a self-driving car, relying entirely on its onboard computer to make split-second decisions. It detects objects, identifies pedestrians, and even can anticipate behavior of…

  • The Anatomy of an Autonomous Agent

    The Anatomy of an Autonomous Agent A blueprint for autonomous agents in an Agentic Mesh ecosystem. Continue reading on Towards Data Science » Eric Broda Go to original source

  • Transformers Key-Value (KV) Caching Explained

    Transformers Key-Value (KV) Caching Explained Speed up your LLM inference Continue reading on Towards Data Science » Michał Oleszak Go to original source

  • Missing Data in Time-Series: Machine Learning Techniques

    Missing Data in Time-Series: Machine Learning Techniques Part 1: Leverage linear regression and decision trees to impute time-series gaps. Continue reading on Towards Data Science » Sara Nóbrega Go to original source

  • Bridging the Data Literacy Gap

    Bridging the Data Literacy Gap The Advent, Evolution, and Current state of “Data Translators” Introduction With Data being constantly glorified as the most valuable asset organizations can own, leaders and decision-makers are always looking for effective ways to put their data insights to use. Every time customers interact with digital products, millions of data points…

  • Introducing Univariate Exemplar Recommenders: how to profile Customer Behavior in a single vector

    Introducing Univariate Exemplar Recommenders: how to profile Customer Behavior in a single vector Customer Profiling Surveying and improving the current methodologies for customer profiling ***To understand this article, knowledge of embeddings, clustering, and recommendation systems is required. The implementation of this algorithm has been released on GitHub and is fully open-source. I am open to…

  • The Name That Broke ChatGPT: Who is David Mayer?

    The Name That Broke ChatGPT: Who is David Mayer? AI, privacy, human bias, prompting, the future of content, and how to hack a chatbot Continue reading on Towards Data Science » Cassie Kozyrkov Go to original source

  • The Intuition behind Concordance Index — Survival Analysis

    The Intuition behind Concordance Index — Survival Analysis The Intuition behind Concordance Index — Survival Analysis Ranking accuracy versus absolute accuracy Taken by the author and her Border Collie. “Be thankful for what you have. Be fearless for what you want” How long would you keep your Gym membership before you decide to cancel it? or Netflix if you are a series…

  • Mistral 7B Explained: Towards More Efficient Language Models

    Mistral 7B Explained: Towards More Efficient Language Models RMS Norm, RoPE, GQA, SWA, KV Cache, and more! Part 5 in the “LLMs from Scratch” series — a complete guide to understanding and building Large Language Models. If you are interested in learning more about how these models work I encourage you to read: Part 1: Tokenization — A Complete Guide Part 2:…