Category: deep-learning

  • The Total Derivative: Correcting the Misconception of Backpropagation’s Chain Rule

    The Total Derivative: Correcting the Misconception of Backpropagation’s Chain Rule This article uses concepts from this brilliant paper. For a deeper understanding of the mathematics please refer to the paper. Here we try to present the math in a more intuitive and explicit way, with some important nuances highlighted. 1 Introduction Discussions about Backpropagation often…

  • The CNN That Challenges ViT

    The CNN That Challenges ViT Introduction The invention of ViT (Vision Transformer) causes us to think that CNNs are obsolete.  But is this really true? It is widely believed that the impressive performance of ViT comes primarily from its transformer-based architecture. However, researchers from Meta argued that it’s not entirely true. If we take a closer…

  • Why Are Convolutional Neural Networks Great For Images?

    Why Are Convolutional Neural Networks Great For Images? The Universal Approximation Theorem states that a neural network with a single hidden layer and a nonlinear activation function can approximate any continuous function.  Practical issues aside, such that the number of neurons in this hidden layer would grow enormously large, we do not need other network architectures. A simple…

  • Reinforcement Learning from One Example?

    Reinforcement Learning from One Example? Prompt engineering alone won’t get us to production. Fine-tuning is expensive. And reinforcement learning? That’s been reserved for well-funded labs with massive datasets until now. New research from Microsoft and academic collaborators has overturned that assumption. Using Reinforcement Learning with Verifiable Rewards (RLVR) and just a single training example, researchers…

  • When Physics Meets Finance: Using AI to Solve Black-Scholes

    When Physics Meets Finance: Using AI to Solve Black-Scholes DISCLAIMER: This is not financial advice. I’m a PhD in Aerospace Engineering with a strong focus on Machine Learning: I’m not a financial advisor. This article is intended solely to demonstrate the power of Physics-Informed Neural Networks (PINNs) in a financial context. When I was 16,…

  • Google’s New AI System Outperforms Physicians in Complex Diagnoses

    Google’s New AI System Outperforms Physicians in Complex Diagnoses Imagine going to the doctor with a baffling set of symptoms. Getting the right diagnosis quickly is crucial, but sometimes even experienced physicians face challenges piecing together the puzzle. Sometimes it might not be something serious at all; others a deep investigation might be required. No…

  • Sesame  Speech Model:  How This Viral AI Model Generates Human-Like Speech

    Sesame  Speech Model:  How This Viral AI Model Generates Human-Like Speech Recently, Sesame AI published a demo of their latest Speech-to-Speech model. A conversational AI agent who is really good at speaking, they provide relevant answers, they speak with expressions, and honestly, they are just very fun and interactive to play with. Note that a…

  • The Basis of Cognitive Complexity: Teaching CNNs to See Connections

    The Basis of Cognitive Complexity: Teaching CNNs to See Connections Liberating education consists in acts of cognition, not transferrals of information. Paulo freire One of the most heated discussions around artificial intelligence is: What aspects of human learning is it capable of capturing? Many authors suggest that artificial intelligence models do not possess the same…

  • Deb8flow: Orchestrating Autonomous AI Debates with LangGraph and GPT-4o

    Deb8flow: Orchestrating Autonomous AI Debates with LangGraph and GPT-4o Introduction I’ve always been fascinated by debates—the strategic framing, the sharp retorts, and the carefully timed comebacks. Debates aren’t just entertaining; they’re structured battles of ideas, driven by logic and evidence. Recently, I started wondering: could we replicate that dynamic using AI agents—having them debate each…

  • The Art of Noise

    The Art of Noise Introduction In my last several articles I talked about generative deep learning algorithms, which mostly are related to text generation tasks. So, I think it would be interesting to switch to generative algorithms for image generation now. We knew that nowadays there have been plenty of deep learning models specialized for…

  • The Case for Centralized AI Model Inference Serving

    The Case for Centralized AI Model Inference Serving As AI models continue to increase in scope and accuracy, even tasks once dominated by traditional algorithms are gradually being replaced by Deep Learning models. Algorithmic pipelines — workflows that take an input, process it through a series of algorithms, and produce an output — increasingly rely…

  • A Simple Implementation of the Attention Mechanism from Scratch

    A Simple Implementation of the Attention Mechanism from Scratch Introduction The Attention Mechanism is often associated with the transformer architecture, but it was already used in RNNs. In Machine Translation or MT (e.g., English-Italian) tasks, when you want to predict the next Italian word, you need your model to focus, or pay attention, on the…

  • Understanding the Tech Stack Behind Generative AI

    Understanding the Tech Stack Behind Generative AI Understanding the Tech Stack Behind Generative AI When ChatGPT reached the one million user mark within five days and took off faster than any other technology in history, the world began to pay attention to artificial intelligence and AI applications. And so it continued apace. Since then, many…

  • The Art of Hybrid Architectures

    The Art of Hybrid Architectures In my previous article, I discussed how morphological feature extractors mimic the way biological experts visually assess images. This time, I want to go a step further and explore a new question:Can different architectures complement each other to build an AI that “sees” like an expert? Introduction: Rethinking Model Architecture…

  • The Ultimate AI/ML Roadmap For Beginners

    The Ultimate AI/ML Roadmap For Beginners AI is transforming the way businesses operate, and nearly every company is exploring how to leverage this technology. As a result, the demand for AI and machine learning skills has skyrocketed in recent years. With nearly four years of experience in AI/ML, I’ve decided to create the ultimate guide…

  • Image Captioning, Transformer Mode On

    Image Captioning, Transformer Mode On Introduction In my previous article, I discussed one of the earliest Deep Learning approaches for image captioning. If you’re interested in reading it, you can find the link to that article at the end of this one. Today, I would like to talk about Image Captioning again, but this time…

  • Deep Research by OpenAI: A Practical Test of AI-Powered Literature Review

    Deep Research by OpenAI: A Practical Test of AI-Powered Literature Review “Conduct a comprehensive literature review on the state-of-the-art in Machine Learning and energy consumption. […]” With this prompt, I tested the new Deep Research function, which has been integrated into the OpenAI o3 reasoning model since the end of February — and conducted a state-of-the-art literature…

  • Debugging the Dreaded NaN

    Debugging the Dreaded NaN You are training your latest AI model, anxiously watching as the loss steadily decreases when suddenly — boom! Your logs are flooded with NaNs (Not a Number) — your model is irreparably corrupted and you’re left staring at your screen in despair. To make matters worse, the NaNs don’t appear consistently.…

  • Breaking the Bottleneck: GPU-Optimised Video Processing for Deep Learning

    Breaking the Bottleneck: GPU-Optimised Video Processing for Deep Learning Deep Learning (DL) applications often require processing video data for tasks such as object detection, classification, and segmentation. However, conventional video processing pipelines are typically inefficient for deep learning inference, leading to performance bottlenecks. In this post will leverage PyTorch and FFmpeg with NVIDIA hardware acceleration…

  • Show and Tell

    Show and Tell Photo by Ståle Grut on Unsplash Introduction Natural Language Processing and Computer Vision used to be two completely different fields. Well, at least back when I started to learn machine learning and deep learning, I feel like there are multiple paths to follow, and each of them, including NLP and Computer Vision,…

  • Machine Learning Incidents in AdTech

    Machine Learning Incidents in AdTech Source: https://unsplash.com/photos/a-couple-of-signs-that-are-on-a-fence-xXbQIrWH2_A Challenges with deep learning in production One of the biggest challenges I encountered in my career as a data scientist was migrating the core algorithms in a mobile AdTech platform from classic machine learning models to deep learning. I worked on a Demand Side Platform (DSP) for user…

  • Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT

    Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT Mastering the art of fine-tuning: Learnings for training your own LLMs. (Image from Unsplash) This is the third article in our GPT series, and also the most practical one: finally, we will talk about how to effectively fine-tune LLMs. It is practical in the…

  • Satellite Image Classification with Deep Learning — Complete Project

    Satellite Image Classification with Deep Learning — Complete Project A Comprehensive Guide Using PyTorch and CNNs Continue reading on Towards Data Science » Leo Anello Go to original source

  • What Would a Stoic Do? — An AI-Based Decision-Making Model

    What Would a Stoic Do? — An AI-Based Decision-Making Model Using AI to build Marcus Aurelius’ reincarnation Continue reading on Towards Data Science » Pol Marin Go to original source

  • A Visual Understanding of Neural Networks

    A Visual Understanding of Neural Networks The math behind neural networks visually explained Continue reading on Towards Data Science » Reza Bagheri Go to original source

  • Sentiment Analysis with Transformers: A Complete Deep Learning Project — PT. I

    Sentiment Analysis with Transformers: A Complete Deep Learning Project — PT. I Master Fine-Tuning Transformers, Comparing Deep Learning Architectures, and Deploying Sentiment Analysis Models Continue reading on Towards Data Science » Leo Anello Go to original source

  • How Recurrent Neural Networks (RNNs) Are Revolutionizing Decision-Making Research

    How Recurrent Neural Networks (RNNs) Are Revolutionizing Decision-Making Research A deep dive into the world of computational modeling and its applications Continue reading on Towards Data Science » Kaushik Rajan Go to original source

  • Understanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired It

    Understanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired It Tracing the roots of ChatGPT: GPT-1, the foundation of OpenAI’s LLMs (Image from Unsplash) The GPT (Generative Pre-Training) model family, first introduced by OpenAI in 2018, is another important application of the Transformer architecture. It has since evolved through versions like…

  • Mastering Sensor Fusion: Color Image Obstacle Detection with KITTI Data — Part 2

    Mastering Sensor Fusion: Color Image Obstacle Detection with KITTI Data — Part 2 Mastering Sensor Fusion: Color Image Obstacle Detection with KITTI Data — Part 2 How to use color image data for object detection in the context of obstacle detection The concept of sensor fusion is a decision-making mechanism that can be applied to different problems and using different…

  • Mastering Model Uncertainty: Thresholding Techniques in Deep Learning

    Mastering Model Uncertainty: Thresholding Techniques in Deep Learning Image generated by Dall-e A few words on thresholding, the softmax activation function, introducing an extra label, and considerations regarding output activation functions. In many real-world applications, machine learning models are not designed to make decisions in an all-or-nothing manner. Instead, there are situations where it is more…

  • Conditional Variational Autoencoders for Text to Image Generation

    Conditional Variational Autoencoders for Text to Image Generation Investigating an early generative architecture and applying it to image generation from text input Recently I was tasked with text-to-image synthesis using a conditional variational autoencoder (CVAE). Being one of the earlier generative structures, it has its limitations but is easily implementable. This article will cover CVAEs at…

  • 100 Years of (eXplainable) AI

    100 Years of (eXplainable) AI Reflecting on advances and challenges in deep learning and explainability in the ever-evolving era of LLMs and AI governance Image by author Background Imagine you are navigating a self-driving car, relying entirely on its onboard computer to make split-second decisions. It detects objects, identifies pedestrians, and even can anticipate behavior of…

  • How to Use Structured Generation for LLM-as-a-Judge Evaluations

    How to Use Structured Generation for LLM-as-a-Judge Evaluations Structured generation is fundamental to building complex, multi-step reasoning agents in LLM evaluations — especially for open source models Source: Generated with SDXL 1.0 Disclosure: I am a maintainer of Opik, one of the open source projects used later in this article. For the past few months, I’ve been working on LLM-based…

  • Reinforcement Learning: Self-Driving Cars to Self-Driving Labs

    Reinforcement Learning: Self-Driving Cars to Self-Driving Labs Understanding AI applications in bio for machine learning engineers Photo by Ousa Chea on Unsplash Anyone who has tried teaching a dog new tricks knows the basics of reinforcement learning. We can modify the dog’s behavior by repeatedly offering rewards for obedience and punishments for misbehavior. In reinforcement learning…

  • Neuromorphic Computing — an Edgier, Greener AI

    Neuromorphic Computing — an Edgier, Greener AI Neuromorphic Computing — an Edgier, Greener AI Why computer hardware and AI algorithms are being reinvented using inspiration from the brain euromorphic Computing might not just help bring AI to the edge, but also reduce carbon emissions at data centers. Generated by author with ImageGen 3. There are periodic proclamations of the coming neuromorphic computing…

  • NLP Illustrated, Part 2: Word Embeddings

    NLP Illustrated, Part 2: Word Embeddings An illustrated and intuitive guide to word embeddings Continue reading on Towards Data Science » Shreya Rao Go to original source