Category: Reinforcemect Learning

How to Evaluate LLMs and Algorithms — The Right Way

How to Evaluate LLMs and Algorithms — The Right Way Never miss a new edition of The Variable, our weekly newsletter featuring a top-notch selection of editors’ picks, deep dives, community news, and more. Subscribe today! All the hard work it takes to integrate large language models and powerful algorithms into your workflows can go to waste…

May 24, 2025
Reinforcement Learning from One Example?

Reinforcement Learning from One Example? Prompt engineering alone won’t get us to production. Fine-tuning is expensive. And reinforcement learning? That’s been reserved for well-funded labs with massive datasets until now. New research from Microsoft and academic collaborators has overturned that assumption. Using Reinforcement Learning with Verifiable Rewards (RLVR) and just a single training example, researchers…

May 1, 2025
How to Train LLMs to “Think” (o1 & DeepSeek-R1)

How to Train LLMs to “Think” (o1 & DeepSeek-R1) In September 2024, OpenAI released its o1 model, trained on large-scale reinforcement learning, giving it “advanced reasoning” capabilities. Unfortunately, the details of how they pulled this off were never shared publicly. Today, however, DeepSeek (an AI research lab) has replicated this reasoning behavior and published the…

March 4, 2025
How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo Welcome to part 2 of my LLM deep dive. If you’ve not read Part 1, I highly encourage you to check it out first. Previously, we covered the first two major stages of training an LLM: Pre-training — Learning from massive datasets to form a base…

February 28, 2025
Reinforcement Learning with PDEs

Reinforcement Learning with PDEs Previously we discussed applying reinforcement learning to Ordinary Differential Equations (ODEs) by integrating ODEs within gymnasium. ODEs are a powerful tool that can describe a wide range of systems but are limited to a single variable. Partial Differential Equations (PDEs) are differential equations involving derivatives of multiple variables that can cover…

February 21, 2025
Learning How to Play Atari Games Through Deep Neural Networks

Learning How to Play Atari Games Through Deep Neural Networks In July 1959, Arthur Samuel developed one of the first agents to play the game of checkers. What constitutes an agent that plays checkers can be best described in Samuel’s own words, “…a computer [that] can be programmed so that it will learn to play…

February 19, 2025