Tag: llm

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale Reducing LLM costs by 30% with validation-aware, multi-tier caching The post Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale appeared first on Towards Data Science. Partha Sarkar Go to original source

March 2, 2026
Mechanistic Interpretability: Peeking Inside an LLM

Mechanistic Interpretability: Peeking Inside an LLM Are the human-like cognitive abilities of LLMs real or fake? How does information travel through the neural network? Is there hidden knowledge inside an LLM? The post Mechanistic Interpretability: Peeking Inside an LLM appeared first on Towards Data Science. Julian Mendel Go to original source

February 6, 2026
Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels

Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels Why your final LLM layer is OOMing and how to fix it with a custom Triton kernel. The post Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels appeared first on Towards Data Science. Ryan Pégoud Go to original source

January 17, 2026
How to Scale Your LLM Usage

How to Scale Your LLM Usage Learn how to increase LLM usage to achieve increased productivity The post How to Scale Your LLM Usage appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

November 30, 2025
LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models

LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models A step-by-step guide to building AI quality control using large language models The post LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models appeared first on Towards Data Science. Piero Paialunga Go…

November 25, 2025
Bayesian Evaluation of Large Language Model Behavior

Bayesian Evaluation of Large Language Model Behavior arXiv:2511.10661v1 Announce Type: cross Abstract: It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts…

November 17, 2025
LLM-Powered Time-Series Analysis

LLM-Powered Time-Series Analysis Part 2: Prompts for Advanced Model Development The post LLM-Powered Time-Series Analysis appeared first on Towards Data Science. Sara Nobrega Go to original source

November 10, 2025
4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance

4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance Learn how to greatly improve the performance of your LLM application The post 4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

October 30, 2025
Notes on LLM Evaluation

Notes on LLM Evaluation A practical, step-by-step guide to building an evaluation pipeline for a real-world AI application The post Notes on LLM Evaluation appeared first on Towards Data Science. Felipe Adachi Go to original source

September 26, 2025
Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output A hands-on example of building a time-series anomaly detection system entirely through visualization and prompting The post Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output appeared first on Towards…

September 21, 2025
How to Enrich LLM Context to Significantly Enhance Capabilities

How to Enrich LLM Context to Significantly Enhance Capabilities Learn how to empower your LLMs by leveraging additional metadata The post How to Enrich LLM Context to Significantly Enhance Capabilities appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

September 17, 2025
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design arXiv:2508.21184v1 Announce Type: cross Abstract: We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to…

September 1, 2025
LLM Monitoring and Observability: Hands-on with Langfuse

LLM Monitoring and Observability: Hands-on with Langfuse Learn the fundamentals of LLM monitoring and observability, from tracing to evaluation and setting up a dashboard using Langfuse The post LLM Monitoring and Observability: Hands-on with Langfuse appeared first on Towards Data Science. Ahmad Talal Riaz Go to original source

August 26, 2025
Systematic LLM Prompt Engineering Using DSPy Optimization

Systematic LLM Prompt Engineering Using DSPy Optimization This article is a journey into the fascinating and rapidly evolving science of LLM prompt iteration, which is a fundamental part of Large Language Model Operations (LLMOPs). We’ll use the example of generating customer service responses with a real-world dataset to show how both generator and LLM-judge prompts…

August 26, 2025
How We Reduced LLM Costs by 90% with 5 Lines of Code

How We Reduced LLM Costs by 90% with 5 Lines of Code When clean code hides inefficiencies: what we learned from fixing a few lines of code and saving 90% in LLM cost. The post How We Reduced LLM Costs by 90% with 5 Lines of Code appeared first on Towards Data Science. Uri Peled Go to…

August 22, 2025
How to Create Powerful LLM Applications with Context Engineering

How to Create Powerful LLM Applications with Context Engineering Improve your LLM by optimizing its context The post How to Create Powerful LLM Applications with Context Engineering appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

August 19, 2025
How to Ensure Reliability in LLM Applications

How to Ensure Reliability in LLM Applications Learn how to make your LLM applications more robust The post How to Ensure Reliability in LLM Applications appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

July 16, 2025
From Equal Weights to Smart Weights: OTPO’s Approach to Better LLM Alignment

From Equal Weights to Smart Weights: OTPO’s Approach to Better LLM Alignment Using optimal transport to weight what matters most In LLM-generated responses The post From Equal Weights to Smart Weights: OTPO’s Approach to Better LLM Alignment appeared first on Towards Data Science. Sudheer Singh Go to original source

July 16, 2025
Recap of all types of LLM Agents

Recap of all types of LLM Agents Regular, ReAct, Chain-of-Thought, Reflexion, ToT, GoT, PoT The post Recap of all types of LLM Agents appeared first on Towards Data Science. Mauro Di Pietro Go to original source

July 10, 2025
Software Engineering in the LLM Era

Software Engineering in the LLM Era On growing new software engineers, even when it’s inefficient The post Software Engineering in the LLM Era appeared first on Towards Data Science. Stephanie Kirmer Go to original source

July 3, 2025
LLM-as-a-Judge: A Practical Guide

LLM-as-a-Judge: A Practical Guide How to Scale LLM Evaluations Beyond Manual Review The post LLM-as-a-Judge: A Practical Guide appeared first on Towards Data Science. Shuai Guo Go to original source

June 20, 2025
LLM-Powered CPI Prediction Inference with Online Text Time Series

LLM-Powered CPI Prediction Inference with Online Text Time Series arXiv:2506.09516v1 Announce Type: new Abstract: Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text…

June 12, 2025
LLM Optimization: LoRA and QLoRA

LLM Optimization: LoRA and QLoRA Scalable fine-tuning techniques for large language models The post LLM Optimization: LoRA and QLoRA appeared first on Towards Data Science. Vyacheslav Efimov Go to original source

May 31, 2025
GAIA: The LLM Agent Benchmark Everyone’s Talking About

GAIA: The LLM Agent Benchmark Everyone’s Talking About What practitioners need to know about this LLM agent benchmark The post GAIA: The LLM Agent Benchmark Everyone’s Talking About appeared first on Towards Data Science. Shuai Guo Go to original source

May 30, 2025
Attaining LLM Certainty with AI Decision Circuits

Attaining LLM Certainty with AI Decision Circuits The promise of AI agents has taken the world by storm. Agents can interact with the world around them, write articles (not this one though), take actions on your behalf, and generally make the difficult part of automating any task easy and approachable. Agents take aim at the most…

May 3, 2025
From FOMO to Opportunity: Analytical AI in the Era of LLM Agents

From FOMO to Opportunity: Analytical AI in the Era of LLM Agents Are you feeling “fear of missing out” (FOMO) when it comes to LLM agents? Well, that was the case for me for quite a while. In recent months, it feels like my online feeds have been completely bombarded by “LLM Agents”: every other…

April 30, 2025
LLM Evaluations: from Prototype to Production

LLM Evaluations: from Prototype to Production Evaluation is the cornerstone of any machine learning product. Investing in quality measurement delivers significant returns. Let’s explore the potential business benefits. As management consultant and writer Peter Drucker once said, “If you can’t measure it, you can’t improve it.” Building a robust evaluation system helps you identify areas…

April 26, 2025
Load-Testing LLMs Using LLMPerf

Load-Testing LLMs Using LLMPerf Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level…

April 19, 2025
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents arXiv:2504.07347v1 Announce Type: new Abstract: As demand for Large Language Models (LLMs) and AI agents rapidly grows, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little is explored through a mathematical modeling and queuing perspective. In this paper, we…

April 11, 2025
StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization

StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization arXiv:2504.05804v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into information retrieval systems introduces new attack surfaces, particularly for adversarial ranking manipulations. We present StealthRank, a novel adversarial ranking attack that manipulates LLM-driven product recommendation systems while maintaining textual fluency and stealth. Unlike existing…

April 10, 2025
How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo Welcome to part 2 of my LLM deep dive. If you’ve not read Part 1, I highly encourage you to check it out first. Previously, we covered the first two major stages of training an LLM: Pre-training — Learning from massive datasets to form a base…

February 28, 2025
6 Common LLM Customization Strategies Briefly Explained

6 Common LLM Customization Strategies Briefly Explained Why Customize LLMs? Large Language Models (Llms) are deep learning models pre-trained based on self-supervised learning, requiring a vast amount of resources on training data, training time and holding a large number of parameters. LLM have revolutionized natural language processing especially in the last 2 years, demonstrating remarkable…

February 25, 2025
How to Measure the Reliability of a Large Language Model’s Response

How to Measure the Reliability of a Large Language Model’s Response The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophisticated when it…

February 13, 2025
I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too…

February 8, 2025
Large Language Models: A Short Introduction

Large Language Models: A Short Introduction And why you should care about LLMs Image by author. There’s an acronym you’ve probably heard non-stop for the past few years: LLM, which stands for Large Language Model. In this article we’re going to take a brief look at what LLMs are, why they’re an extremely exciting piece of technology, why…

January 22, 2025
An Agentic Approach to Reducing LLM Hallucinations

An Agentic Approach to Reducing LLM Hallucinations Simple techniques to alleviate LLM hallucinations using LangGraph Photo by Greg Rakozy on Unsplash If you’ve worked with LLMs, you know they can sometimes hallucinate. This means they generate text that’s either nonsensical or contradicts the input data. It’s a common issue that can hurts the reliability of LLM-powered…

December 23, 2024
From Prototype to Production: Enhancing LLM Accuracy

From Prototype to Production: Enhancing LLM Accuracy Implementing evaluation frameworks to optimize accuracy in real-world applications Image created by DALL-E 3 Building a prototype for an LLM application is surprisingly straightforward. You can often create a functional first version within just a few hours. This initial prototype will likely provide results that look legitimate and be…

December 20, 2024
Classifier-Free Guidance in LLMs Safety — NeurIPS 2024 Challenge Experience

Classifier-Free Guidance in LLMs Safety — NeurIPS 2024 Challenge Experience Classifier-Free Guidance in LLMs Safety — NeurIPS 2024 Challenge Experience This article briefly describes NeurIPS 2024 LLM-PC submission that was awarded the second prize — the approach to effective LLM unlearning without any retaining dataset. This is achieved through the formulation of the unlearning task as an alignment problem with the…

December 19, 2024
Structured LLM Output Using Ollama

Structured LLM Output Using Ollama Control your model responses effectively Continue reading on Towards Data Science » Thomas Reid Go to original source

December 17, 2024
How to Use Structured Generation for LLM-as-a-Judge Evaluations

How to Use Structured Generation for LLM-as-a-Judge Evaluations Structured generation is fundamental to building complex, multi-step reasoning agents in LLM evaluations — especially for open source models Source: Generated with SDXL 1.0 Disclosure: I am a maintainer of Opik, one of the open source projects used later in this article. For the past few months, I’ve been working on LLM-based…

December 11, 2024
How to Build a General-Purpose LLM Agent

How to Build a General-Purpose LLM Agent A Step-by-Step Guide High-level Overview of an LLM Agent. (Image by author) Why build a general-purpose agent? Because it’s an excellent tool to prototype your use cases and lays the groundwork for designing your own custom agentic architecture. Before we dive in, let’s quickly introduce LLM agents. Feel free…

December 5, 2024
Building an LLM fine-tuning Dataset

Building an LLM fine-tuning Dataset sentdex Go to original source

November 27, 2024