Tag: llm

  • Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

    Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale Reducing LLM costs by 30% with validation-aware, multi-tier caching The post Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale appeared first on Towards Data Science. Partha Sarkar Go to original source

  • Mechanistic Interpretability: Peeking Inside an LLM

    Mechanistic Interpretability: Peeking Inside an LLM Are the human-like cognitive abilities of LLMs real or fake? How does information travel through the neural network? Is there hidden knowledge inside an LLM? The post Mechanistic Interpretability: Peeking Inside an LLM appeared first on Towards Data Science. Julian Mendel Go to original source

  • Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels

    Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels Why your final LLM layer is OOMing and how to fix it with a custom Triton kernel. The post Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels appeared first on Towards Data Science. Ryan Pégoud Go to original source

  • How to Scale Your LLM Usage

    How to Scale Your LLM Usage Learn how to increase LLM usage to achieve increased productivity The post How to Scale Your LLM Usage appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models

    LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models A step-by-step guide to building AI quality control using large language models The post LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models appeared first on Towards Data Science. Piero Paialunga Go…

  • Bayesian Evaluation of Large Language Model Behavior

    Bayesian Evaluation of Large Language Model Behavior arXiv:2511.10661v1 Announce Type: cross Abstract: It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts…

  • LLM-Powered Time-Series Analysis

    LLM-Powered Time-Series Analysis Part 2: Prompts for Advanced Model Development The post LLM-Powered Time-Series Analysis appeared first on Towards Data Science. Sara Nobrega Go to original source

  • 4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance

    4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance Learn how to greatly improve the performance of your LLM application The post 4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • Notes on LLM Evaluation

    Notes on LLM Evaluation A practical, step-by-step guide to building an evaluation pipeline for a real-world AI application The post Notes on LLM Evaluation appeared first on Towards Data Science. Felipe Adachi Go to original source

  • Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

    Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output A hands-on example of building a time-series anomaly detection system entirely through visualization and prompting The post Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output appeared first on Towards…

  • BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

    BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design arXiv:2508.21184v1 Announce Type: cross Abstract: We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to…

  • LLM Monitoring and Observability: Hands-on with Langfuse

    LLM Monitoring and Observability: Hands-on with Langfuse Learn the fundamentals of LLM monitoring and observability, from tracing to evaluation and setting up a dashboard using Langfuse The post LLM Monitoring and Observability: Hands-on with Langfuse appeared first on Towards Data Science. Ahmad Talal Riaz Go to original source

  • Systematic LLM Prompt Engineering Using DSPy Optimization

    Systematic LLM Prompt Engineering Using DSPy Optimization This article is a journey into the fascinating and rapidly evolving science of LLM prompt iteration, which is a fundamental part of Large Language Model Operations (LLMOPs). We’ll use the example of generating customer service responses with a real-world dataset to show how both generator and LLM-judge prompts…

  • How We Reduced LLM Costs by 90% with 5 Lines of Code

    How We Reduced LLM Costs by 90% with 5 Lines of Code When clean code hides inefficiencies: what we learned from fixing a few lines of code and saving 90% in LLM cost. The post How We Reduced LLM Costs by 90% with 5 Lines of Code appeared first on Towards Data Science. Uri Peled Go to…

  • How to Create Powerful LLM Applications with Context Engineering

    How to Create Powerful LLM Applications with Context Engineering Improve your LLM by optimizing its context The post How to Create Powerful LLM Applications with Context Engineering appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • How to Ensure Reliability in LLM Applications

    How to Ensure Reliability in LLM Applications Learn how to make your LLM applications more robust The post How to Ensure Reliability in LLM Applications appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • From Equal Weights to Smart Weights: OTPO’s Approach to Better LLM Alignment

    From Equal Weights to Smart Weights: OTPO’s Approach to Better LLM Alignment Using optimal transport to weight what matters most In LLM-generated responses The post From Equal Weights to Smart Weights: OTPO’s Approach to Better LLM Alignment appeared first on Towards Data Science. Sudheer Singh Go to original source

  • Recap of all types of LLM Agents

    Recap of all types of LLM Agents Regular, ReAct, Chain-of-Thought, Reflexion, ToT, GoT, PoT The post Recap of all types of LLM Agents appeared first on Towards Data Science. Mauro Di Pietro Go to original source

  • Software Engineering in the LLM Era

    Software Engineering in the LLM Era On growing new software engineers, even when it’s inefficient The post Software Engineering in the LLM Era appeared first on Towards Data Science. Stephanie Kirmer Go to original source

  • LLM-as-a-Judge: A Practical Guide

    LLM-as-a-Judge: A Practical Guide How to Scale LLM Evaluations Beyond Manual Review The post LLM-as-a-Judge: A Practical Guide appeared first on Towards Data Science. Shuai Guo Go to original source

  • LLM-Powered CPI Prediction Inference with Online Text Time Series

    LLM-Powered CPI Prediction Inference with Online Text Time Series arXiv:2506.09516v1 Announce Type: new Abstract: Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text…

  • LLM Optimization: LoRA and QLoRA

    LLM Optimization: LoRA and QLoRA Scalable fine-tuning techniques for large language models The post LLM Optimization: LoRA and QLoRA appeared first on Towards Data Science. Vyacheslav Efimov Go to original source

  • GAIA: The LLM Agent Benchmark Everyone’s Talking About

    GAIA: The LLM Agent Benchmark Everyone’s Talking About What practitioners need to know about this LLM agent benchmark The post GAIA: The LLM Agent Benchmark Everyone’s Talking About appeared first on Towards Data Science. Shuai Guo Go to original source

  • Attaining LLM Certainty with AI Decision Circuits

    Attaining LLM Certainty with AI Decision Circuits The promise of AI agents has taken the world by storm. Agents can interact with the world around them, write articles (not this one though), take actions on your behalf, and generally make the difficult part of automating any task easy and approachable.  Agents take aim at the most…

  • From FOMO to Opportunity: Analytical AI in the Era of LLM Agents

    From FOMO to Opportunity: Analytical AI in the Era of LLM Agents Are you feeling “fear of missing out” (FOMO) when it comes to LLM agents? Well, that was the case for me for quite a while. In recent months, it feels like my online feeds have been completely bombarded by “LLM Agents”: every other…

  • LLM Evaluations: from Prototype to Production

    LLM Evaluations: from Prototype to Production Evaluation is the cornerstone of any machine learning product. Investing in quality measurement delivers significant returns. Let’s explore the potential business benefits. As management consultant and writer Peter Drucker once said, “If you can’t measure it, you can’t improve it.” Building a robust evaluation system helps you identify areas…

  • Load-Testing LLMs Using LLMPerf

    Load-Testing LLMs Using LLMPerf Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level…

  • Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

    Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents arXiv:2504.07347v1 Announce Type: new Abstract: As demand for Large Language Models (LLMs) and AI agents rapidly grows, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little is explored through a mathematical modeling and queuing perspective. In this paper, we…

  • StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization

    StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization arXiv:2504.05804v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into information retrieval systems introduces new attack surfaces, particularly for adversarial ranking manipulations. We present StealthRank, a novel adversarial ranking attack that manipulates LLM-driven product recommendation systems while maintaining textual fluency and stealth. Unlike existing…

  • How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

    How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo Welcome to part 2 of my LLM deep dive. If you’ve not read Part 1, I highly encourage you to check it out first.  Previously, we covered the first two major stages of training an LLM: Pre-training — Learning from massive datasets to form a base…

  • 6 Common LLM Customization Strategies Briefly Explained

    6 Common LLM Customization Strategies Briefly Explained Why Customize LLMs? Large Language Models (Llms) are deep learning models pre-trained based on self-supervised learning, requiring a vast amount of resources on training data, training time and holding a large number of parameters. LLM have revolutionized natural language processing especially in the last 2 years, demonstrating remarkable…

  • How to Measure the Reliability of a Large Language Model’s Response

    How to Measure the Reliability of a Large Language Model’s Response The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophisticated when it…

  • I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

    I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too…

  • Large Language Models: A Short Introduction

    Large Language Models: A Short Introduction And why you should care about LLMs Image by author. There’s an acronym you’ve probably heard non-stop for the past few years: LLM, which stands for Large Language Model. In this article we’re going to take a brief look at what LLMs are, why they’re an extremely exciting piece of technology, why…

  • An Agentic Approach to Reducing LLM Hallucinations

    An Agentic Approach to Reducing LLM Hallucinations Simple techniques to alleviate LLM hallucinations using LangGraph Photo by Greg Rakozy on Unsplash If you’ve worked with LLMs, you know they can sometimes hallucinate. This means they generate text that’s either nonsensical or contradicts the input data. It’s a common issue that can hurts the reliability of LLM-powered…

  • From Prototype to Production: Enhancing LLM Accuracy

    From Prototype to Production: Enhancing LLM Accuracy Implementing evaluation frameworks to optimize accuracy in real-world applications Image created by DALL-E 3 Building a prototype for an LLM application is surprisingly straightforward. You can often create a functional first version within just a few hours. This initial prototype will likely provide results that look legitimate and be…

  • Classifier-Free Guidance in LLMs Safety — NeurIPS 2024 Challenge Experience

    Classifier-Free Guidance in LLMs Safety — NeurIPS 2024 Challenge Experience Classifier-Free Guidance in LLMs Safety — NeurIPS 2024 Challenge Experience This article briefly describes NeurIPS 2024 LLM-PC submission that was awarded the second prize — the approach to effective LLM unlearning without any retaining dataset. This is achieved through the formulation of the unlearning task as an alignment problem with the…

  • Structured LLM Output Using Ollama

    Structured LLM Output Using Ollama Control your model responses effectively Continue reading on Towards Data Science » Thomas Reid Go to original source

  • How to Use Structured Generation for LLM-as-a-Judge Evaluations

    How to Use Structured Generation for LLM-as-a-Judge Evaluations Structured generation is fundamental to building complex, multi-step reasoning agents in LLM evaluations — especially for open source models Source: Generated with SDXL 1.0 Disclosure: I am a maintainer of Opik, one of the open source projects used later in this article. For the past few months, I’ve been working on LLM-based…

  • How to Build a General-Purpose LLM Agent

    How to Build a General-Purpose LLM Agent A Step-by-Step Guide High-level Overview of an LLM Agent. (Image by author) Why build a general-purpose agent? Because it’s an excellent tool to prototype your use cases and lays the groundwork for designing your own custom agentic architecture. Before we dive in, let’s quickly introduce LLM agents. Feel free…

  • Building an LLM fine-tuning Dataset

    Building an LLM fine-tuning Dataset sentdex Go to original source