Category: large-language-models

  • LyRec: A Song Recommender That Reads Between the Lyrics

    LyRec: A Song Recommender That Reads Between the Lyrics This is how I built an emotionally intelligent LLM-powered song recommendation system. Photo by David Pupăză on Unsplash Do you remember the last time you found yourself obsessing over a song? Maybe it was the raw emotion that resonated with you, or perhaps it was the lyrics…

  • Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT

    Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT Mastering the art of fine-tuning: Learnings for training your own LLMs. (Image from Unsplash) This is the third article in our GPT series, and also the most practical one: finally, we will talk about how to effectively fine-tune LLMs. It is practical in the…

  • Advancing AI Reasoning: Meta-CoT and System 2 Thinking

    Advancing AI Reasoning: Meta-CoT and System 2 Thinking How Meta-CoT enhances system 2 reasoning for complex AI challenges Continue reading on Towards Data Science » Kaushik Rajan Go to original source

  • Why LLMs Suck at ASCII Art

    Why LLMs Suck at ASCII Art How being bad at art can be so dangerous Large Language Models have been doing a pretty good job of knocking down challenge after challenge in areas both expected and not. From writing poetry to generating entire websites from questionably… drawn images, these models seem almost unstoppable (and dire…

  • Deep Dive into KV-Caching In Mistral

    Deep Dive into KV-Caching In Mistral Ever wondered why the time to first token in LLMs is high but subsequent tokens are super fast? In this post, I dive into the details of KV-Caching used in Mistral, a topic I initially found quite daunting. However, as I delved deeper, it became a fascinating subject, especially when…

  • llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models

    llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models Exploring llama.cpp internals and a basic chat program flow Photo by Mathew Schwartz on Unsplash llama.cpp has revolutionized the space of LLM inference by the means of wide adoption and simplicity. It has enabled enterprises and individual developers to deploy LLMs on devices ranging from SBCs…

  • Linearizing Llama

    Linearizing Llama Speeding up Llama: A hybrid approach to attention mechanisms Source: Image by Author (Generated using Gemini 1.5 Flash) In this article, we will see how to replace softmax self-attention in Llama-3.2-1B with hybrid attention combining softmax sliding window and linear attention. This implementation will help us better understand the growing interest in linear attention…

  • AI Agents Hype, Explained — What You Really Need to Know to Get Started

    AI Agents Hype, Explained — What You Really Need to Know to Get Started I’ll set the record straight — AI Agents are not new but advanced. Learn how they’ve evolved and where to get started. Continue reading on Towards Data Science » Marc Nehme Go to original source

  • AI-Powered Information Extraction and Matchmaking

    AI-Powered Information Extraction and Matchmaking Developing an application for extracting key profile information from CVs and recommending jobs aligned with the profile Continue reading on Towards Data Science » Umair Ali Khan Go to original source

  • Unlocking the Untapped Potential of Retrieval-Augmented Generation (RAG) Pipelines

    Unlocking the Untapped Potential of Retrieval-Augmented Generation (RAG) Pipelines Essential Metrics and Methods to Enhance Performance Across Retrieval, Generation, and End-to-End Pipelines Continue reading on Towards Data Science » Saleh Alkhalifa Go to original source

  • Linearizing Attention

    Linearizing Attention Breaking the quadratic barrier: modern alternatives to softmax attention Large Languange Models are great but they have a slight drawback that they use softmax attention which can be computationally intensive. In this article we will explore if there is a way we can replace the softmax somehow to achieve linear time complexity. Image…

  • Semantically Compress Text to Save On LLM Costs

    Semantically Compress Text to Save On LLM Costs LLMs are great… if they can fit all of your data Photo by Christopher Burns on Unsplash Originally published at https://blog.developer.bazaarvoice.com on October 28, 2024. Introduction Large language models are fantastic tools for unstructured text, but what if your text doesn’t fit in the context window? Bazaarvoice faced exactly this…

  • Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models

    Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models In this article we will explore why 128K tokens and more models can’t fully replace using RAG. Continue reading on Towards Data Science » Jérôme DIAZ Go to original source

  • Transformers Key-Value (KV) Caching Explained

    Transformers Key-Value (KV) Caching Explained Speed up your LLM inference Continue reading on Towards Data Science » Michał Oleszak Go to original source

  • Translating a Memoir: A Technical Journey

    Translating a Memoir: A Technical Journey Leveraging GPT-3.5 and unstructured APIs for translations This blog post details how I utilised GPT to translate the personal memoir of a family friend, making it accessible to a broader audience. Specifically, I employed GPT-3.5 for translation and Unstructured’s APIs for efficient content extraction and formatting. The memoir, a…

  • How to Use Structured Generation for LLM-as-a-Judge Evaluations

    How to Use Structured Generation for LLM-as-a-Judge Evaluations Structured generation is fundamental to building complex, multi-step reasoning agents in LLM evaluations — especially for open source models Source: Generated with SDXL 1.0 Disclosure: I am a maintainer of Opik, one of the open source projects used later in this article. For the past few months, I’ve been working on LLM-based…

  • Scientists Go Serious About Large Language Models Mirroring Human Thinking

    Scientists Go Serious About Large Language Models Mirroring Human Thinking A discussion of the latest research suggesting that LLMs do work like the human brain—with some substantial differences Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

  • Making News Recommendations Explainable with Large Language Models

    Making News Recommendations Explainable with Large Language Models A prompt-based experiment to improve both accuracy and transparent reasoning in content personalization. Deliver relevant content to readers at the right time. Image by author. At DER SPIEGEL, we are continually exploring ways to improve how we recommend news articles to our readers. In our latest (offline) experiment,…

  • How to Prune LLaMA 3.2 and Similar Large Language Models

    How to Prune LLaMA 3.2 and Similar Large Language Models This article explores a structured pruning technique for state-of-the-art models, that uses a GLU architecture, enabling the creation of… Continue reading on Towards Data Science » Pere Martra Go to original source

  • Mistral 7B Explained: Towards More Efficient Language Models

    Mistral 7B Explained: Towards More Efficient Language Models RMS Norm, RoPE, GQA, SWA, KV Cache, and more! Part 5 in the “LLMs from Scratch” series — a complete guide to understanding and building Large Language Models. If you are interested in learning more about how these models work I encourage you to read: Part 1: Tokenization — A Complete Guide Part 2:…