Category: large-language-models

LyRec: A Song Recommender That Reads Between the Lyrics

LyRec: A Song Recommender That Reads Between the Lyrics This is how I built an emotionally intelligent LLM-powered song recommendation system. Photo by David Pupăză on Unsplash Do you remember the last time you found yourself obsessing over a song? Maybe it was the raw emotion that resonated with you, or perhaps it was the lyrics…

January 22, 2025
Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT

Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT Mastering the art of fine-tuning: Learnings for training your own LLMs. (Image from Unsplash) This is the third article in our GPT series, and also the most practical one: finally, we will talk about how to effectively fine-tune LLMs. It is practical in the…

January 22, 2025
Advancing AI Reasoning: Meta-CoT and System 2 Thinking

Advancing AI Reasoning: Meta-CoT and System 2 Thinking How Meta-CoT enhances system 2 reasoning for complex AI challenges Continue reading on Towards Data Science » Kaushik Rajan Go to original source

January 21, 2025
Why LLMs Suck at ASCII Art

Why LLMs Suck at ASCII Art How being bad at art can be so dangerous Large Language Models have been doing a pretty good job of knocking down challenge after challenge in areas both expected and not. From writing poetry to generating entire websites from questionably… drawn images, these models seem almost unstoppable (and dire…

January 21, 2025
Deep Dive into KV-Caching In Mistral

Deep Dive into KV-Caching In Mistral Ever wondered why the time to first token in LLMs is high but subsequent tokens are super fast? In this post, I dive into the details of KV-Caching used in Mistral, a topic I initially found quite daunting. However, as I delved deeper, it became a fascinating subject, especially when…

January 15, 2025
llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models

llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models Exploring llama.cpp internals and a basic chat program flow Photo by Mathew Schwartz on Unsplash llama.cpp has revolutionized the space of LLM inference by the means of wide adoption and simplicity. It has enabled enterprises and individual developers to deploy LLMs on devices ranging from SBCs…

January 14, 2025
Linearizing Llama

Linearizing Llama Speeding up Llama: A hybrid approach to attention mechanisms Source: Image by Author (Generated using Gemini 1.5 Flash) In this article, we will see how to replace softmax self-attention in Llama-3.2-1B with hybrid attention combining softmax sliding window and linear attention. This implementation will help us better understand the growing interest in linear attention…

January 11, 2025
AI Agents Hype, Explained — What You Really Need to Know to Get Started

AI Agents Hype, Explained — What You Really Need to Know to Get Started I’ll set the record straight — AI Agents are not new but advanced. Learn how they’ve evolved and where to get started. Continue reading on Towards Data Science » Marc Nehme Go to original source

January 7, 2025
AI-Powered Information Extraction and Matchmaking

AI-Powered Information Extraction and Matchmaking Developing an application for extracting key profile information from CVs and recommending jobs aligned with the profile Continue reading on Towards Data Science » Umair Ali Khan Go to original source

January 2, 2025
Unlocking the Untapped Potential of Retrieval-Augmented Generation (RAG) Pipelines

Unlocking the Untapped Potential of Retrieval-Augmented Generation (RAG) Pipelines Essential Metrics and Methods to Enhance Performance Across Retrieval, Generation, and End-to-End Pipelines Continue reading on Towards Data Science » Saleh Alkhalifa Go to original source

December 28, 2024
Linearizing Attention

Linearizing Attention Breaking the quadratic barrier: modern alternatives to softmax attention Large Languange Models are great but they have a slight drawback that they use softmax attention which can be computationally intensive. In this article we will explore if there is a way we can replace the softmax somehow to achieve linear time complexity. Image…

December 27, 2024
Semantically Compress Text to Save On LLM Costs

Semantically Compress Text to Save On LLM Costs LLMs are great… if they can fit all of your data Photo by Christopher Burns on Unsplash Originally published at https://blog.developer.bazaarvoice.com on October 28, 2024. Introduction Large language models are fantastic tools for unstructured text, but what if your text doesn’t fit in the context window? Bazaarvoice faced exactly this…

December 21, 2024
Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models

Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models In this article we will explore why 128K tokens and more models can’t fully replace using RAG. Continue reading on Towards Data Science » Jérôme DIAZ Go to original source

December 13, 2024
Transformers Key-Value (KV) Caching Explained

Transformers Key-Value (KV) Caching Explained Speed up your LLM inference Continue reading on Towards Data Science » Michał Oleszak Go to original source

December 13, 2024
Translating a Memoir: A Technical Journey

Translating a Memoir: A Technical Journey Leveraging GPT-3.5 and unstructured APIs for translations This blog post details how I utilised GPT to translate the personal memoir of a family friend, making it accessible to a broader audience. Specifically, I employed GPT-3.5 for translation and Unstructured’s APIs for efficient content extraction and formatting. The memoir, a…

December 12, 2024
How to Use Structured Generation for LLM-as-a-Judge Evaluations

How to Use Structured Generation for LLM-as-a-Judge Evaluations Structured generation is fundamental to building complex, multi-step reasoning agents in LLM evaluations — especially for open source models Source: Generated with SDXL 1.0 Disclosure: I am a maintainer of Opik, one of the open source projects used later in this article. For the past few months, I’ve been working on LLM-based…

December 11, 2024
Scientists Go Serious About Large Language Models Mirroring Human Thinking

Scientists Go Serious About Large Language Models Mirroring Human Thinking A discussion of the latest research suggesting that LLMs do work like the human brain—with some substantial differences Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

December 9, 2024
Making News Recommendations Explainable with Large Language Models

Making News Recommendations Explainable with Large Language Models A prompt-based experiment to improve both accuracy and transparent reasoning in content personalization. Deliver relevant content to readers at the right time. Image by author. At DER SPIEGEL, we are continually exploring ways to improve how we recommend news articles to our readers. In our latest (offline) experiment,…

December 1, 2024
How to Prune LLaMA 3.2 and Similar Large Language Models

How to Prune LLaMA 3.2 and Similar Large Language Models This article explores a structured pruning technique for state-of-the-art models, that uses a GLU architecture, enabling the creation of… Continue reading on Towards Data Science » Pere Martra Go to original source

November 28, 2024
Mistral 7B Explained: Towards More Efficient Language Models

Mistral 7B Explained: Towards More Efficient Language Models RMS Norm, RoPE, GQA, SWA, KV Cache, and more! Part 5 in the “LLMs from Scratch” series — a complete guide to understanding and building Large Language Models. If you are interested in learning more about how these models work I encourage you to read: Part 1: Tokenization — A Complete Guide Part 2:…

November 27, 2024