Category: transformers

RoPE, Clearly Explained

RoPE, Clearly Explained Going beyond the math to build intuition The post RoPE, Clearly Explained appeared first on Towards Data Science. Lorenzo Cesconetto Go to original source

January 30, 2026
Hugging Face Transformers in Action: Learning How To Leverage AI for NLP

Hugging Face Transformers in Action: Learning How To Leverage AI for NLP A practical guide to Hugging Face Transformers and to how you can analyze your resumé sentiment in seconds with AI The post Hugging Face Transformers in Action: Learning How To Leverage AI for NLP appeared first on Towards Data Science. Gustavo Santos Go…

December 29, 2025
Scaling Recommender Transformers to a Billion Parameters

Scaling Recommender Transformers to a Billion Parameters How to implement a new generation of transformer recommenders The post Scaling Recommender Transformers to a Billion Parameters appeared first on Towards Data Science. Kirill Кhrylchenko Go to original source

October 22, 2025
Your 1M+ Context Window LLM Is Less Powerful Than You Think

Your 1M+ Context Window LLM Is Less Powerful Than You Think Why working memory is a more important bottleneck than raw context window size The post Your 1M+ Context Window LLM Is Less Powerful Than You Think appeared first on Towards Data Science. Tobias Schnabel Go to original source

July 18, 2025
Hands-On Attention Mechanism for Time Series Classification, with Python

Hands-On Attention Mechanism for Time Series Classification, with Python This is how to use the attention mechanism in a time series classification framework The post Hands-On Attention Mechanism for Time Series Classification, with Python appeared first on Towards Data Science. Piero Paialunga Go to original source

May 31, 2025
Behind the Magic: How Tensors Drive Transformers

Behind the Magic: How Tensors Drive Transformers Introduction Transformers have changed the way artificial intelligence works, especially in understanding language and learning from data. At the core of these models are tensors (a generalized type of mathematical matrices that help process information) . As data moves through the different parts of a Transformer, these tensors…

April 26, 2025
Fine-tuning Multimodal Embedding Models

Fine-tuning Multimodal Embedding Models Adapting CLIP to YouTube Data (with Python Code) This is the 4th article in a larger series on multimodal AI. In the previous post, we discussed multimodal RAG systems, which can retrieve and synthesize information from different data modalities (e.g. text, images, audio). There, we saw how we could implement such a…

February 1, 2025
Static and Dynamic Attention: Implications for Graph Neural Networks

Static and Dynamic Attention: Implications for Graph Neural Networks Examining the expressive capacity of Graph Attention Networks Image by the author In graph representation learning, neighborhood aggregation is one of the most well-studied and investigated areas, among which attention-based methods largely remain state-of-the-art. Leveraging learnable attention scores for weighted aggregations, graph attention networks exhibit higher expressivity…

January 15, 2025
Deep Dive into KV-Caching In Mistral

Deep Dive into KV-Caching In Mistral Ever wondered why the time to first token in LLMs is high but subsequent tokens are super fast? In this post, I dive into the details of KV-Caching used in Mistral, a topic I initially found quite daunting. However, as I delved deeper, it became a fascinating subject, especially when…

January 15, 2025
Contextual Topic Modelling in Chinese Corpora with KeyNMF

Contextual Topic Modelling in Chinese Corpora with KeyNMF A comprehensive guide on getting the most out of your Chinese topic models, from preprocessing to interpretation. With our recent paper on discourse dynamics in European Chinese diaspora media, our team has tapped into an almost unanimous frustration with the quality of topic modelling approaches when applied…

January 14, 2025