Category: thoughts-and-theory

  • Honestly Uncertain

    Honestly Uncertain Ethical issues aside, should you be honest when asked how certain you are about some belief? Of course, it depends. In this blog post, you’ll learn on what. Different ways of evaluating probabilistic predictions come with dramatically different degrees of “optimal honesty”. Perhaps surprisingly, the linear function that assigns +1 to true and fully…

  • Understanding Model Calibration: A Gentle Introduction & Visual Exploration

    Understanding Model Calibration: A Gentle Introduction & Visual Exploration How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibration and then dive…

  • I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

    I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too…

  • Can Machines Dream? On the Creativity of Large Language Models

    Can Machines Dream? On the Creativity of Large Language Models Exploring the Role of Hallucinations, Dependencies, and Imagination in AI Creativity Continue reading on Towards Data Science » Salvatore Raieli Go to original source

  • Apollo and Design Choices of Video Large Multimodal Models (LMMs)

    Apollo and Design Choices of Video Large Multimodal Models (LMMs) Let’s Explore Major Design Choices from Meta’s Apollo Paper Continue reading on Towards Data Science » Matthew Gunton Go to original source

  • How to Evaluate LLM Summarization

    How to Evaluate LLM Summarization A practical and effective guide for evaluating AI summaries Image from Unsplash Summarization is one of the most practical and convenient tasks enabled by LLMs. However, compared to other LLM tasks like question-asking or classification, evaluating LLMs on summarization is far more challenging. And so I myself have neglected evals for…

  • Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT

    Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT Mastering the art of fine-tuning: Learnings for training your own LLMs. (Image from Unsplash) This is the third article in our GPT series, and also the most practical one: finally, we will talk about how to effectively fine-tune LLMs. It is practical in the…

  • Advancing AI Reasoning: Meta-CoT and System 2 Thinking

    Advancing AI Reasoning: Meta-CoT and System 2 Thinking How Meta-CoT enhances system 2 reasoning for complex AI challenges Continue reading on Towards Data Science » Kaushik Rajan Go to original source

  • Why LLMs Suck at ASCII Art

    Why LLMs Suck at ASCII Art How being bad at art can be so dangerous Large Language Models have been doing a pretty good job of knocking down challenge after challenge in areas both expected and not. From writing poetry to generating entire websites from questionably… drawn images, these models seem almost unstoppable (and dire…

  • Zero-Shot Player Tracking in Tennis with Kalman Filtering

    Zero-Shot Player Tracking in Tennis with Kalman Filtering Automated tennis tracking without labels: GroundingDINO, Kalman filtering, and court homography https://medium.com/media/6f735abc63f905de122bb8a0679f97fd/href With the recent surge in sports tracking projects, many inspired by Skalski’s popular soccer tracking project, there’s been a notable shift towards using automated player tracking for sport hobbyists. Most of these approaches follow a…

  • Contextual Topic Modelling in Chinese Corpora with KeyNMF

    Contextual Topic Modelling in Chinese Corpora with KeyNMF A comprehensive guide on getting the most out of your Chinese topic models, from preprocessing to interpretation. With our recent paper on discourse dynamics in European Chinese diaspora media, our team has tapped into an almost unanimous frustration with the quality of topic modelling approaches when applied…

  • Bayesian A/B Testing Falls Short

    Bayesian A/B Testing Falls Short Why Bayesian A/B testing can lead to misunderstandings, inflated false positive rates, introduce bias and complicate results (Image generated by the author using Midjourney) Over the past decade, I’ve engaged in countless discussions about Bayesian A/B testing versus Frequentist A/B testing. In nearly every conversation, I’ve maintained the same viewpoint:…

  • Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough

    Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough With the help of an intricate geometric construction, we can prove that instance-wise cost functions quickly drive SVC to infinity. In the previous article in this series, we examined the concept of strategic VC dimension (SVC) and its connection to the Fundamental Theorem of Strategic Learning.…

  • A New Approach to AI Safety: Layer Enhanced Classification (LEC)

    A New Approach to AI Safety: Layer Enhanced Classification (LEC) LEC surpasses best in class models, like GPT-4o, by combining the efficiency of a ML classifier with the language understanding of an LLM Imagine sitting in a boardroom, discussing the most transformative technology of our time — artificial intelligence — and realizing we’re riding a rocket with no reliable safety…

  • The Good, the Bad, An Ugly Memory for a Neural Network

    The Good, the Bad, An Ugly Memory for a Neural Network Memory can play tricks, to learn best it is not always good to memorize Continue reading on Towards Data Science » Salvatore Raieli Go to original source

  • Why “AI Can’t Reason” Is a Bias

    Why “AI Can’t Reason” Is a Bias We humans are proud creatures Continue reading on Towards Data Science » Rafe Brena, Ph.D. Go to original source

  • How to Apply the Central Limit Theorem to Constrained Data

    How to Apply the Central Limit Theorem to Constrained Data What can we say about the mean of data distributed in an interval [a, b]? Continue reading on Towards Data Science » Ryan Burn Go to original source

  • Scientists Go Serious About Large Language Models Mirroring Human Thinking

    Scientists Go Serious About Large Language Models Mirroring Human Thinking A discussion of the latest research suggesting that LLMs do work like the human brain—with some substantial differences Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

  • Neuromorphic Computing — an Edgier, Greener AI

    Neuromorphic Computing — an Edgier, Greener AI Neuromorphic Computing — an Edgier, Greener AI Why computer hardware and AI algorithms are being reinvented using inspiration from the brain euromorphic Computing might not just help bring AI to the edge, but also reduce carbon emissions at data centers. Generated by author with ImageGen 3. There are periodic proclamations of the coming neuromorphic computing…