Category: thoughts-and-theory

Honestly Uncertain

Honestly Uncertain Ethical issues aside, should you be honest when asked how certain you are about some belief? Of course, it depends. In this blog post, you’ll learn on what. Different ways of evaluating probabilistic predictions come with dramatically different degrees of “optimal honesty”. Perhaps surprisingly, the linear function that assigns +1 to true and fully…

February 19, 2025
Understanding Model Calibration: A Gentle Introduction & Visual Exploration

Understanding Model Calibration: A Gentle Introduction & Visual Exploration How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibration and then dive…

February 12, 2025
I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too…

February 8, 2025
Can Machines Dream? On the Creativity of Large Language Models

Can Machines Dream? On the Creativity of Large Language Models Exploring the Role of Hallucinations, Dependencies, and Imagination in AI Creativity Continue reading on Towards Data Science » Salvatore Raieli Go to original source

February 1, 2025
Apollo and Design Choices of Video Large Multimodal Models (LMMs)

Apollo and Design Choices of Video Large Multimodal Models (LMMs) Let’s Explore Major Design Choices from Meta’s Apollo Paper Continue reading on Towards Data Science » Matthew Gunton Go to original source

January 24, 2025
How to Evaluate LLM Summarization

How to Evaluate LLM Summarization A practical and effective guide for evaluating AI summaries Image from Unsplash Summarization is one of the most practical and convenient tasks enabled by LLMs. However, compared to other LLM tasks like question-asking or classification, evaluating LLMs on summarization is far more challenging. And so I myself have neglected evals for…

January 23, 2025
Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT

Understanding the Evolution of ChatGPT: Part 3— Insights from Codex and InstructGPT Mastering the art of fine-tuning: Learnings for training your own LLMs. (Image from Unsplash) This is the third article in our GPT series, and also the most practical one: finally, we will talk about how to effectively fine-tune LLMs. It is practical in the…

January 22, 2025
Advancing AI Reasoning: Meta-CoT and System 2 Thinking

Advancing AI Reasoning: Meta-CoT and System 2 Thinking How Meta-CoT enhances system 2 reasoning for complex AI challenges Continue reading on Towards Data Science » Kaushik Rajan Go to original source

January 21, 2025
Why LLMs Suck at ASCII Art

Why LLMs Suck at ASCII Art How being bad at art can be so dangerous Large Language Models have been doing a pretty good job of knocking down challenge after challenge in areas both expected and not. From writing poetry to generating entire websites from questionably… drawn images, these models seem almost unstoppable (and dire…

January 21, 2025
Zero-Shot Player Tracking in Tennis with Kalman Filtering

Zero-Shot Player Tracking in Tennis with Kalman Filtering Automated tennis tracking without labels: GroundingDINO, Kalman filtering, and court homography https://medium.com/media/6f735abc63f905de122bb8a0679f97fd/href With the recent surge in sports tracking projects, many inspired by Skalski’s popular soccer tracking project, there’s been a notable shift towards using automated player tracking for sport hobbyists. Most of these approaches follow a…

January 20, 2025
Contextual Topic Modelling in Chinese Corpora with KeyNMF

Contextual Topic Modelling in Chinese Corpora with KeyNMF A comprehensive guide on getting the most out of your Chinese topic models, from preprocessing to interpretation. With our recent paper on discourse dynamics in European Chinese diaspora media, our team has tapped into an almost unanimous frustration with the quality of topic modelling approaches when applied…

January 14, 2025
Bayesian A/B Testing Falls Short

Bayesian A/B Testing Falls Short Why Bayesian A/B testing can lead to misunderstandings, inflated false positive rates, introduce bias and complicate results (Image generated by the author using Midjourney) Over the past decade, I’ve engaged in countless discussions about Bayesian A/B testing versus Frequentist A/B testing. In nearly every conversation, I’ve maintained the same viewpoint:…

January 9, 2025
Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough

Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough With the help of an intricate geometric construction, we can prove that instance-wise cost functions quickly drive SVC to infinity. In the previous article in this series, we examined the concept of strategic VC dimension (SVC) and its connection to the Fundamental Theorem of Strategic Learning.…

January 9, 2025
A New Approach to AI Safety: Layer Enhanced Classification (LEC)

A New Approach to AI Safety: Layer Enhanced Classification (LEC) LEC surpasses best in class models, like GPT-4o, by combining the efficiency of a ML classifier with the language understanding of an LLM Imagine sitting in a boardroom, discussing the most transformative technology of our time — artificial intelligence — and realizing we’re riding a rocket with no reliable safety…

December 21, 2024
The Good, the Bad, An Ugly Memory for a Neural Network

The Good, the Bad, An Ugly Memory for a Neural Network Memory can play tricks, to learn best it is not always good to memorize Continue reading on Towards Data Science » Salvatore Raieli Go to original source

December 17, 2024
Why “AI Can’t Reason” Is a Bias

Why “AI Can’t Reason” Is a Bias We humans are proud creatures Continue reading on Towards Data Science » Rafe Brena, Ph.D. Go to original source

December 13, 2024
How to Apply the Central Limit Theorem to Constrained Data

How to Apply the Central Limit Theorem to Constrained Data What can we say about the mean of data distributed in an interval [a, b]? Continue reading on Towards Data Science » Ryan Burn Go to original source

December 11, 2024
Scientists Go Serious About Large Language Models Mirroring Human Thinking

Scientists Go Serious About Large Language Models Mirroring Human Thinking A discussion of the latest research suggesting that LLMs do work like the human brain—with some substantial differences Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

December 9, 2024
Neuromorphic Computing — an Edgier, Greener AI

Neuromorphic Computing — an Edgier, Greener AI Neuromorphic Computing — an Edgier, Greener AI Why computer hardware and AI algorithms are being reinvented using inspiration from the brain euromorphic Computing might not just help bring AI to the edge, but also reduce carbon emissions at data centers. Generated by author with ImageGen 3. There are periodic proclamations of the coming neuromorphic computing…

November 27, 2024