Tag: dataset

Fine-Tuning vLLMs for Document Understanding

Fine-Tuning vLLMs for Document Understanding In this article, I discuss how you can fine-tune VLMs (visual large language models, often called vLLMs) like Qwen 2.5 VL 7B. I will introduce you to a dataset of handwritten digits, which the base version of Qwen 2.5 VL struggles with. We will then inspect the dataset, annotate it,…

May 6, 2025
Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation arXiv:2502.12607v1 Announce Type: new Abstract: We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such optimization by leveraging duality theory in…

February 19, 2025
You Get a Dataset and Need to Find a “Good” Model Quickly (in Hours or Days), what’s your strategy?

You Get a Dataset and Need to Find a “Good” Model Quickly (in Hours or Days), what’s your strategy? Typical Scenario: Your friend gives you a dataset and challenges you to beat their model’s performance. They don’t tell you what they did, but they provide a single CSV file and the performance metric to optimize.…

December 23, 2024
Transcriptions dataset

Transcriptions dataset We teamed up with Miska Knapek to transcribe our 170 episodes into full written text — resulting in 1,539,957 spoken words overall, including 61 mentions of weather, 923 mentions of maps and 48 mentions of AI. Check out a little tour of the data, browse and search episodes using our brand new archive page, or, for the technically inclined, check out data and code…

December 13, 2024