Category: model-training

Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models

Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models Learn how to optimize your ML models for better results The post Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models appeared first on Towards Data Science. Rukshan Pramoditha Go to original source

August 23, 2025
Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware

Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware Summary of This Study Hardware choices – specifically hardware type and its quantity – along with training time, have a significant positive impact on energy, water, and carbon footprints during AI model training, whereas architecture-related factors do not. The interaction between…

May 14, 2025
Custom Training Pipeline for Object Detection Models

Custom Training Pipeline for Object Detection Models What if you want to write the whole object detection training pipeline from scratch, so you can understand each step and be able to customize it? That’s what I set out to do. I examined several well-known object detection pipelines and designed one that best suits my needs…

March 8, 2025
Debugging the Dreaded NaN

Debugging the Dreaded NaN You are training your latest AI model, anxiously watching as the loss steadily decreases when suddenly — boom! Your logs are flooded with NaNs (Not a Number) — your model is irreparably corrupted and you’re left staring at your screen in despair. To make matters worse, the NaNs don’t appear consistently.…

February 28, 2025
Learnings from a Machine Learning Engineer — Part 4: The Model

Learnings from a Machine Learning Engineer — Part 4: The Model In this latest part of my series, I will share what I have learned on selecting a model for Image Classification and how to fine tune that model. I will also show how you can leverage the model to accelerate your labelling process, and…

February 14, 2025
Training Large Language Models: From TRPO to GRPO

Training Large Language Models: From TRPO to GRPO Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning…

February 6, 2025
Building a Regression Model: Delivery Duration Prediction

Building a Regression Model: Delivery Duration Prediction Building a Regression Model to Predict Delivery Durations: A Practical Guide E2E walkthrough for approaching a regression modeling task In this article, we’re going to walk through the process of building a regression model — from dataset cleaning & preparation, to model training & evaluation. The specific regression task we will…

January 28, 2025
Beyond Causal Language Modeling

Beyond Causal Language Modeling A deep dive into “Not All Tokens Are What You Need for Pretraining” Introduction A few days ago, I had the chance to present at a local reading group that focused on some of the most exciting and insightful papers from NeurIPS 2024. As a presenter, I selected a paper titled…

January 28, 2025