Category: model-training
-
Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models
Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models Learn how to optimize your ML models for better results The post Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models appeared first on Towards Data Science. Rukshan Pramoditha Go to original source
-
Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware
Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware Summary of This Study Hardware choices – specifically hardware type and its quantity – along with training time, have a significant positive impact on energy, water, and carbon footprints during AI model training, whereas architecture-related factors do not. The interaction between…
-
Custom Training Pipeline for Object Detection Models
Custom Training Pipeline for Object Detection Models What if you want to write the whole object detection training pipeline from scratch, so you can understand each step and be able to customize it? That’s what I set out to do. I examined several well-known object detection pipelines and designed one that best suits my needs…
-
Debugging the Dreaded NaN
Debugging the Dreaded NaN You are training your latest AI model, anxiously watching as the loss steadily decreases when suddenly — boom! Your logs are flooded with NaNs (Not a Number) — your model is irreparably corrupted and you’re left staring at your screen in despair. To make matters worse, the NaNs don’t appear consistently.…
-
Learnings from a Machine Learning Engineer — Part 4: The Model
Learnings from a Machine Learning Engineer — Part 4: The Model In this latest part of my series, I will share what I have learned on selecting a model for Image Classification and how to fine tune that model. I will also show how you can leverage the model to accelerate your labelling process, and…
-
Training Large Language Models: From TRPO to GRPO
Training Large Language Models: From TRPO to GRPO Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning…
-
Building a Regression Model: Delivery Duration Prediction
Building a Regression Model: Delivery Duration Prediction Building a Regression Model to Predict Delivery Durations: A Practical Guide E2E walkthrough for approaching a regression modeling task In this article, we’re going to walk through the process of building a regression model — from dataset cleaning & preparation, to model training & evaluation. The specific regression task we will…
-
Beyond Causal Language Modeling
Beyond Causal Language Modeling A deep dive into “Not All Tokens Are What You Need for Pretraining” Introduction A few days ago, I had the chance to present at a local reading group that focused on some of the most exciting and insightful papers from NeurIPS 2024. As a presenter, I selected a paper titled…