Tag: training

TFTF: Training-Free Targeted Flow for Conditional Sampling

TFTF: Training-Free Targeted Flow for Conditional Sampling arXiv:2602.12932v1 Announce Type: new Abstract: We propose a training-free conditional sampling method for flow matching models based on importance sampling. Because a na”ive application of importance sampling suffers from weight degeneracy in high-dimensional settings, we modify and incorporate a resampling technique in sequential Monte Carlo (SMC) during intermediate…

February 16, 2026
Training-Free Self-Correction for Multimodal Masked Diffusion Models

Training-Free Self-Correction for Multimodal Masked Diffusion Models arXiv:2602.02927v1 Announce Type: new Abstract: Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated tokens as immutable, which may lead to error accumulation when early mistakes cannot be revised. In this work,…

February 4, 2026
Precise asymptotic analysis of Sobolev training for random feature models

Precise asymptotic analysis of Sobolev training for random feature models arXiv:2511.03050v1 Announce Type: new Abstract: Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training — regression with both function and gradient data…

November 6, 2025
The Coverage Principle: How Pre-training Enables Post-Training

The Coverage Principle: How Pre-training Enables Post-Training arXiv:2510.15020v1 Announce Type: new Abstract: Language models demonstrate remarkable abilities when pre-trained on large text corpora and fine-tuned for specific tasks, but how and why pre-training shapes the success of the final model remains poorly understood. Notably, although pre-training success is often quantified by cross entropy loss, cross-entropy…

October 20, 2025
Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection arXiv:2510.08906v1 Announce Type: new Abstract: Smart training set selections procedures enable the reduction of data needs and improves predictive robustness in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular…

October 13, 2025
How to Improve the Efficiency of Your PyTorch Training Loop

How to Improve the Efficiency of Your PyTorch Training Loop Learn how to diagnose and resolve bottlenecks in PyTorch using the num_workers, pin_memory, and profiler parameters to maximize training performance. The post How to Improve the Efficiency of Your PyTorch Training Loop appeared first on Towards Data Science. Andrea D’Agostino Go to original source

October 2, 2025
Adaptive generative moment matching networks for improved learning of dependence structures

Adaptive generative moment matching networks for improved learning of dependence structures arXiv:2508.21531v1 Announce Type: new Abstract: An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and its ability to improve the learning of copula random number generators is demonstrated. Based…

September 1, 2025
A Refined Training Recipe for Fine-Grained Visual Classification

A Refined Training Recipe for Fine-Grained Visual Classification How FGVC aims to recognize images belonging to multiple subordinate categories of a super-category The post A Refined Training Recipe for Fine-Grained Visual Classification appeared first on Towards Data Science. Ahmed Belgacem Go to original source

August 13, 2025
On Reconstructing Training Data From Bayesian Posteriors and Trained Models

On Reconstructing Training Data From Bayesian Posteriors and Trained Models arXiv:2507.18372v1 Announce Type: new Abstract: Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three…

July 25, 2025
CoVAE: Consistency Training of Variational Autoencoders

CoVAE: Consistency Training of Variational Autoencoders arXiv:2507.09103v1 Announce Type: new Abstract: Current state-of-the-art generative approaches frequently rely on a two-stage training procedure, where an autoencoder (often a VAE) first performs dimensionality reduction, followed by training a generative model on the learned latent space. While effective, this introduces computational overhead and increased sampling times. We challenge…

July 15, 2025
Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps?

Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps? One thing I’ve noticed recently is that increasingly, a lot of AI/ML roles seem to be focused on ways to integrate LLMs to build web apps that automate some kind of task, e.g. chatbot with…

June 30, 2025
When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets arXiv:2506.19031v1 Announce Type: new Abstract: While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold.…

June 25, 2025
Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks

Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks A step-by-step guide to containerizing and orchestrating an ML training workflow without the Dockerfile headache, using a lightweight GPT-2 example. The post Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks appeared first on Towards Data Science. Sylvain Kalache Go to original source

June 11, 2025
Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware

Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware Summary of This Study Hardware choices – specifically hardware type and its quantity – along with training time, have a significant positive impact on energy, water, and carbon footprints during AI model training, whereas architecture-related factors do not. The interaction between…

May 14, 2025
Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis

Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis Although normal distributions are the most commonly used, a lot of real-world data unfortunately is not normal. When faced with extremely skewed data, it’s tempting for us to utilize log transformations to normalize the distribution and stabilize the variance. I…

May 10, 2025
Backdoor Detection through Replicated Execution of Outsourced Training

Backdoor Detection through Replicated Execution of Outsourced Training arXiv:2504.00170v1 Announce Type: cross Abstract: It is common practice to outsource the training of machine learning models to cloud providers. Clients who do so gain from the cloud’s economies of scale, but implicitly assume trust: the server should not deviate from the client’s training procedure. A malicious…

April 2, 2025
Optimizing ML Training with Metagradient Descent

Optimizing ML Training with Metagradient Descent arXiv:2503.13751v1 Announce Type: new Abstract: A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an…

March 19, 2025
Unraveling Large Language Model Hallucinations

Unraveling Large Language Model Hallucinations Introduction In a YouTube video titled Deep Dive into LLMs like ChatGPT, former Senior Director of AI at Tesla, Andrej Karpathy discusses the psychology of Large Language Models (LLMs) as emergent cognitive effects of the training pipeline. This article is inspired by his explanation of LLM hallucinations and the information presented in the…

March 1, 2025
Debugging the Dreaded NaN

Debugging the Dreaded NaN You are training your latest AI model, anxiously watching as the loss steadily decreases when suddenly — boom! Your logs are flooded with NaNs (Not a Number) — your model is irreparably corrupted and you’re left staring at your screen in despair. To make matters worse, the NaNs don’t appear consistently.…

February 28, 2025
How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference

How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference With the recent explosion of interest in large language models (LLMs), they often seem almost magical. But let’s demystify them. I wanted to step back and unpack the fundamentals — breaking down how LLMs are built, trained, and fine-tuned to become the AI systems we interact…

February 19, 2025
Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training

Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training arXiv:2502.10793v1 Announce Type: new Abstract: Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers…

February 18, 2025
Learnings from a Machine Learning Engineer — Part 5: The Training

Learnings from a Machine Learning Engineer — Part 5: The Training In this fifth part of my series, I will outline the steps for creating a Docker container for training your image classification model, evaluating performance, and preparing for deployment. AI/ML engineers would prefer to focus on model training and data engineering, but the reality…

February 14, 2025
Training Large Language Models: From TRPO to GRPO

Training Large Language Models: From TRPO to GRPO Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning…

February 6, 2025
Distributionally Robust Coreset Selection under Covariate Shift

Distributionally Robust Coreset Selection under Covariate Shift arXiv:2501.14253v1 Announce Type: new Abstract: Coreset selection, which involves selecting a small subset from an existing training dataset, is an approach to reducing training data, and various approaches have been proposed for this method. In practical situations where these methods are employed, it is often the case that…

January 27, 2025
Understanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired It

Understanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired It Tracing the roots of ChatGPT: GPT-1, the foundation of OpenAI’s LLMs (Image from Unsplash) The GPT (Generative Pre-Training) model family, first introduced by OpenAI in 2018, is another important application of the Transformer architecture. It has since evolved through versions like…

January 8, 2025
Data Valuation — A Concise Overview

Data Valuation — A Concise Overview Understanding the Value of your Data: Challenges, Methods, and Applications ChatGPT and similar LLMs were trained on insane amounts of data. OpenAI and Co. scraped the internet, collecting books, articles, and social media posts to train their models. It’s easy to imagine that some of the texts (like scientific or news…

December 16, 2024
Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models

Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models arXiv:2412.07972v1 Announce Type: cross Abstract: We analyze the training of a two-layer autoencoder used to parameterize a flow-based generative model for sampling from a high-dimensional Gaussian mixture. Previous work shows that the phase where the relative probability between the modes is learned disappears as the dimension…

December 12, 2024
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models arXiv:2412.05723v1 Announce Type: new Abstract: Estimating the uncertainty of responses of Large Language Models~(LLMs) remains a critical challenge. While recent Bayesian methods have demonstrated effectiveness in quantifying uncertainty through low-rank weight updates, they typically require complex fine-tuning or post-training procedures. In this paper, we propose Training-Free…

December 10, 2024
why not do training?

https://www.abc.net.au/news/2024-11-17/solar-flooded-australia-told-its-okay-to-waste-some/104606640 during periods of “excess” power. train gpt x , alphafold y, n so on… batteries not required

December 3, 2024
Smaller is smarter

Smaller is smarter Concerns about the environmental impacts of Large Language Models (LLMs) are growing. Although detailed information about the actual costs of LLMs can be difficult to find, let’s attempt to gather some facts to understand the scale. Generated with ChatGPT-4o Since comprehensive data on ChatGPT-4 is not readily available, we can consider Llama 3.1…

December 2, 2024