Tag: training

  • TFTF: Training-Free Targeted Flow for Conditional Sampling

    TFTF: Training-Free Targeted Flow for Conditional Sampling arXiv:2602.12932v1 Announce Type: new Abstract: We propose a training-free conditional sampling method for flow matching models based on importance sampling. Because a na”ive application of importance sampling suffers from weight degeneracy in high-dimensional settings, we modify and incorporate a resampling technique in sequential Monte Carlo (SMC) during intermediate…

  • Training-Free Self-Correction for Multimodal Masked Diffusion Models

    Training-Free Self-Correction for Multimodal Masked Diffusion Models arXiv:2602.02927v1 Announce Type: new Abstract: Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated tokens as immutable, which may lead to error accumulation when early mistakes cannot be revised. In this work,…

  • Precise asymptotic analysis of Sobolev training for random feature models

    Precise asymptotic analysis of Sobolev training for random feature models arXiv:2511.03050v1 Announce Type: new Abstract: Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training — regression with both function and gradient data…

  • The Coverage Principle: How Pre-training Enables Post-Training

    The Coverage Principle: How Pre-training Enables Post-Training arXiv:2510.15020v1 Announce Type: new Abstract: Language models demonstrate remarkable abilities when pre-trained on large text corpora and fine-tuned for specific tasks, but how and why pre-training shapes the success of the final model remains poorly understood. Notably, although pre-training success is often quantified by cross entropy loss, cross-entropy…

  • Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

    Gradient-Guided Furthest Point Sampling for Robust Training Set Selection arXiv:2510.08906v1 Announce Type: new Abstract: Smart training set selections procedures enable the reduction of data needs and improves predictive robustness in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular…

  • How to Improve the Efficiency of Your PyTorch Training Loop

    How to Improve the Efficiency of Your PyTorch Training Loop Learn how to diagnose and resolve bottlenecks in PyTorch using the num_workers, pin_memory, and profiler parameters to maximize training performance. The post How to Improve the Efficiency of Your PyTorch Training Loop appeared first on Towards Data Science. Andrea D’Agostino Go to original source

  • Adaptive generative moment matching networks for improved learning of dependence structures

    Adaptive generative moment matching networks for improved learning of dependence structures arXiv:2508.21531v1 Announce Type: new Abstract: An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and its ability to improve the learning of copula random number generators is demonstrated. Based…

  • A Refined Training Recipe for Fine-Grained Visual Classification

    A Refined Training Recipe for Fine-Grained Visual Classification How FGVC aims to recognize images belonging to multiple subordinate categories of a super-category The post A Refined Training Recipe for Fine-Grained Visual Classification appeared first on Towards Data Science. Ahmed Belgacem Go to original source

  • On Reconstructing Training Data From Bayesian Posteriors and Trained Models

    On Reconstructing Training Data From Bayesian Posteriors and Trained Models arXiv:2507.18372v1 Announce Type: new Abstract: Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three…

  • CoVAE: Consistency Training of Variational Autoencoders

    CoVAE: Consistency Training of Variational Autoencoders arXiv:2507.09103v1 Announce Type: new Abstract: Current state-of-the-art generative approaches frequently rely on a two-stage training procedure, where an autoencoder (often a VAE) first performs dimensionality reduction, followed by training a generative model on the learned latent space. While effective, this introduces computational overhead and increased sampling times. We challenge…

  • Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps?

    Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps? One thing I’ve noticed recently is that increasingly, a lot of AI/ML roles seem to be focused on ways to integrate LLMs to build web apps that automate some kind of task, e.g. chatbot with…

  • When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

    When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets arXiv:2506.19031v1 Announce Type: new Abstract: While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold.…

  • Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks

    Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks A step-by-step guide to containerizing and orchestrating an ML training workflow without the Dockerfile headache, using a lightweight GPT-2 example. The post Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks appeared first on Towards Data Science. Sylvain Kalache Go to original source

  • Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware

    Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware Summary of This Study Hardware choices – specifically hardware type and its quantity – along with training time, have a significant positive impact on energy, water, and carbon footprints during AI model training, whereas architecture-related factors do not. The interaction between…

  • Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis

    Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis Although normal distributions are the most commonly used, a lot of real-world data unfortunately is not normal. When faced with extremely skewed data, it’s tempting for us to utilize log transformations to normalize the distribution and stabilize the variance. I…

  • Backdoor Detection through Replicated Execution of Outsourced Training

    Backdoor Detection through Replicated Execution of Outsourced Training arXiv:2504.00170v1 Announce Type: cross Abstract: It is common practice to outsource the training of machine learning models to cloud providers. Clients who do so gain from the cloud’s economies of scale, but implicitly assume trust: the server should not deviate from the client’s training procedure. A malicious…

  • Optimizing ML Training with Metagradient Descent

    Optimizing ML Training with Metagradient Descent arXiv:2503.13751v1 Announce Type: new Abstract: A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an…

  • Unraveling Large Language Model Hallucinations

    Unraveling Large Language Model Hallucinations Introduction In a YouTube video titled Deep Dive into LLMs like ChatGPT, former Senior Director of AI at Tesla, Andrej Karpathy discusses the psychology of Large Language Models (LLMs) as emergent cognitive effects of the training pipeline. This article is inspired by his explanation of LLM hallucinations and the information presented in the…

  • Debugging the Dreaded NaN

    Debugging the Dreaded NaN You are training your latest AI model, anxiously watching as the loss steadily decreases when suddenly — boom! Your logs are flooded with NaNs (Not a Number) — your model is irreparably corrupted and you’re left staring at your screen in despair. To make matters worse, the NaNs don’t appear consistently.…

  • How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference

    How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference With the recent explosion of interest in large language models (LLMs), they often seem almost magical. But let’s demystify them. I wanted to step back and unpack the fundamentals — breaking down how LLMs are built, trained, and fine-tuned to become the AI systems we interact…

  • Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training

    Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training arXiv:2502.10793v1 Announce Type: new Abstract: Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers…

  • Learnings from a Machine Learning Engineer — Part 5: The Training

    Learnings from a Machine Learning Engineer — Part 5: The Training In this fifth part of my series, I will outline the steps for creating a Docker container for training your image classification model, evaluating performance, and preparing for deployment. AI/ML engineers would prefer to focus on model training and data engineering, but the reality…

  • Training Large Language Models: From TRPO to GRPO

    Training Large Language Models: From TRPO to GRPO Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning…

  • Distributionally Robust Coreset Selection under Covariate Shift

    Distributionally Robust Coreset Selection under Covariate Shift arXiv:2501.14253v1 Announce Type: new Abstract: Coreset selection, which involves selecting a small subset from an existing training dataset, is an approach to reducing training data, and various approaches have been proposed for this method. In practical situations where these methods are employed, it is often the case that…

  • Understanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired It

    Understanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired It Tracing the roots of ChatGPT: GPT-1, the foundation of OpenAI’s LLMs (Image from Unsplash) The GPT (Generative Pre-Training) model family, first introduced by OpenAI in 2018, is another important application of the Transformer architecture. It has since evolved through versions like…

  • Data Valuation — A Concise Overview

    Data Valuation — A Concise Overview Understanding the Value of your Data: Challenges, Methods, and Applications ChatGPT and similar LLMs were trained on insane amounts of data. OpenAI and Co. scraped the internet, collecting books, articles, and social media posts to train their models. It’s easy to imagine that some of the texts (like scientific or news…

  • Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models

    Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models arXiv:2412.07972v1 Announce Type: cross Abstract: We analyze the training of a two-layer autoencoder used to parameterize a flow-based generative model for sampling from a high-dimensional Gaussian mixture. Previous work shows that the phase where the relative probability between the modes is learned disappears as the dimension…

  • Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

    Training-Free Bayesianization for Low-Rank Adapters of Large Language Models arXiv:2412.05723v1 Announce Type: new Abstract: Estimating the uncertainty of responses of Large Language Models~(LLMs) remains a critical challenge. While recent Bayesian methods have demonstrated effectiveness in quantifying uncertainty through low-rank weight updates, they typically require complex fine-tuning or post-training procedures. In this paper, we propose Training-Free…

  • why not do training?

    https://www.abc.net.au/news/2024-11-17/solar-flooded-australia-told-its-okay-to-waste-some/104606640 during periods of “excess” power. train gpt x , alphafold y, n so on… batteries not required

  • Smaller is smarter

    Smaller is smarter Concerns about the environmental impacts of Large Language Models (LLMs) are growing. Although detailed information about the actual costs of LLMs can be difficult to find, let’s attempt to gather some facts to understand the scale. Generated with ChatGPT-4o Since comprehensive data on ChatGPT-4 is not readily available, we can consider Llama 3.1…