Tag: models

  • Selecting Optimal Variable Order in Autoregressive Ising Models

    Selecting Optimal Variable Order in Autoregressive Ising Models arXiv:2602.20394v1 Announce Type: new Abstract: Autoregressive models enable tractable sampling from learned probability distributions, but their performance critically depends on the variable ordering used in the factorization via complexities of the resulting conditional distributions. We propose to learn the Markov random field describing the underlying data, and…

  • Optimizing Deep Learning Models with SAM

    Optimizing Deep Learning Models with SAM A deep dive into the Sharpness-Aware-Minimization (SAM) algorithm and how it improves the generalizability of modern deep learning models The post Optimizing Deep Learning Models with SAM appeared first on Towards Data Science. Anindya Dey Go to original source

  • Best technique for training models on a sample of data?

    Best technique for training models on a sample of data? Due to memory limits on my work computer I’m unable to train machine learning models on our entire analysis dataset. Given my data is highly imbalanced I’m under-sampling from the majority class of the binary outcome. What is the proper method to train ML models…

  • Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models

    Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models arXiv:2601.22336v1 Announce Type: new Abstract: Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label $Yin{0,1}$, an assumption often violated by LLM…

  • A Diffusive Classification Loss for Learning Energy-based Generative Models

    A Diffusive Classification Loss for Learning Energy-based Generative Models arXiv:2601.21025v1 Announce Type: new Abstract: Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-based models (EBMs), where the score is obtained from the negative input-gradient of the energy.…

  • Towards Latent Diffusion Suitable For Text

    Towards Latent Diffusion Suitable For Text arXiv:2601.16220v1 Announce Type: cross Abstract: Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of continuous diffusion models to discrete state spaces. NFDM learns a multivariate forward…

  • SAM 3 vs. Specialist Models — A Performance Benchmark

    SAM 3 vs. Specialist Models — A Performance Benchmark Why specialized models still hold the 30x speed advantage in production environments The post SAM 3 vs. Specialist Models — A Performance Benchmark appeared first on Towards Data Science. Pushpak Bhoge Go to original source

  • Coarsening Causal DAG Models

    Coarsening Causal DAG Models arXiv:2601.10531v1 Announce Type: new Abstract: Directed acyclic graphical (DAG) models are a powerful tool for representing causal relationships among jointly distributed random variables, especially concerning data from across different experimental settings. However, it is not always practical or desirable to estimate a causal model at the granularity of given features in…

  • Evidence Slopes and Effective Dimension in Singular Linear Models

    Evidence Slopes and Effective Dimension in Singular Linear Models arXiv:2601.01238v1 Announce Type: new Abstract: Bayesian model selection commonly relies on Laplace approximation or the Bayesian Information Criterion (BIC), which assume that the effective model dimension equals the number of parameters. Singular learning theory replaces this assumption with the real log canonical threshold (RLCT), an effective…

  • A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue

    A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue arXiv:2512.22282v1 Announce Type: new Abstract: Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1…

  • Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry

    Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry arXiv:2512.21411v1 Announce Type: cross Abstract: Singular learning theory (SLT) citep{watanabe2009algebraic,watanabe2018mathematical} provides a rigorous asymptotic framework for Bayesian models with non-identifiable parameterizations, yet the statistical meaning of its second-order invariant, the emph{singular fluctuation}, has remained unclear. In this work, we show…

  • Diffusion Models in Simulation-Based Inference: A Tutorial Review

    Diffusion Models in Simulation-Based Inference: A Tutorial Review arXiv:2512.20685v1 Announce Type: new Abstract: Diffusion models have recently emerged as powerful learners for simulation-based inference (SBI), enabling fast and accurate estimation of latent parameters from simulated and real data. Their score-based formulation offers a flexible way to learn conditional or joint distributions over parameters and observations,…

  • Enhancing diffusion models with Gaussianization preprocessing

    Enhancing diffusion models with Gaussianization preprocessing arXiv:2512.21020v1 Announce Type: new Abstract: Diffusion models are a class of generative models that have demonstrated remarkable success in tasks such as image generation. However, one of the bottlenecks of these models is slow sampling due to the delay before the onset of trajectory bifurcation, at which point substantial…

  • Cluster-Based Generalized Additive Models Informed by Random Fourier Features

    Cluster-Based Generalized Additive Models Informed by Random Fourier Features arXiv:2512.19373v1 Announce Type: new Abstract: Explainable machine learning aims to strike a balance between prediction accuracy and model transparency, particularly in settings where black-box predictive models, such as deep neural networks or kernel-based methods, achieve strong empirical performance but remain difficult to interpret. This work introduces…

  • Has anyone tried training models on raw discussions instead of curated datasets?

    Has anyone tried training models on raw discussions instead of curated datasets? I’ve always followed the usual advice when training models, like clean the data, normalize everything, remove noise, structure it nicely Recently I tried something different. Instead of polished datasets, I fed models long, messy discussion threads, real conversations, people arguing, correcting themselves, misunderstanding…

  • On the Challenge of Converting TensorFlow Models to PyTorch

    On the Challenge of Converting TensorFlow Models to PyTorch How to upgrade and optimize legacy AI/ML models The post On the Challenge of Converting TensorFlow Models to PyTorch appeared first on Towards Data Science. Chaim Rand Go to original source

  • A note on the impossibility of conditional PAC-efficient reasoning in large language models

    A note on the impossibility of conditional PAC-efficient reasoning in large language models arXiv:2512.03057v1 Announce Type: new Abstract: We prove an impossibility result for conditional Probably Approximately Correct (PAC)-efficient reasoning in large language models. While recent work has established marginal PAC efficiency guarantees for composite models that switch between expensive expert models and cheaper fast…

  • LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models

    LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models A step-by-step guide to building AI quality control using large language models The post LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models appeared first on Towards Data Science. Piero Paialunga Go…

  • Gradient flow for deep equilibrium single-index models

    Gradient flow for deep equilibrium single-index models arXiv:2511.16976v1 Announce Type: cross Abstract: Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training…

  • How Relevance Models Foreshadowed Transformers for NLP

    How Relevance Models Foreshadowed Transformers for NLP Tracing the history of LLM attention: standing on the shoulders of giants The post How Relevance Models Foreshadowed Transformers for NLP appeared first on Towards Data Science. Sean Moran Go to original source

  • Splat Regression Models

    Splat Regression Models arXiv:2511.14042v1 Announce Type: new Abstract: We introduce a highly expressive class of function approximators called Splat Regression Models. Model outputs are mixtures of heterogeneous and anisotropic bump functions, termed splats, each weighted by an output vector. The power of splat modeling lies in its ability to locally adjust the scale and direction…

  • Decomposing Direct and Indirect Biases in Linear Models under Demographic Parity Constraint

    Decomposing Direct and Indirect Biases in Linear Models under Demographic Parity Constraint arXiv:2511.11294v1 Announce Type: new Abstract: Linear models are widely used in high-stakes decision-making due to their simplicity and interpretability. Yet when fairness constraints such as demographic parity are introduced, their effects on model coefficients, and thus on how predictive bias is distributed across…

  • Fast Riemannian-manifold Hamiltonian Monte Carlo for hierarchical Gaussian-process models

    Fast Riemannian-manifold Hamiltonian Monte Carlo for hierarchical Gaussian-process models arXiv:2511.06407v1 Announce Type: new Abstract: Hierarchical Bayesian models based on Gaussian processes are considered useful for describing complex nonlinear statistical dependencies among variables in real-world data. However, effective Monte Carlo algorithms for inference with these models have not yet been established, except for several simple cases.…

  • Precise asymptotic analysis of Sobolev training for random feature models

    Precise asymptotic analysis of Sobolev training for random feature models arXiv:2511.03050v1 Announce Type: new Abstract: Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training — regression with both function and gradient data…

  • Provable Separations between Memorization and Generalization in Diffusion Models

    Provable Separations between Memorization and Generalization in Diffusion Models arXiv:2511.03202v1 Announce Type: new Abstract: Diffusion models have achieved remarkable success across diverse domains, but they remain vulnerable to memorization — reproducing training data rather than generating novel outputs. This not only limits their creative potential but also raises concerns about privacy and safety. While empirical…

  • Why Nonparametric Models Deserve a Second Look

    Why Nonparametric Models Deserve a Second Look Discover how nonparametric conditional distributions unify regression, classification, and synthetic data generation—without assuming functional forms. The post Why Nonparametric Models Deserve a Second Look appeared first on Towards Data Science. Andrew Skabar Go to original source

  • Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data

    Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data arXiv:2511.00217v1 Announce Type: new Abstract: Linear mixed models are widely used for clustered data, but their reliance on parametric forms limits flexibility in complex and high-dimensional settings. In contrast, gradient boosting methods achieve high predictive accuracy through nonparametric estimation, but…

  • Generative Bayesian Optimization: Generative Models as Acquisition Functions

    Generative Bayesian Optimization: Generative Models as Acquisition Functions arXiv:2510.25240v1 Announce Type: new Abstract: We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch scaling as generative sampling, optimization of non-continuous design spaces, and high-dimensional and combinatorial design.…

  • How to Apply Powerful AI Audio Models to Real-World Applications

    How to Apply Powerful AI Audio Models to Real-World Applications Learn about different types of AI audio models and the application areas they can be used in. The post How to Apply Powerful AI Audio Models to Real-World Applications appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • Simplicial Gaussian Models: Representation and Inference

    Simplicial Gaussian Models: Representation and Inference arXiv:2510.12983v1 Announce Type: new Abstract: Probabilistic graphical models (PGMs) are powerful tools for representing statistical dependencies through graphs in high-dimensional systems. However, they are limited to pairwise interactions. In this work, we propose the simplicial Gaussian model (SGM), which extends Gaussian PGM to simplicial complexes. SGM jointly models random…

  • Calibrating Generative Models

    Calibrating Generative Models arXiv:2510.10020v1 Announce Type: new Abstract: Generative models frequently suffer miscalibration, wherein class probabilities and other statistics of the sampling distribution deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly,…

  • What could be my next career progression?

    What could be my next career progression? Hello, I’m 26 years old been working as a junior data scientist in marketing for the past two years and I’m a bit bored/ have no idea how to progress further in my career. Currently I do end to end modeling, from gathering data up to production (not…

  • Are Foundation Models Ready for Your Production Tabular Data?

    Are Foundation Models Ready for Your Production Tabular Data? A complete review of architectures to make zero-shot predictions in the most common types of datasets. The post Are Foundation Models Ready for Your Production Tabular Data? appeared first on Towards Data Science. Carmen Adriana Martínez Barbosa Go to original source

  • Using Vision Language Models to Process Millions of Documents

    Using Vision Language Models to Process Millions of Documents Learn how to effectively apply vision language models to problem solving The post Using Vision Language Models to Process Millions of Documents appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • Physics-Informed Regression: Parameter Estimation in Parameter-Linear Nonlinear Dynamic Models

    Physics-Informed Regression: Parameter Estimation in Parameter-Linear Nonlinear Dynamic Models arXiv:2508.19249v1 Announce Type: cross Abstract: We present a new efficient hybrid parameter estimation method based on the idea, that if nonlinear dynamic models are stated in terms of a system of equations that is linear in terms of the parameters, then regularized ordinary least squares can…

  • Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models

    Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models Learn how to optimize your ML models for better results The post Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models appeared first on Towards Data Science. Rukshan Pramoditha Go to original source

  • ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

    ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization arXiv:2508.11551v1 Announce Type: new Abstract: Determining the optimal data mixture for large language model training remains a challenging problem with an outsized impact on performance. In practice, language model developers continue to rely on heuristic exploration since no learning-based approach has emerged as a…

  • When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems

    When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems Models don’t just fail with noise; they fail in silence, by narrowing their attention to the point of fragility. The post When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems appeared first on Towards Data Science. Mahe Jabeen Abdul Go…

  • Diffusion Models for Time Series Forecasting: A Survey

    Diffusion Models for Time Series Forecasting: A Survey arXiv:2507.14507v1 Announce Type: new Abstract: Diffusion models, initially developed for image synthesis, demonstrate remarkable generative capabilities. Recently, their application has expanded to time series forecasting (TSF), yielding promising results. In this survey, we firstly introduce the standard diffusion models and their prevalent variants, explaining their adaptation to…

  • When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

    When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values arXiv:2507.13024v1 Announce Type: new Abstract: Predicting a response with partially missing inputs remains a challenging task even in parametric models, since parameter estimation in itself is not sufficient to predict on partially observed inputs. Several works study prediction in linear models. In…

  • Fast Gaussian Processes under Monotonicity Constraints

    Fast Gaussian Processes under Monotonicity Constraints arXiv:2507.06677v1 Announce Type: new Abstract: Gaussian processes (GPs) are widely used as surrogate models for complicated functions in scientific and engineering applications. In many cases, prior knowledge about the function to be approximated, such as monotonicity, is available and can be leveraged to improve model fidelity. Incorporating such constraints…

  • A Malliavin calculus approach to score functions in diffusion generative models

    A Malliavin calculus approach to score functions in diffusion generative models arXiv:2507.05550v1 Announce Type: new Abstract: Score-based diffusion generative models have recently emerged as a powerful tool for modelling complex data distributions. These models aim at learning the score function, which defines a map from a known probability distribution to the target data distribution via…

  • How to Fine-Tune Small Language Models to Think with Reinforcement Learning

    How to Fine-Tune Small Language Models to Think with Reinforcement Learning A visual tour and from-scratch guide to train GRPO reasoning models in PyTorch The post How to Fine-Tune Small Language Models to Think with Reinforcement Learning appeared first on Towards Data Science. Avishek Biswas Go to original source

  • Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis

    Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis arXiv:2507.03756v1 Announce Type: new Abstract: The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data — implying that some form of…

  • Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles

    Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles arXiv:2507.01542v1 Announce Type: new Abstract: Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting…

  • TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics

    TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics arXiv:2506.21757v1 Announce Type: new Abstract: Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is…

  • Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

    Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives arXiv:2506.20114v1 Announce Type: new Abstract: Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an…

  • Know What You Don’t Know: Uncertainty Calibration of Process Reward Models

    Know What You Don’t Know: Uncertainty Calibration of Process Reward Models arXiv:2506.09338v1 Announce Type: new Abstract: Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present…

  • Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks

    Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks A step-by-step guide to containerizing and orchestrating an ML training workflow without the Dockerfile headache, using a lightweight GPT-2 example. The post Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks appeared first on Towards Data Science. Sylvain Kalache Go to original source

  • Continuous Semi-Implicit Models

    Continuous Semi-Implicit Models arXiv:2506.06778v1 Announce Type: new Abstract: Semi-implicit distributions have shown great promise in variational inference and generative modeling. Hierarchical semi-implicit models, which stack multiple semi-implicit layers, enhance the expressiveness of semi-implicit distributions and can be used to accelerate diffusion models given pretrained score networks. However, their sequential training often suffers from slow convergence.…

  • Bayesian Data Sketching for Varying Coefficient Regression Models

    Bayesian Data Sketching for Varying Coefficient Regression Models arXiv:2506.00270v1 Announce Type: new Abstract: Varying coefficient models are popular for estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We introduce Bayesian…

  • How To Build a Benchmark for Your Models

    How To Build a Benchmark for Your Models I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among most of the clients I worked with: They rarely have a clear idea of…

  • Strength in Numbers: Ensembling Models with Bagging and Boosting

    Strength in Numbers: Ensembling Models with Bagging and Boosting Bagging and boosting are two powerful ensemble techniques in machine learning – they are must-knows for data scientists! After reading this article, you are going to have a solid understanding of how bagging and boosting work and when to use them. We’ll cover the following topics,…

  • Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models

    Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models Thank you for the kind response to Part 1, it’s been encouraging to see so many readers interested in time series forecasting. In Part 1 of this series, we broke down time series data into trend, seasonality, and noise, discussed when to use additive versus…

  • Boosting Statistic Learning with Synthetic Data from Pretrained Large Models

    Boosting Statistic Learning with Synthetic Data from Pretrained Large Models arXiv:2505.04992v1 Announce Type: new Abstract: The rapid advancement of generative models, such as Stable Diffusion, raises a key question: how can synthetic data from these models enhance predictive modeling? While they can generate vast amounts of datasets, only a subset meaningfully improves performance. We propose…

  • Modeling Spatial Extremes using Non-Gaussian Spatial Autoregressive Models via Convolutional Neural Networks

    Modeling Spatial Extremes using Non-Gaussian Spatial Autoregressive Models via Convolutional Neural Networks arXiv:2505.03034v1 Announce Type: new Abstract: Data derived from remote sensing or numerical simulations often have a regular gridded structure and are large in volume, making it challenging to find accurate spatial models that can fill in missing grid cells or simulate the process…

  • Diffusion Models, Explained Simply

    Diffusion Models, Explained Simply Introduction Generative AI is one of the most popular terms we hear today. Recently, there has been a surge in generative AI applications involving text, image, audio, and video generation. When it comes to image creation, Diffusion models have emerged as a state-of-the-art technique for content generation. Although they were first introduced…

  • Provable Efficiency of Guidance in Diffusion Models for General Data Distribution

    Provable Efficiency of Guidance in Diffusion Models for General Data Distribution arXiv:2505.01382v1 Announce Type: new Abstract: Diffusion models have emerged as a powerful framework for generative modeling, with guidance techniques playing a crucial role in enhancing sample quality. Despite their empirical success, a comprehensive theoretical understanding of the guidance effect remains limited. Existing studies only…

  • Decoding Latent Spaces: Assessing the Interpretability of Time Series Foundation Models for Visual Analytics

    Decoding Latent Spaces: Assessing the Interpretability of Time Series Foundation Models for Visual Analytics arXiv:2504.20099v1 Announce Type: cross Abstract: The present study explores the interpretability of latent spaces produced by time series foundation models, focusing on their potential for visual analysis tasks. Specifically, we evaluate the MOMENT family of models, a set of transformer-based, pre-trained…

  • Choose the Right One: Evaluating Topic Models for Business Intelligence

    Choose the Right One: Evaluating Topic Models for Business Intelligence Topic models are used in businesses to classify brand-related text datasets (such as product and site reviews, surveys, and social media comments) and to track how customer satisfaction metrics change over time. There is a myriad of recent topic models one can choose from: the…

  • Towards Accurate Forecasting of Renewable Energy : Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France

    Towards Accurate Forecasting of Renewable Energy : Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France arXiv:2504.16100v1 Announce Type: cross Abstract: Accurate prediction of non-dispatchable renewable energy sources is essential for grid stability and price prediction. Regional power supply forecasts are usually indirect through a bottom-up approach of plant-level forecasts,…

  • How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals

    How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals The recent launch of the DeepSeek-R1 model sent ripples across the global AI community. It delivered breakthroughs on par with the reasoning models from Meta and OpenAI, achieving this in a fraction of the time and at a significantly lower cost. Beyond…

  • Transfer Learning for High-dimensional Reduced Rank Time Series Models

    Transfer Learning for High-dimensional Reduced Rank Time Series Models arXiv:2504.15691v1 Announce Type: new Abstract: The objective of transfer learning is to enhance estimation and inference in a target data by leveraging knowledge gained from additional sources. Recent studies have explored transfer learning for independent observations in complex, high-dimensional models assuming sparsity, yet research on time…

  • Retrieval Augmented Generation (RAG) — An Introduction

    Retrieval Augmented Generation (RAG) — An Introduction The model hallucinated! It was giving me OK answers and then it just started hallucinating. We’ve all heard or experienced it. Natural Language Generation models can sometimes hallucinate, i.e., they start generating text that is not quite accurate for the prompt provided. In layman’s terms, they start making…

  • Towards Interpretable Deep Generative Models via Causal Representation Learning

    Towards Interpretable Deep Generative Models via Causal Representation Learning arXiv:2504.11609v1 Announce Type: new Abstract: Recent developments in generative artificial intelligence (AI) rely on machine learning techniques such as deep learning and generative modeling to achieve state-of-the-art performance across wide-ranging domains. These methods’ surprising performance is due in part to their ability to learn implicit “representations”…

  • AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse

    AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse arXiv:2504.10540v1 Announce Type: new Abstract: Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped similarity pattern between adjacent steps through caching mechanisms, they lack…

  • Circuit Tracing: A Step Closer to Understanding Large Language Models

    Circuit Tracing: A Step Closer to Understanding Large Language Models Context Over the years, Transformer-based large language models (LLMs) have made substantial progress across a wide range of tasks evolving from simple information retrieval systems to sophisticated agents capable of coding, writing, conducting research, and much more. But despite their capabilities, these models are still largely…

  • The Case for Centralized AI Model Inference Serving

    The Case for Centralized AI Model Inference Serving As AI models continue to increase in scope and accuracy, even tasks once dominated by traditional algorithms are gradually being replaced by Deep Learning models. Algorithmic pipelines — workflows that take an input, process it through a series of algorithms, and produce an output — increasingly rely…

  • Understanding the Tech Stack Behind Generative AI

    Understanding the Tech Stack Behind Generative AI Understanding the Tech Stack Behind Generative AI When ChatGPT reached the one million user mark within five days and took off faster than any other technology in history, the world began to pay attention to artificial intelligence and AI applications. And so it continued apace. Since then, many…

  • Debiasing Kernel-Based Generative Models

    Debiasing Kernel-Based Generative Models arXiv:2503.20825v1 Announce Type: new Abstract: We propose a novel two-stage framework of generative models named Debiasing Kernel-Based Generative Models (DKGM) with the insights from kernel density estimation (KDE) and stochastic approximation. In the first stage of DKGM, we employ KDE to bypass the obstacles in estimating the density of data without…

  • Talk to Videos

    Talk to Videos Large language models (LLMs) are improving in efficiency and are now able to understand different data formats, offering possibilities for myriads of applications in different domains. Initially, LLMs were inherently able to process only text. The image understanding feature was integrated by coupling an LLM with another image encoding model. However, gpt-4o…

  • Testing the Power of Multimodal AI Systems in Reading and Interpreting Photographs, Maps, Charts and More

    Testing the Power of Multimodal AI Systems in Reading and Interpreting Photographs, Maps, Charts and More Introduction It’s no news that artificial intelligence has made huge strides in recent years, particularly with the advent of multimodal models that can process and create both text and images, and some very new ones that also process and produce…

  • Interpretable Neural Causal Models with TRAM-DAGs

    Interpretable Neural Causal Models with TRAM-DAGs arXiv:2503.16206v1 Announce Type: new Abstract: The ultimate goal of most scientific studies is to understand the underlying causal mechanism between the involved variables. Structural causal models (SCMs) are widely used to represent such causal mechanisms. Given an SCM, causal queries on all three levels of Pearl’s causal hierarchy can…

  • Six Organizational Models for Data Science

    Six Organizational Models for Data Science Introduction Data science teams can operate in myriad ways within a company. These organizational models influence the type of work that the team does, but also the team’s culture, goals, Impact, and overall value to the company.  Adopting the wrong organizational model can limit impact, cause delays, and compromise…

  • 6 Common LLM Customization Strategies Briefly Explained

    6 Common LLM Customization Strategies Briefly Explained Why Customize LLMs? Large Language Models (Llms) are deep learning models pre-trained based on self-supervised learning, requiring a vast amount of resources on training data, training time and holding a large number of parameters. LLM have revolutionized natural language processing especially in the last 2 years, demonstrating remarkable…

  • The Next AI Revolution: A Tutorial Using VAEs to Generate High-Quality Synthetic Data

    The Next AI Revolution: A Tutorial Using VAEs to Generate High-Quality Synthetic Data What is synthetic data? Data created by a computer intended to replicate or augment existing data. Why is it useful? We have all experienced the success of ChatGPT, Llama, and more recently, DeepSeek. These language models are being used ubiquitously across society…

  • Towards a perturbation-based explanation for medical AI as differentiable programs

    Towards a perturbation-based explanation for medical AI as differentiable programs arXiv:2502.14001v1 Announce Type: new Abstract: Recent advancement in machine learning algorithms reaches a point where medical devices can be equipped with artificial intelligence (AI) models for diagnostic support and routine automation in clinical settings. In medicine and healthcare, there is a particular demand for sufficient…

  • Green LIME: Improving AI Explainability through Design of Experiments

    Green LIME: Improving AI Explainability through Design of Experiments arXiv:2502.12753v1 Announce Type: new Abstract: In artificial intelligence (AI), the complexity of many models and processes often surpasses human interpretability, making it challenging to understand why a specific prediction is made. This lack of transparency is particularly problematic in critical fields like healthcare, where trust in…

  • Six Ways to Control Style and Content in Diffusion Models

    Six Ways to Control Style and Content in Diffusion Models Stable Diffusion 1.5/2.0/2.1/XL 1.0, DALL-E, Imagen… In the past years, Diffusion Models have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more specialised queries, for example generating images in a specific style,…

  • Two in context learning tasks with complex functions

    Two in context learning tasks with complex functions arXiv:2502.03503v1 Announce Type: new Abstract: We examine two in context learning (ICL) tasks with mathematical functions in several train and test settings for transformer models. Our study generalizes work on linear functions by showing that small transformers, even models with attention layers only, can approximate arbitrary polynomial…

  • A Visual Guide to How Diffusion Models Work

    A Visual Guide to How Diffusion Models Work This article is aimed at those who want to understand exactly how Diffusion Models work, with no prior knowledge expected. I’ve tried to use illustrations wherever possible to provide visual intuitions on each part of these models. I’ve kept mathematical notation and equations to a minimum, and where…

  • Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models

    Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models arXiv:2502.01919v1 Announce Type: new Abstract: In this work, we present a comprehensive Bayesian posterior analysis of what we term Poisson Hierarchical Indian Buffet Processes, designed for complex random sparse count species sampling models that allow…

  • Trustworthy Evaluation of Generative AI Models

    Trustworthy Evaluation of Generative AI Models arXiv:2501.18897v1 Announce Type: new Abstract: Generative AI (GenAI) models have recently achieved remarkable empirical performance in various applications, however, their evaluations yet lack uncertainty quantification. In this paper, we propose a method to compare two generative models based on an unbiased estimator of their relative performance gap. Statistically, our…

  • U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms

    U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms arXiv:2501.18084v1 Announce Type: new Abstract: Across various domains, the growing advocacy for open science and open-source machine learning has made an increasing number of models publicly available. These models allow practitioners to integrate them into their own contexts, reducing the need for extensive data labeling, training, and calibration.…

  • Machine Learning Incidents in AdTech

    Machine Learning Incidents in AdTech Source: https://unsplash.com/photos/a-couple-of-signs-that-are-on-a-fence-xXbQIrWH2_A Challenges with deep learning in production One of the biggest challenges I encountered in my career as a data scientist was migrating the core algorithms in a mobile AdTech platform from classic machine learning models to deep learning. I worked on a Demand Side Platform (DSP) for user…

  • Singular leaning coefficients and efficiency in learning theory

    Singular leaning coefficients and efficiency in learning theory arXiv:2501.12747v1 Announce Type: new Abstract: Singular learning models with non-positive Fisher information matrices include neural networks, reduced-rank regression, Boltzmann machines, normal mixture models, and others. These models have been widely used in the development of learning machines. However, theoretical analysis is still in its early stages. In…

  • Generative Models with ELBOs Converging to Entropy Sums

    Generative Models with ELBOs Converging to Entropy Sums arXiv:2501.09022v1 Announce Type: new Abstract: The evidence lower bound (ELBO) is one of the most central objectives for probabilistic unsupervised learning. For the ELBOs of several generative models and model classes, we here prove convergence to entropy sums. As one result, we provide a list of generative…

  • On the Statistical Capacity of Deep Generative Models

    On the Statistical Capacity of Deep Generative Models arXiv:2501.07763v1 Announce Type: new Abstract: Deep generative models are routinely used in generating samples from complex, high-dimensional distributions. Despite their apparent successes, their statistical properties are not well understood. A common assumption is that with enough training data and sufficiently large neural networks, deep generative model samples…

  • llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models

    llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models Exploring llama.cpp internals and a basic chat program flow Photo by Mathew Schwartz on Unsplash llama.cpp has revolutionized the space of LLM inference by the means of wide adoption and simplicity. It has enabled enterprises and individual developers to deploy LLMs on devices ranging from SBCs…

  • How to Tell Among Two Regression Models with Statistical Significance

    How to Tell Among Two Regression Models with Statistical Significance Diving into the F-test for nested models with algorithms, examples and code Continue reading on Towards Data Science » LucianoSphere (Luciano Abriata, PhD) Go to original source

  • A Distributional Evaluation of Generative Image Models

    A Distributional Evaluation of Generative Image Models arXiv:2501.00744v1 Announce Type: new Abstract: Generative models are ubiquitous in modern artificial intelligence (AI) applications. Recent advances have led to a variety of generative modeling approaches that are capable of synthesizing highly realistic samples. Despite these developments, evaluating the distributional match between the synthetic samples and the target…

  • Multi-Agentic RAG with Hugging Face Code Agents

    Multi-Agentic RAG with Hugging Face Code Agents Using Qwen2.5–7B-Instruct powered code agents to create a local, open source, multi-agentic RAG system Photo by Jaredd Craig on Unsplash Large Language Models have shown impressive capabilities and they are still undergoing steady improvements with each new generation of models released. Applications such as chatbots and summarisation can directly exploit…

  • Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models

    Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models arXiv:2412.20586v1 Announce Type: new Abstract: Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models, which are statistical models representing cognitive processes. In this study, we test and improve the robustness of parameter estimation using amortized Bayesian inference (ABI)…

  • Linearizing Attention

    Linearizing Attention Breaking the quadratic barrier: modern alternatives to softmax attention Large Languange Models are great but they have a slight drawback that they use softmax attention which can be computationally intensive. In this article we will explore if there is a way we can replace the softmax somehow to achieve linear time complexity. Image…

  • A Statistical Framework for Ranking LLM-Based Chatbots

    A Statistical Framework for Ranking LLM-Based Chatbots arXiv:2412.18407v1 Announce Type: new Abstract: Large language models (LLMs) have transformed natural language processing, with frameworks like Chatbot Arena providing pioneering platforms for evaluating these models. By facilitating millions of pairwise comparisons based on human judgments, Chatbot Arena has become a cornerstone in LLM evaluation, offering rich datasets…

  • Adaptive Nonparametric Perturbations of Parametric Bayesian Models

    Adaptive Nonparametric Perturbations of Parametric Bayesian Models arXiv:2412.10683v2 Announce Type: cross Abstract: Parametric Bayesian modeling offers a powerful and flexible toolbox for scientific data analysis. Yet the model, however detailed, may still be wrong, and this can make inferences untrustworthy. In this paper we study nonparametrically perturbed parametric (NPP) Bayesian models, in which a parametric…

  • Deep Learning for Hydroelectric Optimization: Generating Long-Term River Discharge Scenarios with Ensemble Forecasts from Global Circulation Models

    Deep Learning for Hydroelectric Optimization: Generating Long-Term River Discharge Scenarios with Ensemble Forecasts from Global Circulation Models arXiv:2412.12234v1 Announce Type: cross Abstract: Hydroelectric power generation is a critical component of the global energy matrix, particularly in countries like Brazil, where it represents the majority of the energy supply. However, its strong dependence on river discharges,…

  • Master Machine Learning: 4 Classification Models Made Simple

    Master Machine Learning: 4 Classification Models Made Simple A Beginner’s Guide to Building Models in 15 Practical Steps Continue reading on Towards Data Science » Leo Anello Go to original source

  • Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models

    Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models In this article we will explore why 128K tokens and more models can’t fully replace using RAG. Continue reading on Towards Data Science » Jérôme DIAZ Go to original source

  • How to Prune LLaMA 3.2 and Similar Large Language Models

    How to Prune LLaMA 3.2 and Similar Large Language Models This article explores a structured pruning technique for state-of-the-art models, that uses a GLU architecture, enabling the creation of… Continue reading on Towards Data Science » Pere Martra Go to original source