Category: aimldsaimlds

Learning Multi-Index Models with Hyper-Kernel Ridge Regression

Learning Multi-Index Models with Hyper-Kernel Ridge Regression arXiv:2510.02532v1 Announce Type: new Abstract: Deep neural networks excel in high-dimensional problems, outperforming models such as kernel methods, which suffer from the curse of dimensionality. However, the theoretical foundations of this success remain poorly understood. We follow the idea that the compositional structure of the learning task is…

October 6, 2025
Weekly Entering & Transitioning – Thread 06 Oct, 2025 – 13 Oct, 2025

Weekly Entering & Transitioning – Thread 06 Oct, 2025 – 13 Oct, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

October 6, 2025
Why am I not getting responses?

Why am I not getting responses? As mentioned before, I can’t use the weekly transition because it doesn’t allow pictures. I appreciate your help last time when I asked. I’ve implemented your recommendations but I’m still not getting responses. I’ve added a completely new ML-based project, fixed mistakes, revamped the layout and I’m still not…

October 6, 2025
Do you know interesting datasets for kriging?

Do you know interesting datasets for kriging? Hi guys, I need to do a project using many linear models and I’m looking for a dataset. Ideally something interesting with lots of numerical variables, especially one where kriging could be applied. If you have any dataset suggestions or interesting research questions I could build the project…

October 6, 2025
What could be my next career progression?

What could be my next career progression? Hello, I’m 26 years old been working as a junior data scientist in marketing for the past two years and I’m a bit bored/ have no idea how to progress further in my career. Currently I do end to end modeling, from gathering data up to production (not…

October 6, 2025
Are LLMs necessary to get a job?

Are LLMs necessary to get a job? For someone laid off in 2023 before the LLM/Agent craze went mainstream, do you think I need to learn LLM architecture? Are certs or github projects worth anything as far as getting through the filters and/or landing a job? I have 10 YOE. I specialized in machine learning…

October 6, 2025
Classical Computer Vision and Perspective Transformation for Sudoku Extraction

Classical Computer Vision and Perspective Transformation for Sudoku Extraction Why you shouldn’t overcomplicate solutions to simple problems The post Classical Computer Vision and Perspective Transformation for Sudoku Extraction appeared first on Towards Data Science. Florian Trautweiler Go to original source

October 6, 2025
Building a Command-Line Quiz Application in R

Building a Command-Line Quiz Application in R Practice control flow, input handling, and functions in R by creating an interactive quiz game. The post Building a Command-Line Quiz Application in R appeared first on Towards Data Science. Benjamin Nweke Go to original source

October 6, 2025
Real-Time Intelligence in Microsoft Fabric: The Ultimate Guide

Real-Time Intelligence in Microsoft Fabric: The Ultimate Guide Once upon a time, handling streaming data was considered an avant-garde approach. Since the introduction of relational database management systems in the 1970s and traditional data warehousing systems in the late 1980s, all data workloads began and ended with the so-called batch processing. Batch processing relies on the concept of…

October 5, 2025
How to Build a Powerful Deep Research System

How to Build a Powerful Deep Research System Learn how to access vasts amounts of information with your own deep research system The post How to Build a Powerful Deep Research System appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

October 5, 2025
Build a Data Dashboard Using HTML, CSS, and JavaScript

Build a Data Dashboard Using HTML, CSS, and JavaScript A framework-free guide for Python programmers The post Build a Data Dashboard Using HTML, CSS, and JavaScript appeared first on Towards Data Science. Thomas Reid Go to original source

October 4, 2025
MobileNetV2 Paper Walkthrough: The Smarter Tiny Giant

MobileNetV2 Paper Walkthrough: The Smarter Tiny Giant Understanding and implementing MobileNetV2 with PyTorch — the next generation of MobileNetV1 The post MobileNetV2 Paper Walkthrough: The Smarter Tiny Giant appeared first on Towards Data Science. Muhammad Ardi Go to original source

October 4, 2025
Private Realizable-to-Agnostic Transformation with Near-Optimal Sample Complexity

Private Realizable-to-Agnostic Transformation with Near-Optimal Sample Complexity arXiv:2510.01291v1 Announce Type: new Abstract: The realizable-to-agnostic transformation (Beimel et al., 2015; Alon et al., 2020) provides a general mechanism to convert a private learner in the realizable setting (where the examples are labeled by some function in the concept class) to a private learner in the agnostic…

October 3, 2025
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling

Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling arXiv:2510.01329v1 Announce Type: new Abstract: Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token. This creates an ‘information void’ where semantic information that could be inferred from unmasked tokens is lost between denoising steps. We introduce Continuously Augmented…

October 3, 2025
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting arXiv:2510.01414v1 Announce Type: new Abstract: This paper analyzes the generalization error of minimum-norm interpolating solutions in linear regression using spiked covariance data models. The paper characterizes how varying spike strengths and target-spike alignments can affect risk, especially in overparameterized settings. The study presents…

October 3, 2025
A reproducible comparative study of categorical kernels for Gaussian process regression, with new clustering-based nested kernels

A reproducible comparative study of categorical kernels for Gaussian process regression, with new clustering-based nested kernels arXiv:2510.01840v1 Announce Type: new Abstract: Designing categorical kernels is a major challenge for Gaussian process regression with continuous and categorical inputs. Despite previous studies, it is difficult to identify a preferred method, either because the evaluation metrics, the optimization…

October 3, 2025
AI Foundation Model for Time Series with Innovations Representation

AI Foundation Model for Time Series with Innovations Representation arXiv:2510.01560v1 Announce Type: new Abstract: This paper introduces an Artificial Intelligence (AI) foundation model for time series in engineering applications, where causal operations are required for real-time monitoring and control. Since engineering time series are governed by physical, rather than linguistic, laws, large-language-model-based AI foundation models…

October 3, 2025
Prediction vs. Search Models: What Data Scientists Are Missing

Prediction vs. Search Models: What Data Scientists Are Missing How do platform firms set prices and make money? The post Prediction vs. Search Models: What Data Scientists Are Missing appeared first on Towards Data Science. Derek Tran Go to original source

October 3, 2025
AI Engineering and Evals as New Layers of Software Work

AI Engineering and Evals as New Layers of Software Work How to maintain reliability in inherently stochastic systems The post AI Engineering and Evals as New Layers of Software Work appeared first on Towards Data Science. Clara Chong Go to original source

October 3, 2025
What Makes a Language Look Like Itself?

What Makes a Language Look Like Itself? How simple statistics reveal the visual fingerprints of 20 languages The post What Makes a Language Look Like Itself? appeared first on Towards Data Science. Kenneth McCarthy Go to original source

October 3, 2025
Smarter, Not Harder: How AI’s Self-Doubt Unlocks Peak Performance

Smarter, Not Harder: How AI’s Self-Doubt Unlocks Peak Performance “Deep Think with Confidence,” a smarter way to scale reasoning tasks without wasting a massive amount of computation The post Smarter, Not Harder: How AI’s Self-Doubt Unlocks Peak Performance appeared first on Towards Data Science. Ankit Singh Chauhan Go to original source

October 3, 2025
Identifying All {epsilon}-Best Arms in (Misspecified) Linear Bandits

Identifying All {epsilon}-Best Arms in (Misspecified) Linear Bandits arXiv:2510.00073v1 Announce Type: new Abstract: Motivated by the need to efficiently identify multiple candidates in high trial-and-error cost tasks such as drug discovery, we propose a near-optimal algorithm to identify all {epsilon}-best arms (i.e., those at most {epsilon} worse than the optimum). Specifically, we introduce LinFACT, an…

October 2, 2025
Private Learning of Littlestone Classes, Revisited

Private Learning of Littlestone Classes, Revisited arXiv:2510.00076v1 Announce Type: new Abstract: We consider online and PAC learning of Littlestone classes subject to the constraint of approximate differential privacy. Our main result is a private learner to online-learn a Littlestone class with a mistake bound of $tilde{O}(d^{9.5}cdot log(T))$ in the realizable case, where $d$ denotes the…

October 2, 2025
CINDES: Classification induced neural density estimator and simulator

CINDES: Classification induced neural density estimator and simulator arXiv:2510.00367v1 Announce Type: new Abstract: Neural network-based methods for (un)conditional density estimation have recently gained substantial attention, as various neural density estimators have outperformed classical approaches in real-data experiments. Despite these empirical successes, implementation can be challenging due to the need to ensure non-negativity and unit-mass constraints,…

October 2, 2025
On the Adversarial Robustness of Learning-based Conformal Novelty Detection

On the Adversarial Robustness of Learning-based Conformal Novelty Detection arXiv:2510.00463v1 Announce Type: new Abstract: This paper studies the adversarial robustness of conformal novelty detection. In particular, we focus on AdaDetect, a powerful learning-based framework for novelty detection with finite-sample false discovery rate (FDR) control. While AdaDetect provides rigorous statistical guarantees under benign conditions, its behavior…

October 2, 2025
A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws

A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws arXiv:2510.00504v1 Announce Type: new Abstract: When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller…

October 2, 2025
Temporal-Difference Learning and the Importance of Exploration: An Illustrated Guide

Temporal-Difference Learning and the Importance of Exploration: An Illustrated Guide Comparing model-free and model-based RL methods on a dynamic grid world The post Temporal-Difference Learning and the Importance of Exploration: An Illustrated Guide appeared first on Towards Data Science. Ryan Pégoud Go to original source

October 2, 2025
Are Foundation Models Ready for Your Production Tabular Data?

Are Foundation Models Ready for Your Production Tabular Data? A complete review of architectures to make zero-shot predictions in the most common types of datasets. The post Are Foundation Models Ready for Your Production Tabular Data? appeared first on Towards Data Science. Carmen Adriana Martínez Barbosa Go to original source

October 2, 2025
How to Improve the Efficiency of Your PyTorch Training Loop

How to Improve the Efficiency of Your PyTorch Training Loop Learn how to diagnose and resolve bottlenecks in PyTorch using the num_workers, pin_memory, and profiler parameters to maximize training performance. The post How to Improve the Efficiency of Your PyTorch Training Loop appeared first on Towards Data Science. Andrea D’Agostino Go to original source

October 2, 2025
Data Visualization Explained (Part 2): An Introduction to Visual Variables

Data Visualization Explained (Part 2): An Introduction to Visual Variables A non-technical and accessible guide to the underlying concept behind visual design: visual encoding channels The post Data Visualization Explained (Part 2): An Introduction to Visual Variables appeared first on Towards Data Science. Murtaza Ali Go to original source

October 2, 2025
Visual Pollen Classification Using CNNs and Vision Transformers

Visual Pollen Classification Using CNNs and Vision Transformers Filling the data gap: A machine learning approach to pollen identification in ecology and biotechnology The post Visual Pollen Classification Using CNNs and Vision Transformers appeared first on Towards Data Science. Karol Struniawski Go to original source

October 2, 2025
Neural Optimal Transport Meets Multivariate Conformal Prediction

Neural Optimal Transport Meets Multivariate Conformal Prediction arXiv:2509.25444v1 Announce Type: new Abstract: We propose a framework for conditional vector quantile regression (CVQR) that combines neural optimal transport with amortized optimization, and apply it to multivariate conformal prediction. Classical quantile regression does not extend naturally to multivariate responses, while existing approaches often ignore the geometry of…

October 1, 2025
Fair Classification by Direct Intervention on Operating Characteristics

Fair Classification by Direct Intervention on Operating Characteristics arXiv:2509.25481v1 Announce Type: new Abstract: We develop new classifiers under group fairness in the attribute-aware setting for binary classification with multiple group fairness constraints (e.g., demographic parity (DP), equalized odds (EO), and predictive parity (PP)). We propose a novel approach, applicable to linear fractional constraints, based on…

October 1, 2025
Conservative Decisions with Risk Scores

Conservative Decisions with Risk Scores arXiv:2509.25588v1 Announce Type: new Abstract: In binary classification applications, conservative decision-making that allows for abstention can be advantageous. To this end, we introduce a novel approach that determines the optimal cutoff interval for risk scores, which can be directly available or derived from fitted models. Within this interval, the algorithm…

October 1, 2025
One-shot Conditional Sampling: MMD meets Nearest Neighbors

One-shot Conditional Sampling: MMD meets Nearest Neighbors arXiv:2509.25507v1 Announce Type: new Abstract: How can we generate samples from a conditional distribution that we never fully observe? This question arises across a broad range of applications in both modern machine learning and classical statistics, including image post-processing in computer vision, approximate posterior sampling in simulation-based inference,…

October 1, 2025
Coupling Generative Modeling and an Autoencoder with the Causal Bridge

Coupling Generative Modeling and an Autoencoder with the Causal Bridge arXiv:2509.25599v1 Announce Type: new Abstract: We consider inferring the causal effect of a treatment (intervention) on an outcome of interest in situations where there is potentially an unobserved confounder influencing both the treatment and the outcome. This is achievable by assuming access to two separate…

October 1, 2025
Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply

Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply Understanding Gini and Lorenz curves for smarter model evaluation The post Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply appeared first on Towards Data Science. Nikhil Dasari Go to original source

October 1, 2025
Actual Intelligence in the Age of AI

Actual Intelligence in the Age of AI Jarom Hulet on mastering fundamentals, hiring well, and deciding what to write about next The post Actual Intelligence in the Age of AI appeared first on Towards Data Science. TDS Editors Go to original source

October 1, 2025
How to Build Effective Agentic Systems with LangGraph

How to Build Effective Agentic Systems with LangGraph Create AI workflows with agentic frameworks The post How to Build Effective Agentic Systems with LangGraph appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

October 1, 2025
The Machine Learning Lessons I’ve Learned This Month

The Machine Learning Lessons I’ve Learned This Month September 2025: library or self-made, Ditto and Launchbar, reading widely and deeply The post The Machine Learning Lessons I’ve Learned This Month appeared first on Towards Data Science. Pascal Janetzky Go to original source

October 1, 2025
Variance-Bounded Evaluation without Ground Truth: VB-Score

Variance-Bounded Evaluation without Ground Truth: VB-Score arXiv:2509.22751v1 Announce Type: new Abstract: Reliable evaluation is a central challenge in machine learning when tasks lack ground truth labels or involve ambiguity and noise. Conventional frameworks, rooted in the Cranfield paradigm and label-based metrics, fail in such cases because they cannot assess how robustly a system performs under…

September 30, 2025
Concept activation vectors: a unifying view and adversarial attacks

Concept activation vectors: a unifying view and adversarial attacks arXiv:2509.22755v1 Announce Type: new Abstract: Concept Activation Vectors (CAVs) are a tool from explainable AI, offering a promising approach for understanding how human-understandable concepts are encoded in a model’s latent spaces. They are computed from hidden-layer activations of inputs belonging either to a concept class or…

September 30, 2025
Identifying Memory Effects in Epidemics via a Fractional SEIRD Model and Physics-Informed Neural Networks

Identifying Memory Effects in Epidemics via a Fractional SEIRD Model and Physics-Informed Neural Networks arXiv:2509.22760v1 Announce Type: new Abstract: We develop a physics-informed neural network (PINN) framework for parameter estimation in fractional-order SEIRD epidemic models. By embedding the Caputo fractional derivative into the network residuals via the L1 discretization scheme, our method simultaneously reconstructs epidemic…

September 30, 2025
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression

Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression arXiv:2509.22794v1 Announce Type: new Abstract: We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing…

September 30, 2025
A theoretical guarantee for SyncRank

A theoretical guarantee for SyncRank arXiv:2509.22766v1 Announce Type: new Abstract: We present a theoretical and empirical analysis of the SyncRank algorithm for recovering a global ranking from noisy pairwise comparisons. By adopting a complex-valued data model where the true ranking is encoded in the phases of a unit-modulus vector, we establish a sharp non-asymptotic recovery…

September 30, 2025
Preparing Video Data for Deep Learning: Introducing Vid Prepper

Preparing Video Data for Deep Learning: Introducing Vid Prepper A guide to fast video data preprocessing for machine learning The post Preparing Video Data for Deep Learning: Introducing Vid Prepper appeared first on Towards Data Science. Jamie Petherbridge-Conroy Go to original source

September 30, 2025
I Made My AI Model 84% Smaller and It Got Better, Not Worse

I Made My AI Model 84% Smaller and It Got Better, Not Worse The counterintuitive approach to AI optimization that’s changing how we deploy models The post I Made My AI Model 84% Smaller and It Got Better, Not Worse appeared first on Towards Data Science. Arjun Kaarat Go to original source

September 30, 2025
MCP in Practice

MCP in Practice Mapping power, concentration, and usage in the emerging AI developer ecosystem The post MCP in Practice appeared first on Towards Data Science. Sruly Rosenblat Go to original source

September 30, 2025
Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models

Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models arXiv:2509.21423v1 Announce Type: new Abstract: We study the problem of causal structure learning from a combination of observational and interventional data generated by a linear non-Gaussian structural equation model that might contain cycles. Recent results show that using mere observational data identifies the causal graph only up…

September 29, 2025
General Pruning Criteria for Fast SBL

General Pruning Criteria for Fast SBL arXiv:2509.21572v1 Announce Type: new Abstract: Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated hyperparameter. The method estimates the hyperparameters by marginalizing out the…

September 29, 2025
IndiSeek learns information-guided disentangled representations

IndiSeek learns information-guided disentangled representations arXiv:2509.21584v1 Announce Type: new Abstract: Learning disentangled representations is a fundamental task in multi-modal learning. In modern applications such as single-cell multi-omics, both shared and modality-specific features are critical for characterizing cell states and supporting downstream analyses. Ideally, modality-specific features should be independent of shared ones while also capturing all…

September 29, 2025
Effective continuous equations for adaptive SGD: a stochastic analysis view

Effective continuous equations for adaptive SGD: a stochastic analysis view arXiv:2509.21614v1 Announce Type: new Abstract: We present a theoretical analysis of some popular adaptive Stochastic Gradient Descent (SGD) methods in the small learning rate regime. Using the stochastic modified equations framework introduced by Li et al., we derive effective continuous stochastic dynamics for these methods.…

September 29, 2025
SADA: Safe and Adaptive Inference with Multiple Black-Box Predictions

SADA: Safe and Adaptive Inference with Multiple Black-Box Predictions arXiv:2509.21707v1 Announce Type: new Abstract: Real-world applications often face scarce labeled data due to the high cost and time requirements of gold-standard experiments, whereas unlabeled data are typically abundant. With the growing adoption of machine learning techniques, it has become increasingly feasible to generate multiple predicted…

September 29, 2025
Weekly Entering & Transitioning – Thread 29 Sep, 2025 – 06 Oct, 2025

Weekly Entering & Transitioning – Thread 29 Sep, 2025 – 06 Oct, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

September 29, 2025
What interesting projects are you working on that are not related to AI?

What interesting projects are you working on that are not related to AI? Share links if possible. submitted by /u/yaymayhun [link] [comments] /u/yaymayhun Go to original source

September 29, 2025
What a Drunk Man Can Teach Us About Time Series Forecasting

What a Drunk Man Can Teach Us About Time Series Forecasting Autocorrelation & The Random Walk explained with a drunk man 🍺 Let me illustrate this statistical concept with an example we can all visualize. Imagine a drunk man wandering a city. His steps are completely random and unpredictable. Here’s the intuition: – His current…

September 29, 2025
Relationship between ROC AUC and Gain curve?

Relationship between ROC AUC and Gain curve? Heya, I been studying the gains curve, and I’ve noticed there’s a relationship between the gains curve and ROC curve the smaller the base rate the closer is gains curve is to ROC curve. Anyway onto the point, is if fair to assume that for two models if…

September 29, 2025
Oscillatory Coordination in Cognitive Architectures: Old Dog, New Math

Oscillatory Coordination in Cognitive Architectures: Old Dog, New Math Been working in AI since before it was cool (think 80s expert systems, not ChatGPT hype). Lately I’ve been developing this cognitive architecture called OGI that uses Top-K gating between specialized modules. Works well, proved the stability, got the complexity down to O(k²). But something’s been…

September 29, 2025
Eulerian Melodies: Graph Algorithms for Music Composition

Eulerian Melodies: Graph Algorithms for Music Composition Conceptual overview and an end-to-end Python implementation The post Eulerian Melodies: Graph Algorithms for Music Composition appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

September 29, 2025
Learning Triton One Kernel At a Time: Vector Addition

Learning Triton One Kernel At a Time: Vector Addition The basics of GPU programming, optimisation, and your first Triton kernel The post Learning Triton One Kernel At a Time: Vector Addition appeared first on Towards Data Science. Ryan Pégoud Go to original source

September 28, 2025
What Clients Really Ask for in AI Projects

What Clients Really Ask for in AI Projects Managing AI projects is no walk in the park, but you have the power to make it easier for everyone The post What Clients Really Ask for in AI Projects appeared first on Towards Data Science. Ivo Bernardo Go to original source

September 28, 2025
Building Fact-Checking Systems: Catching Repeating False Claims Before They Spread

Building Fact-Checking Systems: Catching Repeating False Claims Before They Spread How retrieval and ensemble methods make fact-checking faster, scalable, and more reliable in a digital world The post Building Fact-Checking Systems: Catching Repeating False Claims Before They Spread appeared first on Towards Data Science. Iva Pezo Go to original source

September 27, 2025
Why MissForest Fails in Prediction Tasks: A Key Limitation You Need to Keep in Mind

Why MissForest Fails in Prediction Tasks: A Key Limitation You Need to Keep in Mind Why the original MissForest algorithm cannot be directly applied for predictive modeling, and how MissForestPredict solves this problem The post Why MissForest Fails in Prediction Tasks: A Key Limitation You Need to Keep in Mind appeared first on Towards Data…

September 27, 2025
Using Vision Language Models to Process Millions of Documents

Using Vision Language Models to Process Millions of Documents Learn how to effectively apply vision language models to problem solving The post Using Vision Language Models to Process Millions of Documents appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

September 27, 2025
Sample completion, structured correlation, and Netflix problems

Sample completion, structured correlation, and Netflix problems arXiv:2509.20404v1 Announce Type: new Abstract: We develop a new high-dimensional statistical learning model which can take advantage of structured correlation in data even in the presence of randomness. We completely characterize learnability in this model in terms of VCN${}_{k,k}$-dimension (essentially $k$-dependence from Shelah’s classification theory). This model suggests…

September 26, 2025
Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances

Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances arXiv:2509.20508v1 Announce Type: new Abstract: We address the problem of efficiently computing Wasserstein distances for multiple pairs of distributions drawn from a meta-distribution. To this end, we propose a fast estimation method based on regressing Wasserstein distance on sliced Wasserstein (SW) distances. Specifically, we…

September 26, 2025
Unsupervised Domain Adaptation with an Unobservable Source Subpopulation

Unsupervised Domain Adaptation with an Unobservable Source Subpopulation arXiv:2509.20587v1 Announce Type: new Abstract: We study an unsupervised domain adaptation problem where the source domain consists of subpopulations defined by the binary label $Y$ and a binary background (or environment) $A$. We focus on a challenging setting in which one such subpopulation in the source domain…

September 26, 2025
A Hierarchical Variational Graph Fused Lasso for Recovering Relative Rates in Spatial Compositional Data

A Hierarchical Variational Graph Fused Lasso for Recovering Relative Rates in Spatial Compositional Data arXiv:2509.20636v1 Announce Type: new Abstract: The analysis of spatial data from biological imaging technology, such as imaging mass spectrometry (IMS) or imaging mass cytometry (IMC), is challenging because of a competitive sampling process which convolves signals from molecules in a single…

September 26, 2025
A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity

A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity arXiv:2509.20618v1 Announce Type: new Abstract: We study gapped scale-sensitive dimensions of a function class in both sequential and non-sequential settings. We demonstrate that covering numbers for any uniformly bounded class are controlled above by these gapped dimensions, generalizing the results of cite{anthony2000function,alon1997scale}. Moreover, we…

September 26, 2025
TDS Newsletter: To Better Understand AI, Look Under the Hood

TDS Newsletter: To Better Understand AI, Look Under the Hood AI-powered tools tend to generate extreme reactions: on one side we have the “It’s magic!” and “best thing ever!” crowd. On the other, we find the “we’re doomed!” camp. These aren’t static or monolithic groups, of course. You might even find yourself on both ends of…

September 26, 2025
Notes on LLM Evaluation

Notes on LLM Evaluation A practical, step-by-step guide to building an evaluation pipeline for a real-world AI application The post Notes on LLM Evaluation appeared first on Towards Data Science. Felipe Adachi Go to original source

September 26, 2025
Building a Video Game Recommender System with FastAPI, PostgreSQL, and Render: Part 2

Building a Video Game Recommender System with FastAPI, PostgreSQL, and Render: Part 2 Deploying a FastAPI + PostgreSQL recommender system as a web application on Render The post Building a Video Game Recommender System with FastAPI, PostgreSQL, and Render: Part 2 appeared first on Towards Data Science. Lucas See Go to original source

September 26, 2025
Building Video Game Recommender Systems with FastAPI, PostgreSQL, and Render: Part 1

Building Video Game Recommender Systems with FastAPI, PostgreSQL, and Render: Part 1 Designing a video game recommendations service with Steams API The post Building Video Game Recommender Systems with FastAPI, PostgreSQL, and Render: Part 1 appeared first on Towards Data Science. Lucas See Go to original source

September 26, 2025
Stochastic Path Planning in Correlated Obstacle Fields

Stochastic Path Planning in Correlated Obstacle Fields arXiv:2509.19559v1 Announce Type: new Abstract: We introduce the Stochastic Correlated Obstacle Scene (SCOS) problem, a navigation setting with spatially correlated obstacles of uncertain blockage status, realistically constrained sensors that provide noisy readings and costly disambiguation. Modeling the spatial correlation with Gaussian Random Field (GRF), we develop Bayesian belief…

September 25, 2025
Anchored Langevin Algorithms

Anchored Langevin Algorithms arXiv:2509.19455v1 Announce Type: new Abstract: Standard first-order Langevin algorithms such as the unadjusted Langevin algorithm (ULA) are obtained by discretizing the Langevin diffusion and are widely used for sampling in machine learning because they scale to high dimensions and large datasets. However, they face two key limitations: (i) they require differentiable log-densities,…

September 25, 2025
MAGIC: Multi-task Gaussian process for joint imputation and classification in healthcare time series

MAGIC: Multi-task Gaussian process for joint imputation and classification in healthcare time series arXiv:2509.19577v1 Announce Type: new Abstract: Time series analysis has emerged as an important tool for improving patient diagnosis and management in healthcare applications. However, these applications commonly face two critical challenges: time misalignment and data sparsity. Traditional approaches address these issues through…

September 25, 2025
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies

Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies arXiv:2509.19707v1 Announce Type: new Abstract: Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this…

September 25, 2025
Convex Regression with a Penalty

Convex Regression with a Penalty arXiv:2509.19788v1 Announce Type: new Abstract: A common way to estimate an unknown convex regression function $f_0: Omega subset mathbb{R}^d rightarrow mathbb{R}$ from a set of $n$ noisy observations is to fit a convex function that minimizes the sum of squared errors. However, this estimator is known for its tendency to…

September 25, 2025
Decoding Nonlinear Signals In Large Observational Datasets

Decoding Nonlinear Signals In Large Observational Datasets Rain, snow, or something In between? The post Decoding Nonlinear Signals In Large Observational Datasets appeared first on Towards Data Science. Fraser King Go to original source

September 25, 2025
RAG Explained: Reranking for Better Answers

RAG Explained: Reranking for Better Answers How reranking improves retrieval-augmented generation by surfacing the most relevant results The post RAG Explained: Reranking for Better Answers appeared first on Towards Data Science. Maria Mouschoutzi Go to original source

September 25, 2025
PyTorch Explained: From Automatic Differentiation to Training Custom Neural Networks

PyTorch Explained: From Automatic Differentiation to Training Custom Neural Networks Deep learning is shaping our world as we speak. In fact, it has been slowly revolutionizing software since the early 2010s. In 2025, PyTorch is at the forefront of this revolution, emerging as one of the most important libraries to train neural networks. Whether you…

September 25, 2025
Introducing the AI-3P Assessment Framework: Score AI Projects Before Committing Resources

Introducing the AI-3P Assessment Framework: Score AI Projects Before Committing Resources A question-driven scorecard to prioritize and de-risk AI initiatives before implementation The post Introducing the AI-3P Assessment Framework: Score AI Projects Before Committing Resources appeared first on Towards Data Science. Marina Tosic Go to original source

September 25, 2025
Generating Consistent Imagery with Gemini

Generating Consistent Imagery with Gemini A practical guide to building a prompt-based generation pipeline for your image library The post Generating Consistent Imagery with Gemini appeared first on Towards Data Science. Laurent Picard Go to original source

September 24, 2025
The Art of Asking Good Questions

The Art of Asking Good Questions As a data scientist, are you driving product decisions? Or just supporting them? The right questions can turn AI from a threat into your career’s best ally. Here’s how to start asking them. The post The Art of Asking Good Questions appeared first on Towards Data Science. Greg Rafferty…

September 24, 2025
Generative AI Myths, Busted: An Engineers’s Quick Guide

Generative AI Myths, Busted: An Engineers’s Quick Guide A super simple and quick guide to how generative AI works, the myths around it, and why it won’t replace engineers anytime soon. The post Generative AI Myths, Busted: An Engineers’s Quick Guide appeared first on Towards Data Science. Amy Ma Go to original source

September 24, 2025
Why Are Marketers Turning To Quasi Geo-Lift Experiments? (And How to Plan Them)

Why Are Marketers Turning To Quasi Geo-Lift Experiments? (And How to Plan Them) Are “quasi” geo-lift experiments the missing piece for your marketing science function? The post Why Are Marketers Turning To Quasi Geo-Lift Experiments? (And How to Plan Them) appeared first on Towards Data Science. Tomas Jancovic Go to original source

September 24, 2025
5 Techniques to Prevent Hallucinations in Your RAG Question Answering

5 Techniques to Prevent Hallucinations in Your RAG Question Answering Learn how to reduce the number of hallucinations, and the impact they have The post 5 Techniques to Prevent Hallucinations in Your RAG Question Answering appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

September 24, 2025
Low-Rank Adaptation of Evolutionary Deep Neural Networks for Efficient Learning of Time-Dependent PDEs

Low-Rank Adaptation of Evolutionary Deep Neural Networks for Efficient Learning of Time-Dependent PDEs arXiv:2509.16395v1 Announce Type: new Abstract: We study the Evolutionary Deep Neural Network (EDNN) framework for accelerating numerical solvers of time-dependent partial differential equations (PDEs). We introduce a Low-Rank Evolutionary Deep Neural Network (LR-EDNN), which constrains parameter evolution to a low-rank subspace, thereby…

September 23, 2025
Conditional Multidimensional Scaling with Incomplete Conditioning Data

Conditional Multidimensional Scaling with Incomplete Conditioning Data arXiv:2509.16627v1 Announce Type: new Abstract: Conditional multidimensional scaling seeks for a low-dimensional configuration from pairwise dissimilarities, in the presence of other known features. By taking advantage of available data of the known features, conditional multidimensional scaling improves the estimation quality of the low-dimensional configuration and simplifies knowledge discovery…

September 23, 2025
System-Level Uncertainty Quantification with Multiple Machine Learning Models: A Theoretical Framework

System-Level Uncertainty Quantification with Multiple Machine Learning Models: A Theoretical Framework arXiv:2509.16663v1 Announce Type: new Abstract: ML models have errors when used for predictions. The errors are unknown but can be quantified by model uncertainty. When multiple ML models are trained using the same training points, their model uncertainties may be statistically dependent. In reality,…

September 23, 2025
DoubleGen: Debiased Generative Modeling of Counterfactuals

DoubleGen: Debiased Generative Modeling of Counterfactuals arXiv:2509.16842v1 Announce Type: new Abstract: Generative models for counterfactual outcomes face two key sources of bias. Confounding bias arises when approaches fail to account for systematic differences between those who receive the intervention and those who do not. Misspecification bias arises when methods attempt to address confounding through estimation…

September 23, 2025
Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization arXiv:2509.17251v1 Announce Type: new Abstract: Existing theory suggests that for linear regression problems categorized by capacity and source conditions, gradient descent (GD) is always minimax optimal, while both ridge regression and online stochastic gradient descent (SGD) are polynomially suboptimal for certain categories of such problems.…

September 23, 2025
How to Connect an MCP Server for an AI-Powered, Supply-Chain Network Optimization Agent

How to Connect an MCP Server for an AI-Powered, Supply-Chain Network Optimization Agent From prompt to strategic decision-making: MCP-powered agents for cost-efficient, reliable and sustainable supply chain network design. The post How to Connect an MCP Server for an AI-Powered, Supply-Chain Network Optimization Agent appeared first on Towards Data Science. Samir Saci Go to original…

September 23, 2025
The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling

The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling Understanding how banks use the KS statistic in loan approvals. The post The Kolmogorov–Smirnov Statistic, Explained: Measuring Model Power in Credit Risk Modeling appeared first on Towards Data Science. Nikhil Dasari Go to original source

September 23, 2025
Creating and Deploying an MCP Server from Scratch

Creating and Deploying an MCP Server from Scratch A step-by-step guide for putting an MCP server online in minutes The post Creating and Deploying an MCP Server from Scratch appeared first on Towards Data Science. Vyacheslav Efimov Go to original source

September 23, 2025
Integrating DataHub into Jira: A Practical Guide Using DataHub Actions

Integrating DataHub into Jira: A Practical Guide Using DataHub Actions A walkthrough of how to integrate metadata changes in DataHub into Jira workflows using the DataHub Actions Framework The post Integrating DataHub into Jira: A Practical Guide Using DataHub Actions appeared first on Towards Data Science. Jimin Kang Go to original source

September 23, 2025
The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI

The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI Is it possible to build a perfect induction machine? The post The Theory of Universal Computation: Bayesian Optimality, Solomonoff Induction & AIXI appeared first on Towards Data Science. Angjelin Hila Go to original source

September 23, 2025
SETrLUSI: Stochastic Ensemble Multi-Source Transfer Learning Using Statistical Invariant

SETrLUSI: Stochastic Ensemble Multi-Source Transfer Learning Using Statistical Invariant arXiv:2509.15593v1 Announce Type: new Abstract: In transfer learning, a source domain often carries diverse knowledge, and different domains usually emphasize different types of knowledge. Different from handling only a single type of knowledge from all domains in traditional transfer learning methods, we introduce an ensemble learning…

September 22, 2025
Phase Transition for Stochastic Block Model with more than $sqrt{n}$ Communities

Phase Transition for Stochastic Block Model with more than $sqrt{n}$ Communities arXiv:2509.15822v1 Announce Type: new Abstract: Predictions from statistical physics postulate that recovery of the communities in Stochastic Block Model (SBM) is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that…

September 22, 2025
Interpretable Network-assisted Random Forest+

Interpretable Network-assisted Random Forest+ arXiv:2509.15611v1 Announce Type: new Abstract: Machine learning algorithms often assume that training samples are independent. When data points are connected by a network, the induced dependency between samples is both a challenge, reducing effective sample size, and an opportunity to improve prediction by leveraging information from network neighbors. Multiple methods taking…

September 22, 2025