Category: aimldsaimlds
-
The Data Team’s Survival Guide for the Next Era of Data
The Data Team’s Survival Guide for the Next Era of Data 6 pillars to declutter your stack, escape the service trap, and build the missing foundations for the new primary data consumer: the AI agent. The post The Data Team’s Survival Guide for the Next Era of Data appeared first on Towards Data Science. Mahdi…
-
What Makes Quantum Machine Learning “Quantum”?
What Makes Quantum Machine Learning “Quantum”? And where is it today? The post What Makes Quantum Machine Learning “Quantum”? appeared first on Towards Data Science. Sara A. Metwalli Go to original source
-
How to Create Production-Ready Code with Claude Code
How to Create Production-Ready Code with Claude Code Learn how to write robust code with coding agents. The post How to Create Production-Ready Code with Claude Code appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
The Black Box Problem: Why AI-Generated Code Stops Being Maintainable
The Black Box Problem: Why AI-Generated Code Stops Being Maintainable Same notification system, two architectures. Unstructured generation couples everything into a single module. Structured generation decomposes into independent components with explicit, one-directional dependencies. Image by the author The post The Black Box Problem: Why AI-Generated Code Stops Being Maintainable appeared first on Towards Data Science.…
-
The Volterra signature
The Volterra signature arXiv:2603.04525v1 Announce Type: new Abstract: Modern approaches for learning from non-Markovian time series, such as recurrent neural networks, neural controlled differential equations or transformers, typically rely on implicit memory mechanisms that can be difficult to interpret or to train over long horizons. We propose the Volterra signature $mathrm{VSig}(x;K)$ as a principled, explicit…
-
Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective
Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective arXiv:2603.04479v1 Announce Type: new Abstract: We study the Collatz total stopping time $tau(n)$ over $nle 10^7$ from a probabilistic machine learning viewpoint. Empirically, $tau(n)$ is a skewed and heavily overdispersed count with pronounced arithmetic heterogeneity. We develop two complementary models. First, a Bayesian hierarchical…
-
Dictionary Based Pattern Entropy for Causal Direction Discovery
Dictionary Based Pattern Entropy for Causal Direction Discovery arXiv:2603.04473v1 Announce Type: new Abstract: Discovering causal direction from temporal observational data is particularly challenging for symbolic sequences, where functional models and noise assumptions are often unavailable. We propose a novel emph{Dictionary Based Pattern Entropy ($DPE$)} framework that infers both the direction of causation and the specific…
-
The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization
The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization arXiv:2603.04807v1 Announce Type: new Abstract: We study how architectural inductive bias reshapes the implicit regularization induced by the edge-of-stability phenomenon in gradient descent. Prior work has established that for fully connected networks, the strength of this regularization is governed solely by…
-
Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions
Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions arXiv:2603.04635v1 Announce Type: new Abstract: Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $epsilon$-far from all product distributions in total variation distance.…
-
AI in Multiple GPUs: ZeRO & FSDP
AI in Multiple GPUs: ZeRO & FSDP Learn how Zero Redundancy Optimizer works, how to implement it from scratch, and how to use it in PyTorch The post AI in Multiple GPUs: ZeRO & FSDP appeared first on Towards Data Science. Lorenzo Cesconetto Go to original source
-
How Human Work Will Remain Valuable in an AI World
How Human Work Will Remain Valuable in an AI World The Road to Reality — Episode 1 The post How Human Work Will Remain Valuable in an AI World appeared first on Towards Data Science. Favio Vázquez Go to original source
-
The Theory behind UMAP?
The Theory behind UMAP? arXiv:2603.03375v1 Announce Type: new Abstract: In 2018, McInnes et al. introduced a dimensionality reduction algorithm called UMAP, which enjoys wide popularity among data scientists. Their work introduces a finite variant of a functor called the metric realization, based on an unpublished draft by Spivak. This draft contains many errors, most of…
-
Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents
Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents arXiv:2603.03401v1 Announce Type: new Abstract: This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify iteration increments in KGD, deriving an adaptive parameter selection strategy…
-
Learning Order Forest for Qualitative-Attribute Data Clustering
Learning Order Forest for Qualitative-Attribute Data Clustering arXiv:2603.03387v1 Announce Type: new Abstract: Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e.g., the nominal values of attributes like symptoms, marital status,…
-
Surprisal-R’enyi Free Energy
Surprisal-R’enyi Free Energy arXiv:2603.03405v1 Announce Type: new Abstract: The forward and reverse Kullback-Leibler (KL) divergences arise as limiting objectives in learning and inference yet induce markedly different inductive biases that cannot be explained at the level of expectations alone. In this work, we introduce the Surprisal-R’enyi Free Energy (SRFE), a log-moment-based functional of the likelihood…
-
Scalable Contrastive Causal Discovery under Unknown Soft Interventions
Scalable Contrastive Causal Discovery under Unknown Soft Interventions arXiv:2603.03411v1 Announce Type: new Abstract: Observational causal discovery is only identifiable up to the Markov equivalence class. While interventions can reduce this ambiguity, in practice interventions are often soft with multiple unknown targets. In many realistic scenarios, only a single intervention regime is observed. We propose a…
-
5 Ways to Implement Variable Discretization
5 Ways to Implement Variable Discretization An overview of powerful methods for transforming continuous variables into discrete ones The post 5 Ways to Implement Variable Discretization appeared first on Towards Data Science. Rukshan Pramoditha Go to original source
-
Escaping the Prototype Mirage: Why Enterprise AI Stalls
Escaping the Prototype Mirage: Why Enterprise AI Stalls Too many prototypes, too few products The post Escaping the Prototype Mirage: Why Enterprise AI Stalls appeared first on Towards Data Science. Reya Vir Go to original source
-
Stop Tuning Hyperparameters. Start Tuning Your Problem.
Stop Tuning Hyperparameters. Start Tuning Your Problem. 80% of ML projects fail from bad problem framing, not bad models. A 5-step protocol to define the right problem before you write training code. The post Stop Tuning Hyperparameters. Start Tuning Your Problem. appeared first on Towards Data Science. Kaushik Rajan Go to original source
-
RAG with Hybrid Search: How Does Keyword Search Work?
RAG with Hybrid Search: How Does Keyword Search Work? Understanding keyword search, TF-IDF, and BM25 The post RAG with Hybrid Search: How Does Keyword Search Work? appeared first on Towards Data Science. Maria Mouschoutzi Go to original source
-
Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits
Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits arXiv:2603.02417v1 Announce Type: new Abstract: We develop a Fisher-geometric theory of stochastic gradient descent (SGD) in which mini-batch noise is an intrinsic, loss-induced matrix — not an exogenous scalar variance. Under exchangeable sampling, the mini-batch gradient covariance is pinned down (to leading…
-
Low-Degree Method Fails to Predict Robust Subspace Recovery
Low-Degree Method Fails to Predict Robust Subspace Recovery arXiv:2603.02594v1 Announce Type: new Abstract: The low-degree polynomial framework has been highly successful in predicting computational versus statistical gaps for high-dimensional problems in average-case analysis and machine learning. This success has led to the low-degree conjecture, which posits that this method captures the power and limitations of…
-
Geometric structures and deviations on James’ symmetric positive-definite matrix bicone domain
Geometric structures and deviations on James’ symmetric positive-definite matrix bicone domain arXiv:2603.02483v1 Announce Type: new Abstract: Symmetric positive-definite (SPD) matrix datasets play a central role across numerous scientific disciplines, including signal processing, statistics, finance, computer vision, information theory, and machine learning among others. The set of SPD matrices forms a cone which can be viewed…
-
Conformal Graph Prediction with Z-Gromov Wasserstein Distances
Conformal Graph Prediction with Z-Gromov Wasserstein Distances arXiv:2603.02460v1 Announce Type: new Abstract: Supervised graph prediction addresses regression problems where the outputs are structured graphs. Although several approaches exist for graph–valued prediction, principled uncertainty quantification remains limited. We propose a conformal prediction framework for graph-valued outputs, providing distribution–free coverage guarantees in structured output spaces. Our method…
-
Combinatorial Sparse PCA Beyond the Spiked Identity Model
Combinatorial Sparse PCA Beyond the Spiked Identity Model arXiv:2603.02607v1 Announce Type: new Abstract: Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Sigma$, whose top eigenvector $v in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into…
-
Graph Coloring You Can See
Graph Coloring You Can See Visual intuition with Python The post Graph Coloring You Can See appeared first on Towards Data Science. Rhyd Lewis Go to original source
-
Why You Should Stop Writing Loops in Pandas
Why You Should Stop Writing Loops in Pandas How to think in columns, write faster code, and finally use Pandas like a professional The post Why You Should Stop Writing Loops in Pandas appeared first on Towards Data Science. Ibrahim Salami Go to original source
-
I Quit My $130,000 ML Engineer Job After Learning 4 Lessons
I Quit My $130,000 ML Engineer Job After Learning 4 Lessons What they don’t tell you about “dream tech jobs” The post I Quit My $130,000 ML Engineer Job After Learning 4 Lessons appeared first on Towards Data Science. Egor Howell Go to original source
-
Agentic RAG vs Classic RAG: From a Pipeline to a Control Loop
Agentic RAG vs Classic RAG: From a Pipeline to a Control Loop A practical guide to choosing between single-pass pipelines and adaptive retrieval loops based on your use case’s complexity, cost, and reliability requirements The post Agentic RAG vs Classic RAG: From a Pipeline to a Control Loop appeared first on Towards Data Science. Mostafa…
-
Initialization-Aware Score-Based Diffusion Sampling
Initialization-Aware Score-Based Diffusion Sampling arXiv:2603.00772v1 Announce Type: new Abstract: Score-based generative models (SGMs) aim at generating samples from a target distribution by approximating the reverse-time dynamics of a stochastic differential equation. Despite their strong empirical performance, classical samplers initialized from a Gaussian distribution require a long time horizon noising typically inducing a large number of…
-
The Partition Principle Revisited: Non-Equal Volume Designs Achieve Minimal Expected Star Discrepancy
The Partition Principle Revisited: Non-Equal Volume Designs Achieve Minimal Expected Star Discrepancy arXiv:2603.00202v1 Announce Type: new Abstract: We study the expected star discrepancy under a newly designed class of non-equal volume partitions. The main contributions are twofold. First, we establish a strong partition principle for the star discrepancy, showing that our newly designed non-equal volume…
-
Time-Aware Latent Space Bayesian Optimization
Time-Aware Latent Space Bayesian Optimization arXiv:2603.00935v1 Announce Type: new Abstract: Latent-space Bayesian optimization (LSBO) extends Bayesian optimization to structured domains, such as molecular design, by searching in the continuous latent space of a generative model. However, most LSBO methods assume a fixed objective, whereas real design campaigns often face temporal drift (e.g., evolving preferences or…
-
Random Features for Operator-Valued Kernels: Bridging Kernel Methods and Neural Operators
Random Features for Operator-Valued Kernels: Bridging Kernel Methods and Neural Operators arXiv:2603.00971v1 Announce Type: new Abstract: In this work, we investigate the generalization properties of random feature methods. Our analysis extends prior results for Tikhonov regularization to a broad class of spectral regularization techniques and further generalizes the setting to operator-valued kernels. This unified framework…
-
Learning with the Nash-Sutcliffe loss
Learning with the Nash-Sutcliffe loss arXiv:2603.00968v1 Announce Type: new Abstract: The Nash-Sutcliffe efficiency ($text{NSE}$) is a widely used, positively oriented relative measure for evaluating forecasts across multiple time series. However, it lacks a decision-theoretic foundation for this purpose. To address this, we examine its negatively oriented counterpart, which we refer to as Nash-Sutcliffe loss, defined…
-
YOLOv3 Paper Walkthrough: Even Better, But Not That Much
YOLOv3 Paper Walkthrough: Even Better, But Not That Much A PyTorch implementation on the YOLOv3 architecture from scratch The post YOLOv3 Paper Walkthrough: Even Better, But Not That Much appeared first on Towards Data Science. Muhammad Ardi Go to original source
-
The Machine Learning Lessons I’ve Learned This Month
The Machine Learning Lessons I’ve Learned This Month February 2026: exchange with others, documentation, and MLOps The post The Machine Learning Lessons I’ve Learned This Month appeared first on Towards Data Science. Pascal Janetzky Go to original source
-
Code Less, Ship Faster: Building APIs with FastAPI
Code Less, Ship Faster: Building APIs with FastAPI Master path operations, Pydantic models, dependency injection, and automatic documentation. The post Code Less, Ship Faster: Building APIs with FastAPI appeared first on Towards Data Science. Thomas Reid Go to original source
-
Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models
Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models arXiv:2602.23518v1 Announce Type: new Abstract: Deep generative models (DGMs) compress high-dimensional data but often entangle distinct physical factors in their latent spaces. We present an auxiliary-variable-guided framework for disentangling representations of thermal Sunyaev-Zel’dovich (tSZ) maps of dark matter halos. We introduce halo mass…
-
Partition Function Estimation under Bounded f-Divergence
Partition Function Estimation under Bounded f-Divergence arXiv:2602.23535v1 Announce Type: new Abstract: We study the statistical complexity of estimating partition functions given sample access to a proposal distribution and an unnormalized density ratio for a target distribution. While partition function estimation is a classical problem, existing guarantees typically rely on structural assumptions about the domain or…
-
Multivariate Spatio-Temporal Neural Hawkes Processes
Multivariate Spatio-Temporal Neural Hawkes Processes arXiv:2602.23629v1 Announce Type: new Abstract: We propose a Multivariate Spatio-Temporal Neural Hawkes Process for modeling complex multivariate event data with spatio-temporal dynamics. The proposed model extends continuous-time neural Hawkes processes by integrating spatial information into latent state evolution through learned temporal and spatial decay dynamics, enabling flexible modeling of excitation…
-
Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables
Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables arXiv:2602.23611v1 Announce Type: new Abstract: Algorithmic decisions about individuals require predictions that are not only accurate but also fair with respect to sensitive attributes such as gender and race. Causal notions of fairness align with legal requirements, yet many…
-
Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data
Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data arXiv:2602.23602v1 Announce Type: new Abstract: Heteroscedasticity — where the variance of a variable changes with other variables — is pervasive in real data, and elucidating why it arises from the perspective of statistical moments is crucial in scientific knowledge discovery and decision-making. However,…
-
Weekly Entering & Transitioning – Thread 02 Mar, 2026 – 09 Mar, 2026
Weekly Entering & Transitioning – Thread 02 Mar, 2026 – 09 Mar, 2026 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…
-
So what do y’all think of the Block layoffs?
So what do y’all think of the Block layoffs? My upcoming interview with Block got canceled, and I am in a bit of relief but at the same time it made me question where is the industry in general headed to. Block CEO is attributing the layoffs to AI. As an active job seeker and…
-
Time Series Themed Children’s Book
Time Series Themed Children’s Book For the parents out there’s looking to share the joys of data collection, cleaning, time series modeling, and forecasting error with their little ones. Written completely in rhyme and all about using data to solve problems. Alternatively, Harry’s Lemonade Solution could be used to teach your parents a little bit…
-
The top 5 most common product analytics case interview questions asked in big tech interviews
The top 5 most common product analytics case interview questions asked in big tech interviews Hey folks, You might remember me from my previous posts about my progression into big tech or my guide to passing A/B Test interview questions. Well, I’m back with what will hopefully be more helpful interview tips. These are tips…
-
My experience after final round interviews at 3 tech companies
My experience after final round interviews at 3 tech companies Hey folks, this is an update from my previous post (here). You might also remember me for my previous posts about how to pass product analytics interviews in tech, and how to pass AB testing/Experimentation interviews. For context, I was laid off last year, took…
-
Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale
Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale Reducing LLM costs by 30% with validation-aware, multi-tier caching The post Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale appeared first on Towards Data Science. Partha Sarkar Go to original source
-
Context Engineering as Your Competitive Edge
Context Engineering as Your Competitive Edge If you have both unique domain expertise and know how to make it usable to your AI systems, you’ll be hard to beat. The post Context Engineering as Your Competitive Edge appeared first on Towards Data Science. Dr. Janna Lipenkova Go to original source
-
Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel
Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel How reusable, lazy-loaded instructions solve the context bloat problem in AI-assisted development. The post Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel appeared first on Towards Data Science. Ruben Broekx Go to original source
-
Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not?
Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not? A case study on techniques to maximize your clusters The post Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not? appeared first on Towards Data Science. Hector Mejia Go to original source
-
Coding the Pong Game from Scratch in Python
Coding the Pong Game from Scratch in Python Implementing the classic Pong game in Python using OOP and Turtle The post Coding the Pong Game from Scratch in Python appeared first on Towards Data Science. Mahnoor Javed Go to original source
-
Stop Asking if a Model Is Interpretable
Stop Asking if a Model Is Interpretable Start asking what question the explanation should answer. The post Stop Asking if a Model Is Interpretable appeared first on Towards Data Science. Manuel Franco de la Peña Go to original source
-
Generative AI, Discriminative Human
Generative AI, Discriminative Human How to think critically about AI in an ocean of hype The post Generative AI, Discriminative Human appeared first on Towards Data Science. Jason Tamara Widjaja Go to original source
-
The Gap Between Junior and Senior Data Scientists Isn’t Code
The Gap Between Junior and Senior Data Scientists Isn’t Code Why my obsession with complex algorithms was actually holding my career back. The post The Gap Between Junior and Senior Data Scientists Isn’t Code appeared first on Towards Data Science. Benjamin Nweke Go to original source
-
LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees
LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees arXiv:2602.22432v1 Announce Type: new Abstract: Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under…
-
Flow Matching is Adaptive to Manifold Structures
Flow Matching is Adaptive to Manifold Structures arXiv:2602.22486v1 Announce Type: new Abstract: Flow matching has emerged as a simulation-free alternative to diffusion-based generative modeling, producing samples by solving an ODE whose time-dependent velocity field is learned along an interpolation between a simple source distribution (e.g., a standard normal) and a target data distribution. Flow-based methods…
-
From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference
From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference arXiv:2602.22492v1 Announce Type: new Abstract: In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence…
-
Unsupervised Continual Learning for Amortized Bayesian Inference
Unsupervised Continual Learning for Amortized Bayesian Inference arXiv:2602.22884v1 Announce Type: new Abstract: Amortized Bayesian Inference (ABI) enables efficient posterior estimation using generative neural networks trained on simulated data, but often suffers from performance degradation under model misspecification. While self-consistency (SC) training on unlabeled empirical data can enhance network robustness, current approaches are limited to static,…
-
Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks
Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks arXiv:2602.22925v1 Announce Type: new Abstract: We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly…
-
A Generalizable MARL-LP Approach for Scheduling in Logistics
A Generalizable MARL-LP Approach for Scheduling in Logistics Part 1. Hybrid Solution for Dynamic Vehicle Routing — Context and Architecture The post A Generalizable MARL-LP Approach for Scheduling in Logistics appeared first on Towards Data Science. Alexander Levin Go to original source
-
Detecting and Editing Visual Objects with Gemini
Detecting and Editing Visual Objects with Gemini A practical guide to identifying, restoring, and transforming elements within your images The post Detecting and Editing Visual Objects with Gemini appeared first on Towards Data Science. Laurent Picard Go to original source
-
Take a Deep Dive into Filtering in DAX
Take a Deep Dive into Filtering in DAX Have you ever wondered what happens when you apply a filter in a DAX expression? Well, Today I will take you on a deep dive into this fascinating topic, with examples to help you learn something new and surprising. The post Take a Deep Dive into Filtering…
-
Counterdiabatic Hamiltonian Monte Carlo
Counterdiabatic Hamiltonian Monte Carlo arXiv:2602.21272v1 Announce Type: new Abstract: Hamiltonian Monte Carlo (HMC) is a state of the art method for sampling from distributions with differentiable densities, but can converge slowly when applied to challenging multimodal problems. Running HMC with a time varying Hamiltonian, in order to interpolate from an initial tractable distribution to the…
-
Efficient Uncoupled Learning Dynamics with $tilde{O}!left(T^{-1/4}right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback
Efficient Uncoupled Learning Dynamics with $tilde{O}!left(T^{-1/4}right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback arXiv:2602.21436v1 Announce Type: new Abstract: In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging…
-
Conditional neural control variates for variance reduction in Bayesian inverse problems
Conditional neural control variates for variance reduction in Bayesian inverse problems arXiv:2602.21357v1 Announce Type: new Abstract: Bayesian inference for inverse problems involves computing expectations under posterior distributions — e.g., posterior means, variances, or predictive quantities — typically via Monte Carlo (MC) estimation. When the quantity of interest varies significantly under the posterior, accurate estimates demand…
-
ConformalHDC: Uncertainty-Aware Hyperdimensional Computing with Application to Neural Decoding
ConformalHDC: Uncertainty-Aware Hyperdimensional Computing with Application to Neural Decoding arXiv:2602.21446v1 Announce Type: new Abstract: Hyperdimensional Computing (HDC) offers a computationally efficient paradigm for neuromorphic learning. Yet, it lacks rigorous uncertainty quantification, leading to open decision boundaries and, consequently, vulnerability to outliers, adversarial perturbations, and out-of-distribution inputs. To address these limitations, we introduce ConformalHDC, a unified…
-
Efficient Inference after Directionally Stable Adaptive Experiments
Efficient Inference after Directionally Stable Adaptive Experiments arXiv:2602.21478v1 Announce Type: new Abstract: We study inference on scalar-valued pathwise differentiable targets after adaptive data collection, such as a bandit algorithm. We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed target-agnostic stability conditions. Under directional stability, we show that estimators that…
-
Scaling Feature Engineering Pipelines with Feast and Ray
Scaling Feature Engineering Pipelines with Feast and Ray Utilizing feature stores like Feast and distributed compute frameworks like Ray in production machine learning systems The post Scaling Feature Engineering Pipelines with Feast and Ray appeared first on Towards Data Science. Kenneth Leung Go to original source
-
Breaking the Host Memory Bottleneck: How Peer Direct Transformed Gaudi’s Cloud Performance
Breaking the Host Memory Bottleneck: How Peer Direct Transformed Gaudi’s Cloud Performance Engineering RDMA-like performance over cloud host NICs using libfabric, DMA-BUF, and HCCL to restore distributed training scalability The post Breaking the Host Memory Bottleneck: How Peer Direct Transformed Gaudi’s Cloud Performance appeared first on Towards Data Science. Maria Piterberg Go to original source
-
Aliasing in Audio, Easily Explained: From Wagon Wheels to Waveforms
Aliasing in Audio, Easily Explained: From Wagon Wheels to Waveforms Understanding the foundational distortion of digital audio from first principles, with worked examples and visual intuition The post Aliasing in Audio, Easily Explained: From Wagon Wheels to Waveforms appeared first on Towards Data Science. Aman Agrawal Go to original source
-
How to Define the Modeling Scope of an Internal Credit Risk Model
How to Define the Modeling Scope of an Internal Credit Risk Model Dataset construction for Internal Ratings-Based (IRB) Probability of Default (PD) models The post How to Define the Modeling Scope of an Internal Credit Risk Model appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source
-
Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation
Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation arXiv:2602.20297v1 Announce Type: new Abstract: We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly…
-
Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,lambda}$ Targets
Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,lambda}$ Targets arXiv:2602.20555v1 Announce Type: new Abstract: The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard Transformers can…
-
Selecting Optimal Variable Order in Autoregressive Ising Models
Selecting Optimal Variable Order in Autoregressive Ising Models arXiv:2602.20394v1 Announce Type: new Abstract: Autoregressive models enable tractable sampling from learned probability distributions, but their performance critically depends on the variable ordering used in the factorization via complexities of the resulting conditional distributions. We propose to learn the Markov random field describing the underlying data, and…
-
Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness
Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness arXiv:2602.20585v1 Announce Type: new Abstract: Understanding minimal assumptions that enable learning and generalization is perhaps the central question of learning theory. Several celebrated results in statistical learning theory, such as the VC theorem and Littlestone’s characterization of online learnability, establish conditions on the hypothesis…
-
Amortized Bayesian inference for actigraph time sheet data from mobile devices
Amortized Bayesian inference for actigraph time sheet data from mobile devices arXiv:2602.20611v1 Announce Type: new Abstract: Mobile data technologies use “actigraphs” to furnish information on health variables as a function of a subject’s movement. The advent of wearable devices and related technologies has propelled the creation of health databases consisting of human movement data to…
-
Optimizing Token Generation in PyTorch Decoder Models
Optimizing Token Generation in PyTorch Decoder Models Hiding host-device synchronization via CUDA stream interleaving The post Optimizing Token Generation in PyTorch Decoder Models appeared first on Towards Data Science. Chaim Rand Go to original source
-
Decisioning at the Edge: Policy Matching at Scale
Decisioning at the Edge: Policy Matching at Scale Policy-to-Agency Optimization with PuLP The post Decisioning at the Edge: Policy Matching at Scale appeared first on Towards Data Science. Erika Gomes-Gonçalves Go to original source
-
Optimizing Deep Learning Models with SAM
Optimizing Deep Learning Models with SAM A deep dive into the Sharpness-Aware-Minimization (SAM) algorithm and how it improves the generalizability of modern deep learning models The post Optimizing Deep Learning Models with SAM appeared first on Towards Data Science. Anindya Dey Go to original source
-
AI Bots Formed a Cartel. No One Told Them To.
AI Bots Formed a Cartel. No One Told Them To. Inside the research that shows algorithmic price-fixing isn’t a bug in the code. It’s a feature of the math. The post AI Bots Formed a Cartel. No One Told Them To. appeared first on Towards Data Science. Kaushik Rajan Go to original source
-
Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function
Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function arXiv:2602.18573v1 Announce Type: new Abstract: Machine-generated probability predictions are essential in modern classification tasks such as image classification. A model is well calibrated when its predicted probabilities correspond to observed event frequencies. Despite the need for multicategory recalibration methods, existing…
-
Stochastic Gradient Variational Inference with Price’s Gradient Estimator from Bures-Wasserstein to Parameter Space
Stochastic Gradient Variational Inference with Price’s Gradient Estimator from Bures-Wasserstein to Parameter Space arXiv:2602.18718v1 Announce Type: new Abstract: For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space)…
-
Bounds and Identification of Joint Probabilities of Potential Outcomes and Observed Variables under Monotonicity Assumptions
Bounds and Identification of Joint Probabilities of Potential Outcomes and Observed Variables under Monotonicity Assumptions arXiv:2602.18762v1 Announce Type: new Abstract: Evaluating joint probabilities of potential outcomes and observed variables, and their linear combinations, is a fundamental challenge in causal inference. This paper addresses the bounding and identification of these probabilities in settings with discrete treatment…
-
Implicit Bias and Convergence of Matrix Stochastic Mirror Descent
Implicit Bias and Convergence of Matrix Stochastic Mirror Descent arXiv:2602.18997v1 Announce Type: new Abstract: We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove…
-
Federated Measurement of Demographic Disparities from Quantile Sketches
Federated Measurement of Demographic Disparities from Quantile Sketches arXiv:2602.18870v1 Announce Type: new Abstract: Many fairness goals are defined at a population level that misaligns with siloed data collection, which remains unsharable due to privacy regulations. Horizontal federated learning (FL) enables collaborative modeling across clients with aligned features without sharing raw data. We study federated auditing…
-
Is the AI and Data Job Market Dead?
Is the AI and Data Job Market Dead? What you should be doing in the current job market The post Is the AI and Data Job Market Dead? appeared first on Towards Data Science. Egor Howell Go to original source
-
PySpark for Pandas Users
PySpark for Pandas Users Common Pandas operations and their equivalents in PySpark The post PySpark for Pandas Users appeared first on Towards Data Science. Thomas Reid Go to original source
-
AI in Multiple GPUs: Gradient Accumulation & Data Parallelism
AI in Multiple GPUs: Gradient Accumulation & Data Parallelism Learn and implement gradient accum and data parallelism from scratch in PyTorch The post AI in Multiple GPUs: Gradient Accumulation & Data Parallelism appeared first on Towards Data Science. Lorenzo Cesconetto Go to original source
-
Build Effective Internal Tooling with Claude Code
Build Effective Internal Tooling with Claude Code Use Claude Code to quickly build completely personalized applications The post Build Effective Internal Tooling with Claude Code appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval
Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval arXiv:2602.17779v1 Announce Type: new Abstract: We consider the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models). The objective is to recover an unknown signal $boldsymbol{theta}^star in mathbb{R}^d$ (where $d gg 1$) from a loss function $hat{R}(boldsymbol{theta})$…
-
Interactive Learning of Single-Index Models via Stochastic Gradient Descent
Interactive Learning of Single-Index Models via Stochastic Gradient Descent arXiv:2602.17876v1 Announce Type: new Abstract: Stochastic gradient descent (SGD) is a cornerstone algorithm for high-dimensional optimization, renowned for its empirical successes. Recent theoretical advances have provided a deep understanding of how SGD enables feature learning in high-dimensional nonlinear models, most notably the textit{single-index model} with i.i.d.…
-
Drift Estimation for Stochastic Differential Equations with Denoising Diffusion Models
Drift Estimation for Stochastic Differential Equations with Denoising Diffusion Models arXiv:2602.17830v1 Announce Type: new Abstract: We study the estimation of time-homogeneous drift functions in multivariate stochastic differential equations with known diffusion coefficient, from multiple trajectories observed at high frequency over a fixed time horizon. We formulate drift estimation as a denoising problem conditional on previous…
-
Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget
Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget arXiv:2602.17894v1 Announce Type: new Abstract: Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical…
-
On the Generalization and Robustness in Conditional Value-at-Risk
On the Generalization and Robustness in Conditional Value-at-Risk arXiv:2602.18053v1 Announce Type: new Abstract: Conditional Value-at-Risk (CVaR) is a widely used risk-sensitive objective for learning under rare but high-impact losses, yet its statistical behavior under heavy-tailed data remains poorly understood. Unlike expectation-based risk, CVaR depends on an endogenous, data-dependent quantile, which couples tail averaging with threshold…
-
Weekly Entering & Transitioning – Thread 23 Feb, 2026 – 02 Mar, 2026
Weekly Entering & Transitioning – Thread 23 Feb, 2026 – 02 Mar, 2026 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…
-
How to not get discouraged while searching for a job?
How to not get discouraged while searching for a job? The market has not been forgiving, especially when it comes to interviews. I am not sure if anyone else has noticed, but companies seem to expect flawless interviews and coding rounds. I have faced a few rejections over the past couple of months, and it…
-
Data Catalog Tool – Sanity Check
Data Catalog Tool – Sanity Check submitted by /u/FirCoat [link] [comments] /u/FirCoat Go to original source