Category: aimldsaimlds
-
Radon–Wasserstein Gradient Flows for Interacting-Particle Sampling in High Dimensions
Radon–Wasserstein Gradient Flows for Interacting-Particle Sampling in High Dimensions arXiv:2602.05227v1 Announce Type: new Abstract: Gradient flows of the Kullback–Leibler (KL) divergence, such as the Fokker–Planck equation and Stein Variational Gradient Descent, evolve a distribution toward a target density known only up to a normalizing constant. We introduce new gradient flows of the KL divergence with…
-
Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach
Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach arXiv:2602.05340v1 Announce Type: new Abstract: We consider the sequential experimental design problem in the predict-then-optimize paradigm. In this paradigm, the outputs of the prediction model are used as coefficient vectors in a downstream linear optimization problem. Traditional sequential experimental design aims to control the input variables (features)…
-
Mechanistic Interpretability: Peeking Inside an LLM
Mechanistic Interpretability: Peeking Inside an LLM Are the human-like cognitive abilities of LLMs real or fake? How does information travel through the neural network? Is there hidden knowledge inside an LLM? The post Mechanistic Interpretability: Peeking Inside an LLM appeared first on Towards Data Science. Julian Mendel Go to original source
-
Why Is My Code So Slow? A Guide to Py-Spy Python Profiling
Why Is My Code So Slow? A Guide to Py-Spy Python Profiling Stop guessing and start diagnosing performance issues using Py-Spy The post Why Is My Code So Slow? A Guide to Py-Spy Python Profiling appeared first on Towards Data Science. Kenneth McCarthy Go to original source
-
The Rule Everyone Misses: How to Stop Confusing loc and iloc in Pandas
The Rule Everyone Misses: How to Stop Confusing loc and iloc in Pandas A simple mental model to remember when each one works (with examples that finally click). The post The Rule Everyone Misses: How to Stop Confusing loc and iloc in Pandas appeared first on Towards Data Science. Ibrahim Salami Go to original source
-
A Hitchhiker’s Guide to Poisson Gradient Estimation
A Hitchhiker’s Guide to Poisson Gradient Estimation arXiv:2602.03896v1 Announce Type: new Abstract: Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation and Gumbel-SoftMax (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical…
-
Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations
Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations arXiv:2602.03889v1 Announce Type: new Abstract: Finite mixture models are widely used for unsupervised learning, but maximum likelihood estimation via EM suffers from degeneracy as components collapse. We introduce transcendental regularization, a penalized likelihood framework with analytic barrier functions that prevent degeneracy while maintaining asymptotic efficiency. The…
-
Byzantine Machine Learning: MultiKrum and an optimal notion of robustness
Byzantine Machine Learning: MultiKrum and an optimal notion of robustness arXiv:2602.03899v1 Announce Type: new Abstract: Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule…
-
Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks
Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks arXiv:2602.03948v1 Announce Type: new Abstract: In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network.…
-
Learning Multi-type heterogeneous interacting particle systems
Learning Multi-type heterogeneous interacting particle systems arXiv:2602.03954v1 Announce Type: new Abstract: We propose a framework for the joint inference of network topology, multi-type interaction kernels, and latent type assignments in heterogeneous interacting particle systems from multi-trajectory data. This learning task is a challenging non-convex mixed-integer optimization problem, which we address through a novel three-stage approach.…
-
How to Work Effectively with Frontend and Backend Code
How to Work Effectively with Frontend and Backend Code Learn how to be an effective full-stack engineer with Claude Code The post How to Work Effectively with Frontend and Backend Code appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
AWS vs. Azure: A Deep Dive into Model Training – Part 2
AWS vs. Azure: A Deep Dive into Model Training – Part 2 This article covers how Azure ML’s persistent, workspace-centric compute resources differ from AWS SageMaker’s on-demand, job-specific approach. Additionally, we explored environment customization options, from Azure’s curated environments and custom environments to SageMaker’s three level of customizations. The post AWS vs. Azure: A Deep…
-
How to Build Your Own Custom LLM Memory Layer from Scratch
How to Build Your Own Custom LLM Memory Layer from Scratch Step-by-step guide to building autonomous memory retrieval systems The post How to Build Your Own Custom LLM Memory Layer from Scratch appeared first on Towards Data Science. Avishek Biswas Go to original source
-
Plan–Code–Execute: Designing Agents That Create Their Own Tools
Plan–Code–Execute: Designing Agents That Create Their Own Tools The case against pre-built tools in Agentic Architectures The post Plan–Code–Execute: Designing Agents That Create Their Own Tools appeared first on Towards Data Science. Partha Sarkar Go to original source
-
Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation
Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation arXiv:2602.02633v1 Announce Type: new Abstract: Often, constraints arise in deployment settings where even lightweight parameter updates e.g. parameter-efficient fine-tuning could induce model shift or tuning instability. We study test-time adaptation of foundation models for few-shot classification under a completely frozen-model regime, where additionally, no…
-
Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions
Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions arXiv:2602.02577v1 Announce Type: new Abstract: The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing work has demonstrated that KL divergence between multivariate Gaussian distributions follows a relaxed triangle inequality.…
-
Near-Universal Multiplicative Updates for Nonnegative Einsum Factorization
Near-Universal Multiplicative Updates for Nonnegative Einsum Factorization arXiv:2602.02759v1 Announce Type: new Abstract: Despite the ubiquity of multiway data across scientific domains, there are few user-friendly tools that fit tailored nonnegative tensor factorizations. Researchers may use gradient-based automatic differentiation (which often struggles in nonnegative settings), choose between a limited set of methods with mature implementations, or…
-
Training-Free Self-Correction for Multimodal Masked Diffusion Models
Training-Free Self-Correction for Multimodal Masked Diffusion Models arXiv:2602.02927v1 Announce Type: new Abstract: Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated tokens as immutable, which may lead to error accumulation when early mistakes cannot be revised. In this work,…
-
Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks
Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks arXiv:2602.02791v1 Announce Type: new Abstract: We study a supervised multiclass classification problem for diffusion processes, where each class is characterized by a distinct drift function and trajectories are observed at discrete times. Extending the one-dimensional multiclass framework of Denis et al. (2024) to multidimensional…
-
Routing in a Sparse Graph: a Distributed Q-Learning Approach
Routing in a Sparse Graph: a Distributed Q-Learning Approach Distributed agents need only decide one move ahead. The post Routing in a Sparse Graph: a Distributed Q-Learning Approach appeared first on Towards Data Science. Sébastien Gilbert Go to original source
-
YOLOv2 & YOLO9000 Paper Walkthrough: Better, Faster, Stronger
YOLOv2 & YOLO9000 Paper Walkthrough: Better, Faster, Stronger From YOLOv1 to YOLOv2: prior box, k-means, Darknet-19, passthrough layer, and more The post YOLOv2 & YOLO9000 Paper Walkthrough: Better, Faster, Stronger appeared first on Towards Data Science. Muhammad Ardi Go to original source
-
Creating a Data Pipeline to Monitor Local Crime Trends
Creating a Data Pipeline to Monitor Local Crime Trends A walkthough of creating an ETL pipeline to extract local crime data and visualize it in Metabase. The post Creating a Data Pipeline to Monitor Local Crime Trends appeared first on Towards Data Science. Jimin Kang Go to original source
-
The Proximity of the Inception Score as an Evaluation Criterion
The Proximity of the Inception Score as an Evaluation Criterion The neighborhood of synthetic data The post The Proximity of the Inception Score as an Evaluation Criterion appeared first on Towards Data Science. Giuseppe Pio Cannata Go to original source
-
Neuron Block Dynamics for XOR Classification with Zero-Margin
Neuron Block Dynamics for XOR Classification with Zero-Margin arXiv:2602.00172v1 Announce Type: new Abstract: The ability of neural networks to learn useful features through stochastic gradient descent (SGD) is a cornerstone of their success. Most theoretical analyses focus on regression or on classification tasks with a positive margin, where worst-case gradient bounds suffice. In contrast, we…
-
Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals
Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals arXiv:2602.00171v1 Announce Type: new Abstract: Multimodal learning combines information from multiple data modalities to improve predictive performance. However, modalities often contribute unequally and in a data dependent way, making it unclear which data modalities are genuinely informative and to what extent their contributions can be trusted. Quantifying modality…
-
Singular Bayesian Neural Networks
Singular Bayesian Neural Networks arXiv:2602.00387v1 Announce Type: new Abstract: Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{top}$ with $A in mathbb{R}^{m times r}$, $B in…
-
Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation
Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation arXiv:2602.00413v1 Announce Type: new Abstract: Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function, these approaches require extensive computational resources and may not generalize…
-
Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey
Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey arXiv:2602.00399v1 Announce Type: new Abstract: In the last decade, Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems. However, most RL algorithms rely on the Markov Decision Process assumption, which is violated in practical cyber-physical systems affected…
-
Silicon Darwinism: Why Scarcity Is the Source of True Intelligence
Silicon Darwinism: Why Scarcity Is the Source of True Intelligence We are confusing “size” with “smart.” The next leap in artificial intelligence will not come from a larger data center, but from a more constrained environment. The post Silicon Darwinism: Why Scarcity Is the Source of True Intelligence appeared first on Towards Data Science. Aakash…
-
Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation
Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation arXiv:2601.22367v1 Announce Type: new Abstract: Generalized Bayesian Inference (GBI) tempers a loss with a temperature $beta>0$ to mitigate overconfidence and improve robustness under model misspecification, but existing GBI methods typically rely on costly MCMC or SDE-based samplers and must be re-run for each new dataset…
-
Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models
Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models arXiv:2601.22336v1 Announce Type: new Abstract: Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label $Yin{0,1}$, an assumption often violated by LLM…
-
It’s all the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms
It’s all the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms arXiv:2601.22378v1 Announce Type: new Abstract: Maximum likelihood estimators (MLE) and control variate estimators (CVE) have been used in conjunction with known information across sketching algorithms and applications in machine learning. We prove that under certain conditions in an…
-
Simulation-based Bayesian inference with ameliorative learned summary statistics — Part I
Simulation-based Bayesian inference with ameliorative learned summary statistics — Part I arXiv:2601.22441v1 Announce Type: new Abstract: This paper, which is Part 1 of a two-part paper series, considers a simulation-based inference with learned summary statistics, in which such a learned summary statistic serves as an empirical-likelihood with ameliorative effects in the Bayesian setting, when the…
-
Corrected Samplers for Discrete Flow Models
Corrected Samplers for Discrete Flow Models arXiv:2601.22519v1 Announce Type: new Abstract: Discrete flow models (DFMs) have been proposed to learn the data distribution on a finite state space, offering a flexible framework as an alternative to discrete diffusion models. A line of recent work has studied samplers for discrete diffusion models, such as tau-leaping and…
-
Weekly Entering & Transitioning – Thread 02 Feb, 2026 – 09 Feb, 2026
Weekly Entering & Transitioning – Thread 02 Feb, 2026 – 09 Feb, 2026 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…
-
Am I drifting away from Data Science, or building useful foundations? (2 YOE working in a startup, no coding)
Am I drifting away from Data Science, or building useful foundations? (2 YOE working in a startup, no coding) I’m looking for some career perspective and would really appreciate advice from people working in or around data science. I’m currently not sure where exactly is my career heading and want to start a business eventually…
-
Brainstorming around the visualization of customer segment data
Brainstorming around the visualization of customer segment data submitted by /u/SingerEast1469 [link] [comments] /u/SingerEast1469 Go to original source
-
What separates data scientists who earn a good living (100k-200k) from those who earn 300k+ at FAANG?
What separates data scientists who earn a good living (100k-200k) from those who earn 300k+ at FAANG? Is it just stock options and vesting? Or is it just FAANG is a lot of work. Why do some data scientists deserve that much? I work at a Fortune 500 and the ceiling for IC data scientists…
-
Building “Auto-Analyst” — A data analytics AI agentic system
Building “Auto-Analyst” — A data analytics AI agentic system submitted by /u/phicreative1997 [link] [comments] /u/phicreative1997 Go to original source
-
Distributed Reinforcement Learning for Scalable High-Performance Policy Optimization
Distributed Reinforcement Learning for Scalable High-Performance Policy Optimization Leveraging massive parallelism, asynchronous updates, and multi-machine training to match and exceed human-level performance The post Distributed Reinforcement Learning for Scalable High-Performance Policy Optimization appeared first on Towards Data Science. Sam Black Go to original source
-
How to Apply Agentic Coding to Solve Problems
How to Apply Agentic Coding to Solve Problems Learn how to efficiently solve problems with coding agents The post How to Apply Agentic Coding to Solve Problems appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
How to Run Claude Code for Free with Local and Cloud Models from Ollama
How to Run Claude Code for Free with Local and Cloud Models from Ollama Ollama now offers Anthropic API compatibility The post How to Run Claude Code for Free with Local and Cloud Models from Ollama appeared first on Towards Data Science. Thomas Reid Go to original source
-
Creating an Etch A Sketch App Using Python and Turtle
Creating an Etch A Sketch App Using Python and Turtle A beginner-friendly Python tutorial The post Creating an Etch A Sketch App Using Python and Turtle appeared first on Towards Data Science. Mahnoor Javed Go to original source
-
Why Your Multi-Agent System is Failing: Escaping the 17x Error Trap of the “Bag of Agents”
Why Your Multi-Agent System is Failing: Escaping the 17x Error Trap of the “Bag of Agents” Hard-won lessons on how to scale agentic systems without scaling the chaos, including a taxonomy of core agent types. The post Why Your Multi-Agent System is Failing: Escaping the 17x Error Trap of the “Bag of Agents” appeared first…
-
On the Possibility of Small Networks for Physics-Informed Learning
On the Possibility of Small Networks for Physics-Informed Learning A new kind of hyperparameter study The post On the Possibility of Small Networks for Physics-Informed Learning appeared first on Towards Data Science. Conor Rowan Go to original source
-
Multi-Attribute Decision Matrices, Done Right
Multi-Attribute Decision Matrices, Done Right How to structure decisions, identify efficient options, and avoid misleading value metrics The post Multi-Attribute Decision Matrices, Done Right appeared first on Towards Data Science. Josiah DeValois Go to original source
-
Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators
Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators arXiv:2601.20888v1 Announce Type: new Abstract: We study sampling from posterior distributions in Bayesian linear inverse problems where $A$, the parameters to observables operator, is computationally expensive. In many applications, $A$ can be factored in a manner that facilitates the construction of a cost-effective approximation $tilde{A}$.…
-
Efficient Causal Structure Learning via Modular Subgraph Integration
Efficient Causal Structure Learning via Modular Subgraph Integration arXiv:2601.21014v1 Announce Type: new Abstract: Learning causal structures from observational data remains a fundamental yet computationally intensive task, particularly in high-dimensional settings where existing methods face challenges such as the super-exponential growth of the search space and increasing computational demands. To address this, we introduce VISTA (Voting-based…
-
A Diffusive Classification Loss for Learning Energy-based Generative Models
A Diffusive Classification Loss for Learning Energy-based Generative Models arXiv:2601.21025v1 Announce Type: new Abstract: Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-based models (EBMs), where the score is obtained from the negative input-gradient of the energy.…
-
Diffusion-based Annealed Boltzmann Generators : benefits, pitfalls and hopes
Diffusion-based Annealed Boltzmann Generators : benefits, pitfalls and hopes arXiv:2601.21026v1 Announce Type: new Abstract: Sampling configurations at thermodynamic equilibrium is a central challenge in statistical physics. Boltzmann Generators (BGs) tackle it by combining a generative model with a Monte Carlo (MC) correction step to obtain asymptotically unbiased samples from an unnormalized target. Most current BGs…
-
An efficient, accurate, and interpretable machine learning method for computing probability of failure
An efficient, accurate, and interpretable machine learning method for computing probability of failure arXiv:2601.21089v1 Announce Type: new Abstract: We introduce a novel machine learning method called the Penalized Profile Support Vector Machine based on the Gabriel edited set for the computation of the probability of failure for a complex system as determined by a threshold…
-
The Unbearable Lightness of Coding
The Unbearable Lightness of Coding Confessions of a vibe coder The post The Unbearable Lightness of Coding appeared first on Towards Data Science. Elena Jolkver Go to original source
-
Randomization Works in Experiments, Even Without Balance
Randomization Works in Experiments, Even Without Balance Randomization usually balances confounders in experiments, but what happens when it doesn’t? The post Randomization Works in Experiments, Even Without Balance appeared first on Towards Data Science. Jarom Hulet Go to original source
-
Deep Neural Networks as Iterated Function Systems and a Generalization Bound
Deep Neural Networks as Iterated Function Systems and a Generalization Bound arXiv:2601.19958v1 Announce Type: new Abstract: Deep neural networks (DNNs) achieve remarkable performance on a wide range of tasks, yet their mathematical analysis remains fragmented: stability and generalization are typically studied in disparate frameworks and on a case-by-case basis. Architecturally, DNNs rely on the recursive…
-
Minimax Rates for Hyperbolic Hierarchical Learning
Minimax Rates for Hyperbolic Hierarchical Learning arXiv:2601.20047v1 Announce Type: new Abstract: We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse,…
-
Efficient Evaluation of LLM Performance with Statistical Guarantees
Efficient Evaluation of LLM Performance with Statistical Guarantees arXiv:2601.20251v1 Announce Type: new Abstract: Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized…
-
Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging
Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging arXiv:2601.20269v1 Announce Type: new Abstract: Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpopulations, raising critical concerns regarding algorithmic bias. Fairness auditing addresses these risks through two primary functions: certification, which verifies adherence to…
-
Physics-informed Blind Reconstruction of Dense Fields from Sparse Measurements using Neural Networks with a Differentiable Simulator
Physics-informed Blind Reconstruction of Dense Fields from Sparse Measurements using Neural Networks with a Differentiable Simulator arXiv:2601.20496v1 Announce Type: new Abstract: Generating dense physical fields from sparse measurements is a fundamental question in sampling, signal processing, and many other applications. State-of-the-art methods either use spatial statistics or rely on examples of dense fields in the…
-
Federated Learning, Part 2: Implementation with the Flower Framework 🌼
Federated Learning, Part 2: Implementation with the Flower Framework 🌼 Implementing cross-silo federated learning step by step The post Federated Learning, Part 2: Implementation with the Flower Framework 🌼 appeared first on Towards Data Science. Parul Pandey Go to original source
-
Machine Learning in Production? What This Really Means
Machine Learning in Production? What This Really Means From notebooks to real-world systems The post Machine Learning in Production? What This Really Means appeared first on Towards Data Science. Sabrine Bendimerad Go to original source
-
I Ditched My Mouse: How I Control My Computer With Hand Gestures (In 60 Lines of Python)
I Ditched My Mouse: How I Control My Computer With Hand Gestures (In 60 Lines of Python) A step-by-step guide to building a “Minority Report”-style interface using OpenCV and MediaPipe The post I Ditched My Mouse: How I Control My Computer With Hand Gestures (In 60 Lines of Python) appeared first on Towards Data Science.…
-
Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning
Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning Estimating neighborhood-level pedestrian risk from real-world incident data The post Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning appeared first on Towards Data Science. Aneesh Patil Go to original source
-
Statistical Inference for Explainable Boosting Machines
Statistical Inference for Explainable Boosting Machines arXiv:2601.18857v1 Announce Type: new Abstract: Explainable boosting machines (EBMs) are popular “glass-box” models that learn a set of univariate functions using boosting trees. These achieve explainability through visualizations of each feature’s effect. However, unlike linear model coefficients, uncertainty quantification for the learned univariate functions requires computationally intensive bootstrapping, making…
-
Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration
Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration arXiv:2601.18907v1 Announce Type: new Abstract: Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small can lead to slow progress. We propose implicit variants…
-
Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget
Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget arXiv:2601.18950v1 Announce Type: new Abstract: Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One…
-
Convergence of Muon with Newton-Schulz
Convergence of Muon with Newton-Schulz arXiv:2601.19156v1 Announce Type: new Abstract: We analyze Muon as originally proposed and used in practice — using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a…
-
Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making
Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making arXiv:2601.19186v1 Announce Type: new Abstract: Fairness is a central pillar of trustworthy machine learning, especially in domains where accuracy- or profit-driven optimization is insufficient. While most fairness research focuses on supervised learning, fairness in policy learning remains less explored. Because policy learning is…
-
Going Beyond the Context Window: Recursive Language Models in Action
Going Beyond the Context Window: Recursive Language Models in Action Explore a practical approach to analysing massive datasets with LLMs The post Going Beyond the Context Window: Recursive Language Models in Action appeared first on Towards Data Science. Mariya Mansurova Go to original source
-
Data Science as Engineering: Foundations, Education, and Professional Identity
Data Science as Engineering: Foundations, Education, and Professional Identity Recognize data science as an engineering practice and structure education accordingly. The post Data Science as Engineering: Foundations, Education, and Professional Identity appeared first on Towards Data Science. Tom Narock Go to original source
-
From Connections to Meaning: Why Heterogeneous Graph Transformers (HGT) Change Demand Forecasting
From Connections to Meaning: Why Heterogeneous Graph Transformers (HGT) Change Demand Forecasting How relationship-aware graphs turn connected forecasts into operational insight The post From Connections to Meaning: Why Heterogeneous Graph Transformers (HGT) Change Demand Forecasting appeared first on Towards Data Science. Partha Sarkar Go to original source
-
Layered Architecture for Building Readable, Robust, and Extensible Apps
Layered Architecture for Building Readable, Robust, and Extensible Apps If adding a feature feels like open-heart surgery on your codebase, the problem isn’t bugs, it’s structure. This article shows how better architecture reduces risk, speeds up change, and keeps teams moving. The post Layered Architecture for Building Readable, Robust, and Extensible Apps appeared first on…
-
Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding
Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding arXiv:2601.17160v1 Announce Type: new Abstract: We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full…
-
Error Analysis of Bayesian Inverse Problems with Generative Priors
Error Analysis of Bayesian Inverse Problems with Generative Priors arXiv:2601.17374v1 Announce Type: new Abstract: Data-driven methods for the solution of inverse problems have become widely popular in recent years thanks to the rise of machine learning techniques. A popular approach concerns the training of a generative model on additional data to learn a bespoke prior…
-
“Rebuilding” Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training
“Rebuilding” Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training arXiv:2601.17510v1 Announce Type: new Abstract: This article presents the full, original record of the 2024 Joint Statistical Meetings (JSM) town hall, “Statistics in the Age of AI,” which convened leading statisticians to discuss how the field is evolving in…
-
Boosting methods for interval-censored data with regression and classification
Boosting methods for interval-censored data with regression and classification arXiv:2601.17973v1 Announce Type: new Abstract: Boosting has garnered significant interest across both machine learning and statistical communities. Traditional boosting algorithms, designed for fully observed random samples, often struggle with real-world problems, particularly with interval-censored data. This type of data is common in survival analysis and time-to-event…
-
A Cherry-Picking Approach to Large Load Shaping for More Effective Carbon Reduction
A Cherry-Picking Approach to Large Load Shaping for More Effective Carbon Reduction arXiv:2601.17990v1 Announce Type: new Abstract: Shaping multi-megawatt loads, such as data centers, impacts generator dispatch on the electric grid, which in turn affects system CO2 emissions and energy cost. Substantiating the effectiveness of prevalent load shaping strategies, such as those based on grid-level…
-
How Cursor Actually Indexes Your Codebase
How Cursor Actually Indexes Your Codebase Exploring the RAG pipeline in Cursor that powers code indexing and retrieval for coding agents The post How Cursor Actually Indexes Your Codebase appeared first on Towards Data Science. Kenneth Leung Go to original source
-
Ray: Distributed Computing For All, Part 2
Ray: Distributed Computing For All, Part 2 Deploying and running Python code on cloud-based clusters The post Ray: Distributed Computing For All, Part 2 appeared first on Towards Data Science. Thomas Reid Go to original source
-
How Convolutional Neural Networks Learn Musical Similarity
How Convolutional Neural Networks Learn Musical Similarity Learning audio embeddings with contrastive learning and deploying them in a real music recommendation app The post How Convolutional Neural Networks Learn Musical Similarity appeared first on Towards Data Science. Luke Stuckey Go to original source
-
Causal ML for the Aspiring Data Scientist
Causal ML for the Aspiring Data Scientist An accessible introduction to causal inference and ML The post Causal ML for the Aspiring Data Scientist appeared first on Towards Data Science. Ross Lauterbach Go to original source
-
Distributional Computational Graphs: Error Bounds
Distributional Computational Graphs: Error Bounds arXiv:2601.16250v1 Announce Type: new Abstract: We study a general framework of distributional computational graphs: computational graphs whose inputs are probability distributions rather than point values. We analyze the discretization error that arises when these graphs are evaluated using finite approximations of continuous probability distributions. Such an approximation might be the…
-
Perfect Clustering for Sparse Directed Stochastic Block Models
Perfect Clustering for Sparse Directed Stochastic Block Models arXiv:2601.16427v1 Announce Type: new Abstract: Exact recovery in stochastic block models (SBMs) is well understood in undirected settings, but remains considerably less developed for directed and sparse networks, particularly when the number of communities diverges. Spectral methods for directed SBMs often lack stability in asymmetric, low-degree regimes,…
-
Efficient Learning of Stationary Diffusions with Stein-type Discrepancies
Efficient Learning of Stationary Diffusions with Stein-type Discrepancies arXiv:2601.16597v1 Announce Type: new Abstract: Learning a stationary diffusion amounts to estimating the parameters of a stochastic differential equation whose stationary distribution matches a target distribution. We build on the recently introduced kernel deviation from stationarity (KDS), which enforces stationarity by evaluating expectations of the diffusion’s generator…
-
Towards Latent Diffusion Suitable For Text
Towards Latent Diffusion Suitable For Text arXiv:2601.16220v1 Announce Type: cross Abstract: Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of continuous diffusion models to discrete state spaces. NFDM learns a multivariate forward…
-
Long-Term Probabilistic Forecast of Vegetation Conditions Using Climate Attributes in the Four Corners Region
Long-Term Probabilistic Forecast of Vegetation Conditions Using Climate Attributes in the Four Corners Region arXiv:2601.16347v1 Announce Type: cross Abstract: Weather conditions can drastically alter the state of crops and rangelands, and in turn, impact the incomes and food security of individuals worldwide. Satellite-based remote sensing offers an effective way to monitor vegetation and climate variables…
-
Weekly Entering & Transitioning – Thread 26 Jan, 2026 – 02 Feb, 2026
Weekly Entering & Transitioning – Thread 26 Jan, 2026 – 02 Feb, 2026 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…
-
SAM 3 vs. Specialist Models — A Performance Benchmark
SAM 3 vs. Specialist Models — A Performance Benchmark Why specialized models still hold the 30x speed advantage in production environments The post SAM 3 vs. Specialist Models — A Performance Benchmark appeared first on Towards Data Science. Pushpak Bhoge Go to original source
-
Azure ML vs. AWS SageMaker: A Deep Dive into Model Training — Part 1
Azure ML vs. AWS SageMaker: A Deep Dive into Model Training — Part 1 Compare Azure ML and AWS SageMaker for scalable model training, focusing on project setup, permission management, and data storage patterns, to align platform choices with existing cloud ecosystem and preferred MLOps workflows The post Azure ML vs. AWS SageMaker: A Deep…
-
How to Build a Neural Machine Translation System for a Low-Resource Language
How to Build a Neural Machine Translation System for a Low-Resource Language An introduction to neural machine translation The post How to Build a Neural Machine Translation System for a Low-Resource Language appeared first on Towards Data Science. Kaixuan Chen Go to original source
-
Air for Tomorrow: Mapping the Digital Air-Quality Landscape, from Repositories and Data Types to Starter Code
Air for Tomorrow: Mapping the Digital Air-Quality Landscape, from Repositories and Data Types to Starter Code Understand air quality: access the available data, interpret data types, and execute starter codes The post Air for Tomorrow: Mapping the Digital Air-Quality Landscape, from Repositories and Data Types to Starter Code appeared first on Towards Data Science. Prithviraj…
-
Optimizing Data Transfer in Distributed AI/ML Training Workloads
Optimizing Data Transfer in Distributed AI/ML Training Workloads A deep dive on data transfer bottlenecks, their identification, and their resolution with the help of NVIDIA Nsight™ Systems – part 3 The post Optimizing Data Transfer in Distributed AI/ML Training Workloads appeared first on Towards Data Science. Chaim Rand Go to original source
-
Achieving 5x Agentic Coding Performance with Few-Shot Prompting
Achieving 5x Agentic Coding Performance with Few-Shot Prompting Learn to leverage few-shot prompting to increase your LLMs performance The post Achieving 5x Agentic Coding Performance with Few-Shot Prompting appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found
Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found How prompt engineering has evolved, examined scientifically; and implications for the future of conversational AI tools The post Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research…
-
From Transactions to Trends: Predict When a Customer Is About to Stop Buying
From Transactions to Trends: Predict When a Customer Is About to Stop Buying Customer churn is usually a gradual process, not a sudden event. In this post, we analyze monthly transaction trends and convert regression slopes into degrees to clearly identify declining purchase behavior. A small negative slope today can prevent a big revenue loss…
-
Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation
Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation arXiv:2601.15360v1 Announce Type: new Abstract: Estimating Heterogeneous Treatment Effects (HTE) in industrial applications such as AdTech and healthcare presents a dual challenge: extreme class imbalance and heavy-tailed outcome distributions. While the X-Learner framework effectively addresses imbalance through cross-imputation, we demonstrate that it…