Category: aimldsaimlds

GRAND: Graph Release with Assured Node Differential Privacy

GRAND: Graph Release with Assured Node Differential Privacy arXiv:2507.00402v1 Announce Type: new Abstract: Differential privacy is a well-established framework for safeguarding sensitive information in data. While extensively applied across various domains, its application to network data — particularly at the node level — remains underexplored. Existing methods for node-level privacy either focus exclusively on query-based…

July 2, 2025
Forward Reverse Kernel Regression for the Schr”{o}dinger bridge problem

Forward Reverse Kernel Regression for the Schr”{o}dinger bridge problem arXiv:2507.00640v1 Announce Type: new Abstract: In this paper, we study the Schr”odinger Bridge Problem (SBP), which is central to entropic optimal transport. For general reference processes and begin–endpoint distributions, we propose a forward-reverse iterative Monte Carlo procedure to approximate the Schr”odinger potentials in a nonparametric way.…

July 2, 2025
An in depth look at the Procrustes-Wasserstein distance: properties and barycenters

An in depth look at the Procrustes-Wasserstein distance: properties and barycenters arXiv:2507.00894v1 Announce Type: new Abstract: Due to its invariance to rigid transformations such as rotations and reflections, Procrustes-Wasserstein (PW) was introduced in the literature as an optimal transport (OT) distance, alternative to Wasserstein and more suited to tasks such as the alignment and comparison…

July 2, 2025
How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 From architectural design to food security. The post How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 appeared first on Towards Data Science. Marco Hening Tallarico Go to…

July 2, 2025
STOP Building Useless ML Projects – What Actually Works

STOP Building Useless ML Projects – What Actually Works How to find machine learning projects that will get you hired. The post STOP Building Useless ML Projects – What Actually Works appeared first on Towards Data Science. Egor Howell Go to original source

July 2, 2025
An Introduction to Remote Model Context Protocol Servers

An Introduction to Remote Model Context Protocol Servers Writing, testing and using them. The post An Introduction to Remote Model Context Protocol Servers appeared first on Towards Data Science. Thomas Reid Go to original source

July 2, 2025
Implementing IBCS rules in Power BI

Implementing IBCS rules in Power BI Is there a way to use the out-of-the-box features of Power BI to be IBCS compliant? The post Implementing IBCS rules in Power BI appeared first on Towards Data Science. Salvatore Cagliari Go to original source

July 2, 2025
Revisiting Benchmarking of Tabular Reinforcement Learning Methods

Revisiting Benchmarking of Tabular Reinforcement Learning Methods Introducing a modular framework and improving model performance. The post Revisiting Benchmarking of Tabular Reinforcement Learning Methods appeared first on Towards Data Science. Oliver S Go to original source

July 2, 2025
Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

Strategic A/B testing via Maximum Probability-driven Two-armed Bandit arXiv:2506.22536v1 Announce Type: new Abstract: Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their…

July 1, 2025
Adjoint Schr”odinger Bridge Sampler

Adjoint Schr”odinger Bridge Sampler arXiv:2506.22565v1 Announce Type: new Abstract: Computational methods for learning to sample from the Boltzmann distribution — where the target distribution is known only up to an unnormalized energy function — have advanced significantly recently. Due to the lack of explicit target samples, however, prior diffusion-based methods, known as diffusion samplers, often…

July 1, 2025
Bayesian Invariance Modeling of Multi-Environment Data

Bayesian Invariance Modeling of Multi-Environment Data arXiv:2506.22675v1 Announce Type: new Abstract: Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features – those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this…

July 1, 2025
CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation arXiv:2506.22963v1 Announce Type: new Abstract: Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on…

July 1, 2025
AICO: Feature Significance Tests for Supervised Learning

AICO: Feature Significance Tests for Supervised Learning arXiv:2506.23396v1 Announce Type: new Abstract: The opacity of many supervised learning algorithms remains a key challenge, hindering scientific discovery and limiting broader deployment — particularly in high-stakes domains. This paper develops model- and distribution-agnostic significance tests to assess the influence of input features in any regression or classification…

July 1, 2025
Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not!

Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! An explanation of the causal assumption implicit in prescriptive modeling and how to satisfy it. The post Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not! appeared first on Towards Data Science. Jarom Hulet Go to original source

July 1, 2025
A Gentle Introduction to Backtracking

A Gentle Introduction to Backtracking Conceptual overview and hands-on examples The post A Gentle Introduction to Backtracking appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

July 1, 2025
Lessons Learned After 6.5 Years Of Machine Learning

Lessons Learned After 6.5 Years Of Machine Learning Deep work, trends, data, and research The post Lessons Learned After 6.5 Years Of Machine Learning appeared first on Towards Data Science. Pascal Janetzky Go to original source

July 1, 2025
From Pixels to Plots

From Pixels to Plots How I built an AI-powered prototype to turn images into insights The post From Pixels to Plots appeared first on Towards Data Science. Jens Winkelmann Go to original source

July 1, 2025
Become a Better Data Scientist with These Prompt Engineering Tips and Tricks

Become a Better Data Scientist with These Prompt Engineering Tips and Tricks Part 1: prompt engineering for planning, cleaning, and EDA The post Become a Better Data Scientist with These Prompt Engineering Tips and Tricks appeared first on Towards Data Science. Sara Nobrega Go to original source

July 1, 2025
Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19

Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19 arXiv:2506.21739v1 Announce Type: new Abstract: Authors Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, and Tzu-Hsuan Liu use the Finite Impulse Response (FIR) linear system filtering method to track and predict the number of people infected and recovered from COVID-19, in a…

June 30, 2025
Critically-Damped Higher-Order Langevin Dynamics

Critically-Damped Higher-Order Langevin Dynamics arXiv:2506.21741v1 Announce Type: new Abstract: Denoising Diffusion Probabilistic Models represent an entirely new class of generative AI methods that have yet to be fully explored. Critical damping has been successfully introduced in Critically-Damped Langevin Dynamics (CLD) and Critically-Damped Third-Order Langevin Dynamics (TOLD++), but has not yet been applied to dynamics of…

June 30, 2025
TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics

TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics arXiv:2506.21757v1 Announce Type: new Abstract: Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is…

June 30, 2025
Thompson Sampling in Function Spaces via Neural Operators

Thompson Sampling in Function Spaces via Neural Operators arXiv:2506.21894v1 Announce Type: new Abstract: We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator’s output. We assume that functional evaluations are inexpensive, while queries to the operator (such as running a high-fidelity…

June 30, 2025
Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction

Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction arXiv:2506.21802v1 Announce Type: new Abstract: Machine learning (ML) models always make a prediction, even when they are likely to be wrong. This causes problems in practical applications, as we do not know if we should trust a prediction. ML with reject option addresses this issue…

June 30, 2025
Weekly Entering & Transitioning – Thread 30 Jun, 2025 – 07 Jul, 2025

Weekly Entering & Transitioning – Thread 30 Jun, 2025 – 07 Jul, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

June 30, 2025
ICs who pivoted: did you go engineering or management?

ICs who pivoted: did you go engineering or management? Hitting that point where I feel like I need to pick a lane. Curious what others did. Did you double down on technical stuff (data engineering/MLE/SWE), switched to the product side, or did you move into people management? submitted by /u/ergodym [link] [comments] /u/ergodym Go to…

June 30, 2025
Unpopular Opinion: These are the most useless posters on LinkedIn

Unpopular Opinion: These are the most useless posters on LinkedIn LinkedIn influencers love to treat the two roles as different species. In most enterprises, especially in mid to small orgs, these roles are largely overlapping. submitted by /u/OverratedDataScience [link] [comments] /u/OverratedDataScience Go to original source

June 30, 2025
How’s the job market for Bayesian statistics?

How’s the job market for Bayesian statistics? I’m a data scientist with 1 YOE. mostly worked on credit scoring models, sql, and Power BI. Lately, I’ve been thinking of going deeper into bayesian statistics and I’m currently going through the statistical rethinking book. But I’m wondering. is it worth focusing heavily on bayesian stats? Or…

June 30, 2025
Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps?

Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps? One thing I’ve noticed recently is that increasingly, a lot of AI/ML roles seem to be focused on ways to integrate LLMs to build web apps that automate some kind of task, e.g. chatbot with…

June 30, 2025
A Developer’s Guide to Building Scalable AI: Workflows vs Agents

A Developer’s Guide to Building Scalable AI: Workflows vs Agents A practical guide to choosing between AI agents and workflows for production systems, covering the hidden costs, architectural trade-offs, and decision framework that can save you thousands in deployment mistakes. Includes real-world examples and a scoring system to determine which approach fits your specific use…

June 28, 2025
The final solution of the Hitchhiker’s problem #5

The final solution of the Hitchhiker’s problem #5 arXiv:2506.20672v1 Announce Type: new Abstract: A recent survey, nicknamed “Hitchhiker’s Guide”, J.J. Arias-Garc{i}a, R. Mesiar, and B. De Baets, A hitchhiker’s guide to quasi-copulas, Fuzzy Sets and Systems 393 (2020) 1-28, has raised the rating of quasi-copula problems in the dependence modeling community in spite of the…

June 27, 2025
Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon arXiv:2506.20779v1 Announce Type: new Abstract: We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs — a problem well motivated by the minima stability and…

June 27, 2025
Active Learning for Manifold Gaussian Process Regression

Active Learning for Manifold Gaussian Process Regression arXiv:2506.20928v1 Announce Type: new Abstract: This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the…

June 27, 2025
Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics

Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics arXiv:2506.20935v1 Announce Type: new Abstract: Forecasting geopolitical conflict from data sources like the Global Database of Events, Language, and Tone (GDELT) is a critical challenge for national security. The inherent sparsity, burstiness,…

June 27, 2025
Lower Bounds on the Size of Markov Equivalence Classes

Lower Bounds on the Size of Markov Equivalence Classes arXiv:2506.20933v1 Announce Type: new Abstract: Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned about the underlying causal graph from purely…

June 27, 2025
A Caching Strategy for Identifying Bottlenecks on the Data Input Pipeline

A Caching Strategy for Identifying Bottlenecks on the Data Input Pipeline PyTorch model performance analysis and optimization — Part 8 The post A Caching Strategy for Identifying Bottlenecks on the Data Input Pipeline appeared first on Towards Data Science. Chaim Rand Go to original source

June 27, 2025
Pipelining AI/ML Training Workloads with CUDA Streams

Pipelining AI/ML Training Workloads with CUDA Streams PyTorch Model Performance Analysis and Optimization — Part 9 The post Pipelining AI/ML Training Workloads with CUDA Streams appeared first on Towards Data Science. Chaim Rand Go to original source

June 27, 2025
Hitchhiker’s Guide to RAG with ChatGPT API and LangChain

Hitchhiker’s Guide to RAG with ChatGPT API and LangChain Build a simple Python RAG pipeline using your local files as context The post Hitchhiker’s Guide to RAG with ChatGPT API and LangChain appeared first on Towards Data Science. Maria Mouschoutzi Go to original source

June 27, 2025
Data Science: From School to Work, Part V

Data Science: From School to Work, Part V How to profile your Python project The post Data Science: From School to Work, Part V appeared first on Towards Data Science. Vincent Margot Go to original source

June 27, 2025
The Mythical Pivot Point from Buy to Build for Data Platforms

The Mythical Pivot Point from Buy to Build for Data Platforms For companies with data-intensive architectures, there often comes a pivotal point where building in-house data platforms makes more sense than buying off-the-shelf solutions The post The Mythical Pivot Point from Buy to Build for Data Platforms appeared first on Towards Data Science. Ming Gao…

June 27, 2025
Data-Driven Dynamic Factor Modeling via Manifold Learning

Data-Driven Dynamic Factor Modeling via Manifold Learning arXiv:2506.19945v1 Announce Type: new Abstract: We propose a data-driven dynamic factor framework where a response variable depends on a high-dimensional set of covariates, without imposing any parametric model on the joint dynamics. Leveraging Anisotropic Diffusion Maps, a nonlinear manifold learning technique introduced by Singer and Coifman, our framework…

June 26, 2025
A Principled Path to Fitted Distributional Evaluation

A Principled Path to Fitted Distributional Evaluation arXiv:2506.20048v1 Announce Type: new Abstract: In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation — developed for expectation-based reinforcement learning — to…

June 26, 2025
Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives arXiv:2506.20114v1 Announce Type: new Abstract: Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an…

June 26, 2025
Valid Selection among Conformal Sets

Valid Selection among Conformal Sets arXiv:2506.20173v1 Announce Type: new Abstract: Conformal prediction offers a distribution-free framework for constructing prediction sets with coverage guarantees. In practice, multiple valid conformal prediction sets may be available, arising from different models or methodologies. However, selecting the most desirable set, such as the smallest, can invalidate the coverage guarantees. To…

June 26, 2025
POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes

POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes arXiv:2506.20406v1 Announce Type: new Abstract: Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on…

June 26, 2025
Use OpenAI Whisper for Automated Transcriptions

Use OpenAI Whisper for Automated Transcriptions Streamline your computer interactions using OpenAI’s Whisper model The post Use OpenAI Whisper for Automated Transcriptions appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

June 26, 2025
Economic Cycle Synchronization with Dynamic Time Warping

Economic Cycle Synchronization with Dynamic Time Warping The case of the Eurozone The post Economic Cycle Synchronization with Dynamic Time Warping appeared first on Towards Data Science. Moritz Pfeifer Go to original source

June 26, 2025
How to Train a Chatbot Using RAG and Custom Data

How to Train a Chatbot Using RAG and Custom Data Retrieval-Augmented Generation made easy with Llama The post How to Train a Chatbot Using RAG and Custom Data appeared first on Towards Data Science. Haden Pelletier Go to original source

June 26, 2025
Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.”

Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.” Companies pursuing incremental productivity gains risk being displaced by AI-native competitors building entirely new business models The post Stop Chasing “Efficiency AI.” The Real Value Is in “Opportunity AI.” appeared first on Towards Data Science. Shreshth Sharma Go to original source

June 26, 2025
Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions

Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions arXiv:2506.19010v1 Announce Type: new Abstract: Causal decomposition analysis aims to assess the effect of modifying risk factors on reducing social disparities in outcomes. Recently, this analysis has incorporated individual characteristics when modifying risk factors by utilizing optimal treatment regimes (OTRs). Since the…

June 25, 2025
When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets arXiv:2506.19031v1 Announce Type: new Abstract: While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold.…

June 25, 2025
Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality

Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality arXiv:2506.19144v1 Announce Type: new Abstract: This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our…

June 25, 2025
Rare dense solutions clusters in asymmetric binary perceptrons — local entropy via fully lifted RDT

Rare dense solutions clusters in asymmetric binary perceptrons — local entropy via fully lifted RDT arXiv:2506.19276v1 Announce Type: new Abstract: We study classical asymmetric binary perceptron (ABP) and associated emph{local entropy} (LE) as potential source of its algorithmic hardness. Isolation of emph{typical} ABP solutions in SAT phase seemingly suggests a universal algorithmic hardness. Paradoxically, efficient…

June 25, 2025
Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks

Near-optimal estimates for the $ell^p$-Lipschitz constants of deep random ReLU neural networks arXiv:2506.19695v1 Announce Type: new Abstract: This paper studies the $ell^p$-Lipschitz constants of ReLU neural networks $Phi: mathbb{R}^d to mathbb{R}$ with random parameters for $p in [1,infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn…

June 25, 2025
Data Has No Moat!

Data Has No Moat! Only if you ignore data quality The post Data Has No Moat! appeared first on Towards Data Science. Fabiana Clemente Go to original source

June 25, 2025
Agentic AI: Implementing Long-Term Memory

Agentic AI: Implementing Long-Term Memory The problem and current solutions The post Agentic AI: Implementing Long-Term Memory appeared first on Towards Data Science. Ida Silfverskiöld Go to original source

June 25, 2025
Why Your Next LLM Might Not Have A Tokenizer

Why Your Next LLM Might Not Have A Tokenizer The Tokenizer Has Been a Necessary Evil, but This Radical Approach Shows That It Might Not Be Necessary Anymore. The post Why Your Next LLM Might Not Have A Tokenizer appeared first on Towards Data Science. Moulik Gupta Go to original source

June 25, 2025
Build Multi-Agent Apps with OpenAI’s Agent SDK

Build Multi-Agent Apps with OpenAI’s Agent SDK Creating multi-agent apps is simple with this open-source SDK, and it can be used with any OpenAI-compatible LLM The post Build Multi-Agent Apps with OpenAI’s Agent SDK appeared first on Towards Data Science. Alan Jones Go to original source

June 25, 2025
Coupled Entropy: A Goldilocks Generalization?

Coupled Entropy: A Goldilocks Generalization? arXiv:2506.17229v1 Announce Type: new Abstract: Nonextensive Statistical Mechanics (NSM) has developed into a powerful toolset for modeling and analyzing complex systems. Despite its many successes, a puzzle arose early in its development. The constraints on the Tsallis entropy are in the form of an escort distribution with elements proportional to…

June 24, 2025
Differentiable neural network representation of multi-well, locally-convex potentials

Differentiable neural network representation of multi-well, locally-convex potentials arXiv:2506.17242v1 Announce Type: new Abstract: Multi-well potentials are ubiquitous in science, modeling phenomena such as phase transitions, dynamic instabilities, and multimodal behavior across physics, chemistry, and biology. In contrast to non-smooth minimum-of-mixture representations, we propose a differentiable and convex formulation based on a log-sum-exponential (LSE) mixture of…

June 24, 2025
Gaussian Processes and Reproducing Kernels: Connections and Equivalences

Gaussian Processes and Reproducing Kernels: Connections and Equivalences arXiv:2506.17366v1 Announce Type: new Abstract: This monograph studies the relations between two approaches using positive definite kernels: probabilistic methods using Gaussian processes, and non-probabilistic methods using reproducing kernel Hilbert spaces (RKHS). They are widely studied and used in machine learning, statistics, and numerical analysis. Connections and equivalences…

June 24, 2025
Scalable Machine Learning Algorithms using Path Signatures

Scalable Machine Learning Algorithms using Path Signatures arXiv:2506.17634v1 Announce Type: new Abstract: The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures – iterated integrals that provide faithful, hierarchical representations of paths – offering a principled and universal feature map for sequential and structured data. Rooted in rough path…

June 24, 2025
Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes

Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes arXiv:2506.17764v1 Announce Type: new Abstract: Band-limited functions are fundamental objects that are widely used in systems theory and signal processing. In this paper we refine a recent nonparametric, nonasymptotic method for constructing simultaneous confidence regions for band-limited functions from noisy input-output…

June 24, 2025
Reinforcement Learning from Human Feedback, Explained Simply

Reinforcement Learning from Human Feedback, Explained Simply The one technique that made ChatGPT so smart The post Reinforcement Learning from Human Feedback, Explained Simply appeared first on Towards Data Science. Vyacheslav Efimov Go to original source

June 24, 2025
Programming, Not Prompting: A Hands-On Guide to DSPy

Programming, Not Prompting: A Hands-On Guide to DSPy A practical deep dive into declarative AI programming The post Programming, Not Prompting: A Hands-On Guide to DSPy appeared first on Towards Data Science. Mariya Mansurova Go to original source

June 24, 2025
Building A Modern Dashboard with Python and Taipy

Building A Modern Dashboard with Python and Taipy A guide to building a front-end data application. The post Building A Modern Dashboard with Python and Taipy appeared first on Towards Data Science. Thomas Reid Go to original source

June 24, 2025
Building AI-Powered Low-Code Workflows with n8n

Building AI-Powered Low-Code Workflows with n8n Three powerful workflows that you can apply to your personal life or business today The post Building AI-Powered Low-Code Workflows with n8n appeared first on Towards Data Science. ALESSANDRA COSTA Go to original source

June 24, 2025
From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems

From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems arXiv:2506.15906v1 Announce Type: new Abstract: Operator learning offers a powerful paradigm for solving parametric partial differential equations (PDEs), but scaling probabilistic neural operators such as the recently proposed Gaussian Processes Operators (GPOs) to high-dimensional, data-intensive regimes remains a significant challenge. In this…

June 23, 2025
Sampling conditioned diffusions via Pathspace Projected Monte Carlo

Sampling conditioned diffusions via Pathspace Projected Monte Carlo arXiv:2506.15743v1 Announce Type: new Abstract: We present an algorithm to sample stochastic differential equations conditioned on rather general constraints, including integral constraints, endpoint constraints, and stochastic integral constraints. The algorithm is a pathspace Metropolis-adjusted manifold sampling scheme, which samples stochastic paths on the submanifold of realizations that…

June 23, 2025
Diffusion-Based Hypothesis Testing and Change-Point Detection

Diffusion-Based Hypothesis Testing and Change-Point Detection arXiv:2506.16089v1 Announce Type: new Abstract: Score-based methods have recently seen increasing popularity in modeling and generation. Methods have been constructed to perform hypothesis testing and change-point detection with score functions, but these methods are in general not as powerful as their likelihood-based peers. Recent works consider generalizing the score-based…

June 23, 2025
CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization

CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization arXiv:2506.16189v1 Announce Type: new Abstract: We study the problem of conformal prediction (CP) under geometric data shifts, where data samples are susceptible to transformations such as rotations or flips. While CP endows prediction models with post-hoc uncertainty quantification and formal coverage guarantees, their practicality breaks under distribution…

June 23, 2025
Random feature approximation for general spectral methods

Random feature approximation for general spectral methods arXiv:2506.16283v1 Announce Type: new Abstract: Random feature approximation is arguably one of the most widely used techniques for kernel methods in large-scale learning algorithms. In this work, we analyze the generalization properties of random feature methods, extending previous results for Tikhonov regularization to a broad class of spectral…

June 23, 2025
Weekly Entering & Transitioning – Thread 23 Jun, 2025 – 30 Jun, 2025

Weekly Entering & Transitioning – Thread 23 Jun, 2025 – 30 Jun, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

June 23, 2025
[Project] I just open-sourced a plugin to stop AI from hallucinating your schemas

[Project] I just open-sourced a plugin to stop AI from hallucinating your schemas Hey r/datascience 👋 Using AI tools like Copilot or Cursor can be a total headache for data science work. You’re trying to join tables, and it confidently suggests customer_id when your table actually uses cust_pk. Or worse, it just invents tables that…

June 23, 2025
I have run DS interviews and wow!

I have run DS interviews and wow! Hey all, I have been responsible for technical interviews for a Data Scientist position and the experience was quite surprising to me. I thought some of you may appreciate some insights. A few disclaimers: I have no previous experience running interviews and have had no training at all…

June 23, 2025
Would you do this job if you were rich enough to retire?

Would you do this job if you were rich enough to retire? Curious your perspective on this. Many of us got into the field because it was lucrative and ensures a stable living, But it also is intrinsically interesting to study and challenge yourself. The personalities attracted to tech are often fun and make work…

June 23, 2025
ML case study rounds

ML case study rounds I am asking this from context of interview. In almost every company these days, there is an ML case study round where the focus is on solving a real world case study. Idk if this is somewhat similar to ML system design or not (I think ML system design rounds are…

June 23, 2025
Why You Should Not Replace Blanks with 0 in Power BI

Why You Should Not Replace Blanks with 0 in Power BI Did someone ask you to replace blank values with 0 in your reports? Maybe you should think twice before you do it! The post Why You Should Not Replace Blanks with 0 in Power BI appeared first on Towards Data Science. Nikola Ilic Go…

June 21, 2025
Understanding Application Performance with Roofline Modeling

Understanding Application Performance with Roofline Modeling A common challenge with calculating an application’s performance is that the real-world performance and theoretical performance can differ. With an ecosystem of products that is growing with high performance needs such as High Performance Computing (HPC), gaming, or in the current landscape – Large Language Models (LLMs), it is…

June 21, 2025
Beyond Model Stacking: The Architecture Principles That Make Multimodal AI Systems Work

Beyond Model Stacking: The Architecture Principles That Make Multimodal AI Systems Work Transforming Independent Models into Collaborative Intelligence The post Beyond Model Stacking: The Architecture Principles That Make Multimodal AI Systems Work appeared first on Towards Data Science. Eric Chung Go to original source

June 20, 2025
Understanding Matrices | Part 2: Matrix-Matrix Multiplication

Understanding Matrices | Part 2: Matrix-Matrix Multiplication The physical meaning of multiplying two matrices and how it works on several special matrices. The post Understanding Matrices | Part 2: Matrix-Matrix Multiplication appeared first on Towards Data Science. Tigran Hayrapetyan Go to original source

June 20, 2025
LLM-as-a-Judge: A Practical Guide

LLM-as-a-Judge: A Practical Guide How to Scale LLM Evaluations Beyond Manual Review The post LLM-as-a-Judge: A Practical Guide appeared first on Towards Data Science. Shuai Guo Go to original source

June 20, 2025
From Configuration to Orchestration: Building an ETL Workflow with AWS Is No Longer a Struggle

From Configuration to Orchestration: Building an ETL Workflow with AWS Is No Longer a Struggle A step-by-step guide to leverage AWS services for efficient data pipeline automation The post From Configuration to Orchestration: Building an ETL Workflow with AWS Is No Longer a Struggle appeared first on Towards Data Science. Jiayan Yin Go to original…

June 20, 2025
What PyTorch Really Means by a Leaf Tensor and Its Grad

What PyTorch Really Means by a Leaf Tensor and Its Grad The secret life of leaves, gradients, and the mighty requires_grad flag The post What PyTorch Really Means by a Leaf Tensor and Its Grad appeared first on Towards Data Science. Maciej J. Mikulski Go to original source

June 20, 2025
Optimal Convergence Rates of Deep Neural Network Classifiers

Optimal Convergence Rates of Deep Neural Network Classifiers arXiv:2506.14899v1 Announce Type: new Abstract: In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s in [0,infty]$) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of…

June 19, 2025
Double Machine Learning for Conditional Moment Restrictions: IV regression, Proximal Causal Learning and Beyond

Double Machine Learning for Conditional Moment Restrictions: IV regression, Proximal Causal Learning and Beyond arXiv:2506.14950v1 Announce Type: new Abstract: Solving conditional moment restrictions (CMRs) is a key problem considered in statistics, causal inference, and econometrics, where the aim is to solve for a function of interest that satisfies some conditional moment equalities. Specifically, many techniques…

June 19, 2025
Performative Validity of Recourse Explanations

Performative Validity of Recourse Explanations arXiv:2506.15366v1 Announce Type: new Abstract: When applicants get rejected by an algorithmic decision system, recourse explanations provide actionable suggestions for how to change their input features to get a positive evaluation. A crucial yet overlooked phenomenon is that recourse explanations are performative: When many applicants act according to their recommendations,…

June 19, 2025
An Observation on Lloyd’s k-Means Algorithm in High Dimensions

An Observation on Lloyd’s k-Means Algorithm in High Dimensions arXiv:2506.14952v1 Announce Type: new Abstract: Clustering and estimating cluster means are core problems in statistics and machine learning, with k-means and Expectation Maximization (EM) being two widely used algorithms. In this work, we provide a theoretical explanation for the failure of k-means in high-dimensional settings with…

June 19, 2025
Time-dependent density estimation using binary classifiers

Time-dependent density estimation using binary classifiers arXiv:2506.15505v1 Announce Type: new Abstract: We propose a data-driven method to learn the time-dependent probability density of a multivariate stochastic process from sample paths, assuming that the initial probability density is known and can be evaluated. Our method uses a novel time-dependent binary classifier trained using a contrastive estimation-based…

June 19, 2025
Beyond Code Generation: Continuously Evolve Text with LLMs

Beyond Code Generation: Continuously Evolve Text with LLMs Long-running content evolution and an introduction to result analysis The post Beyond Code Generation: Continuously Evolve Text with LLMs appeared first on Towards Data Science. Julian Mendel Go to original source

June 19, 2025
Animating Linear Transformations with Quiver

Animating Linear Transformations with Quiver A useful tool in your quiver The post Animating Linear Transformations with Quiver appeared first on Towards Data Science. Artemij Lehmann Go to original source

June 19, 2025
A Multi-Agent SQL Assistant You Can Trust with Human-in-Loop Checkpoint & LLM Cost Control

A Multi-Agent SQL Assistant You Can Trust with Human-in-Loop Checkpoint & LLM Cost Control Your very own SQL assistant built with Streamlit, SQLite, & CrewAI The post A Multi-Agent SQL Assistant You Can Trust with Human-in-Loop Checkpoint & LLM Cost Control appeared first on Towards Data Science. Alle Sravani Go to original source

June 19, 2025
Can We Use Chess to Predict Soccer?

Can We Use Chess to Predict Soccer? An adaptation of Elo ratings for soccer implemented in Python The post Can We Use Chess to Predict Soccer? appeared first on Towards Data Science. Felipe Bandeira Go to original source

June 19, 2025
Computer Vision’s Annotation Bottleneck Is Finally Breaking

Computer Vision’s Annotation Bottleneck Is Finally Breaking A Technical Deep Dive into Auto-Labeling The post Computer Vision’s Annotation Bottleneck Is Finally Breaking appeared first on Towards Data Science. TDS Brand Studio Go to original source

June 19, 2025
Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models

Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models arXiv:2506.13900v1 Announce Type: new Abstract: Cooperative game theory has become a cornerstone of post-hoc interpretability in machine learning, largely through the use of Shapley values. Yet, despite their widespread adoption, Shapley-based methods often rest on axiomatic justifications whose relevance to feature attribution remains…

June 18, 2025
Rademacher learning rates for iterated random functions

Rademacher learning rates for iterated random functions arXiv:2506.13946v1 Announce Type: new Abstract: Most existing literature on supervised machine learning assumes that the training dataset is drawn from an i.i.d. sample. However, many real-world problems exhibit temporal dependence and strong correlations between the marginal distributions of the data-generating process, suggesting that the i.i.d. assumption is often…

June 18, 2025
Meta Optimality for Demographic Parity Constrained Regression via Post-Processing

Meta Optimality for Demographic Parity Constrained Regression via Post-Processing arXiv:2506.13947v1 Announce Type: new Abstract: We address the regression problem under the constraint of demographic parity, a commonly used fairness definition. Recent studies have revealed fair minimax optimal regression algorithms, the most accurate algorithms that adhere to the fairness constraint. However, these analyses are tightly coupled…

June 18, 2025
Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies

Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies arXiv:2506.13955v1 Announce Type: new Abstract: Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from (synthetic) anomalies. We extend…

June 18, 2025
Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms

Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms arXiv:2506.13984v1 Announce Type: new Abstract: In this paper, we develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning. For this purpose we formulated the constrained optimization problem, in which we exploits the Bregman divergence with the Tempesta multi-parametric deformation logarithm…

June 18, 2025
Abstract Classes: A Software Engineering Concept Data Scientists Must Know To Succeed

Abstract Classes: A Software Engineering Concept Data Scientists Must Know To Succeed Simple concepts that differentiate a professional from amateurs. The post Abstract Classes: A Software Engineering Concept Data Scientists Must Know To Succeed appeared first on Towards Data Science. Benjamin Lee Go to original source

June 18, 2025
LLaVA on a Budget: Multimodal AI with Limited Resources

LLaVA on a Budget: Multimodal AI with Limited Resources Let’s get started with multimodality The post LLaVA on a Budget: Multimodal AI with Limited Resources appeared first on Towards Data Science. Marcello Politi Go to original source

June 18, 2025