Category: aimldsaimlds

Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications

Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications arXiv:2510.27056v1 Announce Type: new Abstract: This study explores the classification error of Mixture Discriminant Analysis (MDA) in scenarios where the number of mixture components exceeds those present in the actual data distribution, a condition known as overspecification. We use a two-component Gaussian mixture…

November 3, 2025
Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport

Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport arXiv:2510.27340v1 Announce Type: new Abstract: Adding entropic regularization to Optimal Transport (OT) problems has become a standard approach for designing efficient and scalable solvers. However, regularization introduces a bias from the true solution. To mitigate this bias while still benefiting from the acceleration provided by regularization,…

November 3, 2025
On the Equivalence of Optimal Transport Problem and Action Matching with Optimal Vector Fields

On the Equivalence of Optimal Transport Problem and Action Matching with Optimal Vector Fields arXiv:2510.27385v1 Announce Type: new Abstract: Flow Matching (FM) method in generative modeling maps arbitrary probability distributions by constructing an interpolation between them and then learning the vector field that defines ODE for this interpolation. Recently, it was shown that FM can…

November 3, 2025
Minimax-Optimal Two-Sample Test with Sliced Wasserstein

Minimax-Optimal Two-Sample Test with Sliced Wasserstein arXiv:2510.27498v1 Announce Type: new Abstract: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees and computational efficiency, its theoretical foundations for hypothesis testing remain limited.…

November 3, 2025
Interpretable Model-Aware Counterfactual Explanations for Random Forest

Interpretable Model-Aware Counterfactual Explanations for Random Forest arXiv:2510.27397v1 Announce Type: new Abstract: Despite their enormous predictive power, machine learning models are often unsuitable for applications in regulated industries such as finance, due to their limited capacity to provide explanations. While model-agnostic frameworks such as Shapley values have proved to be convenient and popular, they rarely…

November 3, 2025
Weekly Entering & Transitioning – Thread 03 Nov, 2025 – 10 Nov, 2025

Weekly Entering & Transitioning – Thread 03 Nov, 2025 – 10 Nov, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

November 3, 2025
How would you turn a working Jupyter pipeline into a small web app?

How would you turn a working Jupyter pipeline into a small web app? I’ve inherited a few data-engineering notebooks that work end-to-end. I want to (1) extract the logic into a testable Python package and (2) put a minimal GUI on top so non-technical teammates can run it with parameters and download outputs. Constraints: Python…

November 3, 2025
Is it too early to accept an internship offer?

Is it too early to accept an internship offer? I’m a junior studying Data Analytics and Data Engineering at a solid state school. I’ve been a Data Analyst at my university’s career services for the past year, and previously interned as a Data & Business Analytics Intern at a regional credit union. I just got…

November 3, 2025
Has anyones company successfully implemented what is being described as ACP or an AI Mesh?

Has anyones company successfully implemented what is being described as ACP or an AI Mesh? Has anyones company implemented what is generally described as ACP or what McKinsey describes as an AI Mesh? The concept is a centralized space for AI Agents to “talk to each other”. The link below is a general infographic comparing…

November 3, 2025
schwab API usage from AWS

schwab API usage from AWS Hello everyone, I want to create an app that places stock sales based on triggers from AWS (where all my code resides). I am not sure how can I get authorization tokens from withing AWS for schwab API. Does anyone have experience with schwab ? submitted by /u/Amazing_Alarm6130 [link] [comments]…

November 3, 2025
From Classical Models to AI: Forecasting Humidity for Energy and Water Efficiency in Data Centers

From Classical Models to AI: Forecasting Humidity for Energy and Water Efficiency in Data Centers From ARIMA to N-BEATS: Comparing forecasting approaches that balance accuracy, interpretability, and sustainability The post From Classical Models to AI: Forecasting Humidity for Energy and Water Efficiency in Data Centers appeared first on Towards Data Science. Dr. Theophano Mitsa Go…

November 3, 2025
MobileNetV3 Paper Walkthrough: The Tiny Giant Getting Even Smarter

MobileNetV3 Paper Walkthrough: The Tiny Giant Getting Even Smarter MobileNetV3 with PyTorch — now featuring SE blocks and hard activation functions The post MobileNetV3 Paper Walkthrough: The Tiny Giant Getting Even Smarter appeared first on Towards Data Science. Muhammad Ardi Go to original source

November 3, 2025
The Pearson Correlation Coefficient, Explained Simply

The Pearson Correlation Coefficient, Explained Simply A simple explanation of the Pearson correlation coefficient with examples The post The Pearson Correlation Coefficient, Explained Simply appeared first on Towards Data Science. Nikhil Dasari Go to original source

November 2, 2025
Graph RAG vs SQL RAG

Graph RAG vs SQL RAG Evaluating RAGs on graph and SQL databases The post Graph RAG vs SQL RAG appeared first on Towards Data Science. Reinhard Sellmair Go to original source

November 2, 2025
Let Hypothesis Break Your Python Code Before Your Users Do

Let Hypothesis Break Your Python Code Before Your Users Do Property-based tests that find bugs you didn’t know existed. The post Let Hypothesis Break Your Python Code Before Your Users Do appeared first on Towards Data Science. Thomas Reid Go to original source

November 1, 2025
The Machine Learning Projects Employers Want to See

The Machine Learning Projects Employers Want to See What machine learning projects will actually get you interviews and jobs The post The Machine Learning Projects Employers Want to See appeared first on Towards Data Science. Egor Howell Go to original source

November 1, 2025
RF-DETR Under the Hood: The Insights of a Real-Time Transformer Detection

RF-DETR Under the Hood: The Insights of a Real-Time Transformer Detection From rigid grids to adaptive attention, this is the evolutionary path that made detection transformers fast, flexible, and formidable. The post RF-DETR Under the Hood: The Insights of a Real-Time Transformer Detection appeared first on Towards Data Science. David Redó Nieto Go to original…

November 1, 2025
Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms arXiv:2510.25811v1 Announce Type: new Abstract: We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn…

October 31, 2025
$L_1$-norm Regularized Indefinite Kernel Logistic Regression

$L_1$-norm Regularized Indefinite Kernel Logistic Regression arXiv:2510.26043v1 Announce Type: new Abstract: Kernel logistic regression (KLR) is a powerful classification method widely applied across diverse domains. In many real-world scenarios, indefinite kernels capture more domain-specific structural information than positive definite kernels. This paper proposes a novel $L_1$-norm regularized indefinite kernel logistic regression (RIKLR) model, which extends…

October 31, 2025
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation arXiv:2510.26026v1 Announce Type: new Abstract: Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals {for returns} in both on-policy and off-policy settings. Our method integrates distributional…

October 31, 2025
Bias-Corrected Data Synthesis for Imbalanced Learning

Bias-Corrected Data Synthesis for Imbalanced Learning arXiv:2510.26046v1 Announce Type: new Abstract: Imbalanced data, where the positive samples represent only a small proportion compared to the negative samples, makes it challenging for classification problems to balance the false positive and false negative rates. A common approach to addressing the challenge involves generating synthetic data for the…

October 31, 2025
Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems

Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems arXiv:2510.26061v1 Announce Type: new Abstract: We propose a data-driven framework for efficiently solving quadratic programming (QP) problems by reducing the number of variables in high-dimensional QPs using instance-specific projection. A graph neural network-based model is designed to generate projections tailored to each QP instance, enabling…

October 31, 2025
Building a Rules Engine from First Principles

Building a Rules Engine from First Principles How recasting propositional logic as sparse algebra leads to an elegant and efficient design The post Building a Rules Engine from First Principles appeared first on Towards Data Science. Dmitry Lesnik Go to original source

October 31, 2025
Build LLM Agents Faster with Datapizza AI

Build LLM Agents Faster with Datapizza AI Intro Organizations are increasingly investing in AI as these new tools are adopted in everyday operations more and more. This continuous wave of innovation is fueling the demand for more efficient and reliable frameworks. Following this trend, Datapizza (the startup behind Italy’s tech community) just released an open-source…

October 31, 2025
“Systems thinking helps me put the big picture front and center”

“Systems thinking helps me put the big picture front and center” Shuai Guo on deep research agents, analytical AI vs LLM-based agents, and systems thinking The post “Systems thinking helps me put the big picture front and center” appeared first on Towards Data Science. TDS Editors Go to original source

October 31, 2025
Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees

Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees arXiv:2510.24754v1 Announce Type: new Abstract: Uncertain knowledge graph embedding (UnKGE) methods learn vector representations that capture both structural and uncertainty information to predict scores of unseen triples. However, existing methods produce only point estimates, without quantifying predictive uncertainty-limiting their reliability in high-stakes applications where…

October 30, 2025
Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm arXiv:2510.24815v1 Announce Type: new Abstract: Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with critical decisions at stake. The Hoeffding or ANOVA…

October 30, 2025
Generative Bayesian Optimization: Generative Models as Acquisition Functions

Generative Bayesian Optimization: Generative Models as Acquisition Functions arXiv:2510.25240v1 Announce Type: new Abstract: We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch scaling as generative sampling, optimization of non-continuous design spaces, and high-dimensional and combinatorial design.…

October 30, 2025
Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains

Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains arXiv:2510.25514v1 Announce Type: new Abstract: We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function approximation can lead…

October 30, 2025
Using latent representations to link disjoint longitudinal data for mixed-effects regression

Using latent representations to link disjoint longitudinal data for mixed-effects regression arXiv:2510.25531v1 Announce Type: new Abstract: Many rare diseases offer limited established treatment options, leading patients to switch therapies when new medications emerge. To analyze the impact of such treatment switches within the low sample size limitations of rare disease trials, it is important to…

October 30, 2025
4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance

4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance Learn how to greatly improve the performance of your LLM application The post 4 Techniques to Optimize Your LLM Prompts for Cost, Latency and Performance appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

October 30, 2025
Bringing Vision-Language Intelligence to RAG with ColPali

Bringing Vision-Language Intelligence to RAG with ColPali Unlocking the value of non-textual contents in your knowledge base The post Bringing Vision-Language Intelligence to RAG with ColPali appeared first on Towards Data Science. Julian Yip Go to original source

October 30, 2025
Beyond Normality: Reliable A/B Testing with Non-Gaussian Data

Beyond Normality: Reliable A/B Testing with Non-Gaussian Data arXiv:2510.23666v1 Announce Type: new Abstract: A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise $t$-test to compare outcomes between the treatment and control groups, thereby…

October 29, 2025
VIKING: Deep variational inference with stochastic projections

VIKING: Deep variational inference with stochastic projections arXiv:2510.23684v1 Announce Type: new Abstract: Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon…

October 29, 2025
Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis

Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis arXiv:2510.23935v1 Announce Type: new Abstract: Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we…

October 29, 2025
Bayesian neural networks with interpretable priors from Mercer kernels

Bayesian neural networks with interpretable priors from Mercer kernels arXiv:2510.23745v1 Announce Type: new Abstract: Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. Bayesian neural networks (BNNs) provide a framework for this purpose by constructing…

October 29, 2025
Score-based constrained generative modeling via Langevin diffusions with boundary conditions

Score-based constrained generative modeling via Langevin diffusions with boundary conditions arXiv:2510.23985v1 Announce Type: new Abstract: Score-based generative models based on stochastic differential equations (SDEs) achieve impressive performance in sampling from unknown distributions, but often fail to satisfy underlying constraints. We propose a constrained generative model using kinetic (underdamped) Langevin dynamics with specular reflection of velocity…

October 29, 2025
Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood)

Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood) Can I use NumPy to figure out how my habits affect my mood and productivity? The post Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood) appeared first on Towards Data Science. Ibrahim Salami Go to original source

October 29, 2025
Deep Reinforcement Learning: 0 to 100

Deep Reinforcement Learning: 0 to 100 Using RL to teach robots to fly a drone The post Deep Reinforcement Learning: 0 to 100 appeared first on Towards Data Science. Vedant Jumle Go to original source

October 29, 2025
Using Claude Skills with Neo4j

Using Claude Skills with Neo4j A hands-on exploration of Claude Skills and their potential applications in Neo4j The post Using Claude Skills with Neo4j appeared first on Towards Data Science. Tomaz Bratanic Go to original source

October 29, 2025
Water Cooler Small Talk, Ep. 9: What “Thinking” and “Reasoning” Really Mean in AI and LLMs

Water Cooler Small Talk, Ep. 9: What “Thinking” and “Reasoning” Really Mean in AI and LLMs Understanding how AI models “reason” and why it’s not what humans do when we think The post Water Cooler Small Talk, Ep. 9: What “Thinking” and “Reasoning” Really Mean in AI and LLMs appeared first on Towards Data Science.…

October 29, 2025
Input Adaptive Bayesian Model Averaging

Input Adaptive Bayesian Model Averaging arXiv:2510.22054v1 Announce Type: new Abstract: This paper studies prediction with multiple candidate models, where the goal is to combine their outputs. This task is especially challenging in heterogeneous settings, where different models may be better suited to different inputs. We propose input adaptive Bayesian Model Averaging (IA-BMA), a Bayesian method…

October 28, 2025
Bridging Prediction and Attribution: Identifying Forward and Backward Causal Influence Ranges Using Assimilative Causal Inference

Bridging Prediction and Attribution: Identifying Forward and Backward Causal Influence Ranges Using Assimilative Causal Inference arXiv:2510.21889v1 Announce Type: new Abstract: Causal inference identifies cause-and-effect relationships between variables. While traditional approaches rely on data to reveal causal links, a recently developed method, assimilative causal inference (ACI), integrates observations with dynamical models. It utilizes Bayesian data assimilation…

October 28, 2025
Differentially Private High-dimensional Variable Selection via Integer Programming

Differentially Private High-dimensional Variable Selection via Integer Programming arXiv:2510.22062v1 Announce Type: new Abstract: Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale non-private sparse regression – known as Best Subset Selection (BSS) – with millions…

October 28, 2025
Frequentist Validity of Epistemic Uncertainty Estimators

Frequentist Validity of Epistemic Uncertainty Estimators arXiv:2510.22063v1 Announce Type: new Abstract: Decomposing prediction uncertainty into its aleatoric (irreducible) and epistemic (reducible) components is critical for the development and deployment of machine learning systems. A popular, principled measure for epistemic uncertainty is the mutual information between the response variable and model parameters. However, evaluating this measure…

October 28, 2025
MMbeddings: Parameter-Efficient, Low-Overfitting Probabilistic Embeddings Inspired by Nonlinear Mixed Models

MMbeddings: Parameter-Efficient, Low-Overfitting Probabilistic Embeddings Inspired by Nonlinear Mixed Models arXiv:2510.22198v1 Announce Type: new Abstract: We present MMbeddings, a probabilistic embedding approach that reinterprets categorical embeddings through the lens of nonlinear mixed models, effectively bridging classical statistical theory with modern deep learning. By treating embeddings as latent random effects within a variational autoencoder framework, our…

October 28, 2025
A Real-World Example of Using UDF in DAX

A Real-World Example of Using UDF in DAX With the September 2025 release of Power BI, we get the new user-defined function feature. This is an excellent addition to our toolset. Let’s see how to build a real-world example of this new feature. The post A Real-World Example of Using UDF in DAX appeared first…

October 28, 2025
How to Apply Powerful AI Audio Models to Real-World Applications

How to Apply Powerful AI Audio Models to Real-World Applications Learn about different types of AI audio models and the application areas they can be used in. The post How to Apply Powerful AI Audio Models to Real-World Applications appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

October 28, 2025
The Machine Learning Lessons I’ve Learned This Month

The Machine Learning Lessons I’ve Learned This Month October 2025: READMEs, MIGs, and movements The post The Machine Learning Lessons I’ve Learned This Month appeared first on Towards Data Science. Pascal Janetzky Go to original source

October 28, 2025
Building a Monitoring System That Actually Works

Building a Monitoring System That Actually Works A step-by-step guide to catching real anomalies without drowning in false alerts The post Building a Monitoring System That Actually Works appeared first on Towards Data Science. Mariya Mansurova Go to original source

October 28, 2025
Exponential Convergence Guarantees for Iterative Markovian Fitting

Exponential Convergence Guarantees for Iterative Markovian Fitting arXiv:2510.20871v1 Announce Type: new Abstract: The Schr”odinger Bridge (SB) problem has become a fundamental tool in computational optimal transport and generative modeling. To address this problem, ideal methods such as Iterative Proportional Fitting and Iterative Markovian Fitting (IMF) have been proposed-alongside practical approximations like Diffusion Schr”odinger Bridge and…

October 27, 2025
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization

Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization arXiv:2510.20883v1 Announce Type: new Abstract: Adversarial training has emerged as a key technique to enhance model robustness against adversarial input perturbations. Many of the existing methods rely on computationally expensive min-max problems that limit their application in practice. We propose a novel formulation of adversarial…

October 27, 2025
A Short Note on Upper Bounds for Graph Neural Operator Convergence Rate

A Short Note on Upper Bounds for Graph Neural Operator Convergence Rate arXiv:2510.20954v1 Announce Type: new Abstract: Graphons, as limits of graph sequences, provide a framework for analyzing the asymptotic behavior of graph neural operators. Spectral convergence of sampled graphs to graphons yields operator-level convergence rates, enabling transferability analyses of GNNs. This note summarizes known…

October 27, 2025
Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization

Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization arXiv:2510.21273v1 Announce Type: new Abstract: Probabilistic models must be well calibrated to support reliable decision-making. While calibration in single-output regression is well studied, defining and achieving multivariate calibration in multi-output regression remains considerably more challenging. The existing literature on multivariate calibration primarily focuses on diagnostic tools…

October 27, 2025
Doubly-Regressing Approach for Subgroup Fairness

Doubly-Regressing Approach for Subgroup Fairness arXiv:2510.21091v1 Announce Type: new Abstract: Algorithmic fairness is a socially crucial topic in real-world applications of AI. Among many notions of fairness, subgroup fairness is widely studied when multiple sensitive attributes (e.g., gender, race, age) are present. However, as the number of sensitive attributes grows, the number of subgroups increases…

October 27, 2025
Weekly Entering & Transitioning – Thread 27 Oct, 2025 – 03 Nov, 2025

Weekly Entering & Transitioning – Thread 27 Oct, 2025 – 03 Nov, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

October 27, 2025
Anyone looking to read the third edition of Deep Learning With Python?

Anyone looking to read the third edition of Deep Learning With Python? The book is now available to read online for free: https://deeplearningwithpython.io/chapters/ submitted by /u/yaymayhun [link] [comments] /u/yaymayhun Go to original source

October 27, 2025
The Great Stay — Here’s the New Reality for Tech Workers

The Great Stay — Here’s the New Reality for Tech Workers Do you think you’re part of this new phenomenon called The Great Stay? submitted by /u/KitchenTaste7229 [link] [comments] /u/KitchenTaste7229 Go to original source

October 27, 2025
What’s next for a 11 YOE data scientist?

What’s next for a 11 YOE data scientist? Hi folks, Hope you’re having a great day wherever you are in the world. Context: I’ve been in the data science industry for the past 11 years. I started my career in telecom, where I worked extensively on time series analysis and data cleaning using R, Java,…

October 27, 2025
Any other free options that are similar to ShotBot?

Any other free options that are similar to ShotBot? submitted by /u/Party_Bus_3809 [link] [comments] /u/Party_Bus_3809 Go to original source

October 27, 2025
The Power of Framework Dimensions: What Data Scientists Should Know

The Power of Framework Dimensions: What Data Scientists Should Know Practical guidance and a case study The post The Power of Framework Dimensions: What Data Scientists Should Know appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

October 27, 2025
AI Agents: From Assistants for Efficiency to Leaders of Tomorrow?

AI Agents: From Assistants for Efficiency to Leaders of Tomorrow? How artificial intelligence is evolving from “simple” assistants to potential architect of our future-even CEOs and governors The post AI Agents: From Assistants for Efficiency to Leaders of Tomorrow? appeared first on Towards Data Science. Luciano Abriata Go to original source

October 27, 2025
Data Visualization Explained (Part 4): A Review of Python Essentials

Data Visualization Explained (Part 4): A Review of Python Essentials Learn the foundations of Python to take your data visualization game to the next level. The post Data Visualization Explained (Part 4): A Review of Python Essentials appeared first on Towards Data Science. Murtaza Ali Go to original source

October 26, 2025
Building a Geospatial Lakehouse with Open Source and Databricks

Building a Geospatial Lakehouse with Open Source and Databricks An example workflow for vector geospatial data science The post Building a Geospatial Lakehouse with Open Source and Databricks appeared first on Towards Data Science. Robert Constable Go to original source

October 26, 2025
Agentic AI from First Principles: Reflection

Agentic AI from First Principles: Reflection From theory to code: building feedback loops that improve LLM accuracy The post Agentic AI from First Principles: Reflection appeared first on Towards Data Science. Mariya Mansurova Go to original source

October 25, 2025
How to Consistently Extract Metadata from Complex Documents

How to Consistently Extract Metadata from Complex Documents Learn how to extract important pieces of information from your documents The post How to Consistently Extract Metadata from Complex Documents appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

October 25, 2025
Choosing the Best Model Size and Dataset Size under a Fixed Budget for LLMs

Choosing the Best Model Size and Dataset Size under a Fixed Budget for LLMs A small-scale exploration using Tiny Transformers The post Choosing the Best Model Size and Dataset Size under a Fixed Budget for LLMs appeared first on Towards Data Science. Shuyang Go to original source

October 25, 2025
Deploy an OpenAI Agent Builder Chatbot to a Website

Deploy an OpenAI Agent Builder Chatbot to a Website Using OpenAI’s Agent Builder ChatKit The post Deploy an OpenAI Agent Builder Chatbot to a Website appeared first on Towards Data Science. Thomas Reid Go to original source

October 25, 2025
Compositional Generation for Long-Horizon Coupled PDEs

Compositional Generation for Long-Horizon Coupled PDEs arXiv:2510.20141v1 Announce Type: new Abstract: Simulating coupled PDE systems is computationally intensive, and prior efforts have largely focused on training surrogates on the joint (coupled) data, which requires a large amount of data. In the paper, we study compositional diffusion approaches where diffusion models are only trained on the…

October 24, 2025
Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models

Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models arXiv:2510.19999v1 Announce Type: new Abstract: We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints that reduces training time in comparison to existing state-of-the-art methods. We redesign the CD method by performing a Taylor expansion…

October 24, 2025
Neural Networks for Censored Expectile Regression Based on Data Augmentation

Neural Networks for Censored Expectile Regression Based on Data Augmentation arXiv:2510.20344v1 Announce Type: new Abstract: Expectile regression neural networks (ERNNs) are powerful tools for capturing heterogeneity and complex nonlinear structures in data. However, most existing research has primarily focused on fully observed data, with limited attention paid to scenarios involving censored observations. In this paper,…

October 24, 2025
Testing Most Influential Sets

Testing Most Influential Sets arXiv:2510.20372v1 Announce Type: new Abstract: Small subsets of data with disproportionate influence on model outcomes can have dramatic impacts on conclusions, with a few data points sometimes overturning key findings. While recent work has developed methods to identify these emph{most influential sets}, no formal theory exists to determine when their influence…

October 24, 2025
Learning Decentralized Routing Policies via Graph Attention-based Multi-Agent Reinforcement Learning in Lunar Delay-Tolerant Networks

Learning Decentralized Routing Policies via Graph Attention-based Multi-Agent Reinforcement Learning in Lunar Delay-Tolerant Networks arXiv:2510.20436v1 Announce Type: new Abstract: We present a fully decentralized routing framework for multi-robot exploration missions operating under the constraints of a Lunar Delay-Tolerant Network (LDTN). In this setting, autonomous rovers must relay collected data to a lander under intermittent connectivity…

October 24, 2025
When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation

When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation Exploring the frequency fingerprints of Transformers to guide smarter knowledge distillation The post When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation appeared first on Towards Data Science. Ankit Singh Chauhan Go to original source

October 24, 2025
How to Keep AI Costs Under Control

How to Keep AI Costs Under Control Lessons from Scaling LLMs The post How to Keep AI Costs Under Control appeared first on Towards Data Science. Asaf Liveanu Go to original source

October 24, 2025
How to Control a Robot with Python

How to Control a Robot with Python 3D simulations and movement control with PyBullet The post How to Control a Robot with Python appeared first on Towards Data Science. Mauro Di Pietro Go to original source

October 24, 2025
Multiple Linear Regression Explained Simply (Part 1)

Multiple Linear Regression Explained Simply (Part 1) The math behind fitting a plane instead of a line. The post Multiple Linear Regression Explained Simply (Part 1) appeared first on Towards Data Science. Nikhil Dasari Go to original source

October 24, 2025
Signature Kernel Scoring Rule as Spatio-Temporal Diagnostic for Probabilistic Forecasting

Signature Kernel Scoring Rule as Spatio-Temporal Diagnostic for Probabilistic Forecasting arXiv:2510.19110v1 Announce Type: new Abstract: Modern weather forecasting has increasingly transitioned from numerical weather prediction (NWP) to data-driven machine learning forecasting techniques. While these new models produce probabilistic forecasts to quantify uncertainty, their training and evaluation may remain hindered by conventional scoring rules, primarily MSE,…

October 23, 2025
Calibrated Principal Component Regression

Calibrated Principal Component Regression arXiv:2510.19020v1 Announce Type: new Abstract: We propose a new method for statistical inference in generalized linear models. In the overparameterized regime, Principal Component Regression (PCR) reduces variance by projecting high-dimensional data to a low-dimensional principal subspace before fitting. However, PCR incurs truncation bias whenever the true regression vector has mass outside…

October 23, 2025
Extreme Event Aware ($eta$-) Learning

Extreme Event Aware ($eta$-) Learning arXiv:2510.19161v1 Announce Type: new Abstract: Quantifying and predicting rare and extreme events persists as a crucial yet challenging task in understanding complex dynamical systems. Many practical challenges arise from the infrequency and severity of these events, including the considerable variance of simple sampling methods and the substantial computational cost of…

October 23, 2025
Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study

Topology of Currencies: Persistent Homology for FX Co-movements: A Comparative Clustering Study arXiv:2510.19306v1 Announce Type: new Abstract: This study investigates whether Topological Data Analysis (TDA) can provide additional insights beyond traditional statistical methods in clustering currency behaviours. We focus on the foreign exchange (FX) market, which is a complex system often exhibiting non-linear and high-dimensional…

October 23, 2025
Metadata Extraction Leveraging Large Language Models

Metadata Extraction Leveraging Large Language Models arXiv:2510.19334v1 Announce Type: new Abstract: The advent of Large Language Models has revolutionized tasks across domains, including the automation of legal document analysis, a critical component of modern contract management systems. This paper presents a comprehensive implementation of LLM-enhanced metadata extraction for contract review, focusing on the automatic detection…

October 23, 2025
Why Should We Bother with Quantum Computing in ML?

Why Should We Bother with Quantum Computing in ML? Quantum Machine Learning principles The post Why Should We Bother with Quantum Computing in ML? appeared first on Towards Data Science. Erika G. Gonçalves Go to original source

October 23, 2025
Federated Learning and Custom Aggregation Schemes

Federated Learning and Custom Aggregation Schemes A practical guide to designing and analyzing robust aggregation strategies The post Federated Learning and Custom Aggregation Schemes appeared first on Towards Data Science. Salman Toor Go to original source

October 23, 2025
Implementing DRIFT Search with Neo4j and LlamaIndex

Implementing DRIFT Search with Neo4j and LlamaIndex Combining global and local search to get the most accurate response The post Implementing DRIFT Search with Neo4j and LlamaIndex appeared first on Towards Data Science. Tomaz Bratanic Go to original source

October 23, 2025
Agentic AI in Finance: Opportunities and Challenges for Indonesia

Agentic AI in Finance: Opportunities and Challenges for Indonesia The rise of AI has touched nearly every industry — including finance. In fact, the financial sector has long been an adopter of what we now call “traditional machine learning,” using it for predictive modeling, credit scoring, and risk analytics. But with the current hype around…

October 23, 2025
Graphical model for tensor factorization by sparse sampling

Graphical model for tensor factorization by sparse sampling arXiv:2510.17886v1 Announce Type: new Abstract: We consider tensor factorizations based on sparse measurements of the tensor components. The measurements are designed in a way that the underlying graph of interactions is a random graph. The setup will be useful in cases where a substantial amount of data…

October 22, 2025
Learning Time-Varying Graphs from Incomplete Graph Signals

Learning Time-Varying Graphs from Incomplete Graph Signals arXiv:2510.17903v1 Announce Type: new Abstract: This paper tackles the challenging problem of jointly inferring time-varying network topologies and imputing missing data from partially observed graph signals. We propose a unified non-convex optimization framework to simultaneously recover a sequence of graph Laplacian matrices while reconstructing the unobserved signal entries.…

October 22, 2025
Generalization Below the Edge of Stability: The Role of Data Geometry

Generalization Below the Edge of Stability: The Role of Data Geometry arXiv:2510.18120v1 Announce Type: new Abstract: Understanding generalization in overparameterized neural networks hinges on the interplay between the data geometry, neural architecture, and training dynamics. In this paper, we theoretically explore how data geometry controls this implicit bias. This paper presents theoretical results for overparameterized…

October 22, 2025
Arbitrated Indirect Treatment Comparisons

Arbitrated Indirect Treatment Comparisons arXiv:2510.18071v1 Announce Type: new Abstract: Matching-adjusted indirect comparison (MAIC) has been increasingly employed in health technology assessments (HTA). By reweighting subjects from a trial with individual participant data (IPD) to match the covariate summary statistics of another trial with only aggregate data (AgD), MAIC facilitates the estimation of a treatment effect…

October 22, 2025
Beating the Winner’s Curse via Inference-Aware Policy Optimization

Beating the Winner’s Curse via Inference-Aware Policy Optimization arXiv:2510.18161v1 Announce Type: new Abstract: There has been a surge of recent interest in automatically learning policies to target treatment decisions based on rich individual covariates. A common approach is to train a machine learning model to predict counterfactual outcomes, and then select the policy that optimizes…

October 22, 2025
Scaling Recommender Transformers to a Billion Parameters

Scaling Recommender Transformers to a Billion Parameters How to implement a new generation of transformer recommenders The post Scaling Recommender Transformers to a Billion Parameters appeared first on Towards Data Science. Kirill Кhrylchenko Go to original source

October 22, 2025
Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI

Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI Context engineering, semantic layers, and the evolution of retrieval for agentic AI The post Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI appeared first on Towards Data Science. Steve Hedden Go to original source

October 22, 2025
Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know

Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know I’ve been learning data analytics for a year now. So far, I can consider myself confident in SQL and Power BI. The transition to Python has been quite exciting. I’ve been exposed to some neat and smarter approaches to data analysis. After brushing up…

October 22, 2025
Implementing the Fourier Transform Numerically in Python: A Step-by-Step Guide

Implementing the Fourier Transform Numerically in Python: A Step-by-Step Guide What if the FFT functions in NumPy and SciPy don’t actually compute the Fourier transform you think they do? The post Implementing the Fourier Transform Numerically in Python: A Step-by-Step Guide appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

October 22, 2025
Learning density ratios in causal inference using Bregman-Riesz regression

Learning density ratios in causal inference using Bregman-Riesz regression arXiv:2510.16127v1 Announce Type: new Abstract: The ratio of two probability density functions is a fundamental quantity that appears in many areas of statistics and machine learning, including causal inference, reinforcement learning, covariate shift, outlier detection, independence testing, importance sampling, and diffusion modeling. Naively estimating the numerator…

October 21, 2025
A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators

A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators arXiv:2510.16419v1 Announce Type: new Abstract: While significant progress has been made in heterogeneous treatment effect (HTE) estimation, the evaluation of HTE estimators remains underdeveloped. In this article, we propose a robust evaluation framework based on relative error, which quantifies performance differences between two HTE estimators.…

October 21, 2025
Personalized Collaborative Learning with Affinity-Based Variance Reduction

Personalized Collaborative Learning with Affinity-Based Variance Reduction arXiv:2510.16232v1 Announce Type: new Abstract: Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels — gaining collaborative speedup when agents are similar, without performance degradation when…

October 21, 2025
A Bayesian Framework for Symmetry Inference in Chaotic Attractors

A Bayesian Framework for Symmetry Inference in Chaotic Attractors arXiv:2510.16509v1 Announce Type: new Abstract: Detecting symmetry from data is a fundamental problem in signal analysis, providing insight into underlying structure and constraints. When data emerge as trajectories of dynamical systems, symmetries encode structural properties of the dynamics that enable model reduction, principled comparison across conditions,…

October 21, 2025
From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction

From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction arXiv:2510.16551v1 Announce Type: new Abstract: This research proposes a systematic, large language model (LLM) approach for extracting product and service attributes, features, and associated sentiments from customer reviews. Grounded in marketing theory, the framework distinguishes perceptual attributes from actionable features, producing interpretable…

October 21, 2025