Category: aimldsaimlds
-
The Machine Learning and Deep Learning “Advent Calendar” Series: The Blueprint
The Machine Learning and Deep Learning “Advent Calendar” Series: The Blueprint Opening the black box of ML models, step by step, directly in Excel The post The Machine Learning and Deep Learning “Advent Calendar” Series: The Blueprint appeared first on Towards Data Science. angela shi Go to original source
-
The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall
The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall A modification to the Boruta algorithm that dramatically reduces computation while maintaining high sensitivity The post The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall appeared first on Towards Data Science. Nicolas Vana Go to original source
-
Metric Deception: When Your Best KPIs Hide Your Worst Failures
Metric Deception: When Your Best KPIs Hide Your Worst Failures The most dangerous KPIs aren’t broken; they’re the ones trusted long after they’ve lost their meaning. The post Metric Deception: When Your Best KPIs Hide Your Worst Failures appeared first on Towards Data Science. Shafeeq Ur Rahaman Go to original source
-
How to Scale Your LLM Usage
How to Scale Your LLM Usage Learn how to increase LLM usage to achieve increased productivity The post How to Scale Your LLM Usage appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
Data Science in 2026: Is It Still Worth It?
Data Science in 2026: Is It Still Worth It? An honest view from a 10-year AI Engineer The post Data Science in 2026: Is It Still Worth It? appeared first on Towards Data Science. Sabrine Bendimerad Go to original source
-
The Product Health Score: How I Reduced Critical Incidents by 35% with Unified Monitoring and n8n Automation
The Product Health Score: How I Reduced Critical Incidents by 35% with Unified Monitoring and n8n Automation How product, growth and engineering teams can converge on a single signal for better incident management The post The Product Health Score: How I Reduced Critical Incidents by 35% with Unified Monitoring and n8n Automation appeared first on…
-
Neural Networks Are Blurry, Symbolic Systems Are Fragmented. Sparse Autoencoders Help Us Combine Them.
Neural Networks Are Blurry, Symbolic Systems Are Fragmented. Sparse Autoencoders Help Us Combine Them. Neural and symbolic models compress the world in fundamentally different ways, and Sparse Autoencoders (SAEs) offer a bridge to connect them. The post Neural Networks Are Blurry, Symbolic Systems Are Fragmented. Sparse Autoencoders Help Us Combine Them. appeared first on Towards…
-
Water Cooler Small Talk, Ep. 10: So, What About the AI Bubble?
Water Cooler Small Talk, Ep. 10: So, What About the AI Bubble? Have we all been tricked into believing in an impossible, extremely expensive future? The post Water Cooler Small Talk, Ep. 10: So, What About the AI Bubble? appeared first on Towards Data Science. Maria Mouschoutzi Go to original source
-
Everyday Decisions are Noisier Than You Think — Here’s How AI Can Help Fix That
Everyday Decisions are Noisier Than You Think — Here’s How AI Can Help Fix That From insurance premiums to courtrooms: the impact of noise The post Everyday Decisions are Noisier Than You Think — Here’s How AI Can Help Fix That appeared first on Towards Data Science. Sean Moran Go to original source
-
Implementing the Rock Paper Scissors Game in Python
Implementing the Rock Paper Scissors Game in Python A beginner-friendly Python tutorial using conditionals and the random module The post Implementing the Rock Paper Scissors Game in Python appeared first on Towards Data Science. Mahnoor Javed Go to original source
-
When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing
When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing arXiv:2511.20851v1 Announce Type: new Abstract: Feature selection has remained a daunting challenge in machine learning and artificial intelligence, where increasingly complex, high-dimensional datasets demand principled strategies for isolating the most informative predictors. Despite widespread adoption, many established techniques suffer from notable limitations; some…
-
Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets
Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets arXiv:2511.20888v1 Announce Type: new Abstract: This paper argues that DNNs implement a computational Occam’s razor — finding the `simplest’ algorithm that fits the data — and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start…
-
Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification
Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification arXiv:2511.20960v1 Announce Type: new Abstract: Modern artificial intelligence systems make critical decisions yet often fail silently when uncertain. We develop a geometric framework for post-hoc calibration of neural network probability outputs, treating probability vectors as points on the $(c-1)$-dimensional probability simplex equipped with the Fisher–Rao metric.…
-
Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms
Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms arXiv:2511.21115v1 Announce Type: new Abstract: This paper investigates the partial linear model by Least Absolute Deviation (LAD) regression. We parameterize the nonparametric term using Deep Neural Networks (DNNs) and formulate a penalized LAD problem for estimation. Specifically, our model exhibits…
-
Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference
Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference arXiv:2511.21223v1 Announce Type: new Abstract: Variational inference (VI) is a cornerstone of modern Bayesian learning, enabling approximate inference in complex models that would otherwise be intractable. However, its formulation depends on expectations and divergences defined through high-dimensional integrals, often rendering analytical treatment impossible and necessitating heavy reliance on…
-
I Cleaned a Messy CSV File Using Pandas . Here’s the Exact Process I Follow Every Time.
I Cleaned a Messy CSV File Using Pandas . Here’s the Exact Process I Follow Every Time. Stop guessing at data cleaning. Use this repeatable 5-step Python workflow to diagnose and fix the most common data flaws. The post I Cleaned a Messy CSV File Using Pandas . Here’s the Exact Process I Follow Every Time. appeared first on Towards…
-
RISAT’s Silent Promise: Decoding Disasters with Synthetic Aperture Radar
RISAT’s Silent Promise: Decoding Disasters with Synthetic Aperture Radar The high-resolution physics turning microwave echoes into real-time flood intelligence The post RISAT’s Silent Promise: Decoding Disasters with Synthetic Aperture Radar appeared first on Towards Data Science. Aakash Goswami Go to original source
-
How I Use AI to Convince Companies to Adopt Sustainability
How I Use AI to Convince Companies to Adopt Sustainability Discover how Claude can act as a Supply Chain Sustainability Analyst and guide companies toward greener, more efficient inventory management. The post How I Use AI to Convince Companies to Adopt Sustainability appeared first on Towards Data Science. Samir Saci Go to original source
-
FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection
FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection arXiv:2511.19476v1 Announce Type: new Abstract: Coreset selection compresses large datasets into compact, representative subsets, reducing the energy and computational burden of training deep neural networks. Existing methods are either: (i) DNN-based, which are tied to model-specific parameters and introduce architectural bias; or (ii) DNN-free, which rely on…
-
Optimization and Regularization Under Arbitrary Objectives
Optimization and Regularization Under Arbitrary Objectives arXiv:2511.19628v1 Announce Type: new Abstract: This study investigates the limitations of applying Markov Chain Monte Carlo (MCMC) methods to arbitrary objective functions, focusing on a two-block MCMC framework which alternates between Metropolis-Hastings and Gibbs sampling. While such approaches are often considered advantageous for enabling data-driven regularization, we show that…
-
Clustering Approaches for Mixed-Type Data: A Comparative Study
Clustering Approaches for Mixed-Type Data: A Comparative Study arXiv:2511.19755v1 Announce Type: new Abstract: Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study presents the state-of-the-art of these approaches and compares…
-
A Fully Probabilistic Tensor Network for Regularized Volterra System Identification
A Fully Probabilistic Tensor Network for Regularized Volterra System Identification arXiv:2511.20457v1 Announce Type: new Abstract: Modeling nonlinear systems with Volterra series is challenging because the number of kernel coefficients grows exponentially with the model order. This work introduces Bayesian Tensor Network Volterra kernel machines (BTN-V), extending the Bayesian Tensor Network framework to Volterra system identification.…
-
Generative Modeling with Manifold Percolation
Generative Modeling with Manifold Percolation arXiv:2511.20503v1 Announce Type: new Abstract: Generative modeling is typically framed as learning mapping rules, but from an observer’s perspective without access to these rules, the task manifests as disentangling the geometric support from the probability distribution. We propose that Continuum Percolation is uniquely suited for this support analysis, as the…
-
Why CrewAI’s Manager-Worker Architecture Fails — and How to Fix It
Why CrewAI’s Manager-Worker Architecture Fails — and How to Fix It A real-world analysis of why CrewAI’s hierarchical orchestration misfires—and a practical fix you can implement today. The post Why CrewAI’s Manager-Worker Architecture Fails — and How to Fix It appeared first on Towards Data Science. Partha Sarkar Go to original source
-
How to Implement Three Use Cases for the New Calendar-Based Time Intelligence
How to Implement Three Use Cases for the New Calendar-Based Time Intelligence Starting with the September 2025 Release of Power BI, Microsoft introduced the new Calendar-based Time Intelligence feature. Let’s see what can be done by implementing three use cases. The future looks very interesting with this new feature. The post How to Implement Three…
-
Ten Lessons of Building LLM Applications for Engineers
Ten Lessons of Building LLM Applications for Engineers Practical field notes on workflows, structure, and evaluation from two years of building with engineering domain experts. The post Ten Lessons of Building LLM Applications for Engineers appeared first on Towards Data Science. Shuai Guo Go to original source
-
How to Create Professional Articles with LaTeX in Cursor
How to Create Professional Articles with LaTeX in Cursor Learn how to rapidly create professional articles and presentations with LaTeX in Cursor The post How to Create Professional Articles with LaTeX in Cursor appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
Quantum Fourier Transform Based Kernel for Solar Irrandiance Forecasting
Quantum Fourier Transform Based Kernel for Solar Irrandiance Forecasting arXiv:2511.17698v1 Announce Type: new Abstract: This study proposes a Quantum Fourier Transform (QFT)-enhanced quantum kernel for short-term time-series forecasting. Each signal is windowed, amplitude-encoded, transformed by a QFT, then passed through a protective rotation layer to avoid the QFT/QFT adjoint cancellation; the resulting kernel is used…
-
Prequential posteriors
Prequential posteriors arXiv:2511.17721v1 Announce Type: new Abstract: Data assimilation is a fundamental task in updating forecasting models upon observing new data, with applications ranging from weather prediction to online reinforcement learning. Deep generative forecasting models (DGFMs) have shown excellent performance in these areas, but assimilating data into such models is challenging due to their intractable…
-
Variational Estimators for Node Popularity Models
Variational Estimators for Node Popularity Models arXiv:2511.17783v1 Announce Type: new Abstract: Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may exhibit varying popularity patterns, motivating models such as the Two-Way Node Popularity…
-
An operator splitting analysis of Wasserstein–Fisher–Rao gradient flows
An operator splitting analysis of Wasserstein–Fisher–Rao gradient flows arXiv:2511.18060v1 Announce Type: new Abstract: Wasserstein-Fisher-Rao (WFR) gradient flows have been recently proposed as a powerful sampling tool that combines the advantages of pure Wasserstein (W) and pure Fisher-Rao (FR) gradient flows. Existing algorithmic developments implicitly make use of operator splitting techniques to numerically approximate the WFR…
-
Conformal Prediction for Compositional Data
Conformal Prediction for Compositional Data arXiv:2511.18141v1 Announce Type: new Abstract: In this work, we propose a set of conformal prediction procedures tailored to compositional responses, where outcomes are proportions that must be positive and sum to one. Building on Dirichlet regression, we introduce a split conformal approach based on quantile residuals and a highest-density region…
-
How to Implement Randomization with the Python Random Module
How to Implement Randomization with the Python Random Module Let’s generate randomness in our code’s outputs The post How to Implement Randomization with the Python Random Module appeared first on Towards Data Science. Mahnoor Javed Go to original source
-
Struggling with Data Science? 5 Common Beginner Mistakes
Struggling with Data Science? 5 Common Beginner Mistakes Avoid these mistakes to fast track your data science career. The post Struggling with Data Science? 5 Common Beginner Mistakes appeared first on Towards Data Science. Egor Howell Go to original source
-
A Hands-On Guide to Anthropic’s New Structured Output Capabilities
A Hands-On Guide to Anthropic’s New Structured Output Capabilities A developer’s guide to perfect JSON and typed outputs from Claude Sonnet 4.5 and Opus 4.1 The post A Hands-On Guide to Anthropic’s New Structured Output Capabilities appeared first on Towards Data Science. Thomas Reid Go to original source
-
LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models
LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models A step-by-step guide to building AI quality control using large language models The post LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models appeared first on Towards Data Science. Piero Paialunga Go…
-
BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates
BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates arXiv:2511.16815v1 Announce Type: new Abstract: We introduce the Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS) framework to emulate latent components in hybrid physical systems. BITS for GAPS supports serial hybrid modeling, where known physics governs part of the system and…
-
Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition
Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition arXiv:2511.16796v1 Announce Type: cross Abstract: Penalty-based methods have become popular for solving bilevel optimization (BLO) problems, thanks to their effective first-order nature. However, they often require inner-loop iterations to solve the lower-level (LL) problem and small outer-loop step sizes to handle the increased smoothness…
-
Diffusion-Inversion-Net (DIN): An End-to-End Direct Probabilistic Framework for Characterizing Hydraulic Conductivities and Quantifying Uncertainty
Diffusion-Inversion-Net (DIN): An End-to-End Direct Probabilistic Framework for Characterizing Hydraulic Conductivities and Quantifying Uncertainty arXiv:2511.16926v1 Announce Type: cross Abstract: We propose the Diffusion-Inversion-Net (DIN) framework for inverse modeling of groundwater flow and solute transport processes. DIN utilizes an offline-trained Denoising Diffusion Probabilistic Model (DDPM) as a powerful prior leaner, which flexibly incorporates sparse, multi-source observational…
-
Gradient flow for deep equilibrium single-index models
Gradient flow for deep equilibrium single-index models arXiv:2511.16976v1 Announce Type: cross Abstract: Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training…
-
DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing
DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing arXiv:2511.17038v1 Announce Type: cross Abstract: From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formulation fails to explain its practical behavior: the prior offers limited guidance, while reconstruction is largely…
-
Weekly Entering & Transitioning – Thread 24 Nov, 2025 – 01 Dec, 2025
Weekly Entering & Transitioning – Thread 24 Nov, 2025 – 01 Dec, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…
-
Are LeetCode heavy Interviews becoming the norm for DS Modeling roles?
Are LeetCode heavy Interviews becoming the norm for DS Modeling roles? I’ve been actively searching for DS Modeling roles again, and wow the landscape has changed a lot since the last time I was on the market. It seems like leetcode style interviews have become way more common. I’ve already failed or barely passed several…
-
How long should I stay at my first job if I dislike the city?
How long should I stay at my first job if I dislike the city? I recently just got my bachelors from Berkeley in data science, and I recently started a new job in Boston. I’m super grateful for this job opportunity because I applied to probably 1k jobs and this was the only good offer…
-
Indeed’s Job Report Shows 13% YoY Drop in Data & Analytics Roles
Indeed’s Job Report Shows 13% YoY Drop in Data & Analytics Roles “Roles like business analyst, data analyst, data scientist, and BI developer are drawing large talent pools that outpace the number of job postings, creating a fiercely competitive market.” do you agree with these findings – are data & analytics roles the hardest-hit in…
-
Will there be a discount for Physical O’Reilly Media books?
Will there be a discount for Physical O’Reilly Media books? Will there be a discount for Physical O’Reilly Media books? Hello. Not sure if this is the best place to post this question so let me know. Does anyone know if there will be some Black Friday discount for Physical O’Reilly Media books somewhere? I…
-
Learning Triton One Kernel at a Time: Softmax
Learning Triton One Kernel at a Time: Softmax All you need to know about a fast, readable and PyTorch-ready softmax kernel The post Learning Triton One Kernel at a Time: Softmax appeared first on Towards Data Science. Ryan Pégoud Go to original source
-
Your Next ‘Large’ Language Model Might Not Be Large After All
Your Next ‘Large’ Language Model Might Not Be Large After All A 27M-parameter model just outperformed giants like DeepSeek R1, o3-mini, and Claude 3.7 on reasoning tasks The post Your Next ‘Large’ Language Model Might Not Be Large After All appeared first on Towards Data Science. Moulik Gupta Go to original source
-
Empirical Mode Decomposition: The Most Intuitive Way to Decompose Complex Signals and Time Series
Empirical Mode Decomposition: The Most Intuitive Way to Decompose Complex Signals and Time Series A step-by-step breakdown of empirical mode decomposition to help you extract patterns from time series The post Empirical Mode Decomposition: The Most Intuitive Way to Decompose Complex Signals and Time Series appeared first on Towards Data Science. Sabrine Bendimerad Go to…
-
Overfitting vs. Underfitting: Making Sense of the Bias-Variance Trade-Off
Overfitting vs. Underfitting: Making Sense of the Bias-Variance Trade-Off The best models live in the sweet spot: generalizing well, learning enough, but not too much The post Overfitting vs. Underfitting: Making Sense of the Bias-Variance Trade-Off appeared first on Towards Data Science. Frida Karvouni Go to original source
-
Modern DataFrames in Python: A Hands-On Tutorial with Polars and DuckDB
Modern DataFrames in Python: A Hands-On Tutorial with Polars and DuckDB How I learned to handle growing datasets without slowing down my entire workflow The post Modern DataFrames in Python: A Hands-On Tutorial with Polars and DuckDB appeared first on Towards Data Science. Benjamin Nweke Go to original source
-
How To Build a Graph-Based Recommendation Engine Using EDG and Neo4j
How To Build a Graph-Based Recommendation Engine Using EDG and Neo4j Use a shared taxonomy to connect RDF and property graphs—and power smarter recommendations with inferencing The post How To Build a Graph-Based Recommendation Engine Using EDG and Neo4j appeared first on Towards Data Science. Steve Hedden Go to original source
-
Natural Language Visualization and the Future of Data Analysis and Presentation
Natural Language Visualization and the Future of Data Analysis and Presentation Will conversational interaction replace SQL queries, KPI reports, and dashboards? The post Natural Language Visualization and the Future of Data Analysis and Presentation appeared first on Towards Data Science. Michal Szudejko Go to original source
-
Generative AI Will Redesign Cars, But Not the Way Automakers Think
Generative AI Will Redesign Cars, But Not the Way Automakers Think Traditional manufacturers are using revolutionary technology for incremental optimization instead of fundamental re-imagination The post Generative AI Will Redesign Cars, But Not the Way Automakers Think appeared first on Towards Data Science. Nishant Arora Go to original source
-
TDS Newsletter: How to Build Robust Data and AI Systems
TDS Newsletter: How to Build Robust Data and AI Systems Many practitioners like to jump headfirst into the nitty-gritty details of implementing AI-powered tools. We get it: tinkering your way into a solution can sometimes save you time, and it’s often a fun way to go about learning. As the articles we’re highlighting this week show,…
-
Atlas Gaussian processes on restricted domains and point clouds
Atlas Gaussian processes on restricted domains and point clouds arXiv:2511.15822v1 Announce Type: new Abstract: In real-world applications, data often reside in restricted domains with unknown boundaries, or as high-dimensional point clouds lying on a lower-dimensional, nontrivial, unknown manifold. Traditional Gaussian Processes (GPs) struggle to capture the underlying geometry in such settings. Some existing methods assume…
-
Angular Graph Fractional Fourier Transform: Theory and Application
Angular Graph Fractional Fourier Transform: Theory and Application arXiv:2511.16111v1 Announce Type: new Abstract: Graph spectral representations are fundamental in graph signal processing, offering a rigorous framework for analyzing and processing graph-structured data. The graph fractional Fourier transform (GFRFT) extends the classical graph Fourier transform (GFT) with a fractional-order parameter, enabling flexible spectral analysis while preserving…
-
Time dependent loss reweighting for flow matching and diffusion models is theoretically justified
Time dependent loss reweighting for flow matching and diffusion models is theoretically justified arXiv:2511.16599v1 Announce Type: new Abstract: This brief note clarifies that, in Generator Matching (which subsumes a large family of flow matching and diffusion models over continuous, manifold, and discrete spaces), both the Bregman divergence loss and the linear parameterization of the generator…
-
Spectral Identifiability for Interpretable Probe Geometry
Spectral Identifiability for Interpretable Probe Geometry arXiv:2511.16288v1 Announce Type: new Abstract: Linear probes are widely used to interpret and evaluate neural representations, yet their reliability remains unclear, as probes may appear accurate in some regimes but collapse unpredictably in others. We uncover a spectral mechanism behind this phenomenon and formalize it as the Spectral Identifiability…
-
Rate-optimal community detection near the KS threshold via node-robust algorithms
Rate-optimal community detection near the KS threshold via node-robust algorithms arXiv:2511.16613v1 Announce Type: new Abstract: We study community detection in the emph{symmetric $k$-stochastic block model}, where $n$ nodes are evenly partitioned into $k$ clusters with intra- and inter-cluster connection probabilities $p$ and $q$, respectively. Our main result is a polynomial-time algorithm that achieves the minimax-optimal…
-
How to Use Gemini 3 Pro Efficiently
How to Use Gemini 3 Pro Efficiently Learn the pros and cons of Gemini 3 Pro, from testing with both coding and console usage The post How to Use Gemini 3 Pro Efficiently appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
Data Visualization Explained (Part 5): Visualizing Time-Series Data in Python (Matplotlib, Plotly, and Altair)
Data Visualization Explained (Part 5): Visualizing Time-Series Data in Python (Matplotlib, Plotly, and Altair) An explanation of time-series visualization, including in-depth code examples in Matplotlib, Plotly, and Altair. The post Data Visualization Explained (Part 5): Visualizing Time-Series Data in Python (Matplotlib, Plotly, and Altair) appeared first on Towards Data Science. Murtaza Ali Go to original…
-
How Relevance Models Foreshadowed Transformers for NLP
How Relevance Models Foreshadowed Transformers for NLP Tracing the history of LLM attention: standing on the shoulders of giants The post How Relevance Models Foreshadowed Transformers for NLP appeared first on Towards Data Science. Sean Moran Go to original source
-
Why I’m Making the Switch to marimo Notebooks
Why I’m Making the Switch to marimo Notebooks A fresh way to think about computational notebooks The post Why I’m Making the Switch to marimo Notebooks appeared first on Towards Data Science. Parul Pandey Go to original source
-
Convex Clustering Redefined: Robust Learning with the Median of Means Estimator
Convex Clustering Redefined: Robust Learning with the Median of Means Estimator arXiv:2511.14784v1 Announce Type: new Abstract: Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the…
-
Implicit Bias of the JKO Scheme
Implicit Bias of the JKO Scheme arXiv:2511.14827v1 Announce Type: new Abstract: Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $eta>0$ a sequence of probability distributions $rho_k^eta$ that…
-
Latent space analysis and generalization to out-of-distribution data
Latent space analysis and generalization to out-of-distribution data arXiv:2511.15010v1 Announce Type: new Abstract: Understanding the relationships between data points in the latent decision space derived by the deep learning system is critical to evaluating and interpreting the performance of the system on real world data. Detecting textit{out-of-distribution} (OOD) data for deep learning systems continues to…
-
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit arXiv:2511.15120v1 Announce Type: new Abstract: In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbol{x})=g(boldsymbol{U}boldsymbol{x})$ with hidden subspace $boldsymbol{U}in mathbb{R}^{rtimes d}$, which is the…
-
Beyond Uncertainty Sets: Leveraging Optimal Transport to Extend Conformal Predictive Distribution to Multivariate Settings
Beyond Uncertainty Sets: Leveraging Optimal Transport to Extend Conformal Predictive Distribution to Multivariate Settings arXiv:2511.15146v1 Announce Type: new Abstract: Conformal prediction (CP) constructs uncertainty sets for model outputs with finite-sample coverage guarantees. A candidate output is included in the prediction set if its non-conformity score is not considered extreme relative to the scores observed on…
-
How to Perform Agentic Information Retrieval
How to Perform Agentic Information Retrieval Learn how to utilize AI agents to find information in your document corpus The post How to Perform Agentic Information Retrieval appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch
PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch Hands-on PyTorch: Building a 3-layer neural network for multiple regression The post PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch appeared first on Towards Data Science. Gustavo Santos Go to original source
-
Making Smarter Bets: Towards a Winning AI Strategy with Probabilistic Thinking
Making Smarter Bets: Towards a Winning AI Strategy with Probabilistic Thinking Practical guidance on identifying opportunities, managing product portfolios, and overcoming behavioral biases The post Making Smarter Bets: Towards a Winning AI Strategy with Probabilistic Thinking appeared first on Towards Data Science. Chinmay Kakatkar Go to original source
-
Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands
Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands arXiv:2511.13911v1 Announce Type: new Abstract: Despite recent progress in predicting biomarker trajectories from real clinical data, uncertainty in the predictions poses high-stakes risks (e.g., misdiagnosis) that limit their clinical deployment. To enable safe and reliable use of such predictions in healthcare, we introduce a conformal method…
-
Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants
Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants arXiv:2511.13763v1 Announce Type: new Abstract: We study how two information feeds, a closed-form Markov estimator of residual sojourn and an online trained actor-critic, affect reneging and jockeying in a dual M/M/1 system. Analytically, for unequal service rates and total-time patience, we show that total wait…
-
Empirical Likelihood for Random Forests and Ensembles
Empirical Likelihood for Random Forests and Ensembles arXiv:2511.13934v1 Announce Type: new Abstract: We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling…
-
Splat Regression Models
Splat Regression Models arXiv:2511.14042v1 Announce Type: new Abstract: We introduce a highly expressive class of function approximators called Splat Regression Models. Model outputs are mixtures of heterogeneous and anisotropic bump functions, termed splats, each weighted by an output vector. The power of splat modeling lies in its ability to locally adjust the scale and direction…
-
SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation
SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation arXiv:2511.14146v1 Announce Type: new Abstract: We propose a distributionally robust formulation for simultaneously estimating the covariance matrix and the precision matrix of a random vector.The proposed model minimizes the worst-case weighted sum of the Frobenius loss of the covariance estimator and Stein’s loss of the precision…
-
How to Build an Over-Engineered Retrieval System
How to Build an Over-Engineered Retrieval System Which is actually how some people do it The post How to Build an Over-Engineered Retrieval System appeared first on Towards Data Science. Ida Silfverskiöld Go to original source
-
Why LLMs Aren’t a One-Size-Fits-All Solution for Enterprises
Why LLMs Aren’t a One-Size-Fits-All Solution for Enterprises LLMs are a seamless way to find value in your unstructured data, but the truth is, there is so much more value hidden within your structured data. This post explores what LLMs are (and aren’t) optimized for and how the industry is approaching AI over structured business…
-
Introducing Google’s File Search Tool
Introducing Google’s File Search Tool The search giant fires its latest salvo against traditional RAG processing. The post Introducing Google’s File Search Tool appeared first on Towards Data Science. Thomas Reid Go to original source
-
Generalized Inequality-based Approach for Probabilistic WCET Estimation
Generalized Inequality-based Approach for Probabilistic WCET Estimation arXiv:2511.11682v1 Announce Type: new Abstract: Estimating the probabilistic Worst-Case Execution Time (pWCET) is essential for ensuring the timing correctness of real-time applications, such as in robot IoT systems and autonomous driving systems. While methods based on Extreme Value Theory (EVT) can provide tight bounds, they suffer from model…
-
FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition
FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition arXiv:2511.11817v1 Announce Type: new Abstract: Time series forecasting is essential in a wide range of real world applications. Recently, frequency-domain methods have attracted increasing interest for their ability to capture global dependencies. However, when applied to non-stationary time series, these methods encounter the $textit{spectral…
-
PCA recovery thresholds in low-rank matrix inference with sparse noise
PCA recovery thresholds in low-rank matrix inference with sparse noise arXiv:2511.11927v1 Announce Type: new Abstract: We study the high-dimensional inference of a rank-one signal corrupted by sparse noise. The noise is modelled as the adjacency matrix of a weighted undirected graph with finite average connectivity in the large size limit. Using the replica method from…
-
Bayesian–AI Fusion for Epidemiological Decision Making: Calibrated Risk, Honest Uncertainty, and Hyperparameter Intelligence
Bayesian–AI Fusion for Epidemiological Decision Making: Calibrated Risk, Honest Uncertainty, and Hyperparameter Intelligence arXiv:2511.11983v1 Announce Type: new Abstract: Modern epidemiological analytics increasingly use machine learning models that offer strong prediction but often lack calibrated uncertainty. Bayesian methods provide principled uncertainty quantification, yet are viewed as difficult to integrate with contemporary AI workflows. This paper proposes…
-
PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning
PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning arXiv:2511.12278v1 Announce Type: new Abstract: High-dimensional data often contain low-dimensional signals obscured by structured background noise, which limits the effectiveness of standard PCA. Motivated by contrastive learning, we address the problem of recovering shared signal subspaces from positive pairs, paired observations sharing the same…
-
Understanding Convolutional Neural Networks (CNNs) Through Excel
Understanding Convolutional Neural Networks (CNNs) Through Excel Deep learning is often seen as a black box. We know that it learns from data, but we rarely stop to ask how it truly learns. What if we could open that box and watch each step happen right before our eyes? With Excel, we can do exactly…
-
Javascript Fatigue: HTMX Is All You Need to Build ChatGPT — Part 2
Javascript Fatigue: HTMX Is All You Need to Build ChatGPT — Part 2 In part 1, we showed how we could leverage HTMX to add interactivity to our HTML elements. In other words, Javascript without Javascript. To illustrate that, we began building a simple chat that would return a simulated LLM response. In this article,…
-
Introducing ShaTS: A Shapley-Based Method for Time-Series Models
Introducing ShaTS: A Shapley-Based Method for Time-Series Models Why you should not explain your time-series data with tabular Shapley methods The post Introducing ShaTS: A Shapley-Based Method for Time-Series Models appeared first on Towards Data Science. Manuel Franco de la Peña Go to original source
-
The Absolute Beginner’s Guide to Pandas DataFrames
The Absolute Beginner’s Guide to Pandas DataFrames Learn how to initialize dataframes from dictionaries, lists, and NumPy arrays The post The Absolute Beginner’s Guide to Pandas DataFrames appeared first on Towards Data Science. Ibrahim Salami Go to original source
-
Javascript Fatigue: HTMX is all you need to build ChatGPT — Part 1
Javascript Fatigue: HTMX is all you need to build ChatGPT — Part 1 Building a chatbot (almost) without Javascript, only with Python and HTML. The post Javascript Fatigue: HTMX is all you need to build ChatGPT — Part 1 appeared first on Towards Data Science. Benjamin Etienne Go to original source
-
Neural Local Wasserstein Regression
Neural Local Wasserstein Regression arXiv:2511.10824v1 Announce Type: new Abstract: We study the estimation problem of distribution-on-distribution regression, where both predictors and responses are probability measures. Existing approaches typically rely on a global optimal transport map or tangent-space linearization, which can be restrictive in approximation capacity and distort geometry in multivariate underlying domains. In this paper,…
-
Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data
Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data arXiv:2511.10919v1 Announce Type: new Abstract: Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy constraints, we propose a novel transfer learning with…
-
Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths
Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths arXiv:2511.11161v1 Announce Type: new Abstract: This paper addresses the nonparametric estimation of the drift function over a compact domain for a time-homogeneous diffusion process, based on high-frequency discrete observations from $N$ independent trajectories. We propose a neural network-based estimator and derive…
-
Decomposing Direct and Indirect Biases in Linear Models under Demographic Parity Constraint
Decomposing Direct and Indirect Biases in Linear Models under Demographic Parity Constraint arXiv:2511.11294v1 Announce Type: new Abstract: Linear models are widely used in high-stakes decision-making due to their simplicity and interpretability. Yet when fairness constraints such as demographic parity are introduced, their effects on model coefficients, and thus on how predictive bias is distributed across…
-
Bayesian Evaluation of Large Language Model Behavior
Bayesian Evaluation of Large Language Model Behavior arXiv:2511.10661v1 Announce Type: cross Abstract: It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts…
-
Weekly Entering & Transitioning – Thread 17 Nov, 2025 – 24 Nov, 2025
Weekly Entering & Transitioning – Thread 17 Nov, 2025 – 24 Nov, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…