Category: aimldsaimlds

  • Robust variational neural posterior estimation for simulation-based inference

    Robust variational neural posterior estimation for simulation-based inference arXiv:2509.05724v1 Announce Type: new Abstract: Recent advances in neural density estimation have enabled powerful simulation-based inference (SBI) methods that can flexibly approximate Bayesian inference for intractable stochastic models. Although these methods have demonstrated reliable posterior estimation when the simulator accurately represents the underlying data generative process (GDP),…

  • Risk-averse Fair Multi-class Classification

    Risk-averse Fair Multi-class Classification arXiv:2509.05771v1 Announce Type: new Abstract: We develop a new classification framework based on the theory of coherent risk measures and systemic risk. The proposed approach is suitable for multi-class problems when the data is noisy, scarce (relative to the dimension of the problem), and the labeling might be unreliable. In the…

  • Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation

    Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation arXiv:2509.05852v1 Announce Type: new Abstract: Motivated by the need for rigorous and scalable evaluation of large language models, we study contextual preference inference for pairwise comparison functionals of context-dependent preference score functions across domains. Focusing on the contextual Bradley-Terry-Luce model, we develop…

  • Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery

    Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery arXiv:2509.05775v1 Announce Type: new Abstract: Estimating heterogeneous treatment effects is critical in domains such as personalized medicine, resource allocation, and policy evaluation. A central challenge lies in identifying subpopulations that respond differently to interventions, thereby enabling more targeted and effective decision-making. While clustering methods…

  • Implementing the Gaussian Challenge in Python

    Implementing the Gaussian Challenge in Python Beginner-friendly tutorial to understand range function and Python loops The post Implementing the Gaussian Challenge in Python appeared first on Towards Data Science. Mahnoor Javed Go to original source

  • Agentic AI and the Future of Python Project Management Tooling

    Agentic AI and the Future of Python Project Management Tooling Introducing a pyramid framework of evolution, accelerating and decelerating factors, and strategic recommendations for incumbents and new entrants The post Agentic AI and the Future of Python Project Management Tooling appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

  • From Tokens to Theorems: Building a Neuro-Symbolic AI Mathematician

    From Tokens to Theorems: Building a Neuro-Symbolic AI Mathematician The next Gauss may not be born — they may be spun up in the cloud The post From Tokens to Theorems: Building a Neuro-Symbolic AI Mathematician appeared first on Towards Data Science. Sean Moran Go to original source

  • The End-to-End Data Scientist’s Prompt Playbook

    The End-to-End Data Scientist’s Prompt Playbook Part 3: Prompts for docs, DevOps, and stakeholder communication The post The End-to-End Data Scientist’s Prompt Playbook appeared first on Towards Data Science. Sara Nobrega Go to original source

  • Implementing the Coffee Machine in Python

    Implementing the Coffee Machine in Python A beginner-friendly step-by-step guide to coding a Coffee Maker in Python The post Implementing the Coffee Machine in Python appeared first on Towards Data Science. Mahnoor Javed Go to original source

  • Any-Step Density Ratio Estimation via Interval-Annealed Secant Alignment

    Any-Step Density Ratio Estimation via Interval-Annealed Secant Alignment arXiv:2509.04852v1 Announce Type: new Abstract: Estimating density ratios is a fundamental problem in machine learning, but existing methods often trade off accuracy for efficiency. We propose textit{Interval-annealed Secant Alignment Density Ratio Estimation (ISA-DRE)}, a framework that enables accurate, any-step estimation without numerical integration. Instead of modeling infinitesimal…

  • Optimal Variance and Covariance Estimation under Differential Privacy in the Add-Remove Model and Beyond

    Optimal Variance and Covariance Estimation under Differential Privacy in the Add-Remove Model and Beyond arXiv:2509.04919v1 Announce Type: new Abstract: In this paper, we study the problem of estimating the variance and covariance of datasets under differential privacy in the add-remove model. While estimation in the swap model has been extensively studied in the literature, the…

  • Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations

    Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations arXiv:2509.05186v1 Announce Type: new Abstract: In-context operator networks (ICON) are a class of operator learning methods based on the novel architectures of foundation models. Trained on a diverse set of datasets of initial and boundary conditions paired with corresponding solutions to…

  • Spectral Algorithms in Misspecified Regression: Convergence under Covariate Shift

    Spectral Algorithms in Misspecified Regression: Convergence under Covariate Shift arXiv:2509.05106v1 Announce Type: new Abstract: This paper investigates the convergence properties of spectral algorithms — a class of regularization methods originating from inverse problems — under covariate shift. In this setting, the marginal distributions of inputs differ between source and target domains, while the conditional distribution…

  • Fundamental bounds on efficiency-confidence trade-off for transductive conformal prediction

    Fundamental bounds on efficiency-confidence trade-off for transductive conformal prediction arXiv:2509.04631v1 Announce Type: cross Abstract: Transductive conformal prediction addresses the simultaneous prediction for multiple data points. Given a desired confidence level, the objective is to construct a prediction set that includes the true outcomes with the prescribed confidence. We demonstrate a fundamental trade-off between confidence and…

  • Weekly Entering & Transitioning – Thread 08 Sep, 2025 – 15 Sep, 2025

    Weekly Entering & Transitioning – Thread 08 Sep, 2025 – 15 Sep, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • 🚀 Perpetual ML Suite: Now Live on the Snowflake Marketplace!

    🚀 Perpetual ML Suite: Now Live on the Snowflake Marketplace! submitted by /u/mutlu_simsek [link] [comments] /u/mutlu_simsek Go to original source

  • Europe Salary Thread 2025 – What’s your role and salary?

    Europe Salary Thread 2025 – What’s your role and salary? The yearly Europe-centric salary thread. You can find the last one here: https://old.reddit.com/r/datascience/comments/1fxrmzl/europe_salary_thread_2024_whats_your_role_and/ I think it’s worthwhile to learn from one another and see what different flavours of data scientists, analysts and engineers are out there in the wild. In my opinion, this is especially…

  • Help me evaluate a new job offer – Stay or go?

    Help me evaluate a new job offer – Stay or go? Hi all, I’m having a really hard time deciding whether or not to take an offer I’ve recently received, would really appreciate some advice and a sense check. For context I generally feel my current role is comfortable but i’m starting to plateau after…

  • How to evaluate data transformations?

    How to evaluate data transformations? There are several well-established benchmarks for text-to-SQL tasks like BIRD, Spider, and WikiSQL. However, I’m working on a data transformation system that handles per-row transformations with contextual understanding of the input data. The challenge is that most existing benchmarks focus on either: Pure SQL generation (BIRD, Spider) Simple data cleaning…

  • The Beauty of Space-Filling Curves: Understanding the Hilbert Curve

    The Beauty of Space-Filling Curves: Understanding the Hilbert Curve A quick journey from theory to implementation and application The post The Beauty of Space-Filling Curves: Understanding the Hilbert Curve appeared first on Towards Data Science. Paul Fröhling Go to original source

  • Preventing Context Overload: Controlled Neo4j MCP Cypher Responses for LLMs

    Preventing Context Overload: Controlled Neo4j MCP Cypher Responses for LLMs How timeouts, truncation, and result sanitization keep Cypher outputs LLM-ready The post Preventing Context Overload: Controlled Neo4j MCP Cypher Responses for LLMs appeared first on Towards Data Science. Tomaz Bratanic Go to original source

  • Hands-On with Agents SDK: Safeguarding Input and Output with Guardrails

    Hands-On with Agents SDK: Safeguarding Input and Output with Guardrails A practical exploration of how guardrails safeguard multi-agent systems in Python using OpenAI Agents SDK, Streamlit, and Pydantic The post Hands-On with Agents SDK: Safeguarding Input and Output with Guardrails appeared first on Towards Data Science. Iqbal Rahmadhan Go to original source

  • Extracting Structured Data with LangExtract: A Deep Dive into LLM-Orchestrated Workflows

    Extracting Structured Data with LangExtract: A Deep Dive into LLM-Orchestrated Workflows A guide to building modular workflows for structured intelligence The post Extracting Structured Data with LangExtract: A Deep Dive into LLM-Orchestrated Workflows appeared first on Towards Data Science. Subha Ganapathi Go to original source

  • How to Context Engineer to Optimize Question Answering Pipelines

    How to Context Engineer to Optimize Question Answering Pipelines Learn how to apply context engineering to enhance your question answering systems. The post How to Context Engineer to Optimize Question Answering Pipelines appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • Showcasing Your Work on HuggingFace Spaces

    Showcasing Your Work on HuggingFace Spaces Building an app is exciting – but sharing it is where the real value kicks in. Back when Heroku offered a free tier, deploying demos was effortless. Those days are gone, and finding a simple, free way to showcase machine learning apps has become harder. That’s where Hugging Face…

  • AI Operations Under the Hood: Challenges and Best Practices

    AI Operations Under the Hood: Challenges and Best Practices Building robust, reproducible, and reliable GenAI applications requires a framework of continuous improvement, rigorous evaluation, and systematic validation The post AI Operations Under the Hood: Challenges and Best Practices appeared first on Towards Data Science. Erika G. Gonçalves Go to original source

  • Zero-Inflated Data: A Comparison of Regression Models

    Zero-Inflated Data: A Comparison of Regression Models How to detect it and which model to choose. The post Zero-Inflated Data: A Comparison of Regression Models appeared first on Towards Data Science. Arnaud Capitaine Go to original source

  • Tool Masking: The Layer MCP Forgot

    Tool Masking: The Layer MCP Forgot Tool masking for AI improves AI agents: shape MCP tool surfaces to cut tokens and errors, boost speed and reliability. Start prompt engineering your tools The post Tool Masking: The Layer MCP Forgot appeared first on Towards Data Science. Frank Wittkampf Go to original source

  • Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling

    Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling arXiv:2509.03726v1 Announce Type: new Abstract: Sampling from unnormalized target distributions, e.g. Boltzmann distributions $mu_{text{target}}(x) propto exp(-E(x)/T)$, is fundamental to many scientific applications yet computationally challenging due to complex, high-dimensional energy landscapes. Existing approaches applying modern generative models to Boltzmann distributions either require…

  • Testing for correlation between network structure and high-dimensional node covariates

    Testing for correlation between network structure and high-dimensional node covariates arXiv:2509.03772v1 Announce Type: new Abstract: In many application domains, networks are observed with node-level features. In such settings, a common problem is to assess whether or not nodal covariates are correlated with the network structure itself. Here, we present four novel methods for addressing this…

  • Diffusion Generative Models Meet Compressed Sensing, with Applications to Image Data and Financial Time Series

    Diffusion Generative Models Meet Compressed Sensing, with Applications to Image Data and Financial Time Series arXiv:2509.03898v1 Announce Type: new Abstract: This paper develops dimension reduction techniques for accelerating diffusion model inference in the context of synthetic data generation. The idea is to integrate compressed sensing into diffusion models: (i) compress the data into a latent…

  • Batched Stochastic Matching Bandits

    Batched Stochastic Matching Bandits arXiv:2509.04194v1 Announce Type: new Abstract: In this study, we introduce a novel bandit framework for stochastic matching based on the Multi-nomial Logit (MNL) choice model. In our setting, $N$ agents on one side are assigned to $K$ arms on the other side, where each arm stochastically selects an agent from its…

  • An invertible generative model for forward and inverse problems

    An invertible generative model for forward and inverse problems arXiv:2509.03910v1 Announce Type: new Abstract: We formulate the inverse problem in a Bayesian framework and aim to train a generative model that allows us to simulate (i.e., sample from the likelihood) and do inference (i.e., sample from the posterior). We review the use of triangular normalizing…

  • Should We Use LLMs As If They Were Swiss Knives?

    Should We Use LLMs As If They Were Swiss Knives? A logic game performance comparison between popular LLMs and a custom-made algorithm The post Should We Use LLMs As If They Were Swiss Knives? appeared first on Towards Data Science. Nicolas Garcia Aramouni Go to original source

  • A Visual Guide to Tuning Random Forest Hyperparameters

    A Visual Guide to Tuning Random Forest Hyperparameters How hyperparameter tuning visually changes random forests The post A Visual Guide to Tuning Random Forest Hyperparameters appeared first on Towards Data Science. James Gibbins Go to original source

  • MobileNetV1 Paper Walkthrough: The Tiny Giant

    MobileNetV1 Paper Walkthrough: The Tiny Giant Understanding and implementing MobileNetV1 from scratch with PyTorch The post MobileNetV1 Paper Walkthrough: The Tiny Giant appeared first on Towards Data Science. Muhammad Ardi Go to original source

  • Using LangGraph and MCP Servers to Create My Own Voice Assistant

    Using LangGraph and MCP Servers to Create My Own Voice Assistant Built over 14 days, all locally run, no API keys, cloud services, or subscription fees. The post Using LangGraph and MCP Servers to Create My Own Voice Assistant appeared first on Towards Data Science. Benjamin Lee Go to original source

  • Boosting Your Anomaly Detection With LLMs

    Boosting Your Anomaly Detection With LLMs The 7 emerging application patterns you should know The post Boosting Your Anomaly Detection With LLMs appeared first on Towards Data Science. Shuai Guo Go to original source

  • Fast kernel methods: Sobolev, physics-informed, and additive models

    Fast kernel methods: Sobolev, physics-informed, and additive models arXiv:2509.02649v1 Announce Type: new Abstract: Kernel methods are powerful tools in statistical learning, but their cubic complexity in the sample size n limits their use on large-scale datasets. In this work, we introduce a scalable framework for kernel regression with O(n log n) complexity, fully leveraging GPU…

  • Gaussian process surrogate with physical law-corrected prior for multi-coupled PDEs defined on irregular geometry

    Gaussian process surrogate with physical law-corrected prior for multi-coupled PDEs defined on irregular geometry arXiv:2509.02617v1 Announce Type: new Abstract: Parametric partial differential equations (PDEs) are fundamental mathematical tools for modeling complex physical systems, yet their numerical evaluation across parameter spaces remains computationally intensive when using conventional high-fidelity solvers. To address this challenge, we propose a…

  • Scale-Adaptive Generative Flows for Multiscale Scientific Data

    Scale-Adaptive Generative Flows for Multiscale Scientific Data arXiv:2509.02971v1 Announce Type: new Abstract: Flow-based generative models can face significant challenges when modeling scientific data with multiscale Fourier spectra, often producing large errors in fine-scale features. We address this problem within the framework of stochastic interpolants, via principled design of noise distributions and interpolation schedules. The key…

  • Bayesian Additive Regression Trees for functional ANOVA model

    Bayesian Additive Regression Trees for functional ANOVA model arXiv:2509.03317v1 Announce Type: new Abstract: Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and interactions among predictors. However, the accuracy of BART often comes at…

  • Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization

    Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization arXiv:2509.03378v1 Announce Type: new Abstract: As an adaptive method, Shampoo employs a structured second-moment estimation, and its effectiveness has attracted growing attention. Prior work has primarily analyzed its estimation scheme through the Frobenius norm. Motivated by the natural connection between the second moment and a covariance…

  • Useful Python Libraries You Might Not Have Heard Of:  Freezegun

    Useful Python Libraries You Might Not Have Heard Of:  Freezegun Bring time to a standstill in your Python tests The post Useful Python Libraries You Might Not Have Heard Of:  Freezegun appeared first on Towards Data Science. Thomas Reid Go to original source

  • AI FOMO, Shadow AI, and Other Business Problems

    AI FOMO, Shadow AI, and Other Business Problems What’s the state of AI in business these days, and how much does it cost us? The post AI FOMO, Shadow AI, and Other Business Problems appeared first on Towards Data Science. Stephanie Kirmer Go to original source

  • Hands On Time Series Modeling of Rare Events, with Python

    Hands On Time Series Modeling of Rare Events, with Python This is how to model rare events occurrences in a time series in a few lines of code The post Hands On Time Series Modeling of Rare Events, with Python appeared first on Towards Data Science. Piero Paialunga Go to original source

  • Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2

    Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2 The Ornstein-Uhlenbeck process in Python The post Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2 appeared first on Towards Data Science. Marco Hening Tallarico Go to original source

  • What Being a Data Scientist at a Startup Really Looks Like

    What Being a Data Scientist at a Startup Really Looks Like What I learned about growth, visibility, and chaos over the past five years The post What Being a Data Scientist at a Startup Really Looks Like appeared first on Towards Data Science. Yu Dong Go to original source

  • Simulation-based inference of yeast centromeres

    Simulation-based inference of yeast centromeres arXiv:2509.00200v1 Announce Type: new Abstract: The chromatin folding and the spatial arrangement of chromosomes in the cell play a crucial role in DNA replication and genes expression. An improper chromatin folding could lead to malfunctions and, over time, diseases. For eukaryotes, centromeres are essential for proper chromosome segregation and folding.…

  • Probit Monotone BART

    Probit Monotone BART arXiv:2509.00263v1 Announce Type: new Abstract: Bayesian Additive Regression Trees (BART) of Chipman et al. (2010) has proven to be a powerful tool for nonparametric modeling and prediction. Monotone BART (Chipman et al., 2022) is a recent development that allows BART to be more precise in estimating monotonic functions. We further these developments…

  • Assessing One-Dimensional Cluster Stability by Extreme-Point Trimming

    Assessing One-Dimensional Cluster Stability by Extreme-Point Trimming arXiv:2509.00258v1 Announce Type: new Abstract: We develop a probabilistic method for assessing the tail behavior and geometric stability of one-dimensional n i.i.d. samples by tracking how their span contracts when the most extreme points are trimmed. Central to our approach is the diameter-shrinkage ratio, that quantifies the relative…

  • The Nondecreasing Rank

    The Nondecreasing Rank arXiv:2509.00265v1 Announce Type: new Abstract: In this article the notion of the nondecreasing (ND) rank of a matrix or tensor is introduced. A tensor has an ND rank of r if it can be represented as a sum of r outer products of vectors, with each vector satisfying a monotonicity constraint. It…

  • Partial Functional Dynamic Backdoor Diffusion-based Causal Model

    Partial Functional Dynamic Backdoor Diffusion-based Causal Model arXiv:2509.00472v1 Announce Type: new Abstract: We introduce a Partial Functional Dynamic Backdoor Diffusion-based Causal Model (PFD-BDCM), specifically designed for causal inference in the presence of unmeasured confounders with spatial heterogeneity and temporal dependency. The proposed PFD-BDCM framework addresses the restrictions of the existing approaches by uniquely integrating models…

  • A Deep Dive into RabbitMQ & Python’s Celery: How to Optimise Your Queues

    A Deep Dive into RabbitMQ & Python’s Celery: How to Optimise Your Queues Key lessons I’ve learned running RabbitMQ + Celery in production The post A Deep Dive into RabbitMQ & Python’s Celery: How to Optimise Your Queues appeared first on Towards Data Science. Clara Chong Go to original source

  • Implementing the Caesar Cipher in Python

    Implementing the Caesar Cipher in Python Julius Caesar was a Roman ruler known for his military strategies and excellent leadership. Named after him, the Caesar Cipher is a fascinating cryptographic technique that Julius Caesar employed to send secret signals and messages to his military personnel. The Caesar Cipher is quite basic in its working. It…

  • How to Scale Your AI Search to Handle 10M Queries with 5 Powerful Techniques

    How to Scale Your AI Search to Handle 10M Queries with 5 Powerful Techniques Optimize your AI search with RAG, contextual retrieval and evaluations The post How to Scale Your AI Search to Handle 10M Queries with 5 Powerful Techniques appeared first on Towards Data Science. Eivind Kjosbakken Go to original source

  • What is Universality in LLMs? How to Find Universal Neurons

    What is Universality in LLMs? How to Find Universal Neurons How independently trained transformers form same the neurons The post What is Universality in LLMs? How to Find Universal Neurons appeared first on Towards Data Science. Shuyang Go to original source

  • 3 Greedy Algorithms for Decision Trees, Explained with Examples

    3 Greedy Algorithms for Decision Trees, Explained with Examples Learn the inner workings of decision trees The post 3 Greedy Algorithms for Decision Trees, Explained with Examples appeared first on Towards Data Science. Kuriko Iwai Go to original source

  • The Generalist: The New All-Around Type of Data Professional?

    The Generalist: The New All-Around Type of Data Professional? Is over-specialization ending and are data generalists on the rise? The post The Generalist: The New All-Around Type of Data Professional? appeared first on Towards Data Science. Loizos Loizou Go to original source

  • Quantum-inspired probability metrics define a complete, universal space for statistical learning

    Quantum-inspired probability metrics define a complete, universal space for statistical learning arXiv:2508.21086v1 Announce Type: new Abstract: Comparing probability distributions is a core challenge across the natural, social, and computational sciences. Existing methods, such as Maximum Mean Discrepancy (MMD), struggle in high-dimensional and non-compact domains. Here we introduce quantum probability metrics (QPMs), derived by embedding probability…

  • Weighted Support Points from Random Measures: An Interpretable Alternative for Generative Modeling

    Weighted Support Points from Random Measures: An Interpretable Alternative for Generative Modeling arXiv:2508.21255v1 Announce Type: new Abstract: Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer…

  • Adaptive generative moment matching networks for improved learning of dependence structures

    Adaptive generative moment matching networks for improved learning of dependence structures arXiv:2508.21531v1 Announce Type: new Abstract: An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and its ability to improve the learning of copula random number generators is demonstrated. Based…

  • Privacy Auditing Synthetic Data Release through Local Likelihood Attacks

    Privacy Auditing Synthetic Data Release through Local Likelihood Attacks arXiv:2508.21146v1 Announce Type: cross Abstract: Auditing the privacy leakage of synthetic data is an important but unresolved problem. Most existing privacy auditing frameworks for synthetic data rely on heuristics and unreasonable assumptions to attack the failure modes of generative models, exhibiting limited capability to describe and…

  • BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

    BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design arXiv:2508.21184v1 Announce Type: cross Abstract: We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to…

  • Weekly Entering & Transitioning – Thread 01 Sep, 2025 – 08 Sep, 2025

    Weekly Entering & Transitioning – Thread 01 Sep, 2025 – 08 Sep, 2025 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

  • How do I prepare for my data science job as a new grad?

    How do I prepare for my data science job as a new grad? I just graduated from my bachelors in May. Recently, I’ve been fortunate enough to receive an offer as a data scientist I at a unicorn where most of the people on the ds team have PhDs. My job starts in a month…

  • Let’s Build Something Together

    Let’s Build Something Together Hey everyone, After my last post about my struggles in finding a remote job, I was honestly blown away. I got over 50 messages not with job offers, but with stories, frustrations, and suggestions. The common theme? Many of us are stuck. Some are trying to break into the market, others…

  • Advice for DS/AS/MLE interviews

    Advice for DS/AS/MLE interviews I am looking for data scientist (ML heavy), applied scientist or ML engineer roles in product based companies. For my interview preperation, I am unsure about which book or resources to pick so that I can cover the rigor of ML rounds in these interviews. I have background in CS and…

  • Career Dilemma

    Career Dilemma submitted by /u/NervousVictory1792 [link] [comments] /u/NervousVictory1792 Go to original source

  • How to Develop a Bilingual Voice Assistant

    How to Develop a Bilingual Voice Assistant Exploring ways to make voice assistants more personal The post How to Develop a Bilingual Voice Assistant appeared first on Towards Data Science. Deepak Krishnamurthy Go to original source

  • The Machine Learning Lessons I’ve Learned This Month

    The Machine Learning Lessons I’ve Learned This Month August 2025: logging, lab notebooks, overnight runs The post The Machine Learning Lessons I’ve Learned This Month appeared first on Towards Data Science. Pascal Janetzky Go to original source

  • Understanding Matrices | Part 4: Matrix Inverse

    Understanding Matrices | Part 4: Matrix Inverse The physical meaning of matrix inversion, related formulas, and how inversion behaves on several special types of matrices. The post Understanding Matrices | Part 4: Matrix Inverse appeared first on Towards Data Science. Tigran Hayrapetyan Go to original source

  • Crafting a Custom Voice Assistant with Perplexity

    Crafting a Custom Voice Assistant with Perplexity How to build a fully functional, hands-free voice assistant on a Raspberry Pi The post Crafting a Custom Voice Assistant with Perplexity appeared first on Towards Data Science. Deepak Krishnamurthy Go to original source

  • Marginal Effect of Hyperparameter Tuning with XGBoost

    Marginal Effect of Hyperparameter Tuning with XGBoost Demystifying Bayesian hyperparameter optimization and comparing hyperparameter tuning paradigms The post Marginal Effect of Hyperparameter Tuning with XGBoost appeared first on Towards Data Science. Noah Swan Go to original source

  • Toward Digital Well-Being: Using Generative AI to Detect and Mitigate Bias in Social Networks

    Toward Digital Well-Being: Using Generative AI to Detect and Mitigate Bias in Social Networks This research answered the question: How can machine learning and artificial intelligence help us to unlearn bias? The post Toward Digital Well-Being: Using Generative AI to Detect and Mitigate Bias in Social Networks appeared first on Towards Data Science. Celia Banks…

  • Unlocking Multimodal Video Transcription with Gemini

    Unlocking Multimodal Video Transcription with Gemini Explore how to transcribe videos with speaker identification in a single prompt The post Unlocking Multimodal Video Transcription with Gemini appeared first on Towards Data Science. Laurent Picard Go to original source

  • How to Import Pre-Annotated Data into Label Studio and Run the Full Stack with Docker

    How to Import Pre-Annotated Data into Label Studio and Run the Full Stack with Docker From VOC to JSON: Importing pre-annotations made simple The post How to Import Pre-Annotated Data into Label Studio and Run the Full Stack with Docker appeared first on Towards Data Science. Yagmur Gulec Go to original source

  • Stochastic Gradients under Nuisances

    Stochastic Gradients under Nuisances arXiv:2508.20326v1 Announce Type: new Abstract: Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while…

  • Towards Trustworthy Amortized Bayesian Model Comparison

    Towards Trustworthy Amortized Bayesian Model Comparison arXiv:2508.20614v1 Announce Type: new Abstract: Amortized Bayesian model comparison (BMC) enables fast probabilistic ranking of models via simulation-based training of neural surrogates. However, the reliability of neural surrogates deteriorates when simulation models are misspecified – the very case where model comparison is most needed. Thus, we supplement simulation-based training…

  • Polynomial Chaos Expansion for Operator Learning

    Polynomial Chaos Expansion for Operator Learning arXiv:2508.20886v1 Announce Type: new Abstract: Operator learning (OL) has emerged as a powerful tool in scientific machine learning (SciML) for approximating mappings between infinite-dimensional functional spaces. One of its main applications is learning the solution operator of partial differential equations (PDEs). While much of the progress in this area…

  • Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation

    Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation arXiv:2508.20942v1 Announce Type: new Abstract: In this paper, we extend the transfer learning classification framework from regression function-based methods to decision rules. We propose a novel methodology for modeling posterior drift through Bayes decision rules. By exploiting the geometric…

  • Discovering equations from data: symbolic regression in dynamical systems

    Discovering equations from data: symbolic regression in dynamical systems arXiv:2508.20257v1 Announce Type: cross Abstract: The process of discovering equations from data lies at the heart of physics and in many other areas of research, including mathematical ecology and epidemiology. Recently, machine learning methods known as symbolic regression have automated this process. As several methods are…

  • Implementing the Hangman Game in Python

    Implementing the Hangman Game in Python A beginner-friendly project to understand variables, loops, and conditions in Python The post Implementing the Hangman Game in Python appeared first on Towards Data Science. Mahnoor Javed Go to original source

  • Stepwise Selection Made Simple: Improve Your Regression Models in Python

    Stepwise Selection Made Simple: Improve Your Regression Models in Python Dimensionality reduction in linear regression: classical stepwise methods and a Python application on real-world data The post Stepwise Selection Made Simple: Improve Your Regression Models in Python appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source

  • Graph Coloring for Data Science: A Comprehensive Guide

    Graph Coloring for Data Science: A Comprehensive Guide From theoretical puzzles to practical applications The post Graph Coloring for Data Science: A Comprehensive Guide appeared first on Towards Data Science. Chinmay Kakatkar Go to original source

  • A Visual Guide to Tuning Decision-Tree Hyperparameters

    A Visual Guide to Tuning Decision-Tree Hyperparameters How hyperparameter tuning visually changes decision trees The post A Visual Guide to Tuning Decision-Tree Hyperparameters appeared first on Towards Data Science. James Gibbins Go to original source

  • Air for Tomorrow: Why Openness in Air Quality Research and Implementation Matters for Global Equity

    Air for Tomorrow: Why Openness in Air Quality Research and Implementation Matters for Global Equity Understand how open source can help you unravel air quality The post Air for Tomorrow: Why Openness in Air Quality Research and Implementation Matters for Global Equity appeared first on Towards Data Science. Prithviraj Pramanik Go to original source

  • Fractal Flow: Hierarchical and Interpretable Normalizing Flow via Topic Modeling and Recursive Strategy

    Fractal Flow: Hierarchical and Interpretable Normalizing Flow via Topic Modeling and Recursive Strategy arXiv:2508.19750v1 Announce Type: new Abstract: Normalizing Flows provide a principled framework for high-dimensional density estimation and generative modeling by constructing invertible transformations with tractable Jacobian determinants. We propose Fractal Flow, a novel normalizing flow architecture that enhances both expressiveness and interpretability through…

  • Conditional Normalizing Flow Surrogate for Monte Carlo Prediction of Radiative Properties in Nanoparticle-Embedded Layers

    Conditional Normalizing Flow Surrogate for Monte Carlo Prediction of Radiative Properties in Nanoparticle-Embedded Layers arXiv:2508.19841v1 Announce Type: new Abstract: We present a probabilistic, data-driven surrogate model for predicting the radiative properties of nanoparticle embedded scattering media. The model uses conditional normalizing flows, which learn the conditional distribution of optical outputs, including reflectance, absorbance, and transmittance,…

  • The Information Dynamics of Generative Diffusion

    The Information Dynamics of Generative Diffusion arXiv:2508.19897v1 Announce Type: new Abstract: Generative diffusion models have emerged as a powerful class of models in machine learning, yet a unified theoretical understanding of their operation is still developing. This perspective paper provides an integrated perspective on generative diffusion by connecting their dynamic, information-theoretic, and thermodynamic properties under…

  • Track Component Failure Detection Using Data Analytics over existing STDS Track Circuit data

    Track Component Failure Detection Using Data Analytics over existing STDS Track Circuit data arXiv:2508.11693v1 Announce Type: cross Abstract: Track Circuits (TC) are the main signalling devices used to detect the presence of a train on a rail track. It has been used since the 19th century and nowadays there are many types depending on the…

  • Physics-Informed Regression: Parameter Estimation in Parameter-Linear Nonlinear Dynamic Models

    Physics-Informed Regression: Parameter Estimation in Parameter-Linear Nonlinear Dynamic Models arXiv:2508.19249v1 Announce Type: cross Abstract: We present a new efficient hybrid parameter estimation method based on the idea, that if nonlinear dynamic models are stated in terms of a system of equations that is linear in terms of the parameters, then regularized ordinary least squares can…

  • Get AI-Ready: How to Prepare for a World of Agentic AI as Tech Professionals

    Get AI-Ready: How to Prepare for a World of Agentic AI as Tech Professionals Explore how Agentic AI is reshaping the tech careers, from data to decision-making, and how professionals can prepare for the future of work The post Get AI-Ready: How to Prepare for a World of Agentic AI as Tech Professionals appeared first…

  • Everything I Studied to Become a Machine Learning Engineer (No CS Background)

    Everything I Studied to Become a Machine Learning Engineer (No CS Background) The books, courses, and resources I used in my journey. The post Everything I Studied to Become a Machine Learning Engineer (No CS Background) appeared first on Towards Data Science. Egor Howell Go to original source

  • Time Series Forecasting Made Simple (Part 4.1): Understanding Stationarity in a Time Series

    Time Series Forecasting Made Simple (Part 4.1): Understanding Stationarity in a Time Series An intuitive guide to stationarity in a time series The post Time Series Forecasting Made Simple (Part 4.1): Understanding Stationarity in a Time Series appeared first on Towards Data Science. Nikhil Dasari Go to original source

  • A Brief History of GPT Through Papers

    A Brief History of GPT Through Papers Language models are becoming really good. But where did they come from? The post A Brief History of GPT Through Papers appeared first on Towards Data Science. Rohit Pandey Go to original source

  • The Math You Need to Pan and Tilt 360° Images

    The Math You Need to Pan and Tilt 360° Images Panning a spherical image is just a horizontal roll, but tilting it vertically is much trickier. Let’s see the math! The post The Math You Need to Pan and Tilt 360° Images appeared first on Towards Data Science. Thomas Rouch Go to original source

  • Deterministic Coreset Construction via Adaptive Sensitivity Trimming

    Deterministic Coreset Construction via Adaptive Sensitivity Trimming arXiv:2508.18340v1 Announce Type: new Abstract: We develop a rigorous framework for deterministic coreset construction in empirical risk minimization (ERM). Our central contribution is the Adaptive Deterministic Uniform-Weight Trimming (ADUWT) algorithm, which constructs a coreset by excising points with the lowest sensitivity bounds and applying a data-dependent uniform weight…

  • Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems

    Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems arXiv:2508.18604v1 Announce Type: new Abstract: Follow-the-Regularized-Leader (FTRL) policies have achieved Best-of-Both-Worlds (BOBW) results in various settings through hybrid regularizers, whereas analogous results for Follow-the-Perturbed-Leader (FTPL) remain limited due to inherent analytical challenges. To advance the analytical foundations of FTPL, we revisit classical FTRL-FTPL duality for unbounded perturbations…

  • Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits

    Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits arXiv:2508.18768v1 Announce Type: new Abstract: We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees $widetilde{mathcal{O}}(sqrt{T})$ regret in the adversarial regime and $widetilde{mathcal{O}}(ln T)$ regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding…