Tag: our

How We Are Testing Our Agents in Dev

How We Are Testing Our Agents in Dev Testing that your AI agent is performing as expected is not easy. Here are a few strategies we learned the hard way. The post How We Are Testing Our Agents in Dev appeared first on Towards Data Science. Michael Segner Go to original source

December 7, 2025
Distributionally Robust Online Markov Game with Linear Function Approximation

Distributionally Robust Online Markov Game with Linear Function Approximation arXiv:2511.07831v1 Announce Type: new Abstract: The sim-to-real gap, where agents trained in a simulator face significant performance degradation during testing, is a fundamental challenge in reinforcement learning. Extansive works adopt the framework of distributionally robust RL, to learn a policy that acts robustly under worst case…

November 12, 2025
Private Learning of Littlestone Classes, Revisited

Private Learning of Littlestone Classes, Revisited arXiv:2510.00076v1 Announce Type: new Abstract: We consider online and PAC learning of Littlestone classes subject to the constraint of approximate differential privacy. Our main result is a private learner to online-learn a Littlestone class with a mistake bound of $tilde{O}(d^{9.5}cdot log(T))$ in the realizable case, where $d$ denotes the…

October 2, 2025
Model-free algorithms for fast node clustering in SBM type graphs and application to social role inference in animals

Model-free algorithms for fast node clustering in SBM type graphs and application to social role inference in animals arXiv:2509.15989v1 Announce Type: new Abstract: We propose a novel family of model-free algorithms for node clustering and parameter inference in graphs generated from the Stochastic Block Model (SBM), a fundamental framework in community detection. Drawing inspiration from…

September 22, 2025
If we use AI to do our work – what is our job, then?

If we use AI to do our work – what is our job, then? Images. Text. Audio. There’s no modality that is not handled by AI. And AI systems reach even further, planning advertisement and marketing campaigns, automating social media postings, … Most of this was unthinkable a mere ten years ago. But then, the…

September 13, 2025
Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation

Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation arXiv:2508.20942v1 Announce Type: new Abstract: In this paper, we extend the transfer learning classification framework from regression function-based methods to decision rules. We propose a novel methodology for modeling posterior drift through Bayes decision rules. By exploiting the geometric…

August 29, 2025
Underdamped Langevin MCMC with third order convergence

Underdamped Langevin MCMC with third order convergence arXiv:2508.16485v1 Announce Type: new Abstract: In this paper, we propose a new numerical method for the underdamped Langevin diffusion (ULD) and present a non-asymptotic analysis of its sampling error in the 2-Wasserstein distance when the $d$-dimensional target distribution $p(x)propto e^{-f(x)}$ is strongly log-concave and has varying degrees of…

August 25, 2025
Distributional Sensitivity Analysis: Enabling Differentiability in Sample-Based Inference

Distributional Sensitivity Analysis: Enabling Differentiability in Sample-Based Inference arXiv:2508.09347v1 Announce Type: new Abstract: We present two analytical formulae for estimating the sensitivity — namely, the gradient or Jacobian — at given realizations of an arbitrary-dimensional random vector with respect to its distributional parameters. The first formula interprets this sensitivity as partial derivatives of the inverse…

August 14, 2025
Differentially Private Model-X Knockoffs via Johnson-Lindenstrauss Transform

Differentially Private Model-X Knockoffs via Johnson-Lindenstrauss Transform arXiv:2508.04800v1 Announce Type: new Abstract: We introduce a novel privatization framework for high-dimensional controlled variable selection. Our framework enables rigorous False Discovery Rate (FDR) control under differential privacy constraints. While the Model-X knockoff procedure provides FDR guarantees by constructing provably exchangeable “negative control” features, existing privacy mechanisms like…

August 8, 2025
Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives arXiv:2506.20114v1 Announce Type: new Abstract: Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an…

June 26, 2025
Oh SnapMMD! Forecasting Stochastic Dynamics Beyond the Schr”odinger Bridge’s End

Oh SnapMMD! Forecasting Stochastic Dynamics Beyond the Schr”odinger Bridge’s End arXiv:2505.16082v1 Announce Type: new Abstract: Scientists often want to make predictions beyond the observed time horizon of “snapshot” data following latent stochastic dynamics. For example, in time course single-cell mRNA profiling, scientists have access to cellular transcriptional state measurements (snapshots) from different biological replicates at…

May 23, 2025
Infinite hierarchical contrastive clustering for personal digital envirotyping

Infinite hierarchical contrastive clustering for personal digital envirotyping arXiv:2505.15022v1 Announce Type: new Abstract: Daily environments have profound influence on our health and behavior. Recent work has shown that digital envirotyping, where computer vision is applied to images of daily environments taken during ecological momentary assessment (EMA), can be used to identify meaningful relationships between environmental…

May 22, 2025
We Need a Fourth Law of Robotics in the Age of AI

We Need a Fourth Law of Robotics in the Age of AI Artificial Intelligence has become a mainstay of our daily lives, revolutionizing industries, accelerating scientific discoveries, and reshaping how we communicate. Yet, alongside its undeniable benefits, AI has also ignited a range of ethical and social dilemmas that our existing regulatory frameworks have struggled…

May 7, 2025
On Model Protection in Federated Learning against Eavesdropping Attacks

On Model Protection in Federated Learning against Eavesdropping Attacks arXiv:2504.02114v1 Announce Type: cross Abstract: In this study, we investigate the protection offered by federated learning algorithms against eavesdropping adversaries. In our model, the adversary is capable of intercepting model updates transmitted from clients to the server, enabling it to create its own estimate of the…

April 4, 2025
Backdoor Detection through Replicated Execution of Outsourced Training

Backdoor Detection through Replicated Execution of Outsourced Training arXiv:2504.00170v1 Announce Type: cross Abstract: It is common practice to outsource the training of machine learning models to cloud providers. Clients who do so gain from the cloud’s economies of scale, but implicitly assume trust: the server should not deviate from the client’s training procedure. A malicious…

April 2, 2025
SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement arXiv:2503.12760v1 Announce Type: new Abstract: To design effective digital interventions, experimenters face the challenge of learning decision policies that balance multiple objectives using offline data. Often, they aim to develop policies that maximize goal outcomes, while ensuring there are no undesirable changes in guardrail…

March 18, 2025
Experiments Illustrated: Can $1 Change Behavior More Than $100?

Experiments Illustrated: Can $1 Change Behavior More Than $100? I currently lead a small data team at a small tech company. With everything small, we have a lot of autonomy over what, when, and how we run experiments. In this series, I’m opening the vault from our years of experimenting, each story highlighting a key…

March 12, 2025
Write for Towards Data Science

Write for Towards Data Science Quick Links: Submission Guidelines How To Submit Your Work How to get your article ready for publication! Adding and using images Longform posts, columns, and online books FAQ Why become a contributor? We are looking for writers to propose up-to-date content focused on data science, machine learning, artificial intelligence and…

February 28, 2025
Fr’echet Cumulative Covariance Net for Deep Nonlinear Sufficient Dimension Reduction with Random Objects

Fr’echet Cumulative Covariance Net for Deep Nonlinear Sufficient Dimension Reduction with Random Objects arXiv:2502.15374v1 Announce Type: new Abstract: Nonlinear sufficient dimension reductioncitep{libing_generalSDR}, which constructs nonlinear low-dimensional representations to summarize essential features of high-dimensional data, is an important branch of representation learning. However, most existing methods are not applicable when the response variables are complex non-Euclidean…

February 24, 2025
Building a Data Engineering Center of Excellence

Building a Data Engineering Center of Excellence As data continues to grow in importance and become more complex, the need for skilled data engineers has never been greater. But what is data engineering, and why is it so important? In this blog post, we will discuss the essential components of a functioning data engineering practice…

February 14, 2025
A Meta-learner for Heterogeneous Effects in Difference-in-Differences

A Meta-learner for Heterogeneous Effects in Difference-in-Differences arXiv:2502.04699v1 Announce Type: new Abstract: We address the problem of estimating heterogeneous treatment effects in panel data, adopting the popular Difference-in-Differences (DiD) framework under the conditional parallel trends assumption. We propose a novel doubly robust meta-learner for the Conditional Average Treatment Effect on the Treated (CATT), reducing the…

February 10, 2025
Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models

Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models arXiv:2502.01919v1 Announce Type: new Abstract: In this work, we present a comprehensive Bayesian posterior analysis of what we term Poisson Hierarchical Indian Buffet Processes, designed for complex random sparse count species sampling models that allow…

February 5, 2025
Towards Data Science is Launching as an Independent Publication

Towards Data Science is Launching as an Independent Publication Since founding Towards Data Science in 2016, we’ve built the largest publication on Medium with a dedicated community of readers and contributors focused on data science, machine learning, and AI. Medium built a fantastic platform, and we wouldn’t have been able to reach our audience without…

February 4, 2025
Trustworthy Evaluation of Generative AI Models

Trustworthy Evaluation of Generative AI Models arXiv:2501.18897v1 Announce Type: new Abstract: Generative AI (GenAI) models have recently achieved remarkable empirical performance in various applications, however, their evaluations yet lack uncertainty quantification. In this paper, we propose a method to compare two generative models based on an unbiased estimator of their relative performance gap. Statistically, our…

February 3, 2025
Semantically Compress Text to Save On LLM Costs

Semantically Compress Text to Save On LLM Costs LLMs are great… if they can fit all of your data Photo by Christopher Burns on Unsplash Originally published at https://blog.developer.bazaarvoice.com on October 28, 2024. Introduction Large language models are fantastic tools for unstructured text, but what if your text doesn’t fit in the context window? Bazaarvoice faced exactly this…

December 21, 2024