Category: aimldsaimlds

Contrastive representations of high-dimensional, structured treatments

Contrastive representations of high-dimensional, structured treatments arXiv:2411.19245v1 Announce Type: new Abstract: Estimating causal effects is vital for decision making. In standard causal effect estimation, treatments are usually binary- or continuous-valued. However, in many important real-world settings, treatments can be structured, high-dimensional objects, such as text, video, or audio. This provides a challenge to traditional causal…

December 2, 2024
Weekly Entering & Transitioning – Thread 02 Dec, 2024 – 09 Dec, 2024

Weekly Entering & Transitioning – Thread 02 Dec, 2024 – 09 Dec, 2024 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

December 2, 2024
F5-TTS is highly underrated for Audio Cloning !

F5-TTS is highly underrated for Audio Cloning ! submitted by /u/mehul_gupta1997 [link] [comments] /u/mehul_gupta1997 Go to original source

December 2, 2024
Daily averaged time series comparison -Linking plankton and aerosols emissions?

Daily averaged time series comparison -Linking plankton and aerosols emissions? Hi everyone, so we have this dataset of daily averaged pytoplankton time series over a full year; coccolithophores, chlorophytes, cyanobacteria, diatoms, dinoflagellates, phaecocystis, zooplankton. Then we have atmospheric measurements on the same time intervals of a few aerosols species; Methanesulphonic acid, carboxylic acids, aliphatics, sulphates,…

December 2, 2024
Need help gathering data

Need help gathering data Hello! I’m currently analysing data from politicians across the world and I would like to know if there’s a database with data like years in charge, studies they had, age, gender and some other relevant topics. Please, if you had any links I’ll be glad to check them all. *Need help,…

December 2, 2024
Feature creation out of two features.

Feature creation out of two features. I have been working on a project that tried to identify interactions in variables. What is a good way to capture these interactions by creating features? What are good mathematical expressions to capture interaction beyond multiplication and division? Do note i have nulls and i cannot change it. submitted…

December 2, 2024
Smaller is smarter

Smaller is smarter Concerns about the environmental impacts of Large Language Models (LLMs) are growing. Although detailed information about the actual costs of LLMs can be difficult to find, let’s attempt to gather some facts to understand the scale. Generated with ChatGPT-4o Since comprehensive data on ChatGPT-4 is not readily available, we can consider Llama 3.1…

December 2, 2024
Why “Statistical Significance” Is Pointless

Why “Statistical Significance” Is Pointless Here’s a better framework for data-driven decision-making Continue reading on Towards Data Science » Samuele Mazzanti Go to original source

December 2, 2024
The Lead, Shadow, and Sparring Roles in New Data Settings

The Lead, Shadow, and Sparring Roles in New Data Settings From data engineer to domain expert—what it takes to build a new data platform Continue reading on Towards Data Science » Marina Tosic Go to original source

December 2, 2024
How to Solve a Simple Problem With Machine Learning

How to Solve a Simple Problem With Machine Learning A technical walkthrough of lesson one Continue reading on Towards Data Science » Oscar Leo Go to original source

December 2, 2024
When Not to Use the Streamlit AgGrid Component

When Not to Use the Streamlit AgGrid Component Streamlit-AgGrid is amazing. But there are 2 scenarios where its use is not recommended. Continue reading on Towards Data Science » Jose Parreño Go to original source

December 2, 2024
Making News Recommendations Explainable with Large Language Models

Making News Recommendations Explainable with Large Language Models A prompt-based experiment to improve both accuracy and transparent reasoning in content personalization. Deliver relevant content to readers at the right time. Image by author. At DER SPIEGEL, we are continually exploring ways to improve how we recommend news articles to our readers. In our latest (offline) experiment,…

December 1, 2024
Grokking Behavioral Interviews

Grokking Behavioral Interviews Master the art of behavioral interviews and land your dream job Continue reading on Towards Data Science » Mina Ghashami Go to original source

December 1, 2024
Model Validation Techniques, Explained: A Visual Guide with Code Examples

Model Validation Techniques, Explained: A Visual Guide with Code Examples MODEL EVALUATION & OPTIMIZATION 12 must-know methods to validate your machine learning Every day, machines make millions of predictions — from detecting objects in photos to helping doctors find diseases. But before trusting these predictions, we need to know if they’re any good. After all, no one would…

December 1, 2024
Dunder Methods: The Hidden Gems of Python

Dunder Methods: The Hidden Gems of Python Real-world examples on how actively using special methods can simplify coding and improve readability. Dunder methods, though possibly a basic topic in Python, are something I have often noticed being understood only superficially, even by people who have been coding for quite some time. Disclaimer: This is a forgivable…

December 1, 2024
How Did Open Food Facts Fix OCR-Extracted Ingredients Using Open-Source LLMs?

How Did Open Food Facts Fix OCR-Extracted Ingredients Using Open-Source LLMs? Delve into an end-to-end Machine Learning project to improve the quality of the Open Food Facts database Image generated with Flux1 Open Food Facts’ purpose is to create the largest open-source food database in the world. To this day, it has collected over 3 millions products…

November 30, 2024
Effortless Data Handling: Find Variables Across Multiple Data Files with R

Effortless Data Handling: Find Variables Across Multiple Data Files with R A practical solution with code and workflow Lost in a maze of datasets and endless data dictionaries? Say goodbye to tedious variable hunting! Discover how to quickly identify and extract the variables you need from multiple SAS files using two simple R functions. Streamline your…

November 30, 2024
Why Internal Company Chatbots Fail and How to Use Generative AI in Enterprise with Impact

Why Internal Company Chatbots Fail and How to Use Generative AI in Enterprise with Impact Start with the problem and not with the solution Background licensed from elements.envato.com, edit by Marcel Müller 2024 The most common disillusion that many organizations have is the following: They get excited about generative AI with ChatGPT or Microsoft Co-Pilot, read some…

November 30, 2024
Think you Know Excel? Take Your Analytics Skills to the Next Level with Power Query!

Think you Know Excel? Take Your Analytics Skills to the Next Level with Power Query! 5 practical use cases that prove Power Query is worth exploring. I have a confession to make: I’ve been living under a rock 🪨. Not literally, but how else can I explain not discovering Power Query in Excel until now? Imagine…

November 30, 2024
Water Cooler Small Talk: Simpson’s Paradox

Water Cooler Small Talk: Simpson’s Paradox Is your data tricking you? What can you do about it? Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source

November 30, 2024
The Intuition behind Concordance Index — Survival Analysis

The Intuition behind Concordance Index — Survival Analysis The Intuition behind Concordance Index — Survival Analysis Ranking accuracy versus absolute accuracy Taken by the author and her Border Collie. “Be thankful for what you have. Be fearless for what you want” How long would you keep your Gym membership before you decide to cancel it? or Netflix if you are a series…

November 29, 2024
Complete MLOPS Cycle for a Computer Vision Project

Complete MLOPS Cycle for a Computer Vision Project These days, we encounter (and maybe produce on our own) many computer vision projects, where AI is the hottest topic for new technologies… Continue reading on Towards Data Science » Yağmur Çiğdem Aktaş Go to original source

November 29, 2024
A quick guide to Network Science

A quick guide to Network Science For those who would like to learn about complex connections — from theory to practice in Python Continue reading on Towards Data Science » Milan Janosov Go to original source

November 29, 2024
The Most Expensive Data Science Mistake I’ve Witnessed in My Career

The Most Expensive Data Science Mistake I’ve Witnessed in My Career Why true success in machine learning goes beyond optimizing a single metric Continue reading on Towards Data Science » Claudia Ng Go to original source

November 29, 2024
Five Reasons You Cannot Afford Not Knowing Probability Proportional to Size (PPS) Sampling

Five Reasons You Cannot Afford Not Knowing Probability Proportional to Size (PPS) Sampling Data Science Simple Random Sampling (SRS) works, but if you do not know Probability Proportional to Size Sampling (PPS), you are risking yourself some critical statistical mistakes. Learn why, when, and how you can use PPS Sampling here! Photo by Justin Morgan on Unsplash…

November 29, 2024
On the ERM Principle in Meta-Learning

On the ERM Principle in Meta-Learning arXiv:2411.17898v1 Announce Type: new Abstract: Classic supervised learning involves algorithms trained on $n$ labeled examples to produce a hypothesis $h in mathcal{H}$ aimed at performing well on unseen examples. Meta-learning extends this by training across $n$ tasks, with $m$ examples per task, producing a hypothesis class $mathcal{H}$ within some…

November 28, 2024
A Flexible Defense Against the Winner’s Curse

A Flexible Defense Against the Winner’s Curse arXiv:2411.18569v1 Announce Type: new Abstract: Across science and policy, decision-makers often need to draw conclusions about the best candidate among competing alternatives. For instance, researchers may seek to infer the effectiveness of the most successful treatment or determine which demographic group benefits most from a specific treatment. Similarly,…

November 28, 2024
Isometry pursuit

Isometry pursuit arXiv:2411.18502v1 Announce Type: new Abstract: Isometry pursuit is a convex algorithm for identifying orthonormal column-submatrices of wide matrices. It consists of a novel normalization method followed by multitask basis pursuit. Applied to Jacobians of putative coordinate functions, it helps identity isometric embeddings from within interpretable dictionaries. We provide theoretical and experimental results justifying…

November 28, 2024
Functional relevance based on the continuous Shapley value

Functional relevance based on the continuous Shapley value arXiv:2411.18575v1 Announce Type: new Abstract: The presence of Artificial Intelligence (AI) in our society is increasing, which brings with it the need to understand the behaviour of AI mechanisms, including machine learning predictive algorithms fed with tabular data, text, or images, among other types of data. This…

November 28, 2024
When Is Heterogeneity Actionable for Personalization?

When Is Heterogeneity Actionable for Personalization? arXiv:2411.16552v1 Announce Type: cross Abstract: Targeting and personalization policies can be used to improve outcomes beyond the uniform policy that assigns the best performing treatment in an A/B test to everyone. Personalization relies on the presence of heterogeneity of treatment effects, yet, as we show in this paper, heterogeneity…

November 28, 2024
A Story of Long Tails: Why Uncertainty in Marketing Mix Modelling is Important

A Story of Long Tails: Why Uncertainty in Marketing Mix Modelling is Important “Details matter. It’s worth waiting to get it right.” — Steve Jobs Continue reading on Towards Data Science » Javier Marin Go to original source

November 28, 2024
How to Transition from Engineering to Data Science

How to Transition from Engineering to Data Science AI for engineers: experience of an engineering graduate Continue reading on Towards Data Science » Dan Pietrow Go to original source

November 28, 2024
How to Prune LLaMA 3.2 and Similar Large Language Models

How to Prune LLaMA 3.2 and Similar Large Language Models This article explores a structured pruning technique for state-of-the-art models, that uses a GLU architecture, enabling the creation of… Continue reading on Towards Data Science » Pere Martra Go to original source

November 28, 2024
Level Up Your Coding Skills with Python Threading

Level Up Your Coding Skills with Python Threading Learn how to use queues, daemon threads, and events in a Machine Learning project Continue reading on Towards Data Science » Marcello Politi Go to original source

November 28, 2024
How to Develop an Effective AI-Powered Legal Assistant

How to Develop an Effective AI-Powered Legal Assistant Create a machine-learning-based search into legal decisions Continue reading on Towards Data Science » Eivind Kjosbakken Go to original source

November 28, 2024
170 | Formalizing Design with Gabrielle Mérite and Alan Wilson

170 | Formalizing Design with Gabrielle Mérite and Alan Wilson Data design systems and styleguides are currently a huge trend in the data design world. Moritz is joined by Gabrielle Mérite and Alan Wilson and together we exchange experiences in this emerging space, from designing dataviz components as part of Adobe Spectrum, the styleguide for Deloitte’s Insights…

November 27, 2024
169 | Data Conversations with Vidya Setlur

169 | Data Conversations with Vidya Setlur We have Vidya Setlur on the show to talk about the role language, and natural language processing (NLP) play in data visualization and analytics. Vidya is the director of research at Tableau and has a background in natural language processing and visualization. She is one of the main drivers behind…

November 27, 2024
168 | Highlights from IEEE VIS’22 with Tamara Munzner

168 | Highlights from IEEE VIS’22 with Tamara Munzner Finally, this year we managed to record another classic episode from the IEEE VIS Conference (we recorded a total of 10 with this one!) We have Data Stories’ friend Prof. Tamara Munzner with us to talk about the conference and to highlight a few things she picked from…

November 27, 2024
167 | Visualization and Statistics with Andrew Gelman and Jessica Hullman

167 | Visualization and Statistics with Andrew Gelman and Jessica Hullman In this new episode, we talk about the interplay between statistics and data visualization. We do that with Andrew Gelman, Professor of Statistics and Political Science at Columbia University, and Jessica Hullman, Professor of Computer Science at Northwestern University. Andrew started the popular blog “Statistical Modeling,…

November 27, 2024
166 | Catching up with Amanda Makulec

166 | Catching up with Amanda Makulec Hey all, we are back! In this episode, we have Amanda Makulec to catch up on what happened during this whole period of time. Amanda is a public health and data visualization expert and she is the Executive Director of the Data Visualization Society. In the episode, we talk about…

November 27, 2024
What’s going on everybody?

What’s going on everybody? sentdex Go to original source

November 27, 2024
Visualizing Neural Network Internals

Visualizing Neural Network Internals sentdex Go to original source

November 27, 2024
Building an LLM fine-tuning Dataset

Building an LLM fine-tuning Dataset sentdex Go to original source

November 27, 2024
Getting Back on Grid

Getting Back on Grid sentdex Go to original source

November 27, 2024
Open Source AI Inference API w/ Together

Open Source AI Inference API w/ Together sentdex Go to original source

November 27, 2024
Conformalised Conditional Normalising Flows for Joint Prediction Regions in time series

Conformalised Conditional Normalising Flows for Joint Prediction Regions in time series arXiv:2411.17042v1 Announce Type: new Abstract: Conformal Prediction offers a powerful framework for quantifying uncertainty in machine learning models, enabling the construction of prediction sets with finite-sample validity guarantees. While easily adaptable to non-probabilistic models, applying conformal prediction to probabilistic generative models, such as Normalising…

November 27, 2024
Fast, Precise Thompson Sampling for Bayesian Optimization

Fast, Precise Thompson Sampling for Bayesian Optimization arXiv:2411.17071v1 Announce Type: new Abstract: Thompson sampling (TS) has optimal regret and excellent empirical performance in multi-armed bandit problems. Yet, in Bayesian optimization, TS underperforms popular acquisition functions (e.g., EI, UCB). TS samples arms according to the probability that they are optimal. A recent algorithm, P-Star Sampler (PSS),…

November 27, 2024
Spatio-Temporal Conformal Prediction for Power Outage Data

Spatio-Temporal Conformal Prediction for Power Outage Data arXiv:2411.17099v1 Announce Type: new Abstract: In recent years, increasingly unpredictable and severe global weather patterns have frequently caused long-lasting power outages. Building resilience, the ability to withstand, adapt to, and recover from major disruptions, has become crucial for the power industry. To enable rapid recovery, accurately predicting future…

November 27, 2024
Training a neural netwok for data reduction and better generalization

Training a neural netwok for data reduction and better generalization arXiv:2411.17180v1 Announce Type: new Abstract: The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization. Linear models with LASSO-type regularization achieve this by setting the weights of irrelevant features to zero, effectively identifying and ignoring…

November 27, 2024
A Generalized Unified Skew-Normal Process with Neural Bayes Inference

A Generalized Unified Skew-Normal Process with Neural Bayes Inference arXiv:2411.17400v1 Announce Type: new Abstract: In recent decades, statisticians have been increasingly encountering spatial data that exhibit non-Gaussian behaviors such as asymmetry and heavy-tailedness. As a result, the assumptions of symmetry and fixed tail weight in Gaussian processes have become restrictive and may fail to capture…

November 27, 2024
Weekly Entering & Transitioning – Thread 25 Nov, 2024 – 02 Dec, 2024

Weekly Entering & Transitioning – Thread 25 Nov, 2024 – 02 Dec, 2024 Welcome to this week’s entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g.…

November 27, 2024
Just spent the afternoon chatting with ChatGPT about a work problem. Now I am a convert.

Just spent the afternoon chatting with ChatGPT about a work problem. Now I am a convert. I have to build an optimization algorithm on a domain I have not worked in before (price sensitivity based, revenue optimization) Well, instead of googling around, I asked ChatGPT which we do have available at work. And it was…

November 27, 2024
Should I try to become a Data scientist or AI engineer

Should I try to become a Data scientist or AI engineer Background: I’m a 25M with 2.5 years experience as an analyst. (Soon enrolling in a masters program in CS) There are a few careers possibilities for me, but I’m confused as to whether I should try to become a general data scientist or ai…

November 27, 2024
Have you ever presented an analysis or shipped a model just because someone demand it, even when you knew it was wrong, just to save your ass?

Have you ever presented an analysis or shipped a model just because someone demand it, even when you knew it was wrong, just to save your ass? This has been quite common in my career. Execs demand a model X, we barely have good data to create nor the model turns out good, but telling…

November 27, 2024
I Wrote a Guide to Simulation in Python with SimPy

I Wrote a Guide to Simulation in Python with SimPy Hi folks, I wrote a guide on discrete-event simulation with SimPy, designed to help you learn how to build simulations using Python. Kind of like the official documentation but on steroids. I have used SimPy personally in my own career for over a decade, it…

November 27, 2024
Neuromorphic Computing — an Edgier, Greener AI

Neuromorphic Computing — an Edgier, Greener AI Neuromorphic Computing — an Edgier, Greener AI Why computer hardware and AI algorithms are being reinvented using inspiration from the brain euromorphic Computing might not just help bring AI to the edge, but also reduce carbon emissions at data centers. Generated by author with ImageGen 3. There are periodic proclamations of the coming neuromorphic computing…

November 27, 2024
NLP Illustrated, Part 2: Word Embeddings

NLP Illustrated, Part 2: Word Embeddings An illustrated and intuitive guide to word embeddings Continue reading on Towards Data Science » Shreya Rao Go to original source

November 27, 2024
Addressing Missing Data

Addressing Missing Data Understand missing data patterns (MCAR, MNAR, MAR) for better model performance with Missingno Continue reading on Towards Data Science » Gizem Kaya Go to original source

November 27, 2024
Optimizing Transformer Models for Variable-Length Input Sequences

Optimizing Transformer Models for Variable-Length Input Sequences How PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI Costs Photo by Tanja Zöllner on Unsplash As generative AI (genAI) models grow in both popularity and scale, so do the computational demands and costs associated with their training and deployment. Optimizing these models is crucial for enhancing…

November 27, 2024
Mistral 7B Explained: Towards More Efficient Language Models

Mistral 7B Explained: Towards More Efficient Language Models RMS Norm, RoPE, GQA, SWA, KV Cache, and more! Part 5 in the “LLMs from Scratch” series — a complete guide to understanding and building Large Language Models. If you are interested in learning more about how these models work I encourage you to read: Part 1: Tokenization — A Complete Guide Part 2:…

November 27, 2024