Category: data-science
-
Data Science: From School to Work, Part III
Data Science: From School to Work, Part III Introduction Writing code is about solving problems, but not every problem is predictable. In the real world, your software will encounter unexpected situations: missing files, invalid user inputs, network timeouts, or even hardware failures. This is why handling errors isn’t just a nice-to-have; it’s a critical part…
-
Automate Supply Chain Analytics Workflows with AI Agents using n8n
Automate Supply Chain Analytics Workflows with AI Agents using n8n Why build things the hard way when you can design them the smart way? As a Supply Chain Data Scientist, I’ve explored various frameworks like LangChain and LangGraph to build AI agents using Python. Leveraging LLMs with LangChain for Supply Chain Analytics — A Control Tower Powered by…
-
Uncertainty Quantification in Machine Learning with an Easy Python Interface
Uncertainty Quantification in Machine Learning with an Easy Python Interface Uncertainty quantification (UQ) in a Machine Learning (ML) model allows one to estimate the precision of its predictions. This is extremely important for utilizing its predictions in real-world tasks. For instance, if a machine learning model is trained to predict a property of a material,…
-
The Ultimate AI/ML Roadmap For Beginners
The Ultimate AI/ML Roadmap For Beginners AI is transforming the way businesses operate, and nearly every company is exploring how to leverage this technology. As a result, the demand for AI and machine learning skills has skyrocketed in recent years. With nearly four years of experience in AI/ML, I’ve decided to create the ultimate guide…
-
Data-Driven March Madness Predictions
Data-Driven March Madness Predictions March Madness is infamously unpredictable, a perfect storm where favorites tumble and underdogs rise to do the impossible. Every March, 64 men’s and 64 women’s College Basketball teams battle for glory, while millions of fans, analysts, and betting markets scramble to predict the outcomes. But the odds of picking a perfect…
-
What Germany Currently Is Up To, Debt-Wise
What Germany Currently Is Up To, Debt-Wise €1,600 per second. That’s how much interest Germany has to pay for its debts. In total, the German state has debts ranging into the trillions — more than a thousand billion Euros. And the government is planning to make even more, up to one trillion additional debt is…
-
Google’s Data Science Agent: Can It Really Do Your Job?
Google’s Data Science Agent: Can It Really Do Your Job? On March 3rd, Google officially rolled out its Data Science Agent to most Colab users for free. This is not something brand new — it was first announced in December last year, but it is now integrated into Colab and made widely accessible. Google says…
-
Mastering the Poisson Distribution: Intuition and Foundations
Mastering the Poisson Distribution: Intuition and Foundations You’ve probably used the normal distribution one or two times too many. We all have — It’s a true workhorse. But sometimes, we run into problems. For instance, when predicting or forecasting values, simulating data given a particular data-generating process, or when we try to visualise model output…
-
Six Organizational Models for Data Science
Six Organizational Models for Data Science Introduction Data science teams can operate in myriad ways within a company. These organizational models influence the type of work that the team does, but also the team’s culture, goals, Impact, and overall value to the company. Adopting the wrong organizational model can limit impact, cause delays, and compromise…
-
The Impact of GenAI and Its Implications for Data Scientists
The Impact of GenAI and Its Implications for Data Scientists GenAI systems affect how we work. This general notion is well known. However, we are still unaware of the exact impact of GenAI. For example, how much do these tools affect our work? Do they have a larger impact on certain tasks? What does this…
-
Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster
Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster As we have already seen with the basic components (Part 1, Part 2), the Hadoop ecosystem is constantly evolving and being optimized for new applications. As a result, various tools and technologies have developed over time that make Hadoop more powerful and…
-
Forget About Cloud Computing. On-Premises Is All the Rage Again
Forget About Cloud Computing. On-Premises Is All the Rage Again Ten years ago, everybody was fascinated by the cloud. It was the new thing, and companies that adopted it rapidly saw tremendous growth. Salesforce, for example, positioned itself as a pioneer of this technology and saw great wins. The tides are turning though. As much…
-
Anatomy of a Parquet File
Anatomy of a Parquet File In recent years, Parquet has become a standard format for data storage in Big Data ecosystems. Its column-oriented format offers several advantages: Faster query execution when only a subset of columns is being processed Quick calculation of statistics across all data Reduced storage volume thanks to efficient compression When combined…
-
Fourier Transform Applications in Literary Analysis
Fourier Transform Applications in Literary Analysis Poetry is often seen as a pure art form, ranging from the rigid structure of a haiku to the fluid, unconstrained nature of free-verse poetry. In analysing these works, though, to what extent can mathematics and Data Analysis be used to glean meaning from this free-flowing literature? Of course,…
-
Mastering Hadoop, Part 2: Getting Hands-On — Setting Up and Scaling Hadoop
Mastering Hadoop, Part 2: Getting Hands-On — Setting Up and Scaling Hadoop Now that we’ve explored Hadoop’s role and relevance, it’s time to show you how it works under the hood and how you can start working with it. To start, we are breaking down Hadoop’s core components — HDFS for storage, MapReduce for processing,…
-
7 Powerful DBeaver Tips and Tricks to Improve Your SQL Workflow
7 Powerful DBeaver Tips and Tricks to Improve Your SQL Workflow DBeaver is the most powerful open-source SQL IDE, but there are several features people don’t know about. In this post, I will share with you several features to speed up your workflow, with zero fluff. I’ve learned these as I’m currently digging deeper into…
-
How to Switch from Data Analyst to Data Scientist
How to Switch from Data Analyst to Data Scientist Are you a Data Analyst looking to break into data science? If so, this post is for you. Many people start in analytics because it generally has a lower barrier to entry, but as they gain experience, they realize they want to take on more technical…
-
Experiments Illustrated: Can $1 Change Behavior More Than $100?
Experiments Illustrated: Can $1 Change Behavior More Than $100? I currently lead a small data team at a small tech company. With everything small, we have a lot of autonomy over what, when, and how we run experiments. In this series, I’m opening the vault from our years of experimenting, each story highlighting a key…
-
Mastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies
Mastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies Nowadays, a large amount of data is collected on the internet, which is why companies are faced with the challenge of being able to store, process, and analyze these volumes efficiently. Hadoop is an open-source framework from the Apache Software Foundation and has become…
-
How to Develop Complex DAX Expressions
How to Develop Complex DAX Expressions At some point or another, any Power BI developer must write complex Dax expressions to analyze data. But nobody tells you how to do it. What’s the process for doing it? What is the best way to do it, and how supportive can a development process be? These are the questions…
-
Platform-Mesh, Hub and Spoke, and Centralised | 3 Types of data team
Platform-Mesh, Hub and Spoke, and Centralised | 3 Types of data team Introduction In the “ever rapidly changing landscape of Data and AI” (!), understanding data and AI architecture has never been more critical. However something many leaders overlook is the importance of data team structure. While many of you reading this probably identify as the data…
-
Linear Regression in Time Series: Sources of Spurious Regression
Linear Regression in Time Series: Sources of Spurious Regression 1. Introduction It’s pretty clear that most of our work will be automated by AI in the future. This will be possible because many researchers and professionals are working hard to make their work available online. These contributions not only help us understand fundamental concepts but…
-
Experiments Illustrated: How Random Assignment Saved Us $1M in Marketing Spend
Experiments Illustrated: How Random Assignment Saved Us $1M in Marketing Spend Running cool experiments is easily one of my favorite parts of working in data science. Most experiments don’t deliver big wins, so the winners make for fun stories. We’ve had a few of these at IntelyCare, and I’m sharing each story in a way…
-
Experiments Illustrated: How We Optimized Premium Listings on Our Nursing Job Board
Experiments Illustrated: How We Optimized Premium Listings on Our Nursing Job Board Running experiments is a task that often falls to data scientists. If that’s you, congrats! It can be a rewarding and high-impact area of work, but also requires tools found outside the typical ML-heavy data science curriculum. Even with the best tools, only…
-
When You Just Can’t Decide on a Single Action
When You Just Can’t Decide on a Single Action In Game Theory, the players typically have to make assumptions about the other players’ actions. What will the other player do? Will he use rock, paper or scissors? You never know, but in some cases, you might have an idea of the probability of some actions…
-
One-Tailed Vs. Two-Tailed Tests
One-Tailed Vs. Two-Tailed Tests Introduction If you’ve ever analyzed data using built-in t-test functions, such as those in R or SciPy, here’s a question for you: have you ever adjusted the default setting for the alternative hypothesis? If your answer is no—or if you’re not even sure what this means—then this blog post is for…
-
Kubernetes — Understanding and Utilizing Probes Effectively
Kubernetes — Understanding and Utilizing Probes Effectively Introduction Let’s talk about Kubernetes probes and why they matter in your deployments. When managing production-facing containerized applications, even small optimizations can have enormous benefits. Aiming to reduce deployment times, making your applications better react to scaling events, and managing the running pods healthiness requires fine-tuning your container…
-
Mastering 1:1s as a Data Scientist: From Status Updates to Career Growth
Mastering 1:1s as a Data Scientist: From Status Updates to Career Growth I have been a data team manager for six months, and my team has grown from three to five. I wrote about my initial manager experiences back in November. In this article, I want to talk about something that is more essential to…
-
Practical SQL Puzzles That Will Level Up Your Skill
Practical SQL Puzzles That Will Level Up Your Skill There are some Sql patterns that, once you know them, you start seeing them everywhere. The solutions to the puzzles that I will show you today are actually very simple SQL queries, but understanding the concept behind them will surely unlock new solutions to the queries…
-
Data Science: From School to Work, Part II
Data Science: From School to Work, Part II In my previous article, I highlighted the importance of effective project management in Python development. Now, let’s shift our focus to the code itself and explore how to write clean, maintainable code — an essential practice in professional and collaborative environments. Readability & Maintainability: Well-structured code is easier to…
-
I Won’t Change Unless You Do
I Won’t Change Unless You Do In Game Theory, how can players ever come to an end if there still might be a better option to decide for? Maybe one player still wants to change their decision. But if they do, maybe the other player wants to change too. How can they ever hope to…
-
Debugging the Dreaded NaN
Debugging the Dreaded NaN You are training your latest AI model, anxiously watching as the loss steadily decreases when suddenly — boom! Your logs are flooded with NaNs (Not a Number) — your model is irreparably corrupted and you’re left staring at your screen in despair. To make matters worse, the NaNs don’t appear consistently.…
-
The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines
The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines “You don’t have to be an expert to deceive someone, though you might need some expertise to reliably recognize when you are being deceived.” When my co-instructor and I start our quarterly lesson on deceptive visualizations for the data visualization course we teach at the University…
-
Efficient Data Handling in Python with Arrow
Efficient Data Handling in Python with Arrow 1. Introduction We’re all used to work with CSVs, JSON files… With the traditional libraries and for large datasets, these can be extremely slow to read, write and operate on, leading to performance bottlenecks (been there). It’s precisely with big amounts of data that being efficient handling the…
-
The Next AI Revolution: A Tutorial Using VAEs to Generate High-Quality Synthetic Data
The Next AI Revolution: A Tutorial Using VAEs to Generate High-Quality Synthetic Data What is synthetic data? Data created by a computer intended to replicate or augment existing data. Why is it useful? We have all experienced the success of ChatGPT, Llama, and more recently, DeepSeek. These language models are being used ubiquitously across society…
-
Do European M&Ms Actually Taste Better than American M&Ms?
Do European M&Ms Actually Taste Better than American M&Ms? (Oh, I am the only one who’s been asking this question…? Hm. Well, if you have a minute, please enjoy this exploratory Data Analysis — featuring experimental design, statistics, and interactive visualization — applied a bit too earnestly to resolve an international debate.) 1. Introduction 1.1…
-
Talking about Games
Talking about Games Game theory is a field of research that is quite prominent in Economics but rather unpopular in other scientific disciplines. However, the concepts used in game theory can be of interest to a wider audience, including data scientists, statisticians, computer scientists or psychologists, to name just a few. This article is the…
-
Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics
Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics [ The article was written by Guanao Yan, Ph.D. student of Statistics and Data Science at UCLA. Guanao is the first author of the Nature Communications review article [1]. Spatially resolved transcriptomics (SRT) is revolutionizing Genomics by enabling the high-throughput measurement of gene expression while…
-
Don’t Let Conda Eat Your Hard Drive
Don’t Let Conda Eat Your Hard Drive If you’re an Anaconda user, you know that conda environments help you manage package dependencies, avoid compatibility conflicts, and share your projects with others. Unfortunately, they can also take over your computer’s hard drive. I write lots of computer tutorials and to keep them organized, each has a dedicated folder…
-
Why Data Scientists Should Care about Containers — and Stand Out with This Knowledge
Why Data Scientists Should Care about Containers — and Stand Out with This Knowledge “I train models, analyze data and create dashboards — why should I care about Containers?” Many people who are new to the world of data science ask themselves this question. But imagine you have trained a model that runs perfectly on…
-
Advanced Time Intelligence in DAX with Performance in Mind
Advanced Time Intelligence in DAX with Performance in Mind We all know the usual Time Intelligence function based on years, quarters, months, and days. But sometimes, we need to perform more exotic timer intelligence calculations. But we should not forget to consider performance while programming the measures. Introduction There are many Dax functions in Power BI…
-
Honestly Uncertain
Honestly Uncertain Ethical issues aside, should you be honest when asked how certain you are about some belief? Of course, it depends. In this blog post, you’ll learn on what. Different ways of evaluating probabilistic predictions come with dramatically different degrees of “optimal honesty”. Perhaps surprisingly, the linear function that assigns +1 to true and fully…
-
The Future of Data: How Decision Intelligence is Revolutionizing Data
The Future of Data: How Decision Intelligence is Revolutionizing Data In the past few years, technology and AI have evolved more than ever. As I read about the new concepts in tech and learn new skills and techniques each day, I feel in a state of limbo — there is so much content to consume and yet,…
-
How I Became A Machine Learning Engineer (No CS Degree, No Bootcamp)
How I Became A Machine Learning Engineer (No CS Degree, No Bootcamp) Machine learning and AI are among the most popular topics nowadays, especially within the tech space. I am fortunate enough to work and develop with these technologies every day as a machine learning engineer! In this article, I will walk you through my…
-
➡️ Start Asking Your Data ‘Why?’ — A Gentle Intro To Causality
➡️ Start Asking Your Data ‘Why?’ — A Gentle Intro To Causality Correlation does not imply causation. It turns out, however, that with some simple ingenious tricks one can, potentially, unveil causal relationships within standard observational data, without having to resort to expensive randomised control trials. This post is targeted towards anyone making data driven…
-
Roadmap to Becoming a Data Scientist, Part 4: Advanced Machine Learning
Roadmap to Becoming a Data Scientist, Part 4: Advanced Machine Learning Introduction Data science is undoubtedly one of the most fascinating fields today. Following significant breakthroughs in machine learning about a decade ago, data science has surged in popularity within the tech community. Each year, we witness increasingly powerful tools that once seemed unimaginable. Innovations such as the Transformer…
-
Publish Interactive Data Visualizations for Free with Python and Marimo
Publish Interactive Data Visualizations for Free with Python and Marimo Working in Data Science, it can be hard to share insights from complex datasets using only static figures. All the facets that describe the shape and meaning of interesting data are not always captured in a handful of pre-generated figures. While we have powerful technologies…
-
Building a Data Engineering Center of Excellence
Building a Data Engineering Center of Excellence As data continues to grow in importance and become more complex, the need for skilled data engineers has never been greater. But what is data engineering, and why is it so important? In this blog post, we will discuss the essential components of a functioning data engineering practice…
-
Learnings from a Machine Learning Engineer — Part 1: The Data
Learnings from a Machine Learning Engineer — Part 1: The Data It is said that in order for a machine learning model to be successful, you need to have good data. While this is true (and pretty much obvious), it is extremely difficult to define, build, and sustain good data. Let me share with you…
-
Method of Moments Estimation with Python Code
Method of Moments Estimation with Python Code Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two, … etc., calls per…
-
Should Data Scientists Care About Quantum Computing?
Should Data Scientists Care About Quantum Computing? I am sure the quantum hype has reached every person in tech (and outside it, most probably). With some over-the-top claims, like “some company has proved quantum supremacy,” “the quantum revolution is here,” or my favorite, “quantum computers are here, and it will make classical computers obsolete.” I…
-
Pandas Can’t Handle This: How ArcticDB Powers Massive Datasets
Pandas Can’t Handle This: How ArcticDB Powers Massive Datasets Python has grown to dominate data science, and its package Pandas has become the go-to tool for data analysis. It is great for tabular data and supports data files of up to 1GB if you have a large RAM. Within these size limits, it is also…
-
Build a Decision Tree in Polars from Scratch
Build a Decision Tree in Polars from Scratch Decision Tree algorithms have always fascinated me. They are easy to implement and achieve good results on various classification and regression tasks. Combined with boosting, decision trees are still state-of-the-art in many applications. Frameworks such as sklearn, Lightgbm, xgboost and catboost have done a very good job…
-
Virtualization & Containers for Data Science Newbies
Virtualization & Containers for Data Science Newbies Virtualization makes it possible to run multiple virtual machines (VMs) on a single piece of physical hardware. These VMs behave like independent computers, but share the same physical computing power. A computer within a computer, so to speak. Many cloud services rely on virtualization. But other technologies, such…
-
Data vs. Business Strategy
Data vs. Business Strategy There seems to be a consensus that leveraging data, analytics, and AI to create a data-driven organization requires a clear strategic approach. However, there is less clarity and agreement on exactly what this strategic approach should look like in practice. This article provides a short overview of what strategy work I…
-
The Gamma Hurdle Distribution
The Gamma Hurdle Distribution Which Outcome Matters? Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be…
-
Triangle Forecasting: Why Traditional Impact Estimates Are Inflated (And How to Fix Them)
Triangle Forecasting: Why Traditional Impact Estimates Are Inflated (And How to Fix Them) Accurate impact estimations can make or break your business case. Yet, despite its importance, most teams use oversimplified calculations that can lead to inflated projections. These shot-in-the-dark numbers not only destroy credibility with stakeholders but can also result in misallocation of resources and…
-
Synthetic Data Generation with LLMs
Synthetic Data Generation with LLMs Popularity of RAG Over the past two years while working with financial firms, I’ve observed firsthand how they identify and prioritize Generative AI use cases, balancing complexity with potential value. Retrieval-Augmented Generation (RAG) often stands out as a foundational capability across many LLM-driven solutions, striking a balance between ease of implementation…
-
The Method of Moments Estimator for Gaussian Mixture Models
The Method of Moments Estimator for Gaussian Mixture Models Audio Processing is one of the most important application domains of digital signal processing (DSP) and machine learning. Modeling acoustic environments is an essential step in developing digital audio processing systems such as: speech recognition, speech enhancement, acoustic echo cancellation, etc. Acoustic environments are filled with background…
-
How to Create Network Graph Visualizations in Microsoft PowerBI
How to Create Network Graph Visualizations in Microsoft PowerBI Microsoft PowerBI is a one of the most popular Business Intelligence (BI) tools, and while it has all the features you need to create dynamic analytic reporting for stakeholders across the business, creating some advanced data visualizations is more challenging. This article will walk through how…
-
Introduction to Minimum Cost Flow Optimization in Python
Introduction to Minimum Cost Flow Optimization in Python Minimum cost flow optimization minimizes the cost of moving flow through a network of nodes and edges. Nodes include sources (supply) and sinks (demand), with different costs and capacity limits. The aim is to find the least costly way to move volume from sources to sinks while…
-
Myths vs. Data: Does an Apple a Day Keep the Doctor Away?
Myths vs. Data: Does an Apple a Day Keep the Doctor Away? Introduction “Money can’t buy happiness.” “You can’t judge a book by its cover.” “An apple a day keeps the doctor away.” You’ve probably heard these sayings several times, but do they actually hold up when we look at the data? In this article series,…
-
Neural Networks – Intuitively and Exhaustively Explained
Neural Networks – Intuitively and Exhaustively Explained An in-depth exploration of the most fundamental architecture in modern AI “The Thinking Part” by Daniel Warfield using MidJourney. All images by the author unless otherwise specified. Article originally made available on Intuitively and Exhaustively Explained. In this article we’ll form a thorough understanding of the neural network,…
-
How to Get Promoted as a Data Scientist
How to Get Promoted as a Data Scientist Image artificially generated using Grok 2. Introduction I have been working as a Data Scientist since 2017, and during that time I have been promoted from a junior/mid-level to a senior, and most recently to a Lead Data Scientist. There is a lot of content online regarding…
-
How to Find Seasonality Patterns in Time Series
How to Find Seasonality Patterns in Time Series Using Fourier Transforms to detect seasonal components In my professional life as a data scientist, I have encountered time series multiple times. Most of my knowledge comes from my academic experience, specifically my courses in Econometrics (I have a degree in Economics), where we studied statistical properties…
-
Awesome Plotly with code series (Part 9): To dot, to slope or to stack?
Awesome Plotly with code series (Part 9): To dot, to slope or to stack? Simple methods to replace cluttered bar charts with crisp, reader-friendly visuals. Continue reading on Towards Data Science » Jose Parreño Go to original source
-
5 Essential Tips Learned from My Data Science Journey
5 Essential Tips Learned from My Data Science Journey Personal reflections on my 10-year data odyssey Continue reading on Towards Data Science » Federico Rucci Go to original source
-
How to Make a Data Science Portfolio That Stands Out
How to Make a Data Science Portfolio That Stands Out Create a data science portfolio with Cloud-flare and HUGO Continue reading on Towards Data Science » Egor Howell Go to original source
-
Sparse AutoEncoder: from Superposition to interpretable features
Sparse AutoEncoder: from Superposition to interpretable features Disentangle features in complex Neural Network with superpositions Complex neural networks, such as Large Language Models (LLMs), suffer quite often from interpretability challenges. One of the most important reasons for such difficulty is superposition — a phenomenon of the neural network having fewer dimensions than the number of features it…
-
Are Data Scientists at Risk in 2025?
Are Data Scientists at Risk in 2025? The impact of AI on data science jobs. Continue reading on Towards Data Science » Natassha Selvaraj Go to original source
-
DeepSeek V3: A New Contender in AI-Powered Data Science
DeepSeek V3: A New Contender in AI-Powered Data Science How DeepSeek’s budget-friendly AI model stacks up against ChatGPT, Claude, and Gemini in SQL, EDA, and machine learning Continue reading on Towards Data Science » Yu Dong Go to original source
-
How Likely Is a Six Nations Grand Slam in 2025?
How Likely Is a Six Nations Grand Slam in 2025? Quantifying uncertainty in sports fixtures Photo by Thomas Serer on Unsplash Introduction For rugby fans the long wait is nearly over, like Christmas the Six Nations comes once a year to lift our spirits in the cold winter months. If you’re not very familiar with rugby, the…
-
2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy
2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU Continue reading on Towards Data Science » Benjamin Marie Go to original source
-
Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data
Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data How much data does AI really need? TLDR: Data-centric AI can create more efficient and accurate models. I experimented with data pruning on MNIST¹ to classify handwritten digits. Best runs for “furthest-from-centroid” selection compared to full dataset. Image by author. What if I told you…
-
Actually, Being a Data Scientist is Awesome
Actually, Being a Data Scientist is Awesome Don’t let the doom and gloom get to you Continue reading on Towards Data Science » Marina Wyss – Gratitude Driven Go to original source
-
Navigating Data Science Content: Recognizing Common Pitfalls, Part 1
Navigating Data Science Content: Recognizing Common Pitfalls, Part 1 Uncovering and correcting misconceptions in online data science content to help you learn more effectively Continue reading on Towards Data Science » Geremie Yeo Go to original source
-
Great Books for AI Engineering
Great Books for AI Engineering 10 books with valuable insights about AI science and engineering Great books for AI Engineering — Plus ‘Brave New Words’ (Image is Author’s own work) A few years ago I recommended 21 books in Great Books for Data Science and Great Books for Data Science 2. Since then a lot has changed. While…
-
NLP Illustrated, Part 3: Word2Vec
NLP Illustrated, Part 3: Word2Vec An exhaustive and illustrated guide to Word2Vec with code! Continue reading on Towards Data Science » Shreya Rao Go to original source
-
The Challenges and Realities of Being a Data Scientist
The Challenges and Realities of Being a Data Scientist Some harsh truths behind the field of data science Continue reading on Towards Data Science » Egor Howell Go to original source
-
Machine Learning Incidents in AdTech
Machine Learning Incidents in AdTech Source: https://unsplash.com/photos/a-couple-of-signs-that-are-on-a-fence-xXbQIrWH2_A Challenges with deep learning in production One of the biggest challenges I encountered in my career as a data scientist was migrating the core algorithms in a mobile AdTech platform from classic machine learning models to deep learning. I worked on a Demand Side Platform (DSP) for user…
-
Basics of Probability Notations
Basics of Probability Notations Union, Intersection, Independence, Disjoint, Complement: Advanced Probability for Data Science Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source
-
How GenAI Tools Have Changed My Work as a Data Scientist
How GenAI Tools Have Changed My Work as a Data Scientist An overview of the 4 use cases and 6 GenAI tools I use Continue reading on Towards Data Science » Jonte Dancker Go to original source
-
Who is Right? The Dean or the Students?
Who is Right? The Dean or the Students? A cautionary tale on two perspectives on averaging Continue reading on Towards Data Science » Paolo Molignini, PhD Go to original source
-
Build a Decision Tree in Polars from Scratch
Build a Decision Tree in Polars from Scratch Explore decision trees with polars backend Photo by Leonard Laub on Unsplash Decision tree algorithms have always fascinated me. They are easy to implement and achieve good results on various classification and regression tasks. Combined with boosting, decision trees are still state-of-the-art in many applications. Frameworks such as sklearn,…
-
Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus
Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus Why descriptive statistics aren’t enough and plotting your data is always essential Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source
-
Your Neural Network Can’t Explain This. TMLE to the Rescue!
Your Neural Network Can’t Explain This. TMLE to the Rescue! Targeted Maximum Likelihood Estimation (TMLE) helps you explain patterns where other techniques fall short Continue reading on Towards Data Science » Ari Joury, PhD Go to original source
-
Optimising Budgets With Marketing Mix Models In Python
Optimising Budgets With Marketing Mix Models In Python Part 3 of a hands-on guide to help you master MMM in pymc Photo by Towfiqu barbhuiya on Unsplash What is this series about? Welcome to part 3 of my series on marketing mix modelling (MMM), a hands-on guide to help you master MMM. Throughout this series, we’ll cover key…
-
How Cheap Mortgages Transformed Poland’s Real Estate Market
How Cheap Mortgages Transformed Poland’s Real Estate Market Insights from a synthetic control group Continue reading on Towards Data Science » Lukasz Szubelak Go to original source
-
Deep Learning for Click Prediction in Mobile AdTech
Deep Learning for Click Prediction in Mobile AdTech Source: https://pixabay.com/illustrations/rays-stars-light-explosion-galaxy-9350519/ Machine Learning for Real-Time Bidding The past few years were a revolution for the mobile advertising and gaming industries, with the broad adoption of neural networks for advertising tasks, including click prediction. This migration occurred prior to the success of Large Language Models (LLMs) and…
-
Multi-Headed Cross Attention — By Hand
Multi-Headed Cross Attention — By Hand Hand computing a fundamental component of multimodal models Continue reading on Towards Data Science » Daniel Warfield Go to original source
-
Does It Matter That Online Experiments Interact?
Does It Matter That Online Experiments Interact? What interactions do, why they are just like any other change in the environment post-experiment, and some reassurance Photo by Uriel Soberanes on Unsplash Experiments do not run one at a time. At any moment, hundreds to thousands of experiments run on a mature website. The question comes up:…
-
Avoid These Easily Missed Mistakes in Machine Learning Workflows — Part 2
Avoid These Easily Missed Mistakes in Machine Learning Workflows — Part 2 Using Unavailable Data at Prediction Time and Mixing Magic Numbers with Real Numbers Continue reading on Towards Data Science » Thomas A Dorfer Go to original source
-
A Derivation and Application of Restricted Boltzmann Machines (2024 Nobel Prize)
A Derivation and Application of Restricted Boltzmann Machines (2024 Nobel Prize) Investigating Geoffrey Hinton’s Nobel Prize-winning work and building it from scratch using PyTorch One recipient of the 2024 Nobel Prize in Physics was Geoffrey Hinton for his contributions in the field of AI and machine learning. A lot of people know he worked on neural…
-
On a Time Crunch but Still Want to Learn to Develop Multi-Agent AI?
On a Time Crunch but Still Want to Learn to Develop Multi-Agent AI? These 3 starter projects only take a weekend (and a few cups of coffee, maybe) Continue reading on Towards Data Science » Thuwarakesh Murallie Go to original source
-
The Solar Cycle(s): history, data analysis and trend forecasting.
The Solar Cycle(s): history, data analysis and trend forecasting. The Solar Cycle(s): History, Data Analysis and Trend Forecasting A brief article on the Solar Cycles, the history behind their observation, data analysis and time series forecasting for the incoming solar maximum in 2025–2026 and the next decades You have probably heard about the 11-year Solar Cycle…
-
Harmonizing and Pooling Datasets for Health Research in R
Harmonizing and Pooling Datasets for Health Research in R R code to extract data from unique datasets and combine them in one harmonized dataset ready for seamless analysis Continue reading on Towards Data Science » Rodrigo M Carrillo Larco, MD, PhD Go to original source