Category: data-science
-
Exploratory Data Analysis: Gamma Spectroscopy in Python
Exploratory Data Analysis: Gamma Spectroscopy in Python Let’s observe the matter on the atomic level The post Exploratory Data Analysis: Gamma Spectroscopy in Python appeared first on Towards Data Science. Dmitrii Eliuseev Go to original source
-
A Bird’s-Eye View of Linear Algebra: Measure of a Map — Determinants
A Bird’s-Eye View of Linear Algebra: Measure of a Map — Determinants We roll up our sleeves and start to deal with matrices The post A Bird’s-Eye View of Linear Algebra: Measure of a Map — Determinants appeared first on Towards Data Science. Rohit Pandey Go to original source
-
How to Transition From Data Analyst to Data Scientist
How to Transition From Data Analyst to Data Scientist Playbook on how data analysts can become data scientists The post How to Transition From Data Analyst to Data Scientist appeared first on Towards Data Science. Egor Howell Go to original source
-
Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value
Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value What actually works with AI agents inside enterprise organizations? The post Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value appeared first on Towards Data Science. Weiwei Hu Go to original source
-
Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.
Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling. Learn how to move beyond prediction and actively make intervention through prescriptive modeling. This in-depth guide walks you through Bayesian approaches to system intervention, with practical examples in predictive maintenance. The post Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling. appeared…
-
How I Automated My Machine Learning Workflow with Just 10 Lines of Python
How I Automated My Machine Learning Workflow with Just 10 Lines of Python Use LazyPredict and PyCaret to skip the grunt work and jump straight to performance. The post How I Automated My Machine Learning Workflow with Just 10 Lines of Python appeared first on Towards Data Science. Himanshu Sharma Go to original source
-
The Role of Luck in Sports: Can We Measure It?
The Role of Luck in Sports: Can We Measure It? From last-minute goals to coin tosses: How much does randomness influence the outcomes of games? The post The Role of Luck in Sports: Can We Measure It? appeared first on Towards Data Science. Pol Marin Go to original source
-
The Journey from Jupyter to Programmer: A Quick-Start Guide
The Journey from Jupyter to Programmer: A Quick-Start Guide Explore the real benefits of ditching the notebook The post The Journey from Jupyter to Programmer: A Quick-Start Guide appeared first on Towards Data Science. Lucy Dickinson Go to original source
-
Building a Modern Dashboard with Python and Gradio
Building a Modern Dashboard with Python and Gradio Data insights made simple The post Building a Modern Dashboard with Python and Gradio appeared first on Towards Data Science. Thomas Reid Go to original source
-
Reducing Time to Value for Data Science Projects: Part 2
Reducing Time to Value for Data Science Projects: Part 2 Leveraging automation and parallelism to scale out experiments The post Reducing Time to Value for Data Science Projects: Part 2 appeared first on Towards Data Science. Kristopher McGlinchey Go to original source
-
Decision Trees Natively Handle Categorical Data
Decision Trees Natively Handle Categorical Data But mean target encoding is their turbocharger The post Decision Trees Natively Handle Categorical Data appeared first on Towards Data Science. Vadim Arzamasov Go to original source
-
LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries
LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries Local Large Language Models can convert massive DataFrames to presentable Markdown reports — here’s how. The post LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries appeared first on Towards Data Science. Dario Radečić Go to original source
-
The Secret Power of Data Science in Customer Support
The Secret Power of Data Science in Customer Support Customer support is a data goldmine. Here’s how to unlock its full potential with data science. The post The Secret Power of Data Science in Customer Support appeared first on Towards Data Science. Yu Dong Go to original source
-
A Practical Introduction to Google Analytics
A Practical Introduction to Google Analytics Learn the key concepts and reports of Google Analytics while practising with the platform The post A Practical Introduction to Google Analytics appeared first on Towards Data Science. Eugenia Anello Go to original source
-
I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know
I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know A personal guide to the skills, tools, and mindset behind the title The post I Transitioned from Data Science to AI Engineering: Here’s Everything You Need to Know appeared first on Towards Data Science. Sara Nobrega Go to original source
-
JAX: Is This Google’s NumPy killer?
JAX: Is This Google’s NumPy killer? Auto differentiation and JIT compilation make a compelling case. The post JAX: Is This Google’s NumPy killer? appeared first on Towards Data Science. Thomas Reid Go to original source
-
How Microsoft Power BI Elevated My Data Analysis and Visualization Workflow
How Microsoft Power BI Elevated My Data Analysis and Visualization Workflow Explaining useful features every data analyst needs The post How Microsoft Power BI Elevated My Data Analysis and Visualization Workflow appeared first on Towards Data Science. Benjamin Nweke Go to original source
-
Reinforcement Learning Made Simple: Build a Q-Learning Agent in Python
Reinforcement Learning Made Simple: Build a Q-Learning Agent in Python Inspired by AlphaGo’s Move 37 — learn how agents explore, exploit, and win The post Reinforcement Learning Made Simple: Build a Q-Learning Agent in Python appeared first on Towards Data Science. Sarah Schürch Go to original source
-
Code Agents: The Future of Agentic AI
Code Agents: The Future of Agentic AI HuggingFace smolagents framework in action The post Code Agents: The Future of Agentic AI appeared first on Towards Data Science. Mariya Mansurova Go to original source
-
How to Generate Synthetic Data: A Comprehensive Guide Using Bayesian Sampling and Univariate Distributions
How to Generate Synthetic Data: A Comprehensive Guide Using Bayesian Sampling and Univariate Distributions Data makes the engine run in many organisations. But what if the number of observations is too low or there is only expert knowledge? I will demonstrate how to generate synthetic data with applications in predictive maintenance. The post How to…
-
The Best AI Books & Courses for Getting a Job
The Best AI Books & Courses for Getting a Job A comprehensive guide to the books and courses that helped me learn AI The post The Best AI Books & Courses for Getting a Job appeared first on Towards Data Science. Egor Howell Go to original source
-
Estimating Product-Level Price Elasticities Using Hierarchical Bayesian
Estimating Product-Level Price Elasticities Using Hierarchical Bayesian Using one model to personalize ML results The post Estimating Product-Level Price Elasticities Using Hierarchical Bayesian appeared first on Towards Data Science. Derek Tran Go to original source
-
Multiple Linear Regression Analysis
Multiple Linear Regression Analysis Implementation of multiple linear regression on real data: Assumption checks, model evaluation, and interpretation of results using Python. The post Multiple Linear Regression Analysis appeared first on Towards Data Science. JUNIOR JUMBONG Go to original source
-
Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed
Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed Coding concepts that distinguish an amateur from a professional data scientist The post Inheritance: A Software Engineering Concept Data Scientists Must Know To Succeed appeared first on Towards Data Science. Benjamin Lee Go to original source
-
What Statistics Can Tell Us About NBA Coaches
What Statistics Can Tell Us About NBA Coaches Using Python to determine where NBA coaches come from and what makes them successful The post What Statistics Can Tell Us About NBA Coaches appeared first on Towards Data Science. Brayden Gerrard Go to original source
-
About Calculating Date Ranges in DAX
About Calculating Date Ranges in DAX When performing date calculations, creating date ranges can be helpful. But how can we do this, and which DAX function can help us in which case? Now you can learn more about this topic. The post About Calculating Date Ranges in DAX appeared first on Towards Data Science. Salvatore…
-
Use PyTorch to Easily Access Your GPU
Use PyTorch to Easily Access Your GPU Let’s say you are lucky enough to have access to a system with an Nvidia Graphical Processing Unit (Gpu). Did you know there is an absurdly easy method to use your GPU’s capabilities using a Python library intended and predominantly used for machine learning (ML) applications? Don’t worry…
-
Top Machine Learning Jobs and How to Prepare For Them
Top Machine Learning Jobs and How to Prepare For Them These days, job titles like data scientist, machine learning engineer, and Ai Engineer are everywhere — and if you were anything like me, it can be hard to understand what each of them actually does if you are not working within the field. And then there are titles…
-
I Teach Data Viz with a Bag of Rocks
I Teach Data Viz with a Bag of Rocks Last Thursday, my co-instructor and I showed up to the Data Visualization course we teach at the University of Washington with a bag of rocks. The bag consisted of a fairly diverse collection that I myself put together across a set of treks in various regions…
-
Optimizing Multi-Objective Problems with Desirability Functions
Optimizing Multi-Objective Problems with Desirability Functions When working in Data Science, it is not uncommon to encounter problems with competing objectives. Whether designing products, tuning algorithms or optimizing portfolios, we often need to balance several metrics to get the best possible outcome. Sometimes, maximizing one metrics comes at the expense of another, making it hard…
-
Understanding Random Forest using Python (scikit-learn)
Understanding Random Forest using Python (scikit-learn) Decision trees are a popular supervised learning algorithm with benefits that include being able to be used for both regression and classification as well as being easy to interpret. However, decision trees aren’t the most performant algorithm and are prone to overfitting due to small variations in the training…
-
How to Learn the Math Needed for Machine Learning
How to Learn the Math Needed for Machine Learning Maths can be a scary topic for people. Many of you want to work in machine learning, but the maths skills needed may seem overwhelming. I am here to tell you that it’s nowhere as intimidating as you may think and to give you a roadmap, resources,…
-
How To Build a Benchmark for Your Models
How To Build a Benchmark for Your Models I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among most of the clients I worked with: They rarely have a clear idea of…
-
🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem
🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem The Monty Hall Problem is a well-known brain teaser from which we can learn important lessons in Decision Making that are useful in general and in particular for data scientists. If you are not familiar with this problem, prepare to be perplexed . If you…
-
The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated
The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated The saying goes that 80% of data collected, stored and maintained by governments can be associated with geographical locations. Although never empirically proven, it illustrates the importance of location within data. Ever growing data volumes put constraints on systems that handle geospatial data. Common Big…
-
Strength in Numbers: Ensembling Models with Bagging and Boosting
Strength in Numbers: Ensembling Models with Bagging and Boosting Bagging and boosting are two powerful ensemble techniques in machine learning – they are must-knows for data scientists! After reading this article, you are going to have a solid understanding of how bagging and boosting work and when to use them. We’ll cover the following topics,…
-
Efficient Graph Storage for Entity Resolution Using Clique-Based Compression
Efficient Graph Storage for Entity Resolution Using Clique-Based Compression In the world of entity resolution (ER), one of the central challenges is managing and maintaining the complex relationships between records. At its core, Tilores models entities as graphs: each node represents a record, and edges represent rule-based matches between those records. This approach gives us…
-
Parquet File Format – Everything You Need to Know!
Parquet File Format – Everything You Need to Know! With the amount of Data growing exponentially in the last few years, one of the biggest challenges has become finding the most optimal way to store various data flavors. Unlike in the (not so far) past, when relational databases were considered the only way to go,…
-
Survival Analysis When No One Dies: A Value-Based Approach
Survival Analysis When No One Dies: A Value-Based Approach Survival Analysis is a statistical approach used to answer the question: “How long will something last?” That “something” could range from a patient’s lifespan to the durability of a machine component or the duration of a user’s subscription. One of the most widely used tools in…
-
Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware
Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware Summary of This Study Hardware choices – specifically hardware type and its quantity – along with training time, have a significant positive impact on energy, water, and carbon footprints during AI model training, whereas architecture-related factors do not. The interaction between…
-
Pause Your ML Pipelines for Human Review Using AWS Step Functions + Slack
Pause Your ML Pipelines for Human Review Using AWS Step Functions + Slack Have you ever wanted to pause an automated workflow to wait for a human decision? Maybe you need approval before provisioning cloud resources, promoting a machine learning model to production, or charging a customer’s credit card. In many data science and machine learning…
-
Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis
Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis Although normal distributions are the most commonly used, a lot of real-world data unfortunately is not normal. When faced with extremely skewed data, it’s tempting for us to utilize log transformations to normalize the distribution and stabilize the variance. I…
-
Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models
Time Series Forecasting Made Simple (Part 2): Customizing Baseline Models Thank you for the kind response to Part 1, it’s been encouraging to see so many readers interested in time series forecasting. In Part 1 of this series, we broke down time series data into trend, seasonality, and noise, discussed when to use additive versus…
-
Model Compression: Make Your Machine Learning Models Lighter and Faster
Model Compression: Make Your Machine Learning Models Lighter and Faster Introduction Whether you’re preparing for interviews or building Machine Learning systems at your job, model compression has become a must-have skill. In the era of LLMs, where models are getting larger and larger, the challenges around compressing these models to make them more efficient, smaller,…
-
The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics
The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics This is a follow-up to my earlier article: The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines. My first article focused on how visualizations can be used to mislead, diving into a form of data presentation widely used in public matters. In this article,…
-
The Shadow Side of AutoML: When No-Code Tools Hurt More Than Help
The Shadow Side of AutoML: When No-Code Tools Hurt More Than Help Automl has become the gateway drug to machine learning for many organizations. It promises exactly what teams under pressure want to hear: you bring the data, and we’ll handle the modeling. There are no pipelines to manage, no hyperparameters to tune, and no…
-
Generating Data Dictionary for Excel Files Using OpenPyxl and AI Agents
Generating Data Dictionary for Excel Files Using OpenPyxl and AI Agents Introduction Every company I worked for until today, there it was: the resilient MS Excel. Excel was first released in 1985 and has remained strong until today. It has survived the rise of relational databases, the evolution of many programming languages, the Internet with…
-
A Practical Guide to BERTopic for Transformer-Based Topic Modeling
A Practical Guide to BERTopic for Transformer-Based Topic Modeling Topic modeling has a wide range of use cases in the natural language processing (NLP) domain, such as document tagging, survey analysis, and content organization. It falls under the realm of unsupervised learning technique, making it a very cost-effective technique that reduces the resources required to…
-
Real-Time Interactive Sentiment Analysis in Python
Real-Time Interactive Sentiment Analysis in Python You know what the best part of being an engineer is? You can just build stuff. It’s like a superpower. One rainy afternoon I had this random idea of creating a sentiment visualization of a text input with a smiley face that changes it’s expression base on how positive…
-
From RGB to HSV — and Back Again
From RGB to HSV — and Back Again Introduction A fundamental concept in Computer Vision is understanding how images are stored and represented. On disk, image files are encoded in various ways, from lossy, compressed JPEG files to lossless PNG files. Once you load an image into a program and decode it from the respective…
-
Regression Discontinuity Design: How It Works and When to Use It
Regression Discontinuity Design: How It Works and When to Use It Regression Discontinuity Design: How It Works and When to Use It You’re an avid data scientist and experimenter. You know that randomisation is the summit of Mount Evidence Credibility, and you also know that when you can’t randomise, you resort to observational data and…
-
Retrieval Augmented Classification: Improving Text Classification with External Knowledge
Retrieval Augmented Classification: Improving Text Classification with External Knowledge Text Classification stands as one of the most basic yet most important applications of natural language processing. It has a vital role in many real-world applications that go from filtering unwanted emails like spam, detecting product categories or classifying user intent in a chat-bot application. The…
-
How I Built Business-Automating Workflows with AI Agents
How I Built Business-Automating Workflows with AI Agents AI agents and automation are no longer just a trend — they are transforming how companies operate. In a previous article, I shared several case studies of AI Agents supporting the sustainability roadmaps of small, medium and large companies. AI Agents for Sustainability — (Image by Samir Saci) This is part of a…
-
Making Sense of KPI Changes
Making Sense of KPI Changes As analysts, we are usually monitoring metrics. Quite often, metrics change. And when they do, it’s our job to figure out what’s going on: why did the conversion rate suddenly drop, or what is driving consistent revenue growth? I started my journey in data analytics as a Kpi analyst. For almost…
-
Fine-Tuning vLLMs for Document Understanding
Fine-Tuning vLLMs for Document Understanding In this article, I discuss how you can fine-tune VLMs (visual large language models, often called vLLMs) like Qwen 2.5 VL 7B. I will introduce you to a dataset of handwritten digits, which the base version of Qwen 2.5 VL struggles with. We will then inspect the dataset, annotate it,…
-
Build and Query Knowledge Graphs with LLMs
Build and Query Knowledge Graphs with LLMs Knowledge Graphs are relevant A Knowledge Graph could be defined as a structured representation of information that connects concepts, entities, and their relationships in a way that mimics human understanding. It is often used to organise and integrate data from various sources, enabling machines to reason, infer, and retrieve relevant…
-
The Shape‑First Tune‑Up Provides Organizations with a Means to Reduce MongoDB Expenses by 79%
The Shape‑First Tune‑Up Provides Organizations with a Means to Reduce MongoDB Expenses by 79% TL;DR A fast‑growing SaaS woke up to a silent auto‑scale from M20 → M60, adding 20 % to their cloud bill overnight. In a frantic 48‑hour sprint we: flattened N + 1 waterfalls with $lookup , tamed unbounded cursors with projection,…
-
Agentic AI 101: Starting Your Journey Building AI Agents
Agentic AI 101: Starting Your Journey Building AI Agents Introduction The Artificial Intelligence industry is moving fast. It is impressive and many times overwhelming. I have been studying, learning, and building my foundations in this area of Data Science because I believe that the future of Data Science is strongly correlated with the development of…
-
Rust for Python Developers: Why You Should Take a Look at the Rust Programming Language
Rust for Python Developers: Why You Should Take a Look at the Rust Programming Language The programming language Rust is now appearing in many feeds as it offers a performant and secure way to write programs and places great emphasis on performance. If you come from the Python world of Pandas, Jupyter or Flask, you might think that…
-
How Would I Learn to Code with ChatGPT if I Had to Start Again
How Would I Learn to Code with ChatGPT if I Had to Start Again Coding has been a part of my life since I was 10. From modifying HTML & CSS for my Friendster profile during the simple internet days to exploring SQL injections for the thrill, building a three-legged robot for fun, and lately…
-
Why Are Convolutional Neural Networks Great For Images?
Why Are Convolutional Neural Networks Great For Images? The Universal Approximation Theorem states that a neural network with a single hidden layer and a nonlinear activation function can approximate any continuous function. Practical issues aside, such that the number of neurons in this hidden layer would grow enormously large, we do not need other network architectures. A simple…
-
Data Analyst or Data Engineer or Analytics Engineer or BI Engineer ?
Data Analyst or Data Engineer or Analytics Engineer or BI Engineer ? If you’ve followed me for a while, you probably know I started my career as a QA engineer before transitioning into the world of data analytics. I didn’t go to school for it, didn’t have a mentor, and didn’t land in a formal training…
-
AI Agents for a More Sustainable World
AI Agents for a More Sustainable World As political support for sustainability weakens, the need for long-term sustainable practices has never been more critical. How can we use analytics, boosted by agentic AI, to support companies in their green transformation? For years, the focus of my blog was always on using Supply Chain Analytics methodologies…
-
If I Wanted to Become a Machine Learning Engineer, I’d Do This
If I Wanted to Become a Machine Learning Engineer, I’d Do This If I wanted to become a machine learning engineer again, this is the exact process I would follow. Let’s get into it! First become a data scientist or software engineer I’ve said it before, but a machine learning engineer is not exactly an entry-level position.…
-
How to Ensure Your AI Solution Does What You Expect iI to Do
How to Ensure Your AI Solution Does What You Expect iI to Do Generative AI (GenAI) is evolving fast — and it’s no longer just about fun chatbots or impressive image generation. 2025 is the year where the focus is on turning the AI hype into real value. Companies everywhere are looking into ways to…
-
Struggling to Land a Data Role in 2025? These 5 Tips Will Change That
Struggling to Land a Data Role in 2025? These 5 Tips Will Change That Breaking into the tech world is no longer as easy (or glamorous) as it used to be. Lots of people are finding it difficult to find their way into the current tech market. This can be due to lots of reasons…
-
NumExpr: The “Faster than Numpy” Library Most Data Scientists Have Never Used
NumExpr: The “Faster than Numpy” Library Most Data Scientists Have Never Used Browsing GitHub the other day, I came across a library I’d never heard of before. It was called NumExpr. I was immediately interested because of some claims made about the library. In particular, it stated that for some complex numerical calculations, it was…
-
LLM Evaluations: from Prototype to Production
LLM Evaluations: from Prototype to Production Evaluation is the cornerstone of any machine learning product. Investing in quality measurement delivers significant returns. Let’s explore the potential business benefits. As management consultant and writer Peter Drucker once said, “If you can’t measure it, you can’t improve it.” Building a robust evaluation system helps you identify areas…
-
Government Funding Graph RAG
Government Funding Graph RAG In this article, I present my latest open-source project — Government Funding Graph. The inspiration for this project came from a desire to make better tooling for grant writing, namely to suggest research topics, funding bodies, research institutions, and researchers. I have made Innovate UK grant applications in the past, so I have…
-
Predicting the NBA Champion with Machine Learning
Predicting the NBA Champion with Machine Learning Every NBA season, 30 teams compete for something only one will achieve: the legacy of a championship. From power rankings to trade deadline chaos and injuries, fans and analysts alike speculate endlessly about who will raise the Larry O’Brien Trophy. But what if we could go beyond the hot…
-
Exporting MLflow Experiments from Restricted HPC Systems
Exporting MLflow Experiments from Restricted HPC Systems Many High-Performance Computing (HPC) environments, especially in research and educational institutions, restrict communications to outbound TCP connections. Running a simple command-line ping or curl with the MLflow tracking URL on the HPC bash shell to check packet transfer can be successful. However, communication fails and times out while…
-
Data Science: From School to Work, Part IV
Data Science: From School to Work, Part IV Introduction Let’s start with a simple example that will appeal to most of us. If you want to check if the blinkers of your car are working properly, you sit in the car, turn on the ignition and test a turn signal to see if the front…
-
AI Agents Processing Time Series and Large Dataframes
AI Agents Processing Time Series and Large Dataframes Intro Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This…
-
Building a Personal API for Your Data Projects with FastAPI
Building a Personal API for Your Data Projects with FastAPI How many times have you had a messy Jupyter Notebook filled with copy-pasted code just to re-use some data wrangling logic? Whether you do it for passion or for work, if you code a lot, then you’ve probably answered something like “way too many”. You’re…
-
How to Write Queries for Tabular Models with DAX
How to Write Queries for Tabular Models with DAX Introduction EVALUATE is the statement to query tabular models. Unfortunately, knowing SQL or any other query language doesn’t help as EVALUATE follows a different concept. EVALUATE has only two “Parameters”: A table to show A sort order (ORDER BY) You can pass a third parameter (START…
-
Google’s New AI System Outperforms Physicians in Complex Diagnoses
Google’s New AI System Outperforms Physicians in Complex Diagnoses Imagine going to the doctor with a baffling set of symptoms. Getting the right diagnosis quickly is crucial, but sometimes even experienced physicians face challenges piecing together the puzzle. Sometimes it might not be something serious at all; others a deep investigation might be required. No…
-
When Predictors Collide: Mastering VIF in Multicollinear Regression
When Predictors Collide: Mastering VIF in Multicollinear Regression In regression models, the independent variables must be not or only slightly dependent on each other, i.e. that they are not correlated. However, if such a dependency exists, this is referred to as Multicollinearity and leads to unstable models and results that are difficult to interpret. The…
-
Plotly’s AI Tools Are Redefining Data Science Workflows
Plotly’s AI Tools Are Redefining Data Science Workflows Is there anything more frustrating than building a powerful data model but then struggling to turn it into a tool stakeholders can use to achieve their desired outcome? Data Science has never been short on potential but is also never short on complexity. You can refine algorithms…
-
An LLM-Based Workflow for Automated Tabular Data Validation
An LLM-Based Workflow for Automated Tabular Data Validation This article is part of a series of articles on automating data cleaning for any tabular dataset: Effortless Spreadsheet Normalisation With LLM You can test the feature described in this article on your own dataset using the CleanMyExcel.io service, which is free and requires no registration. What…
-
Are You Sure Your Posterior Makes Sense?
Are You Sure Your Posterior Makes Sense? This article is co-authored by Felipe Bandeira, Giselle Fretta, Thu Than, and Elbion Redenica. We also thank Prof. Carl Scheffler for his support. Introduction Parameter estimation has been for decades one of the most important topics in statistics. While frequentist approaches, such as Maximum Likelihood Estimations, used to…
-
The Basis of Cognitive Complexity: Teaching CNNs to See Connections
The Basis of Cognitive Complexity: Teaching CNNs to See Connections Liberating education consists in acts of cognition, not transferrals of information. Paulo freire One of the most heated discussions around artificial intelligence is: What aspects of human learning is it capable of capturing? Many authors suggest that artificial intelligence models do not possess the same…
-
The Invisible Revolution: How Vectors Are (Re)defining Business Success
The Invisible Revolution: How Vectors Are (Re)defining Business Success In a world that focuses more on data, business leaders must understand vector thinking. At first, vectors may appear as complicated as algebra was in school, but they serve as a fundamental building block. Vectors are as essential as algebra for tasks like sharing a bill…
-
How to Measure Real Model Accuracy When Labels Are Noisy
How to Measure Real Model Accuracy When Labels Are Noisy Ground truth is never perfect. From scientific measurements to human annotations used to train deep learning models, ground truth always has some amount of errors. ImageNet, arguably the most well-curated image dataset has 0.3% errors in human annotations. Then, how can we evaluate predictive models…
-
Ivory Tower Notes: The Problem
Ivory Tower Notes: The Problem Did you ever spend months on a Machine Learning project, only to discover you never defined the “correct” problem at the start? If so, or even if not, and you are only starting with the data science or AI field, welcome to my first Ivory Tower Note, where I will address…
-
Why CatBoost Works So Well: The Engineering Behind the Magic
Why CatBoost Works So Well: The Engineering Behind the Magic Gradient boosting is a cornerstone technique for modeling tabular data due to its speed and simplicity. It delivers great results without any fuss. When you look around you’ll see multiple options like LightGBM, XGBoost, etc. Catboost is one such variant. In this post, we will…
-
Time Series Forecasting Made Simple (Part 1): Decomposition and Baseline Models
Time Series Forecasting Made Simple (Part 1): Decomposition and Baseline Models I used to avoid time series analysis. Every time I took an online course, I’d see a module titled “Time Series Analysis” with subtopics like Fourier Transforms, autocorrelation functions and other intimidating terms. I don’t know why, but I always found a reason to avoid…
-
Mining Rules from Data
Mining Rules from Data Working with products, we might face a need to introduce some “rules”. Let me explain what I mean by “rules” in practical examples: Imagine that we’re seeing a massive wave of fraud in our product, and we want to restrict onboarding for a particular segment of customers to lower this risk. For…
-
A Data Scientist’s Guide to Docker Containers
A Data Scientist’s Guide to Docker Containers For a ML model to be useful it needs to run somewhere. This somewhere is most likely not your local machine. A not-so-good model that runs in a production environment is better than a perfect model that never leaves your local machine. However, the production machine is usually…
-
Unlock the Power of ROC Curves: Intuitive Insights for Better Model Evaluation
Unlock the Power of ROC Curves: Intuitive Insights for Better Model Evaluation We’ve all been in that moment, right? Staring at a chart as if it’s some ancient script, wondering how we’re supposed to make sense of it all. That’s exactly how I felt when I was asked to explain the AUC for the ROC…
-
Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together
Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together In recent years, there has been a proliferation of articles, LinkedIn posts, and marketing materials presenting graph data models from different perspectives. This article will refrain from discussing specific products and instead focus solely on the comparison of…
-
How I Would Learn To Code (If I Could Start Over)
How I Would Learn To Code (If I Could Start Over) According to various sources, the average salary for Coding jobs is ~£47.5k in the UK, which is ~35% higher than the median salary of about £35k. So, coding is a very valuable skill that will earn you more money, not to mention it’s really fun.…
-
Creating an AI Agent to Write Blog Posts with CrewAI
Creating an AI Agent to Write Blog Posts with CrewAI Introduction I love writing. You may notice that if you follow me or my blog. For that reason, I am constantly producing new content and talking about Data Science and Artificial Intelligence. I discovered this passion a couple of years ago when I was just…
-
Are We Watching More Ads Than Content? Analyzing YouTube Sponsor Data
Are We Watching More Ads Than Content? Analyzing YouTube Sponsor Data I’m definitely not the only person who feels that YouTube sponsor segments have become longer and more frequent recently. Sometimes, I watch videos that seem to be trying to sell me something every couple of seconds. On one hand, it’s great that both small and…
-
Linear Programming: Managing Multiple Targets with Goal Programming
Linear Programming: Managing Multiple Targets with Goal Programming This is the sixth (and likely last) part of a Linear Programming series I’ve been writing. With the core concepts covered by the prior articles, this article focuses on goal programming which is a less frequent linear programming (LP) use case. Goal programming is a specific linear…
-
PyScript vs. JavaScript: A Battle of Web Titans
PyScript vs. JavaScript: A Battle of Web Titans We’re delving into frontend web development today, and you might be thinking: what does this have to do with Data Science? Why is Towards Data Science publishing a post related to web dev? Well, because data science isn’t only about building powerful models, engaging in advanced analytics,…
-
4 Levels of GitHub Actions: A Guide to Data Workflow Automation
4 Levels of GitHub Actions: A Guide to Data Workflow Automation Automation has become an indispensable element for ensuring operational efficiency and reliability in modern software development. GitHub Actions, an integrated Continuous Integration and Continuous Deployment (CI/CD) tool within GitHub, has established its position in the software development industry by providing a comprehensive platform for…
-
Create Your Supply Chain Analytics Portfolio to Land Your Dream Job
Create Your Supply Chain Analytics Portfolio to Land Your Dream Job Supply chains are under pressure like never before. From climate-driven disruptions to geopolitical shifts, businesses must adapt to rising costs, new trade barriers and growing sustainability demands. In this new world where supply chains face uncertainty, Supply Chain Analytics is essential to keep resilient operations. Samir, can…
-
The Art of Hybrid Architectures
The Art of Hybrid Architectures In my previous article, I discussed how morphological feature extractors mimic the way biological experts visually assess images. This time, I want to go a step further and explore a new question:Can different architectures complement each other to build an AI that “sees” like an expert? Introduction: Rethinking Model Architecture…
-
A Little More Conversation, A Little Less Action — A Case Against Premature Data Integration
A Little More Conversation, A Little Less Action — A Case Against Premature Data Integration When I talk to [large] organisations that have not yet properly started with Data Science (DS) and Machine Learning (ML), they often tell me that they have to run a data integration project first, because “…all the data is scattered…
-
Master the 3D Reconstruction Process: A Step-by-Step Guide
Master the 3D Reconstruction Process: A Step-by-Step Guide The 3d Reconstruction journey from 2D photographs to 3D models follows a structured path. This path consists of distinct steps that build upon each other to transform flat images into spatial information. Understanding this pipeline is crucial for anyone looking to create high-quality 3D reconstructions. Let me…