Category: getting-started

Implementing the Hangman Game in Python

Implementing the Hangman Game in Python A beginner-friendly project to understand variables, loops, and conditions in Python The post Implementing the Hangman Game in Python appeared first on Towards Data Science. Mahnoor Javed Go to original source

August 29, 2025
Mastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies

Mastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies Nowadays, a large amount of data is collected on the internet, which is why companies are faced with the challenge of being able to store, process, and analyze these volumes efficiently. Hadoop is an open-source framework from the Apache Software Foundation and has become…

March 12, 2025
Linear Regression in Time Series: Sources of Spurious Regression

Linear Regression in Time Series: Sources of Spurious Regression 1. Introduction It’s pretty clear that most of our work will be automated by AI in the future. This will be possible because many researchers and professionals are working hard to make their work available online. These contributions not only help us understand fundamental concepts but…

March 11, 2025
How to Train LLMs to “Think” (o1 & DeepSeek-R1)

How to Train LLMs to “Think” (o1 & DeepSeek-R1) In September 2024, OpenAI released its o1 model, trained on large-scale reinforcement learning, giving it “advanced reasoning” capabilities. Unfortunately, the details of how they pulled this off were never shared publicly. Today, however, DeepSeek (an AI research lab) has replicated this reasoning behavior and published the…

March 4, 2025
How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo Welcome to part 2 of my LLM deep dive. If you’ve not read Part 1, I highly encourage you to check it out first. Previously, we covered the first two major stages of training an LLM: Pre-training — Learning from massive datasets to form a base…

February 28, 2025
Why Data Scientists Should Care about Containers — and Stand Out with This Knowledge

Why Data Scientists Should Care about Containers — and Stand Out with This Knowledge “I train models, analyze data and create dashboards — why should I care about Containers?” Many people who are new to the world of data science ask themselves this question. But imagine you have trained a model that runs perfectly on…

February 20, 2025
Virtualization & Containers for Data Science Newbies

Virtualization & Containers for Data Science Newbies Virtualization makes it possible to run multiple virtual machines (VMs) on a single piece of physical hardware. These VMs behave like independent computers, but share the same physical computing power. A computer within a computer, so to speak. Many cloud services rely on virtualization. But other technologies, such…

February 12, 2025
NLP Illustrated, Part 3: Word2Vec

NLP Illustrated, Part 3: Word2Vec An exhaustive and illustrated guide to Word2Vec with code! Continue reading on Towards Data Science » Shreya Rao Go to original source

January 30, 2025
Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus

Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus Why descriptive statistics aren’t enough and plotting your data is always essential Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source

January 28, 2025
Choosing Classification Model Evaluation Criteria

Choosing Classification Model Evaluation Criteria Is Recall / Precision better than Sensitivity / Specificity? Continue reading on Towards Data Science » Viyaleta Apgar Go to original source

January 26, 2025
Large Language Models: A Short Introduction

Large Language Models: A Short Introduction And why you should care about LLMs Image by author. There’s an acronym you’ve probably heard non-stop for the past few years: LLM, which stands for Large Language Model. In this article we’re going to take a brief look at what LLMs are, why they’re an extremely exciting piece of technology, why…

January 22, 2025
The Concepts Data Professionals Should Know in 2025: Part 1

The Concepts Data Professionals Should Know in 2025: Part 1 From Data Lakehouses to Event-Driven Architecture — Master 12 data concepts and turn them into simple projects to stay ahead in IT. Continue reading on Towards Data Science » Sarah Lea Go to original source

January 20, 2025
Basics of GANs & SMOTE for Data Augmentation

Basics of GANs & SMOTE for Data Augmentation GANs and SMOTE Explained with Bartending: Data Science for Machine Learning Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

January 16, 2025
Qubits Explained: Everything You Need to Know

Qubits Explained: Everything You Need to Know A deep dive into the building block of quantum computers. Continue reading on Towards Data Science » Sara A. Metwalli Go to original source

January 16, 2025
What is MicroPython? Do I Need to Know it as a Data Scientist?

What is MicroPython? Do I Need to Know it as a Data Scientist? In this year’s edition of the Stack Overflow survey, MicroPython is with 1.6% in the Most Popular Technologies — but why? Continue reading on Towards Data Science » Sarah Lea Go to original source

January 13, 2025
Missing Data in Time-Series? Machine Learning Techniques (Part 2)

Missing Data in Time-Series? Machine Learning Techniques (Part 2) Using Clustering Algorithms to Handle Missing Time-Series Data Continue reading on Towards Data Science » Sara Nóbrega Go to original source

January 9, 2025
How To Learn Math for Machine Learning, Fast

How To Learn Math for Machine Learning, Fast Even with zero math background Photo by Antoine Dautry on Unsplash Do you want to become a Data Scientist or machine learning engineer, but you feel intimidated by all the math involved? I get it. I’ve been there. I dropped out of High School after 10th grade, so I…

January 8, 2025
The State of Quantum Computing: Where Are We Today?

The State of Quantum Computing: Where Are We Today? And what we need to overcome Continue reading on Towards Data Science » Sara A. Metwalli Go to original source

January 7, 2025
Encapsulation: A Software Engineering Concept Data Scientists Must Know To Succeed

Encapsulation: A Software Engineering Concept Data Scientists Must Know To Succeed Simple concepts that differentiate a professional from amateurs Continue reading on Towards Data Science » Benjamin Lee Go to original source

January 7, 2025
Lessons from COVID-19: Why Probability Distributions Matter

Lessons from COVID-19: Why Probability Distributions Matter Understanding Distributions with Extremes: Probability for Data Science Series (END) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

December 31, 2024
Superposition: What Makes it Difficult to Explain Neural Network

Superposition: What Makes it Difficult to Explain Neural Network When there are more features than model dimensions Introduction It would be ideal if the world of neural network represented a one-to-one relationship: each neuron activates on one and only one feature. In such a world, interpreting the model would be straightforward: this neuron fires for…

December 30, 2024
Propensity-Score Matching Is the Bedrock of Causal Inference

Propensity-Score Matching Is the Bedrock of Causal Inference And how to get started with it using Python Continue reading on Towards Data Science » Ari Joury, PhD Go to original source

December 23, 2024
The Algorithm That Made Google Google

The Algorithm That Made Google Google How PageRank transformed how we searched the internet, and why it’s still playing an important role in LLMs with Graph RAG. Continue reading on Towards Data Science » Cristian Leo Go to original source

December 19, 2024
A Case for Bagging and Boosting as Data Scientists’ Best Friends

A Case for Bagging and Boosting as Data Scientists’ Best Friends Leveraging wisdom of the crowd in ML models. Continue reading on Towards Data Science » Farzad Nobar Go to original source

December 17, 2024
Master Machine Learning: 4 Classification Models Made Simple

Master Machine Learning: 4 Classification Models Made Simple A Beginner’s Guide to Building Models in 15 Practical Steps Continue reading on Towards Data Science » Leo Anello Go to original source

December 15, 2024
How to Interpret Matrix Expressions — Transformations

How to Interpret Matrix Expressions — Transformations Matrix algebra for a data scientist Photo by Ben Allan on Unsplash This article begins a series for anyone who finds matrix algebra overwhelming. My goal is to turn what you’re afraid of into what you’re fascinated by. You’ll find it especially helpful if you want to understand machine learning concepts…

December 5, 2024
Why “Statistical Significance” Is Pointless

Why “Statistical Significance” Is Pointless Here’s a better framework for data-driven decision-making Continue reading on Towards Data Science » Samuele Mazzanti Go to original source

December 2, 2024