Category: getting-started
-
Implementing the Hangman Game in Python
Implementing the Hangman Game in Python A beginner-friendly project to understand variables, loops, and conditions in Python The post Implementing the Hangman Game in Python appeared first on Towards Data Science. Mahnoor Javed Go to original source
-
Mastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies
Mastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies Nowadays, a large amount of data is collected on the internet, which is why companies are faced with the challenge of being able to store, process, and analyze these volumes efficiently. Hadoop is an open-source framework from the Apache Software Foundation and has become…
-
Linear Regression in Time Series: Sources of Spurious Regression
Linear Regression in Time Series: Sources of Spurious Regression 1. Introduction It’s pretty clear that most of our work will be automated by AI in the future. This will be possible because many researchers and professionals are working hard to make their work available online. These contributions not only help us understand fundamental concepts but…
-
How to Train LLMs to “Think” (o1 & DeepSeek-R1)
How to Train LLMs to “Think” (o1 & DeepSeek-R1) In September 2024, OpenAI released its o1 model, trained on large-scale reinforcement learning, giving it “advanced reasoning” capabilities. Unfortunately, the details of how they pulled this off were never shared publicly. Today, however, DeepSeek (an AI research lab) has replicated this reasoning behavior and published the…
-
How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo
How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo Welcome to part 2 of my LLM deep dive. If you’ve not read Part 1, I highly encourage you to check it out first. Previously, we covered the first two major stages of training an LLM: Pre-training — Learning from massive datasets to form a base…
-
Why Data Scientists Should Care about Containers — and Stand Out with This Knowledge
Why Data Scientists Should Care about Containers — and Stand Out with This Knowledge “I train models, analyze data and create dashboards — why should I care about Containers?” Many people who are new to the world of data science ask themselves this question. But imagine you have trained a model that runs perfectly on…
-
Virtualization & Containers for Data Science Newbies
Virtualization & Containers for Data Science Newbies Virtualization makes it possible to run multiple virtual machines (VMs) on a single piece of physical hardware. These VMs behave like independent computers, but share the same physical computing power. A computer within a computer, so to speak. Many cloud services rely on virtualization. But other technologies, such…
-
NLP Illustrated, Part 3: Word2Vec
NLP Illustrated, Part 3: Word2Vec An exhaustive and illustrated guide to Word2Vec with code! Continue reading on Towards Data Science » Shreya Rao Go to original source
-
Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus
Water Cooler Small Talk, Ep 7: Anscombe’s Quartet and the Datasaurus Why descriptive statistics aren’t enough and plotting your data is always essential Continue reading on Towards Data Science » Maria Mouschoutzi, PhD Go to original source
-
Choosing Classification Model Evaluation Criteria
Choosing Classification Model Evaluation Criteria Is Recall / Precision better than Sensitivity / Specificity? Continue reading on Towards Data Science » Viyaleta Apgar Go to original source
-
Large Language Models: A Short Introduction
Large Language Models: A Short Introduction And why you should care about LLMs Image by author. There’s an acronym you’ve probably heard non-stop for the past few years: LLM, which stands for Large Language Model. In this article we’re going to take a brief look at what LLMs are, why they’re an extremely exciting piece of technology, why…
-
The Concepts Data Professionals Should Know in 2025: Part 1
The Concepts Data Professionals Should Know in 2025: Part 1 From Data Lakehouses to Event-Driven Architecture — Master 12 data concepts and turn them into simple projects to stay ahead in IT. Continue reading on Towards Data Science » Sarah Lea Go to original source
-
Basics of GANs & SMOTE for Data Augmentation
Basics of GANs & SMOTE for Data Augmentation GANs and SMOTE Explained with Bartending: Data Science for Machine Learning Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source
-
Qubits Explained: Everything You Need to Know
Qubits Explained: Everything You Need to Know A deep dive into the building block of quantum computers. Continue reading on Towards Data Science » Sara A. Metwalli Go to original source
-
What is MicroPython? Do I Need to Know it as a Data Scientist?
What is MicroPython? Do I Need to Know it as a Data Scientist? In this year’s edition of the Stack Overflow survey, MicroPython is with 1.6% in the Most Popular Technologies — but why? Continue reading on Towards Data Science » Sarah Lea Go to original source
-
Missing Data in Time-Series? Machine Learning Techniques (Part 2)
Missing Data in Time-Series? Machine Learning Techniques (Part 2) Using Clustering Algorithms to Handle Missing Time-Series Data Continue reading on Towards Data Science » Sara Nóbrega Go to original source
-
How To Learn Math for Machine Learning, Fast
How To Learn Math for Machine Learning, Fast Even with zero math background Photo by Antoine Dautry on Unsplash Do you want to become a Data Scientist or machine learning engineer, but you feel intimidated by all the math involved? I get it. I’ve been there. I dropped out of High School after 10th grade, so I…
-
The State of Quantum Computing: Where Are We Today?
The State of Quantum Computing: Where Are We Today? And what we need to overcome Continue reading on Towards Data Science » Sara A. Metwalli Go to original source
-
Encapsulation: A Software Engineering Concept Data Scientists Must Know To Succeed
Encapsulation: A Software Engineering Concept Data Scientists Must Know To Succeed Simple concepts that differentiate a professional from amateurs Continue reading on Towards Data Science » Benjamin Lee Go to original source
-
Lessons from COVID-19: Why Probability Distributions Matter
Lessons from COVID-19: Why Probability Distributions Matter Understanding Distributions with Extremes: Probability for Data Science Series (END) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source
-
Superposition: What Makes it Difficult to Explain Neural Network
Superposition: What Makes it Difficult to Explain Neural Network When there are more features than model dimensions Introduction It would be ideal if the world of neural network represented a one-to-one relationship: each neuron activates on one and only one feature. In such a world, interpreting the model would be straightforward: this neuron fires for…
-
Propensity-Score Matching Is the Bedrock of Causal Inference
Propensity-Score Matching Is the Bedrock of Causal Inference And how to get started with it using Python Continue reading on Towards Data Science » Ari Joury, PhD Go to original source
-
The Algorithm That Made Google Google
The Algorithm That Made Google Google How PageRank transformed how we searched the internet, and why it’s still playing an important role in LLMs with Graph RAG. Continue reading on Towards Data Science » Cristian Leo Go to original source
-
A Case for Bagging and Boosting as Data Scientists’ Best Friends
A Case for Bagging and Boosting as Data Scientists’ Best Friends Leveraging wisdom of the crowd in ML models. Continue reading on Towards Data Science » Farzad Nobar Go to original source
-
Master Machine Learning: 4 Classification Models Made Simple
Master Machine Learning: 4 Classification Models Made Simple A Beginner’s Guide to Building Models in 15 Practical Steps Continue reading on Towards Data Science » Leo Anello Go to original source
-
How to Interpret Matrix Expressions — Transformations
How to Interpret Matrix Expressions — Transformations Matrix algebra for a data scientist Photo by Ben Allan on Unsplash This article begins a series for anyone who finds matrix algebra overwhelming. My goal is to turn what you’re afraid of into what you’re fascinated by. You’ll find it especially helpful if you want to understand machine learning concepts…
-
Why “Statistical Significance” Is Pointless
Why “Statistical Significance” Is Pointless Here’s a better framework for data-driven decision-making Continue reading on Towards Data Science » Samuele Mazzanti Go to original source