Category: data

  • From ‘Dataslows’ to Dataflows: The Gen2 Performance Revolution in Microsoft Fabric

    From ‘Dataslows’ to Dataflows: The Gen2 Performance Revolution in Microsoft Fabric Dataflows were (rightly?) considered “the slowest and least performant option” for ingesting data into Power BI/Microsoft Fabric. However, things are changing rapidly and the latest Dataflow enhancements changes how we play the game The post From ‘Dataslows’ to Dataflows: The Gen2 Performance Revolution in…

  • Confusion Matrix Made Simple: Accuracy, Precision, Recall & F1-Score

    Confusion Matrix Made Simple: Accuracy, Precision, Recall & F1-Score How to evaluate classification models and understand which metric matters the most. The post Confusion Matrix Made Simple: Accuracy, Precision, Recall & F1-Score appeared first on Towards Data Science. Nikhil Dasari Go to original source

  • End-to-End AWS RDS Setup with Bastion Host Using Terraform

    End-to-End AWS RDS Setup with Bastion Host Using Terraform Learn how to automate secure AWS infrastructure using Terraform — including VPC, public/private subnets, a MySQL RDS database, and a Bastion host for secure access. The post End-to-End AWS RDS Setup with Bastion Host Using Terraform appeared first on Towards Data Science. Yagmur Gulec Go to…

  • What Can the History of Data Tell Us About the Future of AI?

    What Can the History of Data Tell Us About the Future of AI? A 40-Year Look at Data, Business Models, and the Forces Shaping Intelligent Systems The post What Can the History of Data Tell Us About the Future of AI? appeared first on Towards Data Science. Steve Hedden Go to original source

  • How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

    How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 From architectural design to food security. The post How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 appeared first on Towards Data Science. Marco Hening Tallarico Go to…

  • From Pixels to Plots

    From Pixels to Plots How I built an AI-powered prototype to turn images into insights The post From Pixels to Plots appeared first on Towards Data Science. Jens Winkelmann Go to original source

  • Parquet File Format – Everything You Need to Know!

    Parquet File Format – Everything You Need to Know! With the amount of Data growing exponentially in the last few years, one of the biggest challenges has become finding the most optimal way to store various data flavors. Unlike in the (not so far) past, when relational databases were considered the only way to go,…

  • Stop Creating Bad DAGs — Optimize Your Airflow Environment By Improving Your Python Code

    Stop Creating Bad DAGs — Optimize Your Airflow Environment By Improving Your Python Code Stop Creating Bad DAGs — Optimize Your Airflow Environment By Improving Your Python Code Valuable tips to reduce your DAGs’ parse time and save resources. Photo by Dan Roizer on Unsplash Apache Airflow is one of the most popular orchestration tools in the data field, powering workflows…

  • Basics of Probability Notations

    Basics of Probability Notations Union, Intersection, Independence, Disjoint, Complement: Advanced Probability for Data Science Series (1) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • In Defense of Statistical Significance

    In Defense of Statistical Significance We have to draw the line somewhere Photo by Siora Photography on Unsplash It’s become something of a meme that statistical significance is a bad standard. Several recent blogs have made the rounds, making the case that statistical significance is a “cult” or “arbitrary.” If you’d like a classic polemic (and…

  • Data behind the Luck, Ambition, and a Billion-Dollar Dream: Lottery

    Data behind the Luck, Ambition, and a Billion-Dollar Dream: Lottery Using Seattle’s local retail store data for consumer patterns of the lottery (SQL, Python) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • Chi-Squared Test: Comparing Variations Through Soccer

    Chi-Squared Test: Comparing Variations Through Soccer Understanding Different Types of Chi-Squared Tests: A/B Testing for Data Science Series (8) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • Probability Distributions: Poisson vs. Binomial Distribution

    Probability Distributions: Poisson vs. Binomial Distribution Using Soccer to Understand the Difference Between Poisson & Binomial: Probability for Data Science Series (3) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • Bayes’ Theorem: Understanding business outcomes with evidence

    Bayes’ Theorem: Understanding business outcomes with evidence A practical introduction to Bayes’ Theorem: Probability for Data Science Series (2) Continue reading on Towards Data Science » Sunghyun Ahn Go to original source

  • Data Valuation — A Concise Overview

    Data Valuation — A Concise Overview Understanding the Value of your Data: Challenges, Methods, and Applications ChatGPT and similar LLMs were trained on insane amounts of data. OpenAI and Co. scraped the internet, collecting books, articles, and social media posts to train their models. It’s easy to imagine that some of the texts (like scientific or news…