Category: data-quality
-
Stop Writing Spaghetti if-else Chains: Parsing JSON with Python’s match-case
Stop Writing Spaghetti if-else Chains: Parsing JSON with Python’s match-case Introduction If you work in data science, data engineering, or as as a frontend/backend developer, you deal with JSON. For professionals, its basically only death, taxes, and JSON-parsing that is inevitable. The issue is that parsing JSON is often a serious pain. Whether you are…
-
How to Use Simple Data Contracts in Python for Data Scientists
How to Use Simple Data Contracts in Python for Data Scientists Stop your pipelines from breaking on Friday afternoons using simple, open-source validation with Pandera. The post How to Use Simple Data Contracts in Python for Data Scientists appeared first on Towards Data Science. Eirik Berge Go to original source
-
Work Data Is the Next Frontier for GenAI
Work Data Is the Next Frontier for GenAI 9 reasons why work data is the single most valuable data source for LLM training, uniquely capable of propelling LLM performance to unprecedented heights. The post Work Data Is the Next Frontier for GenAI appeared first on Towards Data Science. Zsombor Varnagy-Toth Go to original source
-
Change-Aware Data Validation with Column-Level Lineage
Change-Aware Data Validation with Column-Level Lineage Data transformation tools like dbt make constructing SQL data pipelines easy and systematic. But even with the added structure and clearly defined data models, pipelines can still become complex, which makes debugging issues and validating changes to data models difficult. The post Change-Aware Data Validation with Column-Level Lineage appeared…
-
Data Has No Moat!
Data Has No Moat! Only if you ignore data quality The post Data Has No Moat! appeared first on Towards Data Science. Fabiana Clemente Go to original source
-
An LLM-Based Workflow for Automated Tabular Data Validation
An LLM-Based Workflow for Automated Tabular Data Validation This article is part of a series of articles on automating data cleaning for any tabular dataset: Effortless Spreadsheet Normalisation With LLM You can test the feature described in this article on your own dataset using the CleanMyExcel.io service, which is free and requires no registration. What…
-
The Case Against Centralized Medallion Architecture
The Case Against Centralized Medallion Architecture Why tailored, decentralized data quality trumps the medallion architecture Continue reading on Towards Data Science » Bernd Wessely Go to original source