Category: data-processing

  • Real-Time Intelligence in Microsoft Fabric: The Ultimate Guide

    Real-Time Intelligence in Microsoft Fabric: The Ultimate Guide Once upon a time, handling streaming data was considered an avant-garde approach. Since the introduction of relational database management systems in the 1970s and traditional data warehousing systems in the late 1980s, all data workloads began and ended with the so-called batch processing. Batch processing relies on the concept of…

  • Preparing Video Data for Deep Learning: Introducing Vid Prepper

    Preparing Video Data for Deep Learning: Introducing Vid Prepper A guide to fast video data preprocessing for machine learning The post Preparing Video Data for Deep Learning: Introducing Vid Prepper appeared first on Towards Data Science. Jamie Petherbridge-Conroy Go to original source

  • AI Agents Processing Time Series and Large Dataframes

    AI Agents Processing Time Series and Large Dataframes Intro Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This…

  • Beginner’s Guide to Creating a S3 Storage on AWS

    Beginner’s Guide to Creating a S3 Storage on AWS Introduction AWS is a well-known cloud provider whose primary goal is to allocate server resources for software engineers to deploy their applications. AWS offers many services, one of which is EC2, providing virtual machines for running software applications in the cloud. However, for data-intensive applications, storing…

  • Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster

    Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster As we have already seen with the basic components (Part 1, Part 2), the Hadoop ecosystem is constantly evolving and being optimized for new applications. As a result, various tools and technologies have developed over time that make Hadoop more powerful and…

  • Mastering Hadoop, Part 2: Getting Hands-On — Setting Up and Scaling Hadoop

    Mastering Hadoop, Part 2: Getting Hands-On — Setting Up and Scaling Hadoop Now that we’ve explored Hadoop’s role and relevance, it’s time to show you how it works under the hood and how you can start working with it. To start, we are breaking down Hadoop’s core components — HDFS for storage, MapReduce for processing,…

  • Mastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies

    Mastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies Nowadays, a large amount of data is collected on the internet, which is why companies are faced with the challenge of being able to store, process, and analyze these volumes efficiently. Hadoop is an open-source framework from the Apache Software Foundation and has become…

  • Pandas Can’t Handle This: How ArcticDB Powers Massive Datasets

    Pandas Can’t Handle This: How ArcticDB Powers Massive Datasets Python has grown to dominate data science, and its package Pandas has become the go-to tool for data analysis. It is great for tabular data and supports data files of up to 1GB if you have a large RAM. Within these size limits, it is also…

  • Three Important Pandas Functions You Need to Know

    Three Important Pandas Functions You Need to Know Master these techniques to stand out as a Python developer Continue reading on Towards Data Science » Jiayan Yin Go to original source