Spark Archives - Techfura

Intro to Apache Spark

May 8, 2024May 8, 2024 techfura

Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. Read More …

ETL with PySpark – Intro

May 8, 2024May 8, 2024 techfura

Data transformation is an essential step in the data processing pipeline, especially when working with big data platforms like PySpark. In this article, we’ll explore the different types of data transformations you can perform using PySpark, complete with easy-to-understand code Read More …