Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. Read More …
Tag: Spark
ETL with PySpark – Intro
Data transformation is an essential step in the data processing pipeline, especially when working with big data platforms like PySpark. In this article, we’ll explore the different types of data transformations you can perform using PySpark, complete with easy-to-understand code Read More …