Spark Archives - Techfura

Intro to Apache Spark

May 8, 2024May 8, 2024 techfura

Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. Read More …

Spark SQL – Part I

May 8, 2024May 8, 2024 techfura

The world of data is exploding. Businesses are generating massive datasets from various sources – customer transactions, sensor readings, social media feeds, and more. Analyzing this data is crucial for uncovering valuable insights, informing decisions, and gaining a competitive edge. Read More …

ETL with PySpark – Intro

May 8, 2024May 8, 2024 techfura

Data transformation is an essential step in the data processing pipeline, especially when working with big data platforms like PySpark. In this article, we’ll explore the different types of data transformations you can perform using PySpark, complete with easy-to-understand code Read More …

Spark DataFrame Cheat Sheet

May 8, 2024May 8, 2024 techfura

Core Concepts DataFrame is simply a type alias of Dataset[Row] Quick Reference val spark = SparkSession .builder() .appName(“Spark SQL basic example”) .master(“local”) .getOrCreate() // For implicit conversions like converting RDDs to DataFrames import spark.implicits._ Creation create DataSet from seq Read More …