Skip to course content

Apache Spark 3 Essentials

Gain practical skills to write and optimise Spark 3 applications using the DataFrame API and Spark SQL.

Get Course Info

Audience: Developers / Data Engineers / Data Scientists

Duration: 3 days

Format: Lectures and hands‑on labs (50% lecture, 50% lab)

Overview

Apache Spark is a fast and general‑purpose engine for large‑scale data processing. This course introduces Spark 3 fundamentals, DataFrame API, and Spark SQL in a hands‑on lab environment.

Objective

Gain practical skills to write and optimise Spark 3 applications using the DataFrame API and Spark SQL.

What You Will Learn

  • Spark architecture and execution model
  • Working with DataFrames and the Catalyst optimiser
  • Spark SQL for querying structured data
  • Performance tuning basics

Course Details

Audience: Developers / Data Engineers / Data Scientists

Duration: 3 days

Format: Lectures and hands‑on labs (50% lecture, 50% lab)

Prerequisites:

Basic Python or Scala and Linux command line

Setup: Zero‑install browser labs using Databricks or standalone Spark cluster

Detailed Outline

  • RDD, DataFrame, Dataset concepts
  • Lazy evaluation and DAG scheduler
  • Catalyst optimiser and Tungsten execution engine
  • Creating DataFrames
  • Transformations and actions
  • Handling missing data
  • Aggregation operations
  • Registering temp views
  • SQL vs. DataFrame operations
  • Optimisation and explain plans
  • Building ETL pipelines
  • Analysing large datasets with Spark SQL

Ready to Get Started?

Contact us to learn more about this course and schedule your training.