Apache Spark 3 Essentials
Gain practical skills to write and optimise Spark 3 applications using the DataFrame API and Spark SQL.
Get Course Info
Audience: Developers / Data Engineers / Data Scientists
Duration: 3 days
Format: Lectures and hands‑on labs (50% lecture, 50% lab)
Overview
Apache Spark is a fast and general‑purpose engine for large‑scale data processing. This course introduces Spark 3 fundamentals, DataFrame API, and Spark SQL in a hands‑on lab environment.
Objective
Gain practical skills to write and optimise Spark 3 applications using the DataFrame API and Spark SQL.
What You Will Learn
- Spark architecture and execution model
- Working with DataFrames and the Catalyst optimiser
- Spark SQL for querying structured data
- Performance tuning basics
Course Details
Audience: Developers / Data Engineers / Data Scientists
Duration: 3 days
Format: Lectures and hands‑on labs (50% lecture, 50% lab)
Basic Python or Scala and Linux command line
Setup: Zero‑install browser labs using Databricks or standalone Spark cluster
Detailed Outline
- RDD, DataFrame, Dataset concepts
- Lazy evaluation and DAG scheduler
- Catalyst optimiser and Tungsten execution engine
- Creating DataFrames
- Transformations and actions
- Handling missing data
- Aggregation operations
- Registering temp views
- SQL vs. DataFrame operations
- Optimisation and explain plans
- Building ETL pipelines
- Analysing large datasets with Spark SQL
Ready to Get Started?
Contact us to learn more about this course and schedule your training.