CALL NOW 713-568-9753
Spark for Developers 1x

Upcoming Classes

Ideal for small teams and individuals

see-schedule

Looking For Private Training?

We offer on-site, customized trainings.

contact-us

Spark For Developers

Overview:

This course will introduce Apache Spark. The students will learn how  Spark fits  into the Big Data ecosystem, and how to use Spark for data analysis.

This course covers the Spark version 1x (see Spark v2 training here)

What You Will Learn

  • Spark Shell
  • Spark internals,
  • Spark RDDs, Dataframes
  • Spark APIs
  • Spark SQL
  • Spark MLLib
  • Spark Graphx
  • Spark streaming

Audience :

Developers / Data Analysts

Duration :

3 days

Pre-requisites

  • familiarity with either Java / Scala / Python language (our labs in Scala and Python)
  • basic understanding of Linux development environment (command line navigation / running commands)

What to Bring:

Detailed Outline:

  1. Scala primer
    • A quick introduction to Scala
    • Labs : Getting know Scala
  2. Spark Basics
    • Background and history
    • Spark and Hadoop
    • Spark concepts and architecture
    • Spark eco system (core, spark sql, mlib, streaming)
    • Labs : Installing and running Spark
  3. Spark Shell
    • Running Spark in local mode
    • Spark web UI
    • Spark shell
    • Analyzing dataset – part 1
    • Labs: Spark shell exploration
  4. RDDs
    • RDDs concepts
    • Partitions
    • RDD Operations / transformations
    • Key-Value pair RDDs
    • MapReduce on RDD
    • Caching RDDs
    • Labs : creating & inspecting RDDs;   Caching RDDs
  5. Spark Dataframes
    • Learning about Dataframe
    • Programming in Dataframe
    • Caching and persistence
    • Evaluating performance
    • Labs : Dataframes
  6. Spark API programming
    • Introduction to Spark Dataset API
    • Submitting the first program to Spark
    • Debugging / logging
    • Configuration properties
    • Labs : Programming in Spark API, Submitting jobs
  7. Spark SQL
    • Spark SQL overview
    • Defining tables and importing datasets
    • Querying data frames using SQL
    • Storage formats : JSON / Parquet
    • Labs : Creating and querying data frames; evaluating data formats
  8. Spark ML
    • ML intro
    • ML algorithms
    • Labs : Writing mlib applications
  9. GraphX
    • GraphX library overview
    • GraphX APIs
    • Labs : Processing graph data using Spark
  10. Spark Streaming
    • Streaming overview
    • Evaluating Streaming platforms
    • Streaming operations
    • Sliding window operations
    • Labs : Writing spark streaming applications
  11. Spark and Hadoop
    • Hadoop Intro (HDFS / YARN)
    • Hadoop + Spark architecture
    • Running Spark on Hadoop YARN
    • Processing HDFS files using Spark
    • Spark and Hive
  12. Spark Performance and Tuning
    • Broadcast variables
    • Accumulators
    • Memory management & caching


Upcoming Trainings

  • Please select a session and register.
  • No payment necessary for registration.
  • Payment is due 5 days before the class to secure the spot.