CALL NOW 713-568-9753
Spark for Data Analysts

Upcoming Classes

Ideal for small teams and individuals

see-schedule

Looking For Private Training?

We offer on-site, customized trainings.

contact-us

Spark For Data Analysts

Overview:

This course will introduce Apache Spark. The students will learn how  Spark fits  into the Big Data ecosystem, and how to use Spark for data analysis.

What You Will Learn

  • Scala primer
  • Spark Shell
  • Spark internals,
  • Spark SQL
  • Spark & Hadoop
  • Spark MLLib (3rd day)
  • Spark Graphx (3rd day)

Audience :

Data Analysts , Business Analysts

Duration :

2-3 days (depending on coverage required)

Pre-requisites

  • Analyst background (familiarity with SQL, Scripting ..etc)
  • Basic understanding of Linux development environment (basic command line navigation / editing files / running programs)

What to Bring:

 

Detailed Outline:

  1. Scala primer
    • A quick introduction to Scala
    • Labs : Getting know Scala
  2. Spark Basics
    • Background and history
    • Spark and Hadoop
    • Spark concepts and architecture
    • Spark eco system (core, spark sql, mlib, streaming)
    • Labs : Installing and running Spark
  3. First Look at Spark
    • Running Spark in local mode
    • Spark web UI
    • Spark shell
    • Analyzing dataset – part 1
    • Inspecting RDDs
    • Labs: Spark shell exploration
  4. RDDs
    • RDDs concepts
    • Partitions
    • RDD Operations / transformations
    • RDD types
    • Key-Value pair RDDs
    • MapReduce on RDD
    • Caching and persistence
    • Labs : creating & inspecting RDDs;   Caching RDDs
  5. Spark SQL
    • SQL support in Spark
    • Dataframes
    • Defining tables and importing datasets
    • Querying data frames using SQL
    • Storage formats : JSON / Parquet
    • Labs : Creating and querying data frames; evaluating data formats
  6. Spark and Hadoop
    • Hadoop Intro (HDFS / YARN)
    • Hadoop + Spark architecture
    • Running Spark on Hadoop YARN
    • Processing HDFS files using Spark
    • Spark & Hive
  7.  Mlib (day – 3)
    • mlib intro
    • mlib algorithms
    • Labs : Writing mlib applications
  8.  GraphX (day – 3)
    • GraphX library overview
    • GraphX APIs
    • Labs : Processing graph data using Spark


Upcoming Trainings

  • Please select a session and register.
  • No payment necessary for registration.
  • Payment is due 5 days before the class to secure the spot.