Building Streaming Systems With Kafka + Spark + Cassandra

Overview

This course will teach students how to build streaming systems using the popular fast data stack: Apache Kafka +  Apache Spark + Apache Cassandra.

No previous knowledge of Kafka / Spark / Casandra is assumed.  The course will cover all the technologies and teach them how to integrate them.

What You Will Learn

  • Kafka (1 day)
  • Cassandra (1.5 days)
  • Spark (1.5 days)
  • Putting it all together (1 day)
  • End to End System
  • Lambda Architecture

Audience 

Developers / Architects

Duration 

5 days

Pre-requisites

  • Familiarity with either Java / Scala  language (our labs in Scala and Java – we provide a quick Scala introduction)
  • Basic understanding of Linux development environment (command line navigation/running commands)

Lab Environment

We provide a complete lab environment in the cloud.  No need to install any software on your laptop.
See below for what to bring.

What to Bring

  • A reasonably modern laptop.  Need to be able to connect to cloud services. Laptops with overly restrictive firewalls are not recommended)

Detailed Outline

Kafka (1 day)

    • Kafka design & architecture
    • Getting Kafka up and running
    • Using Kafka utilities
    • Reading & Writing to Kafka using Java API
    • Labs: all of the above sections

Cassandra (1.5 days)

    • Cassandra design & architecture
    • CQLSH
    • Read / Write path in Cassandra
    • C* eventual consistency
    • Time series data
    • Data modeling on C*
    • Using C* Java API
    • Labs : all of the above sections

Spark (1.5 days)

    • Scala primer (if required)
    • Spark design and architecture
    • Spark Shell
    • Spark Data structures : RDDs, Dataframes, Datasets
    • Batch analytics with Spark
    • Writing Spark applications using Spark APIs
    • Spark streaming
    • Structured streaming
    • Labs : all of the above sections

Putting it all together (1 day)

    • Reading  Kafka streams from Spark
    • Saving streaming data from Spark into Cassandra
    • Full end to end application
    • Benchmarking
    • Monitoring
    • Tuning and Optimizing the system

Labs: all of the above sections

Bring Your Own Use Case – Group Study (time permitting)

    • We encourage students to bring a use case they are working on at their company for discussion with the class
    • We will discuss the use case in the class
    • Discuss design choices, sketch out a few designs, debate pros/cons of each design
    • Discuss best practices
    • This will be a group activity, and will be a lots of fun !