Building Streaming Systems With Kafka + Spark + Cassandra

Looking for team training?

We offer excellent trainer-led courses.


Building Real Time Streaming Systems With Kafka + Spark + Cassandra

Keywords : Real time, Fast Data, Lambda architecture, Kafka, Spark, Cassandra


This course will teach students on how to build streaming systems using the popular fast data stack : Apache Kafka +  Apache Spark + Apache Cassandra.

No previous knowledge of Kakfa / Spark / Casandra is assumed.  The course will cover all the technologies and teach them how to integrate them.

What You Will Learn

  • Kafka (1 day)
  • Cassandra (1.5 days)
  • Spark (1.5 days)
  • Putting it all together (1 day)


End to End System


Lambda Architecture

Audience :

Developers / Architects

Duration :

5 days


  • Familiarity with either Java / Scala  language (our labs in Scala and Java – we provide a quick Scala introduction)
  • Basic understanding of Linux development environment (command line navigation / running commands)

Lab Environment

We provide the complete lab environment in the cloud.  No need to install any software on your laptop.
See below for what to bring.

What to Bring:

  • A reasonably modern laptop.  Need to be able to connect to cloud services. Laptops with overly restrictive firewalls are not recommended)

Detailed Outline:

  • Kafka (1 day)
    • Kafka design & architecture
    • Getting Kafka up and running
    • Using Kafka utilities
    • Reading & Writing to Kafka using Java API
    • Labs : all of the above sections
  • Cassandra (1.5 days)
    • Cassandra design & architecture
    • CQLSH
    • Read / Write path in Cassandra
    • C* eventual consistency
    • Time series data
    • Data modeling on C*
    • Using C* Java API
    • Labs : all of the above sections
  • Spark (1.5 days)
    • Scala primer (if required)
    • Spark design and architecture
    • Spark Shell
    • Spark Data structures : RDDs, Dataframes, Datasets
    • Batch analytics with Spark
    • Writing Spark applications using Spark APIs
    • Spark streaming
    • Structured streaming
    • Labs : all of the above sections
  • Putting it all together (1 day)
    • Reading  Kafka streams from Spark
    • Saving streaming data from Spark into Cassandra
    • Full end to end application
    • Benchmarking
    • Monitoring
    • Tuning and Optimizing the system
    • Labs : all of the above sections
  • Bring Your Own Use Case – Group Study (time permitting)
    • We encourage students to bring a use case they are working on at their company for discussion with the class
    • We will discuss the use case in the class
    • Discuss design choices, sketch out a few designs, debate pros/cons of each design
    • Discuss best practices
    • This will be a group activity, and will be a lots of fun !