Cassandra For Developers

Introduction to Apache Cassandra

© Elephant Scale

February 10, 2022

Overview

Modern, large-scale applications involve dealing with Big Data, which is often larger than what traditional databases (RDBMS) can handle.

The Cassandra (C*) is a massively scalable NoSQL database that provides high availability and fault tolerance.

This hands-on course will introduce Cassandra, concepts, data modeling, and CQL (Cassandra Query Language). The focus is practical aspects of working with C* effectively. We will also cover “anti-patterns” and best practices, that will lead to optimal C* implementations in high-performance production systems.

What You Will Learn

  • NoSQL concepts
  • Cassandra’s concepts and architecture
  • Setting up and running C*
  • Setting up C* and your IDE
  • CQL (Cassandra Query Language)
  • Data modeling in CQL
  • Using APIs to interact with Cassandra
  • Understand C* internals (read/write path)
  • Deletion and compaction
  • C* administration
  • C* case studies
  • C* data modeling
  • C* workshop (time permitting)

Audience

Developers, Architects, Database admins

Skill Level

Introductory – Intermediate

Duration

3 days

Prerequisites

  • comfortable with Java programming language
  • comfortable in Linux environment (navigating command line, running commands)

Lab environment

A cloud-based lab environment will be provided.

Students will need the following

  • A reasonably modern laptop with unrestricted connection to the Internet. Laptops with overly restrictive VPNs or firewalls may not work properly
  • Chrome browser
  • SSH client for your platform

Detailed Outline

Introduction to Big Data / NoSQL

  • Big Data challenges vs RDBMS
  • NoSQL overview
  • CAP theorem
  • When is NoSQL appropriate
  • Columnar storage
  • NoSQL ecosystem

Cassandra Essentials

  • C* architecture overview
  • C* clusters, rings, nodes
  • Keyspaces, tables, rows and columns
  • Partitioning, replication, tokens
  • Quorum and consistency levels
  • Labs: installing Cassandra, interacting with Cassandra using CQLSH

Data Modeling – part 1

  • introduction to CQL
  • CQL Datatypes
  • Creating keyspaces and tables
  • Choosing columns and types
  • Choosing primary keys
  • Data layout for rows and columns
  • Time to live (TTL)
  • Querying with CQL
  • CQL updates
  • Collections (list, map, and set)
  • Labs: various data modeling exercises using CQL; experimenting with queries and supported data types

Data Modeling – part 2

  • Creating and using secondary indexes
  • Composite keys (partition keys and clustering keys)
  • Time series data
  • Best practices for time series data
  • Counters
  • Lightweight transactions (LWT)
  • Labs: creating and using indexes; modeling time series data

C* Java API

  • Introduction to Java driver
  • CRUD (Create / Read / Update, Delete) operations using Java client
  • Asynchronous queries
  • Labs: using Java API for Cassandra

C* Internals

  • Understand Cassandra design under the hood
  • Partitioners, gossip protocols, snitches
  • sstables, memtables, commit log
  • Read path, write path
  • Deletions, compactions, tombstones
  • Failure handling
  • Caching

C* Admin

  • Hardware selection
  • Software dependencies
  • Cassandra distributions
  • Lab: students install Cassandra, run benchmarks

C* Best Practices

  • C* best practices
  • Performance tuning
  • Troubleshooting tools and tips
  • “Anti-patterns” – how NOT to use C*

C* Case Studies

  • We will look at some C* use cases in the industry. Study their system architecture, best practices, and recommendations. This gives attendees a good sense of how C* is being used in real-world use cases.

C* Data Modeling labs

  • Attendees will work as teams
  • Multiple use cases from various domains are presented
  • Students work in groups to come up with designs and models, discuss various designs, analyze decisions

C* Workshop (Time permitting)

  • In this section, attendees will implement a real-world use case using C*
  • Attendees will work as teams
  • Each team will come up with data models for C* and implement them and test them
  • Also, teams are encouraged to present their solution to the class. We will discuss, provide feedback and learn from each other
  • Possible project ideas:
  • Implement a Slack-like messaging system. Come up with data models for users, messages and group chats
  • Implement a music service like Spotify. Come up with data models for songs, users, ratings
  • Implement a stock quotes tracking system. Come up with models for stock tickets, prices (time series data)