Cassandra For Developers

Introduction to Apache Cassandra

February 10, 2022

Overview

Modern, large-scale applications involve dealing with Big Data, which is often larger than what traditional databases (RDBMS) can handle.

The Cassandra (C*) is a massively scalable NoSQL database that provides high availability and fault tolerance.

This hands-on course will introduce Cassandra, concepts, data modeling, and CQL (Cassandra Query Language). The focus is practical aspects of working with C* effectively. We will also cover “anti-patterns” and best practices, that will lead to optimal C* implementations in high-performance production systems.

What You Will Learn

NoSQL concepts
Cassandra’s concepts and architecture
Setting up and running C*
Setting up C* and your IDE
CQL (Cassandra Query Language)
Data modeling in CQL
Using APIs to interact with Cassandra
Understand C* internals (read/write path)
Deletion and compaction
C* administration
C* case studies
C* data modeling
C* workshop (time permitting)

Audience

Developers, Architects, Database admins

Skill Level

Introductory – Intermediate

Duration

3 days

Prerequisites

comfortable with Java programming language
comfortable in Linux environment (navigating command line, running commands)

Lab environment

A cloud-based lab environment will be provided.

Students will need the following

A reasonably modern laptop with unrestricted connection to the Internet. Laptops with overly restrictive VPNs or firewalls may not work properly
Chrome browser
SSH client for your platform

Detailed Outline

Introduction to Big Data / NoSQL

Big Data challenges vs RDBMS
NoSQL overview
CAP theorem
When is NoSQL appropriate
Columnar storage
NoSQL ecosystem

Cassandra Essentials

C* architecture overview
C* clusters, rings, nodes
Keyspaces, tables, rows and columns
Partitioning, replication, tokens
Quorum and consistency levels
Labs: installing Cassandra, interacting with Cassandra using CQLSH

Data Modeling – part 1

introduction to CQL
CQL Datatypes
Creating keyspaces and tables
Choosing columns and types
Choosing primary keys
Data layout for rows and columns
Time to live (TTL)
Querying with CQL
CQL updates
Collections (list, map, and set)
Labs: various data modeling exercises using CQL; experimenting with queries and supported data types

Data Modeling – part 2

Creating and using secondary indexes
Composite keys (partition keys and clustering keys)
Time series data
Best practices for time series data
Counters
Lightweight transactions (LWT)
Labs: creating and using indexes; modeling time series data

**C* Java API**

Introduction to Java driver
CRUD (Create / Read / Update, Delete) operations using Java client
Asynchronous queries
Labs: using Java API for Cassandra

**C* Internals**

Understand Cassandra design under the hood
Partitioners, gossip protocols, snitches
sstables, memtables, commit log
Read path, write path
Deletions, compactions, tombstones
Failure handling
Caching

**C* Admin**

Hardware selection
Software dependencies
Cassandra distributions
Lab: students install Cassandra, run benchmarks

**C* Best Practices**

C* best practices
Performance tuning
Troubleshooting tools and tips
“Anti-patterns” – how NOT to use C*

**C* Case Studies**

We will look at some C* use cases in the industry. Study their system architecture, best practices, and recommendations. This gives attendees a good sense of how C* is being used in real-world use cases.

**C* Data Modeling labs**

Attendees will work as teams
Multiple use cases from various domains are presented
Students work in groups to come up with designs and models, discuss various designs, analyze decisions

**C* Workshop (Time permitting)**

In this section, attendees will implement a real-world use case using C*
Attendees will work as teams
Each team will come up with data models for C* and implement them and test them
Also, teams are encouraged to present their solution to the class. We will discuss, provide feedback and learn from each other
Possible project ideas:

Implement a Slack-like messaging system. Come up with data models for users, messages and group chats
Implement a music service like Spotify. Come up with data models for songs, users, ratings
Implement a stock quotes tracking system. Come up with models for stock tickets, prices (time series data)