Skip to course content

Machine Learning with Apache Spark

Use Spark MLlib to build, tune, and deploy scalable Machine-Learning pipelines on large datasets.

Get Course Info

Audience: Data Analysts, Software Engineers, Data Scientists

Duration: 4 days

Format: Lectures and hands-on labs (50% lecture, 50% lab)

Overview

Machine Learning (ML) is changing the world. Apache Spark provides a scalable platform that makes it possible to analyse massive datasets. This course introduces the most popular ML algorithms and shows how to implement them using Spark MLlib, with an emphasis on practical, hands-on experience rather than heavy maths/stats theory.

Objective

Use Spark MLlib to build, tune, and deploy scalable Machine-Learning pipelines on large datasets.

What You Will Learn

  • Spark ecosystem & data models
  • Spark ML pipelines & utilities
  • Feature Engineering & EDA at scale
  • Regression, Classification, Clustering, PCA, Recommendations in Spark
  • Model evaluation & tuning
  • Industry use-cases across finance, healthcare, customer service, text analytics, travel, and more

Course Details

Audience: Data Analysts, Software Engineers, Data Scientists

Duration: 4 days

Format: Lectures and hands-on labs (50% lecture, 50% lab)

Prerequisites:
  • Good programming background; Python helpful but not required. No prior Spark or ML knowledge assumed.

Setup: Cloud-based lab • Modern laptop • Chrome browser

Detailed Outline

  • Spark ecosystem
  • Spark data models
  • Spark ML intro
    • ML landscape
    • AI/ML/DL definitions
    • Types of ML
    • Jupyter + Python + Spark
    • Spark ML utilities labs
    • Data prep
    • Statistics primer
    • Data clean-up
    • Feature extraction
    • Visualising data
    • Training/Testing
    • Gradient Descent
    • Over-/Under-fitting
    • Cross-validation
    • Confusion Matrix
    • ROC/AUC
    • Linear & Logistic Regression
    • SVM
    • Decision Trees
    • Random Forests
    • Naïve Bayes
    • K-Means
    • Principal Component Analysis
    • Collaborative Filtering
    • Group workshop on real datasets

    Ready to Get Started?

    Contact us to learn more about this course and schedule your training.