Machine Learning with Apache Spark
Use Spark MLlib to build, tune, and deploy scalable Machine-Learning pipelines on large datasets.
Get Course Info
Audience: Data Analysts, Software Engineers, Data Scientists
Duration: 4 days
Format: Lectures and hands-on labs (50% lecture, 50% lab)
Overview
Machine Learning (ML) is changing the world. Apache Spark provides a scalable platform that makes it possible to analyse massive datasets. This course introduces the most popular ML algorithms and shows how to implement them using Spark MLlib, with an emphasis on practical, hands-on experience rather than heavy maths/stats theory.
Objective
Use Spark MLlib to build, tune, and deploy scalable Machine-Learning pipelines on large datasets.
What You Will Learn
- Spark ecosystem & data models
- Spark ML pipelines & utilities
- Feature Engineering & EDA at scale
- Regression, Classification, Clustering, PCA, Recommendations in Spark
- Model evaluation & tuning
- Industry use-cases across finance, healthcare, customer service, text analytics, travel, and more
Course Details
Audience: Data Analysts, Software Engineers, Data Scientists
Duration: 4 days
Format: Lectures and hands-on labs (50% lecture, 50% lab)
- Good programming background; Python helpful but not required. No prior Spark or ML knowledge assumed.
Setup: Cloud-based lab • Modern laptop • Chrome browser
Detailed Outline
- Spark ecosystem
- Spark data models
- Spark ML intro
- ML landscape
- AI/ML/DL definitions
- Types of ML
- Jupyter + Python + Spark
- Spark ML utilities labs
- Data prep
- Statistics primer
- Data clean-up
- Feature extraction
- Visualising data
- Training/Testing
- Gradient Descent
- Over-/Under-fitting
- Cross-validation
- Confusion Matrix
- ROC/AUC
- Linear & Logistic Regression
- SVM
- Decision Trees
- Random Forests
- Naïve Bayes
- K-Means
- Principal Component Analysis
- Collaborative Filtering
- Group workshop on real datasets
Ready to Get Started?
Contact us to learn more about this course and schedule your training.