Machine Learning Essentials

Looking for team training?

We offer excellent trainer-led courses.

Machine Learning Essentials

This course introduces popular Machine Learning techniques.

This course is intended for data scientists and software engineers.
We assume no previous knowledge of Machine Learning.
We teach popular Machine Learning algorithms from scratch.

For each machine learning concept, we first discuss the foundations, its applicability and limitations. Then we explain the implementation and use, and specific use cases. This is achieved through a combination of about 50% lecture, 50% lab work.

Please note that this course does not cover in-depth coverage of Math / Stats is behind Machine Learning.

This course is taught using one the following environments

Python
Spark & Python
R

Duration : 3 days

Audience : Data Scientists and Software Engineers

Prerequisites :

Working knowledge of either R, Python or Apache Spark
Programming background
No previous machine learning knowledge is assumed

Objectives :

Learn popular machine learning algorithms, their applicability and limitations
Practice the application of these methods in a machine learning environment
Learn practical use cases and limitations of algorithms

Lab environment:

Lab environment will be provided for students. Students would only need an SSH client and a browse.

Zero Install : There is no need to install software on students’ machines.

Course Outline:

Section 1: Machine Learning (ML) Overview

Machine Learning landscape
Machine Learning applications
Understanding ML algorithms & models (supervised and unsupervised)

Section 2: Machine Learning Environment

Introduction to Jupyter notebooks / R-Studio
Lab: Getting familiar with ML environment

Section 3: Machine Learning Concepts

Statistics Primer
Covariance, Correlation, Covariance Matrix
Errors, Residuals
Overfitting / Underfitting
Cross validation, bootstrapping
Confusion Matrix
ROC curve, Area Under Curve (AUC)
Lab: Basic stats

Section 4: Feature Engineering (FE)

Preparing data for ML
Extracting features, enhancing data
Data cleanup
Visualizing Data
Lab : data cleanup
Lab: visualizing data

Section 5: Linear regression

Simple Linear Regression
Multiple Linear Regression
Running LR
Evaluating LR model performance
Lab
Use case: House price estimates

Section 6: Logistic Regression

Understanding Logistic Regression
Calculating Logistic Regression
Evaluating model performance
Lab
Use case: credit card application, college admissions

Section 7: Classification : SVM (Supervised Vector Machines)

SVM concepts and theory
SVM with kernel
Lab
Use case: Customer churn data

Section 8: Classification : Decision Trees & Random Forests

Theory behind trees
Classification and Regression Trees (CART)
Random Forest concepts
Labs
Use case: predicting loan defaults, estimating election contributions

Section 9: Classification : Naive Bayes

Theory behind Naive Bayes
Running NB algorithm
Evaluating NB model
Lab
Use case: spam filtering

Section 10: Clustering (K-Means)

Theory behind K-Means
Running K-Means algorithm
Estimating the performance
Lab
Use case: grouping cars data, grouping shopping data

Section 11: Principal Component Analysis (PCA)

Understanding PCA concepts
PCA applications
Running a PCA algorithm
Evaluating results
Lab
Use case: analyzing retail shopping data

Section 12: Recommendation (Collaborative filtering)

Recommender systems overview
Collaborative Filtering concepts
Lab
Use case: movie recommendations, music recommendations

Section 13: Final workshop (time permitting)

Students will analyze a couple of datasets and run ML algorithms.
This is done as a group exercise. Each group will present their findings to the class.