Introduction to Machine Learning With Python

2020 Feb 10

Overview

Machine Learning (ML) is changing the world. To use ML effectively, one needs to understand the algorithms and how to utilize them. This course provides an introduction to the most popular machine learning algorithms.

This course teaches doing Machine Learning using the popular SciKit-Learn package in Python language.

This course teaches Machine Learning from a practical perspective. In-depth coverage of Math / Stats is beyond the scope of this course.

What you will learn

  • Python and SciKit-Learn
  • ML Concepts
  • Regressions
    • Linear Regression
    • Logistic Regressions
  • Classifications
    • Naive Bayes
    • SVM
    • Decision Trees
    • Random Forest
  • Clustering algorithms (K-Means)
  • Principal Component Analysis (PCA)
  • Recommendations

Audience

Data Analysts, Software Engineers, Data scientists

Duration

Three Days

Skill Level

Beginner to Intermediate

Industry Use Cases Covered

We will study and solve some of the most common industry use cases; listed below

  • Finance
    • Predicting house prices
    • Predicting loan defaults at Prosper
    • Predicting income from customs data
  • Health care
    • Predicting diabetes outcome
  • Customer service
    • Predicting customer turnover
  • Text analytics
    • Spam classification
  • Travel
    • Predicting Uber demand
  • Politics
    • Predicting election contributions
  • Recommendations
    • Predicting movie ratings
    • Recommending songs
  • Other
    • Predicting wine quality
    • Predicting college admissions

Prerequisites

  • Good programming background
  • familiarity with Python would be a plus, but not required
  • No machine learning knowledge is assumed

Lab environment

Cloud-based lab environment will be provided to students, no need to install anything on the laptop

Students will need the following

  • A reasonably modern laptop with unrestricted connection to the Internet. Laptops with overly restrictive VPNs or firewalls may not work properly
  • Chrome browser

Detailed Course Outline

Python Basics

  • Introduction to Python programming environment
  • Introduction to Numpy and Pandas
  • Labs
    • Working with Jupyter notebooks
    • Numpy and Pandas

Machine Learning (ML) Overview

  • Machine Learning landscape
  • Understanding Deep Learning use cases
  • Understanding AI / Machine Learning / Deep Learning
  • Data and AI
  • AI vocabulary
  • Hardware and software ecosystem
  • Understanding types of Machine Learning (Supervised / Unsupervised / Reinforcement)

Python Scikit-Learn Library

  • Scikit-Learn library overview
  • Lab:
    • Scikit-Learn utilities

Feature Engineering and Exploratory Data Analysis (EDA)

  • Preparing data for ML
  • Statistics Primer
  • Data cleanup
  • Extracting features, enhancing data
  • Visualizing Data
  • Labs:
    • Data cleanup
    • Exploring data
    • Visualizing data

Machine Learning Concepts

  • Training and Testing
  • Gradient Descent
  • Overfitting / Under-fitting
  • Cross validation, bootstrapping
  • Confusion Matrix
  • ROC curve, Area Under Curve (AUC)

Linear regression

  • Linear Regression
  • Errors, Residuals
  • Multiple Linear Regression
  • Evaluating model performance
  • Labs:
    • Use case: House price estimates

Logistic Regression

  • Understanding Logistic Regression
  • Calculating Logistic Regression
  • Evaluating model performance
  • Labs:
    • Credit card application
    • college admissions

Classification: SVM (Supervised Vector Machines)

  • SVM concepts and theory
  • SVM with kernel
  • Labs:
    -Customer churn data

Classification: Decision Trees & Random Forests

  • Classification and Regression Trees (CART) introduction
  • Decision Tree concepts
  • Pruning trees
  • Gini index
  • Bias Variance Tradeoff
  • Random Forest concepts
  • Random Forests features and examples
  • Labs:
    • Predicting loan defaults
    • Estimating election contributions

Classification: Naive Bayes

  • Naive Bayes theory
  • Running Naive Bayes algorithm
  • Evaluating model performance
  • Lab
    • Spam filtering

Unsupervised Algorithms

  • Overview of unsupervised algorithms
  • Supervised vs. unsupervised
  • Understanding unsupervised algorithms

Unsupervised: Clustering: K-Means

  • Theory behind K-Means
  • Running K-Means algorithm
  • Estimating the performance
  • Labs:
    • Predicting Uber demand
    • Clustering shopping trips

Unsupervised: Principal Component Analysis (PCA)

  • Understanding dimensions
  • ‘Curse of dimensionality’
  • Reducing dimensions
  • Overview of Principal Component Analysis (PCA)
  • Eigen vectors and values
  • Implementing PCA algorithm
  • Labs:
    • Predicting wine quality
    • Predicting income from census data

Recommendations

  • Recommendation use cases
  • Recommender systems
  • Collaborative Filtering (CF)
  • Implementing CF algorithm
  • Lab:
    • Movie ratings recommendation
    • Songs rating recommendation

Final workshop (time permitting)

  • This is a group workshop
  • Each group will analyze a couple of real world datasets and run ML algorithms
  • Each group will present their findings to the class