Machine Learning and AI With Python
2022 Feb 10
Overview
Machine Learning (ML) is changing the world. To use ML effectively, one needs to understand the algorithms and how to utilize them. This course provides an introduction to the most popular machine learning algorithms.
This course teaches doing Machine Learning using the popular SciKit-Learn package in Python language.
This course teaches Machine Learning from a practical perspective. In-depth coverage of Math / Stats is beyond the scope of this course.
What you will learn
- Python and SciKit-Learn
- ML Concepts
- Regressions
- Linear Regression
- Logistic Regressions
- Classifications
- Naive Bayes
- SVM
- Decision Trees
- Random Forest
- Clustering algorithms (K-Means)
- Principal Component Analysis (PCA)
- Recommendations
Audience
Data Analysts, Software Engineers, Data scientists
Duration
Three Days
Skill Level
Beginner to Intermediate
Industry Use Cases Covered
We will study and solve some of the most common industry use cases; listed below
- Finance
- Predicting house prices
- Predicting loan defaults at Prosper
- Predicting income from customs data
- Health care
- Predicting diabetes outcome
- Customer service
- Predicting customer turnover
- Text analytics
- Spam classification
- Travel
- Predicting Uber demand
- Politics
- Predicting election contributions
- Recommendations
- Predicting movie ratings
- Recommending songs
- Other
- Predicting wine quality
- Predicting college admissions
Prerequisites
- Good programming background
- familiarity with Python would be a plus, but not required
- No machine learning knowledge is assumed
Lab environment
Cloud-based lab environment will be provided to students, no need to install anything on the laptop
Students will need the following
- A reasonably modern laptop with unrestricted connection to the Internet. Laptops with overly restrictive VPNs or firewalls may not work properly
- Chrome browser
Detailed Course Outline
Python Basics
- Introduction to Python programming environment
- Introduction to Numpy and Pandas
- Labs
- Working with Jupyter notebooks
- Numpy and Pandas
Machine Learning (ML) Overview
- Machine Learning landscape
- Understanding Deep Learning use cases
- Understanding AI / Machine Learning / Deep Learning
- Data and AI
- AI vocabulary
- Hardware and software ecosystem
- Understanding types of Machine Learning (Supervised / Unsupervised / Reinforcement)
Python Scikit-Learn Library
- Scikit-Learn library overview
- Lab:
- Scikit-Learn utilities
Feature Engineering and Exploratory Data Analysis (EDA)
- Preparing data for ML
- Statistics Primer
- Data cleanup
- Extracting features, enhancing data
- Visualizing Data
- Labs:
- Data cleanup
- Exploring data
- Visualizing data
Machine Learning Concepts
- Training and Testing
- Gradient Descent
- Overfitting / Under-fitting
- Cross-validation, bootstrapping
- Confusion Matrix
- ROC curve, Area Under Curve (AUC)
Linear regression
- Linear Regression
- Errors, Residuals
- Multiple Linear Regression
- Evaluating model performance
- Labs:
- Use case: House price estimates
Logistic Regression
- Understanding Logistic Regression
- Calculating Logistic Regression
- Evaluating model performance
- Labs:
- Credit card application
- college admissions
Classification: SVM (Supervised Vector Machines)
- SVM concepts and theory
- SVM with kernel
- Labs:
-Customer churn data
Classification: Decision Trees & Random Forests
- Classification and Regression Trees (CART) introduction
- Decision Tree concepts
- Pruning trees
- Gini index
- Bias Variance Tradeoff
- Random Forest concepts
- Random Forests features and examples
- Labs:
- Predicting loan defaults
- Estimating election contributions
Classification: Naive Bayes
- Naive Bayes theory
- Running Naive Bayes algorithm
- Evaluating model performance
- Lab
- Spam filtering
Unsupervised Algorithms
- Overview of unsupervised algorithms
- Supervised vs. unsupervised
- Understanding unsupervised algorithms
Unsupervised: Clustering: K-Means
- Theory behind K-Means
- Running K-Means algorithm
- Estimating the performance
- Labs:
- Predicting Uber demand
- Clustering shopping trips
Unsupervised: Principal Component Analysis (PCA)
- Understanding dimensions
- ‘Curse of dimensionality’
- Reducing dimensions
- Overview of Principal Component Analysis (PCA)
- Eigenvectors and values
- Implementing PCA algorithm
- Labs:
- Predicting wine quality
- Predicting income from census data
Recommendations
- Recommendation use cases
- Recommender systems
- Collaborative Filtering (CF)
- Implementing CF algorithm
- Lab:
- Movie rating recommendation
- Songs rating recommendation
Final workshop (time permitting)
- This is a group workshop
- Each group will analyze a couple of real-world datasets and run ML algorithms
- Each group will present their findings to the class