# Machine Learning Essentials

This course introduces popular Machine Learning techniques.

This course is intended for data scientists and software engineers.

We assume no previous knowledge of Machine Learning.

**We teach popular Machine Learning algorithms from scratch**.

For each machine learning concept, we first discuss the foundations, its applicability and limitations. Then we explain the implementation and use, and specific use cases. This is achieved through a combination of about 50% lecture, 50% lab work.

Please note that this course does not cover in-depth coverage of Math / Stats is behind Machine Learning.

This course is taught using one the following environments

- Python
- Spark & Python
- R

**Duration** : 3 days

**Audience** : Data Scientists and Software Engineers

**Prerequisites** :

- Working knowledge of either R, Python or Apache Spark
- Programming background
- No previous machine learning knowledge is assumed

**Objectives** :

- Learn popular machine learning algorithms, their applicability and limitations
- Practice the application of these methods in a machine learning environment
- Learn practical use cases and limitations of algorithms

**Lab environment:**

Lab environment will be provided for students. Students would only need an SSH client and a browse.

Zero Install : There is no need to install software on students’ machines.

**Course Outline:**

**Section 1: Machine Learning (ML) Overview**

- Machine Learning landscape
- Machine Learning applications
- Understanding ML algorithms & models (supervised and unsupervised)

**Section 2: Machine Learning Environment**

- Introduction to Jupyter notebooks / R-Studio
- Lab: Getting familiar with ML environment

**Section 3: Machine Learning Concepts**

- Statistics Primer
- Covariance, Correlation, Covariance Matrix
- Errors, Residuals
- Overfitting / Underfitting
- Cross validation, bootstrapping
- Confusion Matrix
- ROC curve, Area Under Curve (AUC)
- Lab: Basic stats

**Section 4: Feature Engineering (FE)**

- Preparing data for ML
- Extracting features, enhancing data
- Data cleanup
- Visualizing Data
- Lab : data cleanup
- Lab: visualizing data

**Section 5: Linear regression**

- Simple Linear Regression
- Multiple Linear Regression
- Running LR
- Evaluating LR model performance
- Lab
- Use case: House price estimates

**Section 6: Logistic Regression**

- Understanding Logistic Regression
- Calculating Logistic Regression
- Evaluating model performance
- Lab
- Use case: credit card application, college admissions

**Section 7: Classification : SVM (Supervised Vector Machines)**

- SVM concepts and theory
- SVM with kernel
- Lab
- Use case: Customer churn data

**Section 8: Classification : Decision Trees & Random Forests**

- Theory behind trees
- Classification and Regression Trees (CART)
- Random Forest concepts
- Labs
- Use case: predicting loan defaults, estimating election contributions

**Section 9: Classification : Naive Bayes**

- Theory behind Naive Bayes
- Running NB algorithm
- Evaluating NB model
- Lab
- Use case: spam filtering

**Section 10: Clustering (K-Means)**

- Theory behind K-Means
- Running K-Means algorithm
- Estimating the performance
- Lab
- Use case: grouping cars data, grouping shopping data

**Section 11: Principal Component Analysis (PCA)**

- Understanding PCA concepts
- PCA applications
- Running a PCA algorithm
- Evaluating results
- Lab
- Use case: analyzing retail shopping data

**Section 12: Recommendation (Collaborative filtering)**

- Recommender systems overview
- Collaborative Filtering concepts
- Lab
- Use case: movie recommendations, music recommendations

**Section 13: Final workshop (time permitting)**

Students will analyze a couple of datasets and run ML algorithms.

This is done as a group exercise. Each group will present their findings to the class.