Machine Learning with Sagemaker (AWS)

Overview

Machine Learning (ML) is the killer app for Big Data. Amazon Machine Learning brings the power of ML to a regular programmer and provides ML as a service. However, to use ML effectively, one needs to understand the models used and how to utilize them on Amazon.

This course is intended for data scientists and software engineers. It maintains an optimal balance of theory and practice. For each machine learning concept, we first discuss the foundations, its applicability, and limitations. Then we explain the implementation and use, and specific use cases. This is achieved through a combination of about 50% lecture, 50% lab work.

Amazon SageMaker is a fully managed machine learning service. The course combines overview and understanding of Machine Learning concepts with specific implementation in SageMaker. In addition, it brings in other tools outside of SageMaker when required.

Audience

Data Scientists and Software Engineers

Duration

3 days

Prerequisites

familiarity with programming in at least one language
be able to navigate Linux command line
basic knowledge of command-line Linux editors (VI / nano)
basic familiarity with AWS (optionally may be provided in the first day on the course)

Lab environment

Training Amazon account will be provided. Students would only need an SSH client and a browser. Zero Install: There is no need to install software on students’ machines.

Objectives

attain a thorough understanding of popular machine learning algorithms, their applicability, and limitations
practice the application of these methods in the Amazon machine learning environment
achieve clarity in the real-world use of machine learning by illustrating each method with practical use cases

Course Outline

Introductions and overviews

- Data ETL
  - Go into one example in detail, implemented on AWS Redshift
  - Provide a pointer to other examples for self-study
- Machine learning
  - Goals, results, supervised/unsupervised
  - Which part of ML is implemented in the Amazon Machine Learning
  - SageMaker (AWS) Overview

Supervised Learning

- Linear regression
- Logistic regression and multinomial logistic regression
- SVM, decision trees, random forests, neural networks
- Labs for every section above

Unsupervised learning

- K-Means
- Other types of unsupervised learning
  - Hierarchical clustering
  - Mixture models
  - DBSCAN

Data visualization

- Visualization examples for the models above
- Links to other visualizations for self-study

SageMaker

- Intro
- SageMaker Details
  - Using Built-in Algorithms
  - Using Your Own Algorithms
  - Using TensorFlow
  - Using Apache MXNet
  - Using Apache Spark
  - Amazon SageMaker Libraries
  - Authentication and Access Control
  - Monitoring