CALL NOW 713-568-9753
UpcomingMachine Learning with Spark

Spark is a new and very popular Big Data processing engine. Spark MLLib is a de facto standard for machine learning in Big Data.

This course is intended for data scientists and software engineers. It maintains an optimal balance of theory and practice. For each machine learning concept, we first discuss the foundations, its applicability and limitations. Then we explain the implementation and use, and specific use cases. This is achieved through a combination of about 50% lecture, 50% lab work.

Duration : 3 days
Audience : Data Scientists and Software Engineers
Prerequisites :
– familiarity with programming in at least one language
– be able to navigate Linux command line
– basic knowledge of command line Linux editors (VI / nano)
Objectives :
– attain thorough understanding of popular machine learning algorithms, their applicability and limitations
– practice the application of these methods in the Spark machine learning environment
– achieve clarity in the real-world use of machine learning by illustrating each method with practical use cases
Lab environment:
Working Spark environment will be provided for students. Students would only need an SSH client and a browse.
Zero Install : There is no need to install software on students’ machines.

Course Outline:

Section 1: Introductions and overviews
Machine learning: goals, results, supervised/unsupervised
Spark as a tool for Big Data
Scala as the language of Spark (together with Python, Java and R)
MLLib as a collection of machine learning algorithms

If the students do not have the Spark/Scala prerequisites, an in-depth introduction of these is taught in the section
Section 2: SVM (Supervised Vector Machines)
Theory
Lab
Use case: anomaly detection

Section 3: Logistic Regression
Theory
Lab
Use case: healthcare prediction

Section 4: Linear regression
Theory
Lab
Use case: financial modelling
Section 5: Naive Bayes
Theory
Lab
Use case: spam filtering

Section 6: Decision Trees
Theory
Lab
Use case: vessel shipment planning

Section 7: Clustering (K-Means)
Theory
Lab
Use case: topic grouping

Section 8: LDA (Latent Dirichlet Allocation)
Theory
Lab
Use case: unsupervised topic discovery

Section 9: Principal Component Analysis (PCA)
Theory
Lab
Use case: stock analysis

Section 10: Recommendation (Collaborative filtering)
Theory
Lab
Use case: dating

Section 11: Graphs – graph operations
Theory
Lab
Use case: finding followers

Section 12: Graphs – optimizations with Pregel
Theory
Lab
Use case: shortest routes, PageRank

Upcoming Courses

Price Qty
Machine Learning with Spark - Oct 23, 2017 $2,300.00 (USD)  
Machine Learning with Spark - Nov 20, 2017 $2,300.00 (USD)  


Online event registration and ticketing powered by Event Espresso

  • Machine Learning with Spark - Oct 23, 2017
    October 23, 2017 - October 25, 2017
    9:00 am - 5:00 pm
  • Machine Learning with Spark - Nov 20, 2017
    November 20, 2017 - November 22, 2017
    9:00 am - 5:00 pm