Data Analytics With Python

Overview

Python has become a powerful language and environment for performing data science.   It combines a robust, object-oriented language with a powerful library of data science packages, such as numpy, scipy, matlibplot, scikit-learn, and pandas.  These tools together make python one of the best combinations of robust programming language together with great library support.

What You Will Learn

  • Quick Python primer
  • A quick primer on data science algorithms
  • NumPy
  • SciPy
  • Pandas
  • Scikit-learn

Audience

Data Analysts, Data Scientists, Developers

Duration 

three days

Format

Lectures and hands-on labs. (50%   50%)

Prerequisites

  • Experience and background in software development.  Helpful to have some background in analytics or machine learning.   
  • Some background in Python highly recommended though a brief intro is included.

Lab environment

Zero Install : There is no need to install Hadoop software on students’ machines! A lab environment in the cloud will be provided for students.

Students will need the following

  • a SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • a browser to access the cluster. We recommend Chrome browser

Detailed outline

Python language Overview

    • Basics of Python language
    • How to edit, run, and test python code
    • Introducing the Anaconda distribution of Python.
    • IDEs
    • Using Jupyter notebooks.

Pandas

    • Series and Dataframes
    • Loading data using Pandas
    • Labs

NumPy and SciPy

    • Arrays
    • Matrices
    • Linear Algebra
    • Labs
    • Visualizing data with matlibplot

Doing Data Science with Scikit-learn

    • Introducing Scikit-Learn
    • Clustering Data
    • Building a Classifier

Big Data With PySpark

    • Introduction to Spark and PySpark
    • Using the Spark framework for Big Data
    • Using MLLib or Data Science in PySpark