Data Analytics With Python
Overview
Python has become a powerful language and environment for performing data science. It combines a robust, object-oriented language with a powerful library of data science packages, such as numpy, scipy, matlibplot, scikit-learn, and pandas. These tools together make python one of the best combinations of robust programming language together with great library support.
What You Will Learn
- Quick Python primer
- A quick primer on data science algorithms
- NumPy
- SciPy
- Pandas
- Scikit-learn
Audience
Data Analysts, Data Scientists, Developers
Duration
three days
Format
Lectures and hands-on labs. (50% 50%)
Prerequisites
- Experience and background in software development. Helpful to have some background in analytics or machine learning.
- Some background in Python highly recommended though a brief intro is included.
Lab environment
Zero Install : There is no need to install Hadoop software on students’ machines! A lab environment in the cloud will be provided for students.
Students will need the following
- a SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster. We recommend Chrome browser
Detailed outline
Python language Overview
-
- Basics of Python language
- How to edit, run, and test python code
- Introducing the Anaconda distribution of Python.
- IDEs
- Using Jupyter notebooks.
Pandas
-
- Series and Dataframes
- Loading data using Pandas
- Labs
NumPy and SciPy
-
- Arrays
- Matrices
- Linear Algebra
- Labs
- Visualizing data with matlibplot
Doing Data Science with Scikit-learn
-
- Introducing Scikit-Learn
- Clustering Data
- Building a Classifier
Big Data With PySpark
-
- Introduction to Spark and PySpark
- Using the Spark framework for Big Data
- Using MLLib or Data Science in PySpark