Python has become a powerful language and environment for performing data science. It combines a robust, object-oriented language with a powerful library of data science packages, such as numpy, scipy, matlibplot, scikit-learn, and pandas. These tools together make python one of the best combinations of robust programming language together with great library support.
What You Will Learn
- Quick Python primer
- A quick primer on data science algorithms
Data Analysts, Data Scientists, Developers
Lectures and hands-on labs. (50% 50%)
- Experience and background in software development. Helpful to have some background in analytics or machine learning.
- Some background in Python highly recommended though a brief intro is included.
Zero Install : There is no need to install Hadoop software on students’ machines! A lab environment in the cloud will be provided for students.
Students will need the following
- a SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster. We recommend Chrome browser
Python language Overview
- Basics of Python language
- How to edit, run, and test python code
- Introducing the Anaconda distribution of Python.
- Using Jupyter notebooks.
- Series and Dataframes
- Loading data using Pandas
NumPy and SciPy
- Linear Algebra
- Visualizing data with matlibplot
Doing Data Science with Scikit-learn
- Introducing Scikit-Learn
- Clustering Data
- Building a Classifier
Big Data With PySpark
- Introduction to Spark and PySpark
- Using the Spark framework for Big Data
- Using MLLib or Data Science in PySpark