CALL NOW 713-568-9753
Hadoop for Business Analysts

Upcoming Classes

Ideal for small teams and individuals

see-schedule

Looking For Private Training?

We offer on-site, customized trainings.

contact-us

Hadoop For Business Analysts

Overview

Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to traditional BI analytics world. This course will introduce an analyst to the core components of Hadoop eco system and its analytics

What You Will Learn:

  • Data storage using HDFS
  • ETL using Pig
  • Data warehousing and querying using Hive

Audience

Business Analysts

Duration

three days

Format

Lectures and hands on labs.

Prerequisites

  • programming background with databases / SQL
  • basic knowledge of Linux (be able to navigate Linux command line, editing files with vi / nano)

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working Hadoop cluster will be provided for students.

Students will need the following

 

Detailed outline

  • Section 1: Introduction to Hadoop
    • hadoop history, concepts
    • eco system
    • distributions
    • high level architecture
    • hadoop myths
    • hadoop challenges
    • hardware / software
    • Labs : first look at Hadoop
  • Section 2: HDFS Overview
    • concepts (horizontal scaling, replication, data locality, rack awareness)
    • architecture (Namenode, Secondary namenode, Data node)
    • data integrity
    • future of HDFS : Namenode HA, Federation
    • labs : Interacting with HDFS
  • Section 3 : Map Reduce Overview
    • mapreduce concepts
    • daemons : jobtracker / tasktracker
    • phases : driver, mapper, shuffle/sort, reducer
    • Thinking in map reduce
    • Future of mapreduce (yarn)
    • labs : Running a Map Reduce program
  • Section 4 : Pig
    • pig vs java map reduce
    • pig latin language
    • user defined functions
    • understanding pig job flow
    • basic data analysis with Pig
    • complex data analysis with Pig
    • multi datasets with Pig
    • advanced concepts
    • lab : writing pig scripts to analyze / transform data
  • Section 5: Hive
    • hive concepts
    • architecture
    • SQL support in Hive
    • data types
    • table creation and queries
    • Hive data management
    • partitions & joins
    • text analytics
    • labs (multiple) : creating Hive tables and running queries, joins , using partitions, using text analytics functions
  • Section 6: BI Tools for Hadoop
    • BI tools and Hadoop
    • Overview of current BI tools landscape
    • Choosing the best tool for the job


Upcoming Trainings

  • Please select a session and register.
  • No payment necessary for registration.
  • Payment is due 5 days before the class to secure the spot.