CALL NOW 713-568-9753
Hadoop for Business Analysts

Upcoming Classes

Ideal for small teams and individuals

see-schedule

Looking For Private Training?

We offer on-site, customized trainings.

contact-us

Hadoop For Business Analysts

Overview

Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to traditional BI analytics world. This course will introduce an analyst to the core components of Hadoop eco system and its analytics

What You Will Learn:

  • Understanding Hadoop ecosystem
  • Data storage using HDFS
  • ETL using Pig
  • Data warehousing and querying using Hive

Audience

Business Analysts

Duration

three days

Format

Lectures and hands on labs.

Prerequisites

  • programming background with databases / SQL
  • basic knowledge of Linux (be able to navigate Linux command line, editing files with vi / nano)

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working Hadoop cluster will be provided for students.

Students will need the following

 

Detailed outline

  • Section 1: Quick primer on Hadoop / HDFS / MapReduce
    • Hadoop eco system
      • distributions
      • high level architecture
      • hardware / software
      • Labs : first look at Hadoop
    • HDFS Overview
      • concepts (horizontal scaling, replication, data locality)
      • architecture (Namenode,  Data node)
      • Demo : Interacting with HDFS
    • Map Reduce Overview
      • mapreduce concepts
      • YARN operating system
      • Demo : Running a Map Reduce program
  • Section 2: Hive
    • hive concepts & architecture
    • SQL support in Hive
    • Data warehousing in Hive
    • data types
    • table creation and queries
    • partitions
    • joins
    • text analytics
    • labs (multiple) : creating Hive tables and running queries, joins , using partitions, using text analytics functions
  • Section 3 : Pig
    • pig concepts and architecture
    • pig latin language
    • understanding pig job flow
    • basic data analysis with Pig
    • data cleanup
    • ETL workloads with Pig
    • joins and multi datasets with Pig
    • user defined functions
    • debugging Pig scripts
    • lab : writing pig scripts to analyze / transform data
  • Section 4: BI Tools for Hadoop
    • BI tools and Hadoop
    • Overview of current BI tools landscape
    • Choosing the best tool for the job