CALL NOW 713-568-9753
Hadoop for Developers

Upcoming Classes

Ideal for small teams and individuals

see-schedule

Looking For Private Training?

We offer on-site, customized training.

contact-us

Hadoop For Developers

Overview

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to Hadoop ecosystem.

What You Will Learn

  • Hadoop & Big Data
  • HDFS
  • MapReduce
  • Pig
  • Hive
  • HBase

Audience:

Developers

Duration:

Four days

Format :

Lectures and hands-on labs. (50% – 50%)

Prerequisites

  • Comfortable with Java programming language (most programming exercises are in java)
  • Comfortable in Linux environment (be able to navigate Linux command line, edit files using vi or nano)

Lab environment

Zero Install: There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following

  • An SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • A browser to access the cluster. We recommend Firefox browser

 

Detailed outline

  • Section 1: Introduction to Hadoop
    • Hadoop history, concepts
    • Ecosystem
    • Distributions
    • High-level architecture
    • Hadoop myths
    • Hadoop challenges
    • Hardware and software
    • Lab: first look at Hadoop
  • Section 2: HDFS
    • Design and architecture
    • Concepts (horizontal scaling, replication, data locality, rack awareness)
    • Daemons: Namenode, Secondary Namenode, Datanode
    • Communications and heart-beats
    • Data integrity
    • Read and write path
    • Namenode High Availability (HA), Federation
    • Labs: Interacting with HDFS
  • Section 3: MapReduce
    • Concepts and architecture
    • Daemons (Hadoop 1): Jobtracker and Tasktracker
    • Phases: Driver, Mapper, Shuffle and Sort, Reducer
    • MapReduce Version 1 and Version 2 (YARN)
    • Internals of Map Reduce
    • Introduction to Java Map Reduce program
    • Labs: Running a sample MapReduce program
  • Section 4: Pig
    • Pig vs Java MapReduce
    • Pig job flow
    • Pig Latin language
    • ETL with Pig
    • Transformations and Joins
    • User defined functions (UDF)
    • Labs: writing Pig scripts to analyze data
  • Section 5: Hive
    • Architecture and design
    • Data types
    • SQL support in Hive
    • Creating Hive tables and querying
    • Partitions
    • Joins
    • Text processing
    • Labs: various labs on processing data with Hive
  • Section 6: HBase
    • Concepts and architecture
    • HBase vs RDBMS vs Cassandra
    • HBase Java API
    • Time series data on HBase
    • Schema design
    • Labs: Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise