CALL NOW 713-568-9753
Hadoop for Developers

Upcoming Classes

Ideal for small teams and individuals

see-schedule

Looking For Private Training?

We offer on-site, customized trainings.

contact-us

Hadoop For Developers

Overview

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to Hadoop ecosystem.

What You Will Learn

  • Hadoop & Big Data
  • HDFS
  • MapReduce
  • Pig
  • Hive
  • HBase

Audience :

Developers

Duration :

four days

Format :

Lectures and hands on labs. (50%   50%)

Prerequisites

  • comfortable with Java programming language (most programming exercises are in java)
  • comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following

  • a SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • a browser to access the cluster. We recommend Firefox browser

 

Detailed outline

  • Section 1: Introduction to Hadoop
    • hadoop history, concepts
    • eco system
    • distributions
    • high level architecture
    • hadoop myths
    • hadoop challenges
    • hardware / software
    • Lab : first look at Hadoop
  • Section 2: HDFS
    • Design and architecture
    • concepts (horizontal scaling, replication, data locality, rack awareness)
    • Daemons : Namenode, Secondary namenode, Data node
    • communications / heart-beats
    • data integrity
    • read / write path
    • Namenode High Availability (HA), Federation
    • labs : Interacting with HDFS
  • Section 3 : Map Reduce
    • concepts and architecture
    • daemons (MRV1) : jobtracker / tasktracker
    • phases : driver, mapper, shuffle/sort, reducer
    • Map Reduce Version 1 and Version 2 (YARN)
    • Internals of Map Reduce
    • Introduction to Java Map Reduce program
    • labs : Running a sample MapReduce program
  • Section 4 : Pig
    • pig vs java map reduce
    • pig job flow
    • pig latin language
    • ETL with Pig
    • Transformations & Joins
    • User defined functions (UDF)
    • labs : writing Pig scripts to analyze data
  • Section 5: Hive
    • architecture and design
    • data types
    • SQL support in Hive
    • Creating Hive tables and querying
    • partitions
    • joins
    • text processing
    • labs : various labs on processing data with Hive
  • Section 6: HBase
    • concepts and architecture
    • hbase vs RDBMS vs cassandra
    • HBase Java API
    • Time series data on HBase
    • schema design
    • labs : Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise


Upcoming Trainings

  • Please select a session and register.
  • No payment necessary for registration.
  • Payment is due 5 days before the class to secure the spot.