Skip to course content

Hadoop For Developers

Learn Hadoop Ecosystem and tools for Big Data analytics

Get Course Info

Audience: Developers

Duration: Three days

Format: Lectures and hands-on labs. (50 % – 50 %)

Overview

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to the Hadoop ecosystem.

Objective

Learn Hadoop Ecosystem and tools for Big Data analytics

What You Will Learn

  • Hadoop & Big Data
  • HDFS
  • YARN
  • Hive
  • HBase

Course Details

Audience: Developers

Duration: Three days

Format: Lectures and hands-on labs. (50 % – 50 %)

Prerequisites:

Developer background • Comfortable with SQL and Java programming language (HBase labs are in Java) • Comfortable in a Linux environment

Setup: Zero Install cluster • SSH client • Browser (Chrome recommended)

Detailed Outline

  • Hadoop Ecosystem
  • Hadoop Distributions
  • High-level architecture
  • Hardware and software
  • Lab : first look at Hadoop
  • Design and architecture
  • Concepts (horizontal scaling, replication, data locality, rack awareness)
  • Daemons: Namenode, Secondary Namenode, Datanode
  • Communications and heart-beats
  • Data integrity
  • Read and write path
  • Namenode High Availability (HA), Federation
  • Labs: Interacting with HDFS
  • YARN Concepts and architecture
  • Resource Manager, Node Manager
  • Writing YARN applications
  • Labs: Running a sample YARN program
  • Architecture and design
  • Hive Data types
  • HQL
  • Creating Hive tables and querying
  • Partitions
  • Joins
  • Text processing
  • Labs: various labs on processing data with Hive
  • Concepts and architecture
  • HBase vs RDBMS vs Cassandra
  • HBase Java API
  • Time series data on HBase
  • Schema design
  • Labs: Interacting with HBase using shell; programming in HBase Java API; Schema design exercise

Ready to Get Started?

Contact us to learn more about this course and schedule your training.