Skip to course content

Advanced Hadoop For Developers

Master advanced Hadoop programming techniques.

Get Course Info

Audience: Developers

Duration: Three days

Format: Lectures (50 %) and hands-on labs (50 %).

Overview

Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course focuses on advanced programming techniques that will be beneficial to experienced Hadoop developers.

Objective

Master advanced Hadoop programming techniques.

What You Will Learn

  • Advanced Pig
  • Advanced Hive
  • Advanced HBase (SQL)

Course Details

Audience: Developers

Duration: Three days

Format: Lectures (50 %) and hands-on labs (50 %).

Prerequisites:

Comfortable with Java programming language (most programming exercises are in Java) • Comfortable in Linux environment • Attended “Hadoop for Developers” or has working knowledge of Hadoop

Setup: Zero Install Hadoop cluster • SSH client • Firefox browser

Detailed Outline

  • Various Data Formats (JSON / Avro / Parquet)
  • Compression Schemes
  • Data Masking
  • Labs : Analyzing different data formats; enabling compression
  • User-defined Functions
  • Introduction to Pig Libraries (ElephantBird / Data-Fu)
  • Loading Complex Structured Data using Pig
  • Pig Tuning
  • Labs : advanced pig scripting, parsing complex data types
  • User-defined Functions
  • Compressed Tables
  • Hive Performance Tuning
  • Labs : creating compressed tables, evaluating table formats and configuration
  • Advanced Schema Modelling
  • Compression
  • Bulk Data Ingest
  • Wide-table / Tall-table comparison
  • HBase and Pig
  • HBase and Hive
  • HBase Performance Tuning
  • Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling

Ready to Get Started?

Contact us to learn more about this course and schedule your training.