Hadoop For Developers
Learn Hadoop Ecosystem and tools for Big Data analytics
Get Course Info
Audience: Developers
Duration: Three days
Format: Lectures and hands-on labs. (50 % – 50 %)
Overview
Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to the Hadoop ecosystem.
Objective
Learn Hadoop Ecosystem and tools for Big Data analytics
What You Will Learn
- Hadoop & Big Data
- HDFS
- YARN
- Hive
- HBase
Course Details
Audience: Developers
Duration: Three days
Format: Lectures and hands-on labs. (50 % – 50 %)
Prerequisites:
Developer background • Comfortable with SQL and Java programming language (HBase labs are in Java) • Comfortable in a Linux environment
Setup: Zero Install cluster • SSH client • Browser (Chrome recommended)
Detailed Outline
- Hadoop Ecosystem
- Hadoop Distributions
- High-level architecture
- Hardware and software
- Lab : first look at Hadoop
- Design and architecture
- Concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons: Namenode, Secondary Namenode, Datanode
- Communications and heart-beats
- Data integrity
- Read and write path
- Namenode High Availability (HA), Federation
- Labs: Interacting with HDFS
- YARN Concepts and architecture
- Resource Manager, Node Manager
- Writing YARN applications
- Labs: Running a sample YARN program
- Architecture and design
- Hive Data types
- HQL
- Creating Hive tables and querying
- Partitions
- Joins
- Text processing
- Labs: various labs on processing data with Hive
- Concepts and architecture
- HBase vs RDBMS vs Cassandra
- HBase Java API
- Time series data on HBase
- Schema design
- Labs: Interacting with HBase using shell; programming in HBase Java API; Schema design exercise
Ready to Get Started?
Contact us to learn more about this course and schedule your training.