Big Data Analytics With Hadoop

Looking for team training?

We offer excellent trainer-led courses.

contact-us

Big Data Analytics With Hadoop

Overview

Apache Hadoop is a popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to traditional BI analytics world. This course will introduce an analyst to the core components of Hadoop eco system and its analytics

What You Will Learn:

  • Understanding Hadoop ecosystem
  • Data storage using HDFS
  • Data warehousing and querying using Hive

Audience

Business Analysts, Developers

Duration

2 days

Format

Lectures and hands on labs.

Prerequisites

  • programming background with databases / SQL
  • basic knowledge of Linux

Lab environment

Zero Install : There is no need to install Hadoop software on students’ machines! A working Hadoop cluster will be provided for students.

Students will need the following

  • a SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • a browser to access the cluster.

Detailed outline

  • Section 1: Hadoop eco system
    • Hadoop overview
      • distributions
      • high level architecture
      • hardware / software
      • Labs : first look at Hadoop
    • HDFS Overview
      • concepts (horizontal scaling, replication, data locality)
      • architecture (Namenode, ¬†Data node)
      • Demo : Interacting with HDFS
    • YARN Overview
      • YARN operating system
      • Demo : Running applications on YARN program
  • Section 2: Hive
    • hive concepts &¬†architecture
    • SQL support in Hive
    • Data warehousing in Hive
    • data types
    • table creation and queries
    • partitions
    • joins
    • modern data formats
    • text analytics
    • Hive performance
    • labs (multiple)