Cloudera Dataflow (Nee Nifi)

© Elephant Scale

June 15, 2021

Overview

  • This three-day hands-on training course provides the fundamental concepts and experience necessary to automate the ingest, flow, transformation, and egress of data using Apache NiFi.
  • Along with gaining a grasp of the key features, concepts, and benefits of NiFi, participants will create and run NiFi dataflows for a variety of scenarios.
  • Students will gain expertise using processors, connections, and process groups, and will use NiFi Expression Language to control the flow of data from various sources to multiple destinations.
  • Participants will monitor dataflows, examine the progress of data through a data flow, and connect dataflows to external systems such as Kafka and HDFS.
  • After taking this course, participants will have key knowledge and expertise for configuring and managing data ingestion, movement, and transformation scenarios for the enterprise.

Objectives

  • Understand the role of Apache NiFi and MiNiFi in the Cloudera DataFlow platform
  • Describe NiFi’s architecture, including standalone and clustered configurations
  • Use key features, including FlowFiles, processors, process groups, controllers, and connections, to define a NiFi dataflow
  • Navigate, configure dataflows, and use dataflow information with the NiFi User Interface
  • Trace the life of data, its origin, transformation, and destination, using data provenance
  • Organize and simplify dataflows
  • Manage dataflow versions using the NiFi Registry
  • Use the NiFi Expression Language to control dataflows
  • Implement dataflow optimization methods and available monitoring and reporting features
  • Connect dataflows with other systems, such as Kafka and HDFS
  • Describe aspects of NiFi security

Duration

  • Three Days

Audience

  • Developers & Administrators

Prerequisites

  • comfortable in Java programming language (navigate Linux command line, edit files with vi / nano)
  • A Java IDE like Eclipse or IntelliJ

Lab environment

  • A working environment will be provided for students. Students would need an SSH client and a browser to access the cluster.
  • Zero Install: There is no need to install HBase software on students’ machines!

Course Outline

  • Introduction to Cloudera Flow Management
  • Processors
  • Connections
  • Dataflows
  • Process Groups
  • FlowFile Provenance
  • Dataflow Templates
  • Apache NiFi Registry
  • FlowFile Attributes
  • NiFi Expression Language
  • Dataflow Optimization
  • NiFi Architecture
  • Site-to-Site Dataflows
  • Cloudera Edge Management and MiNiFi
  • How to create a pipeline with airflow
  • DB and executors
  • How to schedule and monitor workflows
  • Monitoring and Reporting
  • Controller Services
  • Integrating NiFi with the Cloudera Ecosystem
  • NiFi Security