Cloudera Dataflow (Nee Nifi)
© Elephant Scale
June 15, 2021
Overview
- This three-day hands-on training course provides the fundamental concepts and experience necessary to automate the ingest, flow, transformation, and egress of data using Apache NiFi.
- Along with gaining a grasp of the key features, concepts, and benefits of NiFi, participants will create and run NiFi dataflows for a variety of scenarios.
- Students will gain expertise using processors, connections, and process groups, and will use NiFi Expression Language to control the flow of data from various sources to multiple destinations.
- Participants will monitor dataflows, examine the progress of data through a data flow, and connect dataflows to external systems such as Kafka and HDFS.
- After taking this course, participants will have key knowledge and expertise for configuring and managing data ingestion, movement, and transformation scenarios for the enterprise.
Objectives
- Understand the role of Apache NiFi and MiNiFi in the Cloudera DataFlow platform
- Describe NiFi’s architecture, including standalone and clustered configurations
- Use key features, including FlowFiles, processors, process groups, controllers, and connections, to define a NiFi dataflow
- Navigate, configure dataflows, and use dataflow information with the NiFi User Interface
- Trace the life of data, its origin, transformation, and destination, using data provenance
- Organize and simplify dataflows
- Manage dataflow versions using the NiFi Registry
- Use the NiFi Expression Language to control dataflows
- Implement dataflow optimization methods and available monitoring and reporting features
- Connect dataflows with other systems, such as Kafka and HDFS
- Describe aspects of NiFi security
Duration
- Three Days
Audience
- Developers & Administrators
Prerequisites
- comfortable in Java programming language (navigate Linux command line, edit files with vi / nano)
- A Java IDE like Eclipse or IntelliJ
Lab environment
- A working environment will be provided for students. Students would need an SSH client and a browser to access the cluster.
- Zero Install: There is no need to install HBase software on students’ machines!
Course Outline
- Introduction to Cloudera Flow Management
- Processors
- Connections
- Dataflows
- Process Groups
- FlowFile Provenance
- Dataflow Templates
- Apache NiFi Registry
- FlowFile Attributes
- NiFi Expression Language
- Dataflow Optimization
- NiFi Architecture
- Site-to-Site Dataflows
- Cloudera Edge Management and MiNiFi
- How to create a pipeline with airflow
- DB and executors
- How to schedule and monitor workflows
- Monitoring and Reporting
- Controller Services
- Integrating NiFi with the Cloudera Ecosystem
- NiFi Security