Hadoop Ecosystem
This learning path provides an explanation and demonstration of the most popular components in the Hadoop ecosystem.
GK# 7315
This learning path provides an explanation and demonstration of the most popular components in the Hadoop ecosystem.
GK# 7315
Apache Hadoop is an open source software for affordable supercomputing; it provides the distributed file system and the parallel processing required to run a massive computing cluster. This learning path provides an explanation and demonstration of the most popular components in the Hadoop ecosystem. It defines and describes theory and architecture, while also providing instruction on installation, configuration, usage, and low-level use cases for the Hadoop ecosystem. This learning path can be used to help prepare for the Cloudera Certified Developer for Hadoop, HDP Certified Developer, Cloudera Certified Administrator for Hadoop, or Hadoop 2.0 Administrator Certification exam.
Ecosystem for Hadoop
A Map for Big Data
Key Terminology for Big Data
Ecosystem for Hadoop
Theory for Hadoop
Data Repository for Hadoop
Data Refinery for Hadoop
Data Analytics
Hadoop Ecosystem Complexities
Installation of Hadoop
Configuration of User Environments
Pre-installation for Hadoop
Setup of Hadoop
Operations for Hadoop
Monitoring for Hadoop
Troubleshooting of Hadoop Installation
Data Repository with HDFS and HBase
Theory of HDFS
Operations for HDFS
Troubleshooting of HDFS
Theory for NoSQL and RDBMS
Overview of HBase and ZooKeeper
Operation for HBase
Data Repository with Flume
The Purpose of Flume
Setup of Flume
Operations for Flume
Sources, Sinks, and Channels
Serializing Data with Avro
Multiplex Agents for Flume
Troubleshooting of Flume
Data Repository with Sqoop
Setup of MySQL
The Purpose of Sqoop
Setup of Sqoop
Operations for Sqoop
Troubleshooting of Sqoop
Data Refinery with YARN and MapReduce
Theory for YARN
Theory for Key-value Pairs
Operations for MapReduce
First Program for MapReduce
Exploring Hadoop Classpath
Writing a MapReduce Job
APIs for MapReduce
Second Program for MapReduce
Streaming for MapReduce
Data Factory with Hive
The Purpose of Hive
Setup of Hive
Details of Hive
Operations for Hive
Joins and Views for Hive
Partitions and Buckets for Hive
User-defined Functions for Hive
Troubleshooting for Hive
Data Factory with Pig
The Purpose of Pig
Setup for Pig
Details of Pig
Operations for Pig
Working with Pig Operators
User-defined Functions for Pig
Troubleshooting for Pig
Data Factory with Oozie and Hue
The Purpose of Hive Daemons
The Purpose of Oozie
Setup for Oozie
Operations for Oozie
Setup for Hue
Operations for Hue
Data Flow for the Hadoop Ecosystem
The World of Data
Flowing Data with Sqoop
Flowing Data with Hive
Administration for the Ecosystem
In the modern world, data is being generated at an exponential rate. Business data generation is increasing at a similarly rapid rate. Only a small percentage of business data is structured data in rows and columns of databases. This data proliferation requires a rethinking of traditional techniques for capture, storage, and processing. Big data is a term that describes data sets so big they can’t be managed with traditional database systems. Big Data is also a collection of tools and techniques aimed at solving these problems. This learning path covers the current thinking and state of the art for managing and manipulating large data sets using the techniques and tools of Big Data.
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis.
Apache Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. In this learning path you will learn about cluster planning, installation and administration, resource management and monitoring and logging.
This course is available in the following formats:
Train at your own pace with 24/7 access to courses that help you acquire must-have technology skills.