Global Knowledge

1-800-COURSES
Chat Now

Shopping Cart | My Global Knowledge Login | United States United States [change region]

  • Courses
    • Browse Catalog
    • Delivery Methods
    • New Courses
    • Special Offers
    • Guaranteed Dates
    • Search Wizard
  • Certifications
  • Training Solutions
    • Corporate Training
    • Government Training
    • Partner with Us
  • Training Locations
    • Atlanta
    • Chicago
    • Dallas
    • Morristown
    • New York
    • Raleigh
    • San Jose
    • Washington, DC
    • All 150+ Locations
  • Knowledge Center
    • Assessments
    • Case Studies
    • Demos
    • Events
    • Lab Topologies
    • Mobile Apps
    • Practice Files
    • Special Reports
    • Twitter
    • Videos
    • Webinars
    • White Papers
  • Contact Us
Cloudera Developer Training for Apache Hadoop

Home > Course Catalog >  Cloudera Training > Cloudera Developer Training for Apache Hadoop

Cloudera Developer Training for Apache Hadoop

Learn to create robust data processing applications using Apache Hadoop.

You will learn to build powerful data processing applications in this course. You will learn about MapReduce, the Hadoop Distributed Files System (HDFS), and how to write MapReduce code, and you will cover best practices for Hadoop development, debugging, and implementation of workflows.

This course covers concepts addressed on the Cloudera Certified Developer for Apache Hadoop (CCDH) exam, and you will receive one CCDH exam voucher at the end of class.

Global Knowledge Exclusive!

You will receive 30 days of access to an online library where you'll find books and study guides from leading authors on Hadoop, cloud, and big data technologies, including:

  • Ethics of Big Data by Kord Davis and Doug Patterson
  • Hadoop: The Definitive Guide by Cloudera's Tom White
  • Hadoop Operations by Cloudera's Eric Sammer
  • Planning for Big Data by Edd Dumbill

What You'll Learn

  • MapReduce and the HDFS
  • Write MapReduce code in Java or other programming languages
  • Issues to consider when developing MapReduce jobs
  • Implement common algorithms in Hadoop
  • Best practices for Hadoop development and debugging
  • Use other projects such as Apache Hive, Apache Pig, Sqoop, and Oozie
  • Advanced Hadoop API topics required for real-world data analysis

Who Needs to Attend

Developers who need to write and maintain Apache Hadoop applications

Prerequisites

  • Some programming experience (preferably Java)
  • Knowledge of Hadoop is not required

Follow-On Courses

  • Cloudera Training for Apache HBase
  • Cloudera Training for Apache Hive and Pig

Certification Programs and Certificate Tracks

This course is part of the following programs or tracks:

  • CCDH: Cloudera Certified Developer for Apache Hadoop (CDH4)

Course Outline

1. Motivation for Hadoop

  • Problems with Traditional Large-Scale Systems
  • Requirements for a New Approach

2. Hadoop: Basic Concepts

  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

3. Writing a MapReduce Program

  • MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • Driver Code
  • Mapper
  • Reducer
  • Streaming API
  • Using Eclipse for Rapid Development
  • New MapReduce API

4. Integrating Hadoop into the Workflow

  • Relational Database Management Systems
  • Storage Systems
  • Importing Data from a Relational Database Management System with Sqoop
  • Importing Real-Time Data with Flume
  • Accessing HDFS Using FuseDFS and Hoop

5. Delving Deeper into the Hadoop API

  • ToolRunner
  • Testing with MRUnit
  • Reducing Intermediate Data with Combiners
  • Configuration and Close Methods for Map/Reduce Setup and Teardown
  • Writing Partitioners for Better Load Balancing
  • Directly Accessing HDFS
  • Using the Distributed Cache

6. Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Machine Learning with Mahout
  • Term Frequency
  • Inverse Document Frequency
  • Word Co-Occurrence

7. Using Hive and Pig

  • Hive Basics
  • Pig Basics

8. Practical Development Tips and Techniques

  • Debugging MapReduce Code
  • Using LocalJobRunner Mode for Easier Debugging
  • Retrieving Job Information with Counters
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs

9. Advanced MapReduce Programming

  • Custom Writables and WritableComparables
  • Saving Binary Data Using SequenceFiles and Avro Files
  • Creating InputFormats and OutputFormats

10. Joining Data Sets in MapReduce

  • Map-Side Joins
  • Secondary Sort
  • Reduce-Side Joins

11. Graph Manipulation in Hadoop

  • Graph Techniques
  • Representing Graphs in Hadoop
  • Implementing a Sample Algorithm: Single Source Shortest Path

12. Creating Workflows with Oozie

  • Motivation for Oozie
  • Workflow Definition Format

Labs

Throughout the course, you will write Hadoop code and perform other hands-on exercises to solidify your understanding of the concepts.

Cloudera

Classroom

Course Code: 3902

$2995 USD

4 Day Course


Payment Options

Alert Me Alert Me

Schedule and Registration

View Schedule
Other Delivery Methods

Virtual Classroom

On-Site

Resources

PDF of this course

 

  • Videos
  • Loading the player ...

    Big Data and Cloudera's Apache Hadoop Training

  • Loading the player ...

    Is a Career in Big Data right for you?

Share

Copyright ©2013 Global Knowledge Training LLC  All rights reserved.  1-800-COURSES (1-800-268-7737) Privacy  Legal  Policies  Site Map  Blog RSSRSS