Cloudera Training for Apache Hive and Pig
Learn to use Apache Hive and Pig for analytics on your Hadoop implementation.
This course is not currently offered by Global Knowledge. Information here is provided for reference only.
In this course, you will gain an understanding of how to use an abstraction layer for easier data analysis. You will learn how to manage and manipulate data in a Hadoop cluster with SQL and other scripting languages.
Did You Know?
This class is available in our Virtual Classroom -- live online training that combines premium skills development technologies and expert instructors, content, and exercises to ensure superior training, regardless of your location.
What You'll Learn
- How Hive augments MapReduce
- Create and manipulate tables using Hive
- Basic and advanced data types of Hive
- Partition and bucket data with Hive
- Advanced features of Hive
- Load and manipulate data using Pig
- Features of the PigLatin programming language
- Solving problems with Pig
Who Needs to Attend
Users with an understanding of how Hadoop works and with experience with SQL
Basic familiarity with SQL and/or a scripting language
There are no follow-ons for this course.
- What is Hadoop?
- Motivation for Hive
2. Getting Data into Hive
- Hive Architecture
- Creating Hive Tables
- Loading Data into Hive
- Creating Different Databases
3. Manipulating Data with Hive
- Retrieving Data with the SELECT Statement
- Joining Tables
- Storing Query Results in HDFS
- Basic Hive Functions
4. Partitioning and Bucketing Data
- Partitioning Data
- Bucketing Data
5. Advanced Hive Features
- More Advanced HiveQL Tables
- Hive Variables
- Creating User-Defined Functions
- Debugging and Troubleshooting Hive Queries
6. Hive Best Practices
- Configuring a Shared Metastore
- Handling Dates
- Dealing with SerDes
7. Reading and Writing Data with Pig
- Loading Data
- Pig Schemas
- Writing Data
8. PigLatin In-Depth
- FILTERing Data
- Grouping and Sorting Data
- Pig Expressions and Functions
- Joining Multiple Datasets
- Validating Datasets
- Advanced Topics
9. Debugging Pig Scripts
- Strategies for Debugging Pig Programs
- Handling Bad Data
- Using ILLUSTRATE
10. Best Practices for Pig
- Best Practices
- Achieving Optimal Pig Performance in Production
- Using Hive vs. Using Pig
Throughout the course, you will perform hands-on exercises to solidify your understanding of the concepts.