Live Chat
Monday - Friday 8am - 6pm EST Chat Now
Contact Us
Monday - Friday 8am - 8pm EST 1-866-716-6688 Other Contact Options
Checkout

Cart () Loading...

    • Quantity:
    • Delivery:
    • Dates:
    • Location:

    $

Introduction to SAS and Hadoop

GK# 2689

Course Overview

TOP

In this course, you will learn how to use SAS programming methods to read, write, and manipulate Hadoop data. You will learn about Base SAS methods, including reading and writing raw data with the DATA step as well as managing the Hadoop file system and executing Map-Reduce and Pig code from SAS via the HADOOP procedure. In addition, the SAS/ACCESS Interface to Hadoop methods that allow LIBNAME access and SQL pass-through techniques to read and write Hadoop HIVE or Cloudera Impala tables structures is covered. You will receive a brief overview of additional SAS and Hadoop technologies, including DS2, high-performance analytics, SAS LASR Server, and In-Memory Statistics, as well as the computing infrastructure and data access methods that support these.

Schedule

TOP
  • Delivery Format:
  • Date:
  • Location:
  • Access Period:

$

Outline

TOP
Viewing outline for:

Classroom Live Outline

1. Introduction

  • What is Hadoop?
  • How SAS interfaces with Hadoop

2. Accessing HDFS and Invoking Hadoop Applications from SAS

  • Overview of methods available in Base SAS for interacting with Hadoop
  • Reading and writing Hadoop files using Base SAS
  • Methods
  • Executing mapreduce code
  • Executing Pig code using PROC HADOOP

3. Using the SQL Pass-Through Facility

  • Understand the SQL procedure pass-through facility
  • Connecting to a Hadoop Hive database
  • Learning methods to query Hive tables
  • Investigating Hadoop Hive metadata
  • Creating SQL procedure pass-through queries
  • Creating and loading Hive tables with SQL pass-through EXECUTE statements
  • Handling Hive STRING data types

4. Using the SAS/ACCESS LIBNAME Engine

  • Using the LIBNAME statement for Hadoop
  • Using data set options
  • Creating views
  • Combining tables
  • Benefits of the LIBNAME method
  • Using PROC HDMD to access delimited data, XML data, and other non-Hive formats
  • Performance considerations for the SAS/ACCESS LIBNAME statement
  • Copying data from a SAS library to a Hive library

5. Partitioning and Clustering Hive Tables

  • Identifying partitioning, clustering, and indexing methods in Hive
  • How partitioning and clustering can increase query performance
  • Creating and loading partitioned and clustered Hive tables

6. Overview of SAS In-Memory Analytics and the Code Accelerator for Hadoop

  • Using high-performance procedures and the SASHDAT library engine
  • Creating a LASR Analytic server session
  • Using the SASIOLA engine
  • Executing DS2 threads in the Hadoop cluster to summarize data
  • Using PROC HDMD to access HDFS files

Labs

TOP
Viewing labs for:

Classroom Live Labs

Exercises or hands-on workshops are included with most SAS courses

Prerequisites

TOP

Who Should Attend

TOP

SAS programmers who need to access data in Hadoop from within SAS

Course Delivery

This course is available in the following formats:

Classroom Live

Receive face-to-face instruction at one of our training center locations.

Duration: 2 day

Request this course in a different delivery format.
Enroll