Live Chat
Monday - Friday 8am - 6pm EST Chat Now
Contact Us
Monday - Friday 8am - 8pm EST 1-800-268-7737 Other Contact Options

Cart () Loading...

    • Quantity:
    • Delivery:
    • Dates:
    • Location:


Advanced Methods in Data Science and Big Data Analytics

Learn to use several open source tools to address big data challenges.

GK# 4192

Course Overview


This course builds on skills developed in the Data Science and Big Data Analytics  course. The main focus areas cover Hadoop (including Pig, Hive, and HBase), natural language processing, social network analysis, simulation, random forests, multinomial logistic regression, and data visualization. With a technology-neutral approach, this course utilizes several open-source tools to address big data challenges.


  • Delivery Format:
  • Date:
  • Location:
  • Access Period:


What You'll Learn

  • MapReduce functionality
  • NoSQL databases and Hadoop Ecosystem tools for analyzing large-scale, unstructured data sets
  • Natural language processing, social network analysis, and data visualization concepts
  • Use advanced quantitative methods, and apply one of them in a Hadoop environment
  • Apply advanced techniques to real-world datasets in a final lab


Viewing outline for:

Classroom Live Outline

1. MapReduce and Hadoop

  • The MapReduce Framework
  • Apache Hadoop
  • Hadoop Distributed File System
  • YARN

2. Hadoop Ecosystem and NoSQL

  • Hadoop Ecosystem
  • Pig
  • Hive
  • NoSQL--Not only SQL
  • HBase
  • Spark

3. Natural Language Processing

  • Introduction to NLP
  • Text Preprocessing
  • Beyond Bag of Words
  • Language Modeling
  • POS Tagging and HMM
  • Sentiment Analysis and Topic Modeling

4. Social Network Analysis

  • Introduction to SNA and Graph Theory
  • Most Important Nodes
  • Communities and Small World
  • Network Problems and SNA Tools

5. Data Science Theory and Methods

  • Simulation
  • Random Forests
  • Multinomial Logistic Regression

6. Data Visualization

  • Perception and Visualization
  • Visualization of Multivariate Data


Viewing labs for:

Classroom Live Labs

In addition to lecture and demonstrations, this course includes labs designed to give you practical experience.



Who Should Attend

  • Aspiring data scientists
  • Data analysts that have completed the associate level Data Science and Big Data Analytics course
  • Computer scientists wanting to learn MapReduce and methods for analyzing unstructured data such as text.
Course Delivery

This course is available in the following formats:

Classroom Live

Receive face-to-face instruction at one of our training center locations.

Duration: 5 day

Request this course in a different delivery format.