Live Chat
Monday - Friday 8am - 6pm EST Chat Now
Contact Us
Monday - Friday 8am - 8pm EST 1-866-716-6688 Other Contact Options

Cart () Loading...

    • Quantity:
    • Delivery:
    • Dates:
    • Location:


Data Science Essentials

Understand the data science pipeline: data wrangling, analysis, machine learning, and communication and visualization.

GK# 7293

Course Overview


Data science is an applied study of data for statistical analysis and problem solving. This path of courses covers the data science pipeline needed by the everyday data scientist: data wrangling, analysis, machine learning, and communication and visualization.


  • Delivery Format:
  • Date:
  • Location:
  • Access Period:


What You'll Learn

  • Data Science Overview
  • Data Gathering
  • Data Filtering
  • Data Transformation
  • Data Exploration
  • Data Integration
  • Data Analysis Concepts
  • Data Classification and Machine Learning
  • Data Communication and Visualization


Viewing outline for:

On-Demand Outline

Data Science Overview

Defining Data Science

  • What is Data Science?
  • What is Data Wrangling?
  • What is Big Data?
  • What is Machine Learning?

Implementing Data Science

  • Data Science Terminology
  • Data Communication
  • Data Science Pipeline
  • Data Science Tools

Data Gathering

Data Extraction

  • Basic Data Gathering
  • Gathering Web Data
  • Extracting Spreadsheet Data with in2csv
  • Extracting Spreadsheet Data with Agate
  • Extracting Legacy Data from dBASE Tables
  • Extracting HTML Data


  • Gathering Metadata
  • Working with HTTP Headers
  • Working with Linux Log Files
  • Working with Email Headers

Remote Data

  • Connecting to Remote Data
  • Copying Remote Data
  • Synchronizing Remote Data

Data Filtering

Introduction to Data Filtering

  • Data Filtering Techniques and Tools
  • Processing Date Formats
  • Filtering HTTP Headers
  • Filtering CSV Data
  • Replacing Values with sed
  • Dropping Duplicate Data
  • Working with JPEG Headers
  • Filtering PDF Files
  • Filtering for Invalid Data
  • Parsing robots.txt

Data Transformation

File Format Conversions

  • Converting CSV to JSON
  • Converting XML to JSON
  • Converting CSV to SQL
  • Converting SQL to CSV
  • Changing CSV Delimiters

Data Conversions

  • Converting Dates
  • Converting Numbers
  • Rounding Numbers

Optical Character Recognition

  • OCR JPEG Images
  • Extracting Text from PDF Files

Data Exploration

Introduction to Data Exploration

  • Exploring CSV Data
  • Exploring CSV Statistics
  • Querying CSV Data
  • Plotting from the Command Line
  • Counting Words
  • Exploring Directory Trees
  • Determining Word Frequencies
  • Taking Random Samples
  • Finding the Top Rows
  • Finding Repeated Records
  • Identifying Outliers in Data

Data Integration

Introduction to Data Integration

  • Joining CSV Data
  • Concatenating Log Files
  • Sorting Text Files
  • Merging XML Data
  • Aggregating Data
  • Normalizing Data
  • Denormalizing Data
  • Pivoting Data Tables
  • Homogenizing Rows

Data Analysis Concepts

Data Science Math

  • Basic Data Science Math
  • Linear Algebra Vector Math
  • Linear Algebra Matrix Math
  • Linear Algebra Matrix Decomposition

Data Analysis Concepts

  • Data Formation
  • Introduction to Probability
  • Working with Events
  • Working with Probability
  • Continuous Probability Distributions
  • Discrete Probability Distributions
  • Introduction to Bayes Theorem

Estimates and Measures

  • Sampling Data
  • Statistical Measures
  • Estimators
  • Sampling Distributions
  • Confidence Intervals
  • Hypothesis Tests
  • Chi-Square

Data Classification and Machine Learning

Machine Learning Introduction

  • Introduction to Supervised Learning
  • Introduction to Unsupervised Learning
  • Understanding Linear Regression
  • Working with Predictors

Regression and Classification

  • Understanding Logistic Regression
  • Understanding Dummy Variables
  • Using Naïve Bayes Classification
  • Working with Decision Trees


  • K-means Clustering
  • Using Cluster Validation
  • Using Principle Component Analysis

Errors and Validation

  • Introduction to Errors
  • Defining Underfitting
  • Defining Overfitting
  • Using K-folds Cross Validation
  • Using Neural Networks
  • Support Vector Machines (SVM)

Data Communication and Visualization

Introduction to Data Communication

  • Effective Communication and Visualization
  • Correlation Versus Causation
  • Simpson’s Paradox
  • Presenting Data
  • Documenting Data Science
  • Visual Data Exploration


  • Creating Scatter Plots
  • Plotting Line Graphs
  • Creating Bar Charts
  • Creating Histograms
  • Creating Box Plots
  • Creating Network Visualizations
  • Creating a Bubble Plot
  • Creating Interactive Plots

Who Should Attend


Individuals with some programming and math experience working toward implementing data science in their everyday work.

Follow-On Courses


R is a free software environment for statistical computing and graphics and has become an important tool in modern data science. In this course, you will learn the fundamental techniques and methods data scientists use in their everyday work.

Course Delivery

This course is available in the following formats:


Train at your own pace with 24/7 access to courses that help you acquire must-have technology skills.

Request this course in a different delivery format.