Live Chat
Monday - Friday 8am - 6pm EST Chat Now
Contact Us
Monday - Friday 8am - 8pm EST 1-800-268-7737 Other Contact Options

Cart () Loading...

    • Quantity:
    • Delivery:
    • Dates:
    • Location:


Data Mining Techniques: Theory and Practice

GK# 2826


Course Overview


In this course, you will learn about data mining methodology that is a superset to the SAS SEMMA methodology around which SAS Enterprise Miner is organized. You will also learn about a wide range of data mining algorithms as well as theoretical knowledge and practical skills. In this class, you will work through all the steps of a data mining project, beginning with problem definition and data selection, and continuing through data exploration, data transformation, sampling, portioning, modeling, and assessment.


  • Delivery Format:
  • Date:
  • Location:
  • Access Period:



Viewing outline for:

Classroom Live Outline

1. Introduction to Data Mining

  • What is data mining?
  • Directed and undirected data mining
  • Models
  • Profiling and prediction

2. Data Mining Methodology

  • Why have a methodology?
  • How data miners can inadvertently learn things that are not true
  • Translating business problems into data mining problems
  • The importance of model stability
  • Finding the right input variables
  • Sampling to create balanced model sets
  • Partitioning to create training, validation, and test sets
  • Data preparation
  • Model assessment

3. Data Exploration

  • Developing intuition about data
  • Data structure
  • Data types
  • Data values
  • Exploring distributions
  • Summary statistics
  • Histograms
  • using SAS Enterprise Miner for data exploration

4. Regression Models

  • The null hypothesis
  • Statistical significance
  • Confidence bounds
  • Variance and standard deviation
  • Standardized values
  • Correlation
  • Linear regression
  • Logistic regression
  • Using SAS Enterprise Miner to build regression models

5. Decision Trees

  • Decision trees as data exploration and classification tools
  • Decision trees for modeling and scoring
  • Decision trees for variable selection
  • Alternate representations of decision trees
  • Algorithms used to build decision trees
  • Splitting criteria
  • Recognizing instability and overfitting in decision tree models
  • Capturing interactions between variables
  • Using SAS Enterprise Miner to build decision trees

6. Neural Networks

  • Origins of neural networks
  • Neural networks compared with regression
  • Algorithms used to train neural networks
  • Data preparation requirements for neural networks
  • Picking appropriate inputs for neural networks
  • Creating neural network models using SAS Enterprise Miner

7. Memory-Based Reasoning

  • Similarity and distance
  • Distance metrics appropriate for different kinds of data
  • The role of the training set in memory-based reasoning (MBR)
  • Combining the votes of several neighbors
  • Other K-nearest neighbor techniques
  • Collaborative filtering
  • Using the SAS Enterprise Miner MBR node

8. Clustering

  • More on similarity and distance
  • The k-means algorithm
  • Divisive clustering
  • Agglomerative clustering
  • Data preparation for clustering
  • Interpreting clusters
  • Finding clusters with SAS Enterprise Miner

9. Survival Analysis

  • Origins of survival analysis
  • How business data is different from clinical data
  • Hazards and hazard charts
  • Retention curves and survival curves
  • Calculating survival from retention
  • Calculating hazards empirically
  • Parametric hazard models
  • Censoring
  • Competing risks
  • Survival-based forecasting
  • Using SAS code in SAS Enterprise Miner to create survival curves

10. Association Rules

  • Market basket analysis
  • Association rules
  • Sequential pattern analysis
  • Using SAS Enterprise Miner to discover associations in retail data

11. Link Analysis

  • Background on graph theory
  • Sphere of influence
  • Using link analysis to generate derived variables
  • Graph-coloring algorithm
  • Kleinberg's algorithm

12. Genetic Algorithms

  • Optimization techniques and problems (SAS/OR software)
  • Other algorithms
  • Linear programming problems
  • Genetic algorithms


Viewing labs for:

Classroom Live Labs

Exercises or hands-on workshops are included with most SAS courses.

Who Should Attend

  • Business analysts and their managers
  • Statisticians
Course Delivery

This course is available in the following formats:

Classroom Live

Receive face-to-face instruction at one of our training center locations.

Duration: 3 day

Request this course in a different delivery format.