In a previous blog entry I discussed the concept of the Slowly Changing Dimension, or SCD. A good example is how to handle the analysis of Sales by Customer Region when a customer moves from one Region to another. The way to solve this situation is with a Type 2 SCD design. Let’s take a look…
The Type 2 SCD design requires that an attribute is identified as a historical attribute such that the system will keep track of the changes of that attribute so that ultimately analysis can be performed accurately across multiple dimensions including the Time dimension. The way this is done is through a “Surrogate Key” being generated in the Dimension table. Consider this design for the DimCustomer dimension table in the Data Warehouse.
CREATE TABLE [dbo].[DimCustomer](
[CustomerSK] [int] IDENTITY(1,1) NOT NULL,
[CustomerKey] [int] NOT NULL,
[FirstName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NULL,
[GeographyKey] [int] NOT NULL,
[StartDate] [datetime] NULL,
[EndDate] [datetime] NULL,
CONSTRAINT [PK_DimCustomer] PRIMARY KEY CLUSTERED
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
In this case, GeographyKey will be the “Historical Attribute” as we want to track any changes in its value for a particular customer. CustomerKey is the “Business Key” because it’s the Primary Key from the Operational OLTP Database, but is now an ordinary attribute in the dimension table. CustomerSK is the surrogate key which will be generated by the system using the IDENTITY property and will be the Primary Key of the dimension table. The actual value of the surrogate key is of little significance except that it is unique and will be used to form a relationship with the Fact table.
CREATE TABLE [dbo].[FactInternetSales](
[OrderQuantity] [smallint] NOT NULL,
[SalesAmount] [money] NOT NULL,
[DiscountPercentage] [float] NOT NULL,
[DiscountAmount] [float] NOT NULL,
[ProductCost] [money] NOT NULL,
[ProductKey] [int] NOT NULL,
[CustomerSK] [int] NOT NULL,
[OrderDateTimeKey] [int] NOT NULL
) ON [PRIMARY]
ALTER TABLE [dbo].[FactInternetSales] WITH CHECK ADD CONSTRAINT [FK_FactInternetSales_DimCustomer] FOREIGN KEY([CustomerSK])
REFERENCES [dbo].[DimCustomer] ([CustomerSK])
ALTER TABLE [dbo].[FactInternetSales] CHECK CONSTRAINT [FK_FactInternetSales_DimCustomer]
To populate the DimCustomer dimension table we could use the following query to access data from the fully normalized Operational OLTP database:
SELECT Person.Contact.FirstName, Person.Contact.LastName, Sales.Customer.CustomerID, Sales.Customer.TerritoryID
INNER JOIN Sales.Individual
ON Person.Contact.ContactID = Sales.Individual.ContactID
RIGHT OUTER JOIN Sales.Customer
ON Sales.Individual.CustomerID = Sales.Customer.CustomerID
Note that the CustomerSK surrogate key is system generated and increments by one for each new row added to the dimension table. Note also that the StartDate and EndDate attributes are populated using the Slowly Changing Dimension transform or similar logic within the SSIS ETL incremental load process. For each customer move, the EndDate will be updated to reflect the date that the customer ceased to live in the old region (GeographyKey). A new row will be inserted for the same customer with the new region code and the EndDate set to NULL to indicate the current region. The CustomerSK surrogate key guarantees a unique Primary Key even though the CustomerKey has a duplicate value.
When the fact table rows are updated, the fact table load step within the SSIS ETL incremental load process may perform a sub-select to match the CustomerSK surrogate key with the correct customer row for the current location forming the correct Foreign Key value in the Fact table. If the incremental load is performed correctly in this way, the subsequent analysis will be accurate without any differences in end user requests. Here is the inner query of the fact table load query to populate the CustomerSK foreign key in the fact table.
FROM AdvWorksDW1.dbo.DimCustomer AS dc
WHERE (dc.CustomerKey = Sales.SalesOrderHeader.CustomerID) AND (EndDate IS NULL))
This is the “magic” of the Type 2 SCD.
I developed and tested these code samples by adapting the AdventureWorks BI Sample available under http://sqlserversamples.codeplex.com/