Saturday 19 April 2014

55 - How to implement scd-1 using only sql query

What -- To check how to implement how to implement SCD-1 using only SQL Query, we first check What is SCD -1. If you google it, you will be number of results, below is the one from WiKi..
http://en.wikipedia.org/wiki/Slowly_changing_dimension
In simple words, SCD-1 is type of Dimension Modeling wherein we do not maintain History of changes; and based on primary key [or surrogate key] we update the existing records and insert records which are new.

How -- We will take a simple example of Countries table [Dimension table], to implement SCD-1.

We will insert data from Country_Stg to Countries_Dim, both Stg and Dim are relational tables; However our stage table holds incremental data [Truncate and Reload] whereas Dim table holds history / all data.

Structure of our Countries table is as below. Both Stage and Dim table has same structure as below

Country_ID is our primary key. If we go by Informatica Option for SCD-1, we will have to add Target Lookup on Countries table [ as its our target ] and based on the lookup value we will decide if records has to be updated or inserted. To update the record we will again use Update Strategy.

SQL Query Option -- 1

1. You can write a simple SQL query at SQL Over-ride, which will return only new records from Stg table records; by this way you will insert only new records to your Dimension table.

SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG STG
 WHERE NOT EXISTS(
 SELECT 1
 FROM COUNTRIES_DIM DIM
 WHERE DIM.COUNTRY_ID = STG.COUNTRY_ID
 )


2. Write a simple update statement to update existing records of Dim table from Stg table at Source Post SQL

 -- In Teradata
 UPDATE DIM
 FROM COUNTRIES_DIM DIM, COUNTRIES_STG STG
 SET COUNTRY_NAME = STG.COUNTRY_NAME
 , REGION_ID = STG.REGION_ID
 WHERE DIM.COUNTRY_ID = STG.COUNTRY_ID

 -- In Oracle
 UPDATE COUNTRIES_DIM DIM
 SET (COUNTRY_NAME, REGION_ID) =
 (SELECT COUNTRY_NAME, REGION_ID
 FROM COUNTRIES_STG STG
 WHERE STG.COUNTRY_ID = DIM.COUNTRY_ID
 )



 SQL Query Option -- 2
 Second option is to use simple merge statement instead of separate Insert and Update as below.


 MERGE INTO COUNTRIES_DIM DIM
   USING (SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG ) STG
   ON (DIM.COUNTRY_ID = STG.COUNTRY_ID)
   WHEN MATCHED THEN
   UPDATE  SET COUNTRY_NAME = STG.COUNTRY_NAME
                , REGION_ID = STG.REGION_ID
   WHEN NOT MATCHED THEN
   INSERT (DIM.COUNTRY_ID, DIM.COUNTRY_NAME, DIM.REGION_ID)
     VALUES (STG.COUNTRY_ID, STG.COUNTRY_NAME, STG.REGION_ID)


 Now the above Merge statement will do all you work of Insert and Update. Add this at Source Pre-SQL and as you do not need to process anything from actual Source Qualifier Query, just simply add below statement in Source Qualifier Over-Ride..

SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG STG

WHERE 1=2

No comments:

Post a Comment