What -- To check how to implement how to implement SCD-1 using only SQL Query, we first check What is SCD -1. If you google it, you will be number of results, below is the one from WiKi..
http://en.wikipedia.org/wiki/Slowly_changing_dimension
In simple words, SCD-1 is type of Dimension Modeling wherein we do not maintain History of changes; and based on primary key [or surrogate key] we update the existing records and insert records which are new.
How -- We will take a simple example of Countries table [Dimension table], to implement SCD-1.
We will insert data from Country_Stg to Countries_Dim, both Stg and Dim are relational tables; However our stage table holds incremental data [Truncate and Reload] whereas Dim table holds history / all data.
Structure of our Countries table is as below. Both Stage and Dim table has same structure as below
Country_ID is our primary key. If we go by Informatica Option for SCD-1, we will have to add Target Lookup on Countries table [ as its our target ] and based on the lookup value we will decide if records has to be updated or inserted. To update the record we will again use Update Strategy.
SQL Query Option -- 1
1. You can write a simple SQL query at SQL Over-ride, which will return only new records from Stg table records; by this way you will insert only new records to your Dimension table.
SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG STG
WHERE NOT EXISTS(
SELECT 1
FROM COUNTRIES_DIM DIM
WHERE DIM.COUNTRY_ID = STG.COUNTRY_ID
)
2. Write a simple update statement to update existing records of Dim table from Stg table at Source Post SQL
-- In Teradata
UPDATE DIM
FROM COUNTRIES_DIM DIM, COUNTRIES_STG STG
SET COUNTRY_NAME = STG.COUNTRY_NAME
, REGION_ID = STG.REGION_ID
WHERE DIM.COUNTRY_ID = STG.COUNTRY_ID
-- In Oracle
UPDATE COUNTRIES_DIM DIM
SET (COUNTRY_NAME, REGION_ID) =
(SELECT COUNTRY_NAME, REGION_ID
FROM COUNTRIES_STG STG
WHERE STG.COUNTRY_ID = DIM.COUNTRY_ID
)
SQL Query Option -- 2
Second option is to use simple merge statement instead of separate Insert and Update as below.
MERGE INTO COUNTRIES_DIM DIM
USING (SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG ) STG
ON (DIM.COUNTRY_ID = STG.COUNTRY_ID)
WHEN MATCHED THEN
UPDATE SET COUNTRY_NAME = STG.COUNTRY_NAME
, REGION_ID = STG.REGION_ID
WHEN NOT MATCHED THEN
INSERT (DIM.COUNTRY_ID, DIM.COUNTRY_NAME, DIM.REGION_ID)
VALUES (STG.COUNTRY_ID, STG.COUNTRY_NAME, STG.REGION_ID)
Now the above Merge statement will do all you work of Insert and Update. Add this at Source Pre-SQL and as you do not need to process anything from actual Source Qualifier Query, just simply add below statement in Source Qualifier Over-Ride..
SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG STG
WHERE 1=2
http://en.wikipedia.org/wiki/Slowly_changing_dimension
In simple words, SCD-1 is type of Dimension Modeling wherein we do not maintain History of changes; and based on primary key [or surrogate key] we update the existing records and insert records which are new.
How -- We will take a simple example of Countries table [Dimension table], to implement SCD-1.
We will insert data from Country_Stg to Countries_Dim, both Stg and Dim are relational tables; However our stage table holds incremental data [Truncate and Reload] whereas Dim table holds history / all data.
Structure of our Countries table is as below. Both Stage and Dim table has same structure as below
Country_ID is our primary key. If we go by Informatica Option for SCD-1, we will have to add Target Lookup on Countries table [ as its our target ] and based on the lookup value we will decide if records has to be updated or inserted. To update the record we will again use Update Strategy.
SQL Query Option -- 1
1. You can write a simple SQL query at SQL Over-ride, which will return only new records from Stg table records; by this way you will insert only new records to your Dimension table.
SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG STG
WHERE NOT EXISTS(
SELECT 1
FROM COUNTRIES_DIM DIM
WHERE DIM.COUNTRY_ID = STG.COUNTRY_ID
)
2. Write a simple update statement to update existing records of Dim table from Stg table at Source Post SQL
-- In Teradata
UPDATE DIM
FROM COUNTRIES_DIM DIM, COUNTRIES_STG STG
SET COUNTRY_NAME = STG.COUNTRY_NAME
, REGION_ID = STG.REGION_ID
WHERE DIM.COUNTRY_ID = STG.COUNTRY_ID
-- In Oracle
UPDATE COUNTRIES_DIM DIM
SET (COUNTRY_NAME, REGION_ID) =
(SELECT COUNTRY_NAME, REGION_ID
FROM COUNTRIES_STG STG
WHERE STG.COUNTRY_ID = DIM.COUNTRY_ID
)
SQL Query Option -- 2
Second option is to use simple merge statement instead of separate Insert and Update as below.
MERGE INTO COUNTRIES_DIM DIM
USING (SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG ) STG
ON (DIM.COUNTRY_ID = STG.COUNTRY_ID)
WHEN MATCHED THEN
UPDATE SET COUNTRY_NAME = STG.COUNTRY_NAME
, REGION_ID = STG.REGION_ID
WHEN NOT MATCHED THEN
INSERT (DIM.COUNTRY_ID, DIM.COUNTRY_NAME, DIM.REGION_ID)
VALUES (STG.COUNTRY_ID, STG.COUNTRY_NAME, STG.REGION_ID)
Now the above Merge statement will do all you work of Insert and Update. Add this at Source Pre-SQL and as you do not need to process anything from actual Source Qualifier Query, just simply add below statement in Source Qualifier Over-Ride..
SELECT STG.COUNTRY_ID,
STG.COUNTRY_NAME,
STG.REGION_ID
FROM COUNTRIES_STG STG
WHERE 1=2
No comments:
Post a Comment