Wednesday 16 October 2013

67 How will you re-create Persistent Lookup Files on a particular date?

Few days back one of my friend was giving client interview and one of the question was "How will you re-generate / refresh Lookup Persistent Cache on a particular time period?" ... [ another thing is, I realized that even after cracking the interview for a company, you again have to face client round - even before your actual joining process starts]

Below are the ways which we discussed ..

Before thinking about how we refresh the persistent cache file, we will see how the file gets created.

Lets assume below scenario..


We have 1. Departments [Oracle] as a Source,
2. Departments [as a Lookup]
3. Target as a Flat File.



As shown in above image, you have to check the "Lookup Cache Persistent" box, to enable persistent cache [or create persistent cache file]

To validate if your session is creating persistent cache [file] or not, run the workflow and verify the session log -- which is as below


TT_11183 : Enabled using [1 (auto)] additional concurrent pipelines to build lookup caches. (Session likely will build or refresh [1] lookup caches; [0] on-demand only).

2013-10-17 23:59:58 : INFO : (7924 | MAPPING) : (IS | BodheeTree) : node01_mandar-ad09ce72 : TM_6660 : Total Buffer Pool size is 24000000 bytes and Block size is 65536 bytes.

2013-10-17 23:59:58 : INFO : (7924 | MAPPING) : (IS | BodheeTree) : node01_mandar-ad09ce72 : TT_11162 : INFO: Transformation [lkp_Dep]: Input Group Id=0: transforming up to 288 row(s) at a time.

2013-10-17 23:59:58 : INFO : (7924 | MAPPING) : (IS | BodheeTree) : node01_mandar-ad09ce72 : TT_11163 : INFO: Transformation [lkp_Dep]: Output Group Id=0: transforming up to 288 row(s) [288] at a time.

2013-10-17 23:59:58 : INFO : (7924 | LKPDP_1) : (IS | BodheeTree) : node01_mandar-ad09ce72 : DBG_21097 : Lookup Transformation [lkp_Dep]: Default sql to create lookup cache: SELECT DEPARTMENT_ID FROM DEPARTMENTS ORDER BY DEPARTMENT_ID

2013-10-17 23:59:58 : INFO : (7924 | LKPDP_1) : (IS | BodheeTree) : node01_mandar-ad09ce72 : DBG_21249 : Initializing Transform: lkp_Dep{{BLD}}

2013-10-17 23:59:58 : INFO : (7924 | LKPDP_1) : (IS | BodheeTree) : node01_mandar-ad09ce72 : DBG_21371 : Number of Input Transforms = [1]:

2013-10-17 23:59:58 : INFO : (7924 | LKPDP_1) : (IS | BodheeTree) : node01_mandar-ad09ce72 : DBG_21259 : Input Transform[0]: [lkp_Dep{{DSQ}}]

2013-10-17 23:59:58 : INFO : (7924 | DIRECTOR) : (IS | BodheeTree) : node01_mandar-ad09ce72 : TT_11184 : Starting additional concurrent pipeline to build the lookup cache needed by Lookup transformation [lkp_Dep].

2013-10-17 23:59:58 : INFO : (7924 | LKPDP_1:TRANSF_1_1) : (IS | BodheeTree) : node01_mandar-ad09ce72 : DBG_21294 : DBG_21294 Lookup cache creation completed : (Thu Oct 17 23:59:58 2013)

2013-10-17 23:59:58 : INFO : (7924 | LKPDP_1:TRANSF_1_1) : (IS | BodheeTree) : node01_mandar-ad09ce72 : CMN_1671 : Created new cache files PMLKUP5_5_0W32.[dat/idx] in directory C:\Informatica\9.0.1\server\infa_shared\Cache for Lookup [lkp_Dep].

2013-10-17 23:59:58 : INFO : (7924 | LKPDP_1:TRANSF_1_1) : (IS | BodheeTree) : node01_mandar-ad09ce72 : DBG_21641 : lkp_Dep: Index cache size = [71568000], Data cache size = [143138816]

2013-10-17 23:59:58 : INFO : (7924 | LKPDP_1:TRANSF_1_1) : (IS | BodheeTree) : node01_mandar-ad09ce72 : DBG_21216 : Finished transformations for Source Qualifier [lkp_Dep{{DSQ}}]. Total errors [0

As you can see in the session log, lookup query got created and got fired against the respective database to create the cache file.

However, when you run the same worfklow for second time, session log will say somedifferent story as below --

Severity Timestamp Node Thread Message Code Message

INFO 10/18/2013 12:10:58 AM node01_mandar-ad09ce72 LKPDP_1 TE_7212 Increasing [Index Cache] size for transformation [lkp_Dep] from [71567155] to [71568000].

INFO 10/18/2013 12:10:58 AM node01_mandar-ad09ce72 LKPDP_1 TE_7212 Increasing [Data Cache] size for transformation [lkp_Dep] from [143134310] to [143138816].

INFO 10/18/2013 12:10:58 AM node01_mandar-ad09ce72 LKPDP_1 CMN_1669 Using existing cache files PMLKUP5_5_0W32.[dat/idx] in directory C:\Informatica\9.0.1\server\infa_shared\Cache for Lookup [lkp_Dep].

INFO 10/18/2013 12:10:58 AM node01_mandar-ad09ce72 LKPDP_1 DBG_21641 lkp_Dep: Index cache size = [71568000], Data cache size = [143138816]

INFO 10/18/2013 12:10:58 AM node01_mandar-ad09ce72 READER_1_1_1 DBG_21438 Reader: Source is [ORCL], user [HR]
As log says, second time when workflow ran, it validated if the cache file is present or not. and as it finds the cache file at respective location, it used same file to generate the Lookup Cache

Now the file which got genrated is a static file; this will not automatically gets updated.

When persistent cache will be useful ?
The time when you are sure that the master table on which you are doing the lookup will not get updated / inserted frequently. e.g. Fiscal Year table which you populate once in a year.

How will you update the cache file [which new records] or re-generate it?

1. If you delete /move the file from respective location [ or if], informatica will not get the respective file and it will automatically re-generates it.

So, you can create a command task [which will run at end of every month] to delete / move the persistent cache file from respective location.


 As shown in above image, we will create one more task to delete the persistent cache and at we will run this task only at month end. We will write below condition between task1 and task2.

DATE_DIFF (LAST_DAY(sysdate), sysdate,'DD') = 0

2. Instead of deleting the persistent cache, we also have an option to re-build the cache at session level itself.


As shown in above image, click the check box for "Re-Cache From Lookup Source", which will re-generate the lookup cache files, before actaully running the mappings.

Create one more session as below which will re-generate cache files and before running the actual workflow.




As explained in Point#1, even s_RegenerateLookupCache will only run on month end which will have below condition at link

DATE_DIFF (LAST_DAY(sysdate), sysdate,'DD') = 0


2 comments:

  1. Thanks for the explanation sir.
    In 2nd approach for regenerating persistent cache, i think the sessions should not be parallel but instead sequential.
    Please correct me if i am wrong.

    ReplyDelete
  2. Even its a parallel, it won't harm since we have added a link condition. That condition will always check for Date and then only execute the corresponding task.

    ReplyDelete