Archive for November, 2014

Informatica partitioning Part-1

Posted: November 6, 2014 in Informatica

What is partitioning and what is the advantage.

Lets understand this in general terms first.

Let’s say i want to fill a ‘water tank’ with water. If one person has to do this, he can accomplish the task in 1Hr. But I want to achieve this task in 10 min. Then what should i do?

Solution:- I need to add more people to the task( in this case we need 6 people). This can be archived not only by adding 6 people but each person should  be doing the same amount of work as others. Lets say out of 6 People 3 are assigned work that can be finished in 4 hr(it means they are ideal the remaining 4 hrs) and the other 3 are working for 8 hr’s, can we accomplish the task in time? (obvious answer is no as there is unequal distribution of work.

“Equal distribution is the Key”.

So what we understand is just adding resources will not work, Utilization is equally important.

Informatica way :- Let’s correlate the above stuff in ETL way…

I want to load 100 million records, Initially the session took 6 hours. But our requirement is to load it in less than 1 Hr. We can do session Partitioning (nothing but creating multiple threads) to improve performance ( just like adding more people) , So half task is accomplished the second half is to utilize these threads properly.

The second part is very important Lets name it as Good partitioning and Bad partitioning.
Ex:-
Table LOCATION

COUNT LOCATION LOC_ID
90 A 1
5 B 2
3 C 3
2 D 4

Bad partitioning:- Let’s understand bad partitioning using the below example.

To Load the above table in Informatica, lets say we did session partitioning using Location and used KEY based partitioning.
90 records belongs to Location A
5 records belongs to Location B
3 records belongs to Location C
2 records belongs to Location D
If we do session partitioning on this location key (nothing but creating multiple threads and assign location key to it)..As below

First thread assign Location A, second thread Location B respectively..
First thread will take more time to finish as it has more records to work with, rest all will finish soon and sit ideal (wast of 3 threads). This is called Bad partitioning as we are not distributed the task equally across the $ threads

Advertisements