ETL Tool – Informatica – Part 2

Power Exchange:

Informatica Power Exchange as a stand alone service or along with Power Center, helps organizations leverage data by avoiding manual coding of data extraction programs. Power Exchange supports batch, real time and changed data capture options in main frame (DB2, VSAM, IMS etc.), mid range (AS400 DB2 etc.,), and for relational databases (oracle, sql server, db2 etc) and flat files in unix, linux and windows systems.

Power Channel:

This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through Firewalls, transfer files over FTP, etc.

Meta Data Exchange:

Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when used with Power Center. For example, an organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data
models. Functional and technical team should have spent much time and effort in creating the data model’s data structures(tables, columns, data types, procedures, functions, triggers etc). By using meta data exchange, these data structures can be imported into power center to identify source and target mappings which leverages time and effort. There is no need for Informatica developer to create these data structures once again.

Power Analyzer:

Power Analyzer provides organizations with reporting facilities. PowerAnalyzer makes accessing, analyzing, and sharing enterprise data simple and easily available to decision makers. PowerAnalyzer enables to gain insight into business processes and develop business intelligence.

With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate information from data stored in a data warehouse, data mart, operational data
store, or other data storage models. PowerAnalyzer is best with a dimensional data warehouse in a relational database. It can also run reports on data in any
table in a relational database that do not conform to the dimensional model.

Super Glue:

Superglue is used for loading metadata in a centralized place from several sources. Reports can be run against this superglue to analyze meta data.

Power Mart:

Power Mart is a departmental version of Informatica for building, deploying, and managing data warehouses and data marts. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. Power Center supports global repositories and networked repositories and it can be connected to several sources.  Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Power Mart can extensible grow to an enterprise implementation and it is easy for developer productivity through a code-less environment.

Note:This is not a complete tutorial on Informatica. We will add more Tips and Guidelines on Informatica in near future. Please visit us soon to check back. To know more about Informatica, contact its official website www.informatica.com

 

ETL Tool – Informatica – Part 1

Informatica:

Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of enterprise data integration software and ETL software.

The important Informatica Components are:

  • Power Exchange
  • Power Center
  • Power Center Connect
  • Power Exchange
  • Power Channel
  • Metadata Exchange
  • Power Analyzer
  • Super Glue

In Informatica, all the Metadata information about source systems, target systems and transformations are stored in the Informatica repository. Informatica’s Power Center Client and Repository Server access this repository to store and retrieve metadata.

Note: To know more about Metadata and its significance, please click here.

Source and Target:

Consider a Bank that has got many branches throughout the world. In each branch data may be stored in different source systems like oracle, sql server, terradata, etc. When the Bank decides to integrate its data from several sources for its management decisions, it may choose one or more systems like oracle, sql server, terradata, etc. as its data warehouse target. Many organisations prefer Informatica to do that ETL process, because
Informatica is more powerful in designing and building data warehouses. It can connect to several sources and targets to extract meta data from sources and targets, transform and load the data into target systems.

Guidelines to work with Informatica Power Center:

Repository: This is where all the metadata information is stored in the Informatica suite. The Power Center Client and the Repository Server would access this repository to retrieve, store and manage metadata.

Power Center Client: Informatica client is used for managing users, identifying source and target systems definitions, creating mapping and mapplets, creating sessions and run workflows etc.

Repository Server: This repository server takes care of all the connections between the repository and the Power Center Client.

Power Center Server: Power Center server does the extraction from source and then loading data into targets.

Designer: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within the Designer wizard.

Source Analyzer is used for extracting metadata from source systems.

Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial representation about the flow of data from source to target.

Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself.

Data Cleansing: The PowerCenter’s data cleansing technology improves data quality by validating, correctly naming and standardization of address data. A person’s address may  not be same in all source systems because of typos  and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse).

Transformation: Transformations help to transform the source data according to the requirements of target system. Sorting, Filtering, Aggregation, Joining are some of the examples of transformation. Transformations ensure the quality of the data being loaded into target and this is done during the mapping process from source to target.

Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. For example, if the fact tables are loaded before the lookup tables, then the target system will pop up an error message since the fact table is violating the foreign key validation. To avoid this, workflows can be created to ensure the correct flow of data from source to target.

Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server.

Power Center Connect: This component helps to extract data and metadata from ERP systems like IBM’s MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

Power Center Exchange: This component helps to extract data and metadata from ERP systems like IBM’s MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

 

Next ⇒ ETL Tool – Informatica – Part 2

 

Informatica Transformations – Part 5

XML Source Qualifier Transformation:

XML Source Qualifier is a Passive and Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources.

Advanced External Procedure Transformation:

Advanced External Procedure transformation is an Active and Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.

External Procedure Transformation:

External Procedure transformation is an Active and Connected / UnConnected transformations. Sometimes, the standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.

Differences between Advanced External Procedure and External Procedure Transformations:

  • External Procedure returns single value, whereas Advanced External Procedure returns multiple values.
  • External Procedure supports COM and Informatica procedures whereas AEP supports only Informatica Procedures.

 

[Thanks to Radhika for submitting this useful article on Informatica Transformations!]

 

Informatica Transformations – Part 4

Sequence Generator Transformation:

Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing keys.

It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL(You cannot add ports to this transformation). NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.

Stored Procedure Transformation:

Stored Procedure transformation is a Passive and Connected & UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc.

The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.

Sorter Transformation:

Sorter transformation is a Connected and an Active transformation. It allows to sort data either in ascending or descending order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct.

Source Qualifier Transformation:

Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source definition to a mapping, it is must to connect it to a Source Qualifier transformation. The Source Qualifier performs the various tasks such as overriding default SQL query, filtering records; join data from two or more tables etc.

Update Strategy Transformation:

Update strategy transformation is an Active and Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. You can specify how to treat source rows in table, insert, update, delete or data driven.

 

Next ⇒ Informatica Transformations – Part 5

 

Informatica Transformations – Part 3

Lookup Transformation:

Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is used to look up data in a relational table, view, or synonym. Lookup definition can be imported either from source or from target tables.

For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table.

Difference between Connected and UnConnected Lookup Transformation:

  • Connected lookup receives input values directly from mapping pipeline whereas UnConnected lookup receives values from: LKP expression from another transformation.
  • Connected lookup returns multiple columns from the same row whereas UnConnected lookup has one return port and returns one column from each row.
  • Connected lookup supports user-defined default values whereas UnConnected lookup does not support user defined values.

Normalizer Transformation:

Normalizer Transformation is an Active and Connected transformation. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data.

Rank Transformation:

Rank transformation is an Active and Connected transformation. It is used to select the top or bottom rank of data. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products.

Router Transformation:

Router is an Active and Connected transformation. It is similar to filter transformation. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions. It has input, output and default groups. For example, if we want to filter data like where State=Michigan, State=California, State=New York and all other States. It’s easy to route data to different tables.

 

Next ⇒ Informatica Transformations – Part 4

 

Informatica Transformations – Part 2

Aggregator Transformation:

Aggregator transformation is an Active and Connected transformation. This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups). For example, to calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.

Expression Transformation:

Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row before writing to the target. For example, to calculate discount of each product or to concatenate first and last names or to convert date to a string field.

Filter Transformation:

Filter transformation is an Active and Connected transformation. This can be used to filter rows in a mapping that do not meet the condition. For example, to know all the employees who are working in Department 10 or to find out the products that falls between the rate category $500 and $1000.

Joiner Transformation:

Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.

In order to join two sources, there must be at least one matching port. at least one matching port. While joining two sources it is a must to specify one source as master and the other as detail.

The Joiner transformation supports the following types of joins:

  • Normal Join
  • Master Outer Join
  • Detail Outer Join
  • Full Outer Join

Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.

Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.

Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.

Full outer join keeps all rows of data from both the master and detail sources.

 

Next ⇒ Informatica Transformations – Part 3

 

Informatica Transformations – Part 1

What are Informatica Transformations?

[Submitted by: Radhika, Michigan, US.]

In Informatica, Transformations help to transform the source data according to the requirements of target system and it ensures the quality of the data being loaded into target.

Transformations are of two types: Active and Passive.

Active Transformation:

An active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation.

Passive Transformation:

A passive transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation.

Transformations can be Connected or UnConnected.

Connected Transformation:

Connected transformation is connected to other transformations or directly to target table in the mapping.

UnConnected Transformation:

An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

Following are the list of Transformations available in Informatica:

  • Aggregator Transformation
  • Expression Transformation
  • Filter Transformation
  • Joiner Transformation
  • Lookup Transformation
  • Normalizer Transformation
  • Rank Transformation
  • Router Transformation
  • Sequence Generator Transformation
  • Stored Procedure Transformation
  • Sorter Transformation
  • Update Strategy Transformation
  • XML Source Qualifier Transformation
  • Advanced External Procedure Transformation
  • External Transformation

In the following pages, we will explain all the above Informatica Transformations and their significance in the ETL process in detail.

 

Next ⇒ Informatica Transformations – Part 2

Informatica ETL Interview Questions and Answers – Part 3

Informatica Interview Questions & Answers:

P. How DTM buffer size and buffer block size are related?

  • The number of buffer blocks in a session = DTM Buffer Size / Buffer Block Size. Default settings create enough buffer blocks for 83 sources and targets. If the session contains more than 83, you might need to increase DTM Buffer Size or decrease Default Buffer Block Size.
  • (total number of sources + total number of targets)* 2] = (session buffer blocks)
  • (session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block Size) * (number of partitions)

Q. Difference between shortcut and reusable transformation?

  • A shortcut is a reference (link) to an object in a shared folder, these are commonly used for sources and targets that are to be shared between different environments / or projects. A shortcut is created by assigning ‘Shared’ status to a folder within the Repository Manager and then dragging objects from this folder into another open folder; This provides a single point of control / reference for the object – multiple projects don’t all have import sources and targets into their local folders.
  • A reusable transformation is usually something that is kept local to a folder, examples would be the use of a reusable sequence generator for allocating warehouse Customer Id’s which would be useful if you were loading customer details from multiple source systems and allocating unique ids to each new source-key. Many mappings could use the same sequence and the sessions would all draw from the same continuous pool of sequence numbers generated.

R. What are the types of  groups in Router transformation?

  • Input Group.
  • Output Group.
  • The designer copies property information from the input ports of the input group to create a set of output ports for each output group.
  • User defined Group.
  • Default group.
  • One cannot modify or delete default groups.

S. What are the out put files that the Informatica server creates during the session running?

  • Informatica server log: Informatica server (on UNIX) creates a log for all status and error messages (default name:pm.server.log). It also creates an error log for error messages. These files will be created in Informatica home directory.
  • Session log file: Informatica server creates session log file for each session. It writes information about session into log files such as initialization process, creation of sql commands for reader and writer threads, errors encountered and load summary. The amount of detail in session log file depends on the tracing level that you set.
  • Session detail file: This file contains load statistics for each target in mapping. Session detail includes information such as table name, number of rows written or rejected. We can view this file by double clicking on the session in monitor window.
  • Performance detail file: This file contains information known as session performance details which helps you where performance can be improved. To generate this file select the performance detail option in the session property sheet.
  • Reject file: This file contains the rows of data that the writer does not write to targets.
  • Control file: Informatica server creates control file and a target file when you run a session that uses the external loader. The control file contains the information about the target flat file such as data format and loading instructions for the external loader.
  • Post session email: Post session email allows you to automatically communicate information about a session run to designated recipients. U can create two different
    messages. One if the session completed successfully the other if the session fails.
  • Indicator file: If u use the flat file as a target, you can configure the Informatica server to create indicator file. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete or reject.
  • Output file: If session writes to a target file, the Informatica server creates the target file based on file properties entered in the session property sheet.
  • Cache files: When the Informatica server creates memory cache it also creates cache files. For the following circumstances Informatica server creates index and data cache files.

 

Informatica ETL Interview Questions and Answers – Part 2

Informatica Interview Questions & Answers:

H. How to join 2 tables, without using any condition?

  • It is not possible to actually possible to join two tables without using any condition. If we would like to join two tables, then in source qualifier transformation add all the ports from two tables(if both tables are from same database source) and write down your join condition in user defined join option or otherwise write down join condition inside the sql query option with generated sql query.
  • OR simply use a dummy condition like 1=1.

I. Can we return two columns from an unconnected lookup?

  • Yes. Concatenate two columns. However return port will be one.

J. I have an aggregater in my mapping and no group by port on
any column and I am passing 100 rows through aggregater, so how many rows I will get as output from aggregater?

  • Last row only.

K. What is the diff between union, joiner and lookup?

  • Union – Can join two tables without common port.
  • Joiner– Can join any two heterogeneous sources, but common port is necessary.
  • Lookup – We can join two tables by using sql over-ride, apart from that we can check whether the row is already existing or not.

L. What are the similarities/differences between ROUTER and FILTER?

  • Similarity between Router and Filter is that both transformation would be able to discard data based on some filter condition.
  • The Differences are… 1.Using router transformation one would able to test incoming data for multiple filter condition and if none of the condition met the incoming rows gets rerouted to DEFAULT group.
  • 2.Using filter transformation, one would be able to pass data to next transformation based on single filter pipeline condition. There is no option of routing data that doesn’t match the condition like ROUTER Transformation.

M. What is a complex mapping?

A complex mapping  generally will have the following characteristics:

  • Difficult requirement
  • More no.of transformations
  • Having difficult business logic
  • May require combination of two or more methods/combinations
  • Complex business logic
  • More than 30 unconnected lookup

N. Flat file contains n number of records, we have to load the records in target from 51 to 100. How to do that in Informatica?

  • Use sequence generator to get row no. for each record, then use filter giving the condition(row no. greater than 50 and less than 100)

O. What are the limitations for using target load type “Bulk” in the session?

  • Update strategy
  • Less number of records
  • Target should be relational DB
  • No constraints in the target table

 

Next ⇒ Informatica Interview Questions Continued …Page 3

Informatica ETL Interview Questions and Answers – Part 1

Informatica Interview Questions & Answers:

A. What are the differences between Connected and Unconnected
lookup?

  • Connected lookup participates in the mapping(dataflow), just like any other transformation. Unconnected  lookup is used when a lookup function is used instead in an expression transformation in the mapping in which case the lookup does not appear in the main flow (dataflow) of the mapping. Connected lookup can return more than one value(output port) whereas an Unconnected lookup gives only one output port. Unconnected lookups are reusable.
  • Connected transformation is connected to other transformations or directly to target table in the mapping. An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

B. When do we use dynamic cache and static cache in connected and unconnected lookup transformations?

  • Dynamic cache used for updation of Master Table & SCD(Slowly Changing Dimensions) type 1.
  • Static used for Flatfile.

C. What is the tracing level?

  • Tracing level means that the amount of data storing in to the log files.
  • Tracing Levels are TERSE,VERBOSE,VERBOSE INITIALIZATION and NORMAL.
  • This is a kind of explanation when a session log is created….
    • Normal: It explains in a detailed manner
    • Verbose: It explains detailed explanation for each and every row

D. How many types of transformations supported by sorted input?

  • Aggregator Transformation,Joiner Transformation and Lookup Transformation support sorted input so that it will increase the session performance.

E. Name 4 output files that Informatica server creates during session running:

  • Session log
  • Workflow log
  • Errors log
  • Badfile

F. what is the difference between stop and abort?

  • Stop command immediately kills the reading process and doesn’t have any timeout period.
  • Abort command gives a time out period of 60 secs to the Informatica server to finish the dtm process else it kills the dtm process.

G. What is Update Override? What are the differences between SQL Override and Update Override?

  • Update Override – It is an option available in TARGET instance. By default, Target table is updated based on Primary key values. To update the Target table on non primary key values, we can generate the default Query and override the Query according to the requirement. Suppose for example, if we want to update the record in target table, when a column value=’AAA’ then, we can include this condition in Where clause of default Query.
  • Coming to SQL override – It is an option available in Source Qualifier and Lookup transformation where we can include joins, filters, Group by and Order by.

 

Next⇒ Informatica Interview Questions Continued …Page 2

1 2