Informatica ETL Interview Questions and Answers – Part 2

Informatica Interview Questions & Answers:

H. How to join 2 tables, without using any condition?

  • It is not possible to actually possible to join two tables without using any condition. If we would like to join two tables, then in source qualifier transformation add all the ports from two tables(if both tables are from same database source) and write down your join condition in user defined join option or otherwise write down join condition inside the sql query option with generated sql query.
  • OR simply use a dummy condition like 1=1.

I. Can we return two columns from an unconnected lookup?

  • Yes. Concatenate two columns. However return port will be one.

J. I have an aggregater in my mapping and no group by port on
any column and I am passing 100 rows through aggregater, so how many rows I will get as output from aggregater?

  • Last row only.

K. What is the diff between union, joiner and lookup?

  • Union – Can join two tables without common port.
  • Joiner– Can join any two heterogeneous sources, but common port is necessary.
  • Lookup – We can join two tables by using sql over-ride, apart from that we can check whether the row is already existing or not.

L. What are the similarities/differences between ROUTER and FILTER?

  • Similarity between Router and Filter is that both transformation would be able to discard data based on some filter condition.
  • The Differences are… 1.Using router transformation one would able to test incoming data for multiple filter condition and if none of the condition met the incoming rows gets rerouted to DEFAULT group.
  • 2.Using filter transformation, one would be able to pass data to next transformation based on single filter pipeline condition. There is no option of routing data that doesn’t match the condition like ROUTER Transformation.

M. What is a complex mapping?

A complex mapping  generally will have the following characteristics:

  • Difficult requirement
  • More no.of transformations
  • Having difficult business logic
  • May require combination of two or more methods/combinations
  • Complex business logic
  • More than 30 unconnected lookup

N. Flat file contains n number of records, we have to load the records in target from 51 to 100. How to do that in Informatica?

  • Use sequence generator to get row no. for each record, then use filter giving the condition(row no. greater than 50 and less than 100)

O. What are the limitations for using target load type “Bulk” in the session?

  • Update strategy
  • Less number of records
  • Target should be relational DB
  • No constraints in the target table

 

Next ⇒ Informatica Interview Questions Continued …Page 3

Informatica ETL Interview Questions and Answers – Part 1

Informatica Interview Questions & Answers:

A. What are the differences between Connected and Unconnected
lookup?

  • Connected lookup participates in the mapping(dataflow), just like any other transformation. Unconnected  lookup is used when a lookup function is used instead in an expression transformation in the mapping in which case the lookup does not appear in the main flow (dataflow) of the mapping. Connected lookup can return more than one value(output port) whereas an Unconnected lookup gives only one output port. Unconnected lookups are reusable.
  • Connected transformation is connected to other transformations or directly to target table in the mapping. An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

B. When do we use dynamic cache and static cache in connected and unconnected lookup transformations?

  • Dynamic cache used for updation of Master Table & SCD(Slowly Changing Dimensions) type 1.
  • Static used for Flatfile.

C. What is the tracing level?

  • Tracing level means that the amount of data storing in to the log files.
  • Tracing Levels are TERSE,VERBOSE,VERBOSE INITIALIZATION and NORMAL.
  • This is a kind of explanation when a session log is created….
    • Normal: It explains in a detailed manner
    • Verbose: It explains detailed explanation for each and every row

D. How many types of transformations supported by sorted input?

  • Aggregator Transformation,Joiner Transformation and Lookup Transformation support sorted input so that it will increase the session performance.

E. Name 4 output files that Informatica server creates during session running:

  • Session log
  • Workflow log
  • Errors log
  • Badfile

F. what is the difference between stop and abort?

  • Stop command immediately kills the reading process and doesn’t have any timeout period.
  • Abort command gives a time out period of 60 secs to the Informatica server to finish the dtm process else it kills the dtm process.

G. What is Update Override? What are the differences between SQL Override and Update Override?

  • Update Override – It is an option available in TARGET instance. By default, Target table is updated based on Primary key values. To update the Target table on non primary key values, we can generate the default Query and override the Query according to the requirement. Suppose for example, if we want to update the record in target table, when a column value=’AAA’ then, we can include this condition in Where clause of default Query.
  • Coming to SQL override – It is an option available in Source Qualifier and Lookup transformation where we can include joins, filters, Group by and Order by.

 

Next⇒ Informatica Interview Questions Continued …Page 2

ETL Testing – Part 3

System Testing:

This environment integrates the various components and runs as a single unit. This should include sequences of events that enable those different components to run as a single unit and validate the data flow.

  • Verify all the required functionality in the validation environment.
  • Run end-to-end system test.
  • Record initialization and incremental load statistics.
  • Perform and mitigate performance of the entire system.
  • Verify error-handling processes are working as designed.
Prerequisites:
  • Finalized Implementation Checklist.
  • All integration testing complete.
  • Migration from the Test environment to the validation environment, as applicable.
  • Production configuration and data available.

Regression Testing:

Regression Testing is performed after the developer fixes a defect reported. This testing is to verify whether the identified defects are fixed and fixing of these defects does not introduce any other new defects in the system / application. This testing will also be performed when a Change Request is implemented on an existing production system. After the Change Request (CR) is approved, the testing team takes the impact analysis as input for designing the test cases for the CR.

Prerequisites:
  • Finalized Implementation Checklist.
  • All integration testing complete.

Performance Testing:

To determine the system performance under a particular workload / Service Level Agreement (SLA).Ensures system meets the performance criteria and it can detect bottle neck.

Types of Performance Testing are Load, Stress, Volume etc.

 

ETL Testing – Part 2

Test Estimation:

Effective software project estimation is one of the most challenging and important activities in the testing activities. However, it is the most essential part of a proper project planning and control. Under-estimating a project leads to under-staffing it, running the risk of low quality deliverables and resulting in loss of credibility as deadlines are missed. So it is imperative to do a proper estimation during the planning stage.

The basic steps in estimation include:
  • Estimating the size of the system to be tested.
  • Estimating the effort in person-hours (1 person hours – number of working hour in a day i.e. 8 hours)

After receiving the requirements the tester analyses the mappings that are created/modified and study about the changes made. Based on the impact analysis, the tester comes to know about how much time is needed for the whole testing process, which consists of mapping analysis, test case preparation, test execution, defect reporting, regression testing and final documentation. This calculated time is entered in the Estimation time sheet.

Integration Testing:

Integration testing is to verify the required functionality of a mapping (Single ETL / single session) in the environment specific to testing team (Test environment). This testing should ensure that correct numbers of rows (validated records) are transferred to the target from the source.

Integration testing is also used to verify initialization and incremental mappings (sessions) functionality along with the pre-session and post-session scripts for dependencies and the usage/consumption of relative indicator files to control dependencies across multiple work streams (modules). During integration testing error-handling processes, proper functionality of mapping variables and the appropriate business requirements can be validated.

Prerequisites:
  • Access to the required folders on the network.
  • Implementation Checklist for move from development to test.
  • All unit testing completed and summarized.
  • Data available in the test environment.
  • Migration to the test environment from the development environment.

Next ⇒ ETL Testing Part 3

ETL Testing – Part 1

ETL Testing:

Testing is an important phase in the project life-cycle. A structured well defined testing methodology involving comprehensive unit testing and system testing not only ensures smooth transition to the production environment but also a system without defects.

The testing phase can be broadly classified into the following categories:

  • Integration Testing
  • System Testing
  • Regression Testing
  • Performance Testing
  • Operational Qualification

Test Strategy:

A test strategy is an outline that describes the test plan. It is created to inform the project team the objective, high level scope of the testing process. This includes the testing objective, methods of testing, resources, estimated timelines, environment etc.

The test strategy is created based on high level design document.For each testing component test strategy needs to be created. based on this strategy testing process will be detailed out in the test plan.

Test Planning:

Test Planning is a key for successfully implementing the testing of a system. The deliverable is the actual “Test Plan”.  A software project test plan is a document that describes the Purpose, System Overview, Approach to Testing, Test Planning, Defect Tracking, Test Environment, Test prerequisites and References.

A key prerequisite for preparing a successful Test Plan is having approved (functional and non functional) requirements. Without the frozen requirements and specification the test plan will result in the lack of validation for the projects testing efforts.

The process of preparing a test plan is a useful way to get to know how testing of a particular system can be carried out within the given time line provided the test plan should be thorough enough.

The test plan outlines and defines the strategy and approach taken to perform end-to-end testing on a given project. The test plan describes the tasks, schedules, resources, and tools for integrating and testing the software application. It is intended for use by project personnel in understanding and carrying out prescribed test activities and in managing these activities through successful completion.

The test plan objectives are as follows:

  • To define a testing approach, scope, out of scope and methodology that encompasses integration testing, system testing, performance testing and regression testing in one plan for the business and project team.
  • To verify the functional and non functional requirements are met.
  • To coordinate resources, environments into an integrated schedule.
  • To provide a plan that outlines the contents of detailed test cases scenarios for each of the four phases of testing.
  • To determine a process for communicating issues resulting from the test phase.

The contents of a typical test plan consist of the following:

  • An introduction that includes a purpose, Definition & Acronym, Assumptions & Dependencies, In scope, Out of scope, Roles& Responsibilities and contacts. This information is obtained from the requirements specification.
  • System Overview will explain about the background and the system description.
  • A test approach for all testing levels includes test Objectives for each level, Test responsibilities, Levels of testing, various testing, Test coverage, Testing tools, Test data and Test stop criteria.
  • Test planning specifies Test schedule, Documentation deliverables, Test communication and Critical and High risk functions.

The test plan, thus, summarizes and consolidates information that is necessary for the efficient and effective conduct of testing.  Design Specification, Requirement Document and Project plan supporting the finalization of testing are located in separate documents and are referenced in the test plan.

Next ⇒ ETL Testing – Part 2  

Data Warehouse & ETL Tutorial

Data Warehouse & ETL Tutorial:

Data Warehouse is where data from different source systems are integrated, processed and stored. Data Warehouse data is a non-production data which is mainly used for analyzing and reporting purposes. Business users and higher management authorities use the data warehouse data to analyze, and to make important business decisions. Data Warehouse collects large volume of data from variant sources with many different data formats. The ETL(Extraction, Transformation and Loading) process handles these data and transforms it into a more consistent, standard formatted data. This ETL process is done with the help of an ETL tool and most widely used tools are Informatica, Data Stage, Ab Inito, Oracle Warehouse Builder etc.

This Data Warehouse and ETL tutorial section explains the data warehouse/ETL concepts, ETL tools and their usage with practical examples. Go through the links given below to read more.

 

ETL Concepts

ETL Concepts:

Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.

The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.

Note that ETT (extraction, transformation, transportation) and ETM (extraction, transformation, move) are sometimes used instead of ETL.

Glossary of ETL Terms (Reference:www.Oracle.com):

Source System:

A database, application, file, or other storage facility from which the data in a data warehouse is derived.

Mapping:

The definition of the relationship and data flow between source and target objects.

Metadata:

Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the
data warehouse. A repository contains metadata.

Staging Area:

A place where data is processed before entering the warehouse.

Cleansing:

The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

Transformation:

The process of manipulating data. Any manipulation beyond  copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

Transportation:

The process of moving copied or transformed data from a source to a data warehouse.

Target System:

A database, application, file, or other storage facility to which the “transformed source data” is loaded in a data warehouse.

Sample ETL Process Flow Diagram:

Sample ETL Process Flow Diagram

 

ETL Tools – What to Learn?

ETL Tools: What to Learn?

With the help of ETL tools, we can create powerful target Data Warehouses without much difficulty. Following are the various options that we have to know and learn in order to use ETL tools.

Software:

  • How to install ETL tool on server/client?

Working with an ETL Tool:

  • How to work with various options like designer, mapping, workflow, scheduling etc.,?
  • How to work with sources like DBMS, relational source databases, files, ERPs etc., and import the source definitions?
  • How to import data from data modeling tools, applications etc.,?
  • How to work with targets like DBMS, relational source databases, files, ERPs etc., and import the source definitions?
  • How to create target definitions?
  • How to create mappings between source definitions and target definitions?
  • How to create transformations?
  • How to cleanse the source data?
  • How to create a dimension, slowly changing dimensions, cube etc.,?
  • How to create and monitor workflows?
  • How to configure, monitor and run debugger?
  • How to view and generate metadata reports?

 

ETL Tools

What are ETL Tools?

ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.

These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debugging and loading into data warehouse when compared to the old method.

There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those.

Popular ETL Tools:

Tool NameCompany Name
InformaticaInformatica Corporation
DT/StudioEmbarcadero Technologies
DataStageIBM
Ab InitioAb Initio Software Corporation
Data JunctionPervasive Software
Oracle Warehouse BuilderOracle Corporation
Microsoft SQL Server IntegrationMicrosoft
TransformOnDemandSolonde
Transformation ManagerETL Solutions

 

1 2 3 4 5 12