Data Warehouse & ETL Tutorial

Data Warehouse & ETL Tutorial:

Data Warehouse is where data from different source systems are integrated, processed and stored. Data Warehouse data is a non-production data which is mainly used for analyzing and reporting purposes. Business users and higher management authorities use the data warehouse data to analyze, and to make important business decisions. Data Warehouse collects large volume of data from variant sources with many different data formats. The ETL(Extraction, Transformation and Loading) process handles these data and transforms it into a more consistent, standard formatted data. This ETL process is done with the help of an ETL tool and most widely used tools are Informatica, Data Stage, Ab Inito, Oracle Warehouse Builder etc.

This Data Warehouse and ETL tutorial section explains the data warehouse/ETL concepts, ETL tools and their usage with practical examples. Go through the links given below to read more.


Data Warehouse & ETL:



 

ETL Concepts

ETL Concepts:

Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.

The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.

Note that ETT (Extraction, Transformation, Transportation) and ETM (Extraction, Transformation, Move) are sometimes used instead of ETL.

Data comes in different forms – Structured format or UnStructured format – with huge volumes – in high velocity.

Data in Structured Format:

  • Example: .csv files, .xls files, .dat files, .xml files etc.
  • Structured data is stored for several years within the Organization’s high end servers.
  • The need for the Data Analysis (Slicing and Dicing of the Data) of these Structured Data Formats led to the Evolution of ETL.
In the earlier times, data from different source systems like .csv, .xls, .dat, .xml etc., were  loaded to the target database by traditional programming methods. But the same data can be loaded very quickly by using ETL tools when compared to the old methods by making use of different components present in the ETL tools.
From the ETL perspective, an ETL developer has to know how to identify/connect to the source systems and do the transformations and load the data into target database.
Extraction and Loading process is easy when compared to the Transformation process  since Transformation has to see the different anomalies present in the data entry.

Common Anomalies found in the Source Systems:

1. Name: Combination of upper character and lower characters. No predefined formats.
Example: kEvin CURtis, STEven CURtis
2. Indicator Columns: Yes, No, Y, N
3. Gender Columns: M, F, Male, Female
4. Format Columns: xxx-xx-xxxx, xx-xxx-xxxx
5. Currency Columns: $1234, $1234.00, $1234.50
6. Date Columns: MM/DD/YYYY, DD/MM/YYYY oformat
The above mentioned problems are resolved by the ETL developer using the ETL tools and the right data is loaded into target system.

Sample ETL Process Flow Diagram:

Sample ETL Process Flow Diagram

Glossary of ETL Terms (Reference:www.Oracle.com):

Source System:

A database, application, file, or other storage facility from which the data in a data warehouse is derived.

Mapping:

The definition of the relationship and data flow between source and target objects.

Metadata:

Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the
data warehouse. A repository contains metadata.

Staging Area:

A place where data is processed before entering the warehouse.

Cleansing:

The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

Transformation:

The process of manipulating data. Any manipulation beyond  copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

Transportation:

The process of moving copied or transformed data from a source to a data warehouse.

Target System:

A database, application, file, or other storage facility to which the “transformed source data” is loaded in a data warehouse.

 

Reach US!!!

 

  • We provide online training in advanced OLTP Data Modeling and DIMENSIONAL Data Modeling.
  • We also teach the data structures with Data Analytics Software “R”.
  • We provide online Data Modeling Project Support when you get struck with projects that you are involved.
  • We can analyse your Business Requirements, understand and suggest solutions to create OLTP Data Models and Dimensional Data models.
  • We provide Data Modeling Interview Preparation Sessions with a lot of Data Modeling Interview Questions/Answers, which will help you to clear any interview.

 

If you are interested, please reach us at Training@LearnDataModeling.com or 91-9080157239

 

ETL Tools – What to Learn?

ETL Tools: What to Learn?

With the help of ETL tools, we can create powerful target Data Warehouses without much difficulty. Following are the various options that we have to know and learn in order to use ETL tools.

Software:

  • How to install ETL tool on server/client?

Working with an ETL Tool:

  • How to work with various options like designer, mapping, workflow, scheduling etc.,?
  • How to work with sources like DBMS, relational source databases, files, ERPs etc., and import the source definitions?
  • How to import data from data modeling tools, applications etc.,?
  • How to work with targets like DBMS, relational source databases, files, ERPs etc., and import the source definitions?
  • How to create target definitions?
  • How to create mappings between source definitions and target definitions?
  • How to create transformations?
  • How to cleanse the source data?
  • How to create a dimension, slowly changing dimensions, cube etc.,?
  • How to create and monitor workflows?
  • How to configure, monitor and run debugger?
  • How to view and generate metadata reports?

 

ETL Tools

What are ETL Tools?

ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.

These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debugging and loading into data warehouse when compared to the old method.

There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those.

Popular ETL Tools:

[ultimatetables 82 /]