Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.
The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.
Note that ETT (Extraction, Transformation, Transportation) and ETM (Extraction, Transformation, Move) are sometimes used instead of ETL.
Data comes in different forms – Structured format or UnStructured format – with huge volumes – in high velocity.
Data in Structured Format:
- Example: .csv files, .xls files, .dat files, .xml files etc.
- Structured data is stored for several years within the Organization’s high end servers.
- The need for the Data Analysis (Slicing and Dicing of the Data) of these Structured Data Formats led to the Evolution of ETL.
In the earlier times, data from different source systems like .csv, .xls, .dat, .xml etc., were loaded to the target database by traditional programming methods. But the same data can be loaded very quickly by using ETL tools when compared to the old methods by making use of different components present in the ETL tools.
From the ETL perspective, an ETL developer has to know how to identify/connect to the source systems and do the transformations and load the data into target database.
Extraction and Loading process is easy when compared to the Transformation process since Transformation has to see the different anomalies present in the data entry.
Common Anomalies found in the Source Systems:
1. Name: Combination of upper character and lower characters. No predefined formats.
Example: kEvin CURtis, STEven CURtis
2. Indicator Columns: Yes, No, Y, N
3. Gender Columns: M, F, Male, Female
4. Format Columns: xxx-xx-xxxx, xx-xxx-xxxx
5. Currency Columns: $1234, $1234.00, $1234.50
6. Date Columns: MM/DD/YYYY, DD/MM/YYYY oformat
The above mentioned problems are resolved by the ETL developer using the ETL tools and the right data is loaded into target system.
Sample ETL Process Flow Diagram:
Glossary of ETL Terms (Reference:www.Oracle.com):
A database, application, file, or other storage facility from which the data in a data warehouse is derived.
The definition of the relationship and data flow between source and target objects.
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the
data warehouse. A repository contains metadata.
A place where data is processed before entering the warehouse.
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.
The process of moving copied or transformed data from a source to a data warehouse.
A database, application, file, or other storage facility to which the “transformed source data” is loaded in a data warehouse.
- We provide online training in advanced OLTP Data Modeling and DIMENSIONAL Data Modeling.
- We also teach the data structures with Data Analytics Software “R”.
- We provide online Data Modeling Project Support when you get struck with projects that you are involved.
- We can analyse your Business Requirements, understand and suggest solutions to create OLTP Data Models and Dimensional Data models.
- We provide Data Modeling Interview Preparation Sessions with a lot of Data Modeling Interview Questions/Answers, which will help you to clear any interview.
If you are interested, please reach us at Training@LearnDataModeling.com or 91-9080157239