Role of Metadata in ETL:
When you deal with a data warehouse, various phases like Business Process Modeling, Data Modeling, ETL, Reporting etc., are inter-related with each other and they do contain their own metadata. For example in ETL, it will be very difficult for one to extract, transform and load source data into a data warehouse, if there is no metadata available for the source like where and how to get the source data.
Let us explain the role of metadata in the ETL process with the help of an example table shown below which contains information about an organisation’s employees.
Employee Name | Employee Age | Employee Salary | Employee Title |
---|---|---|---|
John Hick | 36 | $3000 | Informatica Specialist |
In the above table, the second row, containing information like John Hick, 36, $3000, Informatica Specialist are known as Data. Whereas the first row, (i.e) table header containing headings like Employee Name, Employee Age, Employee Salary, Employee Title are called as Metadata for the above said data.
An organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc., for developing data models. Functional and technical team should have spent much time and effort in creating the data model’s data structures (tables, columns, data types, procedures, functions, triggers etc.,). By using metadata capturing tools, these data structures can be imported into metadata repository which we call it as metadata.
For example when you work with Informatica’s Metadata Exchange, it captures the metadata present in these tools and loads into the repository. There is no need for Informatica developer to create these data structures once again since metadata(data definitions) have been already captured and stored. Similarly most of the ETL tools have that capability to capture metadata from RDBMS, files, ERP, Applications etc.
In ETL, Metadata Repository is where all the metadata information about source, target, transformations, mapping, workflows, sessions etc., are stored.
From this repository, metadata can be manipulated, queried and retrieved with the help of wizards provided by metadata capturing tools. During the ETL process, when we are mapping source and target systems, we are actually mapping their metadata.
In any organization, a useful metadata often stored in a repository can be a handy resource to know about the organization’s information systems. Assume that each department in an organization may have a different business definitions, data types, attribute names for the same attribute or they may have a single business definition for many attributes. These anomalies can be overcome by properly maintaining metadata for these attributes in the centralized repository.
Thus metadata plays a vital role in explaining about how, why, where data can be found, retrieved, stored and used efficiently in an information management system.
Really Good Data