Data Modeling Development Cycle

Data Modeling Development Cycle

Following are the important phases in the Data Model Development Life Cycle.

  1. Gathering Business Requirements
  2. Conceptual Data Modeling
  3. Logical Data Modeling
  4. Physical Data Modeling
  5. Development of the schema or the database
  6. Maintenance of the data model as per the changes.
1. Gathering Business Requirements – First Phase:

Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs.

2. Conceptual Data Modeling (CDM) – Second Phase:

This data model includes all major entities, relationships and it will not contain much detail
about attributes and is often used in the INITIAL PLANNING PHASE. Please refer the diagram below and follow the link to learn more about Conceptual Data Modeling Tutorial.

Conceptual Data Modeling - Example diagram:

3. Logical Data Modeling (LDM) – Third Phase:

This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of an organization. Please refer the diagram below and follow the link to learn more about Logical Data Modeling Tutorial.

In the example, we have identified the entity names, attribute names, and relationship. For detailed explanation, refer to relational data modeling.

4. Physical Data Modeling (PDM) – Fourth Phase:

This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database. Please refer the diagram below and follow the link to learn more about Physical Data Modeling Tutorial.

Example of Physical Data Mode

5. Database – Fifth Phase:

DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL code is executed in server to create databases.

 

What is Data Model Versioning?

Data Model Versioning | Concepts | Advantages | Uses:

Data Model Versioning is the process of assigning either unique version names or unique version numbers to different stages of a data model. Data Modeling tools may have this versioning facility in their software itself and versions of data models may be stored in their repository.

The following information just gives an idea about how to do versioning manually.

Example:

a) <Project Name>_<mmddyyyy>; Banking_01012010

b) <Project Name>_<Version Number>; Banking_v1

During the development cycle of the data model, SME’s (Subject Matter Experts) or Business Analysts will request the data modeling team to create a new subject area for a new line of business or modify the existing subject area.

In the initial stages of development of a data model, whenever a new subject area is added to the data model or changes done to the data model, immediately, data model changes will be sent to the “project team” by email. Data Models are stored in a shared network where “project team” will have privileges to view the data model and “data modelers” will have privileges to update the data model.

Practical Example:

To start with, A bank may have “savings account” as their line of business. Later it may add a different line of business “Credit Card”. To start with, data model will have only only subject area “Savings Account”. When credit card is added to the bank’s business, SMEs or business analyst will analyze it and they will send a new requirement to the data modeling team to add different entities in the logical data model. They may also send few changes(add attribute/delete attribute) in existing “Savings Account” data model. In order to keep track of these changes, we need versioning of the data model.

Versioning Example:

Assume that this data model work will be completed within 6 months starting from Jan 2010 and ending in June 2010.

In the shared network allocated for the project team, create a folder called “Data Modeling”. Under data modeling, create sub folders like “Jan 2010”, “Feb 2010”, “Mar 2010”, “Apr 2010”, “May 2010” and “June 2010”. The logic behind this is data model updates done in that particular month are stored under that month folder.

For Data Models, Start with version V1 in January and update it to V2 in February. Whatever other changes you do within that particular month, suffix “V1” or “V2” by .1, .2 etc.

  • Date: 1st January 2010: A new requirement to create “Savings Account” data model was given by SMEs or business analysts. Assume that the project name is “Banking”. Create a data model by name “Banking_v1” and add necessary entities in the data model. Save it under “Jan 2010” folder.
  • Date: 25th January 2010: Few changes have been sent by SMEs for “Savings Account” subject area. Save the existing data model as “Banking_V1.1”, update the changes and store it under “Jan 2010” folder. Now you have two versions of the data model.
  • Date: 25th February 2010: A new requirement about “Credit Card” was sent by SMEs. Save the the latest model “Banking_v1.1” as “Banking_v2” and apply the changes. Now you have three versions of data model. Store it under “Feb 2010” folder.
Advantages:
  • Data Model Changes can be tracked. Weekly or monthly changes can be sent to the project team by email.
  • Data Model can be compared with the data base and data models can be brought in SYNC with data base.
  • Changes can be easily rolled back (Removing the changes). If SMEs or business analysts are not sure, very often these roll backs will happen.
  • Reports can be generated from the data model and sent to the “documentation team”.
  • Clarity within the project team.
  • Some times the project team may be interested in a particular version of the data model. Its easier to send that particular version of the data model.
Interview Question:
  • How do you implement data model versioning?

Data Model Repository

What is a Data Model Repository? 

A data modeling repository is a storage area, where metadata of a data model is stored. The data stored is different from the software perspective, organization’s perspective and usage perspective. Repository can be stored any where; either in a data base or locally within any system.

Example: ETL Repository and Data Modeling Repository are different based on the software/usage perspective. In Data Modeling repository, meta data related to data
modeling is stored and in ETL repository, meta data related to ETL (Extraction, Transformation, and Loading) is stored. Organization will only store the meta data that they are interested.

From the data modeling perspective, data models and relevant meta data are stored in repository.

Whenever there are several data modelers working in an organization, and if they have to access the same data models concurrently, organizations buy repository. Otherwise, they store meta data in a shared network.

When data modeling software is bought with repository tool, system administrators install repository and share the username/password to the “Data Model Repository
Administrator”. This “Data Model Repository Administrator” has super privileges.

The administrator creates usernames and allocates privileges on data models for business analysts, SME, data modelers, Application developers (development/reporting), DBA’s, Business Users, Managers etc.

Examples of Privileges Allocated:
  • Creation and Updation of Logical Data Model. Based on the needs, privileges will be allocated on ALL or FEW or ONE data model present in the organization.
  • Creation and Updation of Physical Data Model Based on the needs, privileges will be allocated on ALL or FEW or ONE data model present in the organization.
  • Creation and Updation of Logical/Physical Data Model Based on the needs, privileges will be allocated on ALL or FEW or ONE data model present in the organization.
  • View Logical Data Model or Physical Data Model or both.
  • Creation and Updation of a particular database object (tables, views, indexes etc.)

All you have to know is how to login, log out, privileges allocated, different menus present in repository and how to work in repository, etc.

Uses of Repository:

  • Helps the data modelers to work on the same data model consistently/collaboratively and merge all work activities in the same data model itself.
  • Creating different Version of the data model to keep track of changes.
  • Generating Reports from the repository.
  • Applying security to data model.
  • Back Up and Recovery of the data models.

 

Data Modeling Reports

From Data Modeling tools, reports can be easily generated for technical and business needs. The reports that have been generated from logical data model and physical data model are called as business reports and technical reports respectively. Most of the data modeling tools provide default reports like subject area reports, entity reports, attribute reports, table reports, column reports, indexing reports, relationship reports etc. The advantage of these reports is, whether they are technical or non-technical, everybody would understand what is going on within the organization.

Other than default reports provided by data modeling tools, a data modeler can also create customized reports as per the needs of an organization. For example, if an expert asks of both logical and physical reports of a particular subject area in one file(e.g in .xls), logical and physical reports can be easily merged and reports can be easily generated accordingly. Data Modeling tools provide the facility of sorting, filtering options and the reports can be exported into file formats like .xls, .doc, .xml etc.

Logical Data Model Report:

Logical Data Model Report describes information about business such as the entity names, attribute names, definitions, business rules, mapping information etc.

Logical Data Model Report Example:

Logical Data Model Report Example

Physical Data Model Report:

Physical Data Model Report describes information such as the ownership of the database, physical characteristics of a database (in oracle, table space, extents, segments, blocks, partitions etc), performance tuning (processors, indexing), table name, column name, data type, relationship between the tables, constraints, abbreviations, derivation rules, glossary, data dictionary, etc., and is used by the technical team.

Physical Data Model Report Example:

Physical Data Model Report Example

 

Data Modeling Standards | Modeling Data

Data Modeling standardization has been in practice for many years and the following section highlight the needs and implementation of the data modeling standards.

Standardization Needs | Modeling data:

Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.

Nowadays, business to business transactions (B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definition would create a chaos across the business.

For example, when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization.

Table Names Standardization:

Giving a full name to the tables, will give an idea about data what it is about. Generally, do not abbreviate the table names; however this may differ according to organization’s standards. If the table name’s length exceeds the database standards, then try to abbreviate the table names. Some general guidelines are listed below that may be used as a prefix or suffix for the table.

Examples:

Lookup – LKP – Used for Code, Type tables by which a fact table can be directly accessed.
e.g. Credit Card Type Lookup – CREDIT_CARD_TYPE_LKP

Fact – FCT – Used for transaction tables:
e.g. Credit Card Fact – CREDIT_CARD_FCT

Cross Reference – XREF – Tables that resolves many to many relationships.
e.g. Credit Card Member XREF – CREDIT_CARD_MEMBER_XREF

History – HIST – Tables the stores history.
e.g. Credit Card Retired History – CREDIT_CARD_RETIRED_HIST

Statistics – STAT – Tables that store statistical information.
e.g. Credit Card Web Statistics – CREDIT_CARD_WEB_STAT

Column Names Standardization:

Some general guidelines are listed below that may be used as a prefix or suffix for the column.

Examples:

Key – Key System generated surrogate key.
e.g. Credit Card Key – CRDT_CARD_KEY

Identifier – ID – Character column that is used as an identifier.
e.g. Credit Card Identifier – CRDT_CARD_ID

Code – CD – Numeric or alphanumeric column that is used as an identifying attribute.
e.g. State Code – ST_CD

Description – DESC – Description for a code, identifier or a key.
e.g. State Description – ST_DESC

Indicator – IND – to denote indicator columns.
e.g. Gender Indicator – GNDR_IND

Database Parameters Standardization:

Some general guidelines are listed below that may be used for other physical parameters.

Examples:

Index – Index – IDX – for index names.
e.g. Credit Card Fact IDX01 – CRDT_CARD_FCT_IDX01

Primary Key – PK – for Primary key constraint names.
e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01

Alternate Keys – AK – for Alternate key names.
e.g. Credit Card Fact AK01 – CRDT_CARD_FCT_AK01

Foreign Keys – FK – for Foreign key constraint names.
e.g. Credit Card Fact FK01 – CRDT_CARD_FCT_FK01

Note:

If you are joining in a company as a data modeler, first speak with DBAs or the software configuration management team to know the standards.

Logical Data Modeler Role | Physical Data Modeler Role

Data Modeler Role:

Data Modelers fall into two major categories of Logical or Physical Data Modelers depending upon the role they play in developing Logical or Physical data models. Based on organization needs, data modelers only do logical data modeling or physical data modeling or combination of both. Nowadays, organizations prefer to hire data modelers, who can do logical as well as physical data modeling efficiently. Logical data modelers interact with stake holders, business analysts, smart management experts (SME) and developers.
Physical data modelers interact with logical data modelers and the database team.

For logical data modeling, please refer Logical Data Modeling Tutorial and for physical data modeling, please refer Physical Data Modeling Tutorial.

Logical Data Modeler Role:

Business Requirement Analysis:
  • Interact with Business Analysts to get the functional requirements.
  • Interact with end users and find out the reporting needs.
  • Conduct interviews, brain storming discussions with project team to get additional requirements.
  • Gather accurate data by data analysis and functional analysis.
Development of data model:
  • Create standard abbreviation document for logical, physical and dimensional data models.
  • Create logical, physical and dimensional data models (data warehouse data modelling).
  • Document logical, physical and dimensional data models (data warehouse data modelling).
Reports:
  • Generate reports from data model.
Review:
  • Review the data model with functional and technical team.
Support & Maintenance:
  • Assist developers, ETL, BI team and end users to understand the data model.
  • Maintain change log for each data model.

Physical Data Modeler Role:

  • Create sql code from data model and co-ordinate with DBAs to create development, testing, regression and production database.
  • Check to see data models and databases are in sync.
  • Adding data base objects (like indexes, partitions in oracle database) for performance.
  • Generating reports.

Note:

  • Based on the client’s requirements, the Logical/Physical data modeler may be asked to do PL/SQL coding, ETL process, data profiling, data analysis, DBA activities and evaluation of the software also.
  • In US, the Logical/Physical data modeler’s duty and responsibility may change due to demand/supply, regulations from USCIS/Department of Labor etc.

Steps to create Data Model

Steps to create and maintain a new data model from business requirements:

These are the general guidelines to create a standard data model and in real time, a data model may not be created in the same sequential manner as shown below. Based on the enterprise’s requirements, some of the steps may be excluded or included in addition to these.

Sometimes, data modeler may be asked to develop a data model based on the existing database. In that situation, the data modeler has to reverse engineer the database and create a data model.

Note:

As a data modeler, the intention is to create tables, columns, datatype, not null, primary keys, and foreign keys. If there is a need for sequence, views, triggers and other database parameters, the data modelers should not take a hasty decision by embedding all objects in the data model. Before creating the data model, it is a good practice to understand the database object creation process by discussing with DBAs or top-notch technical executives and take it forward.

Steps to create a Logical Data Model:

  1. Get Business requirements.
  2. Analyze Business requirements.
  3. Create High Level Conceptual Data Model. Get it approved.
  4. Create a new Logical Data Model. Add the following to the logical data model.
  5. Select target database where data modeling tool creates the scripts for physical schema.
  6. Create standard abbreviation document for naming logical and physical objects according to business/data modeling standard.
  7. Create domain.
  8. Create rule.
  9. Create default.
  10. Create Entity and add definitions.
  11. Create attribute and add definitions.
  12. Assign datatype to attribute. If a domain is already present then the domain should be attached to attribute.
  13. Add check constraint/rule or default to the columns (wherever it is required).
  14. Create primary or unique keys to attribute.
  15. Create unique index or bitmap index to attribute.
  16. Based on the analysis, create surrogate key columns.
  17. If required, create super types and sub types.
  18. Analyze the relationship between entities and Create foreign key relationship (one to many or many to many) between those entities.
  19. Create subject areas and add relevant entities to those subject areas.
  20. Align the objects in the main subject area and other subject areas.
  21. Validate the data model.
  22. Generate reports from the data model.
  23. Take a print out of the data model.
  24. Get it approved.

Steps to create a Physical Data Model:

  1. Create Physical Data Model from the existing logical data model.
  2. Add database properties to physical data model.
  3. Generate SQL Scripts from Physical Data Model. Tick or check the necessary parameters in the tool, create scripts and then forward that to DBA. (DBA will execute those scripts in database).
  4. Compare database and data model. Make sure everything is okay.
  5. Create a change log document for differences between the current version and previous version of the data model.

Maintenance of Data Models:

  • Maintain Logical & Physical Data Model.
  • For each release (version of the data model), try to compare the present version with the previous version of the data model. Similarly, try to compare the data model with the database to find out the differences.

 

Why to build or create a data model | Advantages of Data Modeling

Why to build or create a data model?

  • To avoid redundancy of data in a OLTP database.
  • In Data Warehousing, data from source systems can be transformed as per the rules and loaded into target tables.
  • In Data Warehousing, you can do data profiling by cleaning the data from source systems and load that into data warehouse columns. i.e. Same column from different source system may have different data structure and column name. In data warehouse, we can create a column as per standards and load the data.
  • In Data warehousing, several columns data help in predicting the future, which is a part of data mining.
  • In Data Warehousing or in Data Mart, you can drill down the data to a certain and you can get consolidated information. For example, with location dimension, You can group the data on a state level basis, county level basis, city level basis. With time dimension, you can drill down on a yearly basis or quarterly basis or on monthly basis.
  • A new application for OLTP (Online Transaction Processing), ODS (Operational Data Store),
    data warehouse and data marts.
  • Rewriting data models from existing systems that may need to change reports.
  • Incorrect data modeling in the existing systems.
  • A data base that has no data models.

Advantages and Importance of Data Model:

  • The goal of a data model is to make sure that all data objects provided by the functional team are completely and accurately represented.
  • Data model is detailed enough to be used by the technical team for building the physical database.
  • The information contained in the data model will be used to define the significance of business, relational tables, primary and foreign keys, stored procedures, and triggers.
  • Data Model can be used to communicate the business within and across businesses.

 

Data Modeling Concepts | What is Data Modeling | Data Modeling Overview

What is a Data Model?

A Data model is a conceptual representation of data structures (tables) required for a database and is very powerful in expressing and communicating the business requirements.

Data Modeling Overview:

A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database. A data model is comprised of two parts logical design and physical design. Data Models are created in either Top Down Approach or Bottom-Up Approach. In Top-Down Approach, data models are created by understanding and analyzing the business requirements. In Bottom Up Approach, data models are created from existing databases, which has no data models. IDEF1X is the common notation used in creating data models since it is more descriptive.

Data Modeling can be broadly classified into OLTP data modeling and OLAP data modeling. Data Models are created for OLTP databases by normalizing the data and for OLAP databases, data models are created by de-normalizing the data. Data Models cannot be frozen since update will happen on data modeling based on business requirements. Data Models looks like a blue print or like a map. Data Model is not an exact replica of the database and it will not contain all the objects or code present in the database since several objects are available in database and tonnes and tonnes of code would have been developed by developers. Usually Data Models contains the key database objects like tables, columns, relationships, constraints etc.

Data model helps functional and technical team in designing the database. Functional team normally refers to one or more Business Analysts, Business Managers, Subject Matter Experts, End Users etc., and Technical teams refers to one or more programmers, DBAs etc. Data modelers are responsible for designing the data model and they communicate with functional team to get the business requirements and technical teams to implement the database.

Data Modeling Concept:

The concept of data modeling can be better understood if we compare the development cycle of a data model to the construction of a house. For example Company ABC is planning to build a guest house (database) and it calls the building architect (data modeler) and projects its building requirements (business requirements). Building architect (data modeler) develops the plan (data model) and gives it to company ABC. Finally company ABC calls civil engineers (DBA) to construct the guest house (database).

Data Engineers, Data Modeler and Data Architect are the common titles for those who are involved in data modeling. To become an efficient data modeler, you should have an overview about the database objects, constraints, normalization and understanding the requirements correctly.

Before learning Data Modeling, one has to spend couple of hours to understand the basic concepts of SQL (Structured Query Language) like CREATE TABLE, ALTER TABLE, MODIFY TABLE, RELATIONSHIPS and SELECT commands etc.

Steps to learn Data Modeling:

  1. Fundamentals of SQL
  2. Conceptual Data Modeling
  3. Logical Data Modeling
  4. Physical Data Modeling
  5. Dimensional Data Modeling
  6. Maintenance of the data model.

If you are interested in Data Modeling Training, please reach Antony at Training@LearnDataModeling.com or 91-9080157239/91-98846 75745