Database: Sample Data Analysis

Database: Sample Data

With a simple example of designing a data model with ‘Employee related information’, most of the data structures can be easily understood. So we will try to design a data model using the sample data given below and will explain the data modeler’s involvement in the database environment.

Sample Source Data:

First Name Middle Name Last Name SSN Gender Birth Date Salary Dept. No Dept. Desc
Melinda J Schipper 765-43-21 Female 20-APR-1973 5000
Kevin A Schulte 123-45-67 Male 13-OCT-1969 5000 10 IT

Note: If you go through our topics provided under the Data Modelling Section, you will have a brief idea about how to design a data model.

Database: Sample Data Analysis

For the sample data provided in the above table, as a Data Modeler you have to design logical data model, physical data model and generate DDL scripts. In order to do the above tasks, you need to create the following:

  • Create 3 tables GENDER_LKP, DEPARTMENT_LKP, EMPLOYEE_DTL.
  • Assign correct data types to the columns.
  • Create constraints like primary key, unique key, null, not null, check to the columns.
  • Assign correct PHYSICAL names for tables and columns.
  • Select GENDER_LKP, DEPARTMENT_LKP as parent tables and EMPLOYEE_DTL as the child table and connect them.
  • Even though column ‘DTTM_STMP’ has not been present in the sample data, you have to add ‘DTTM_STMP’ to all tables to know the date and time on which a record is inserted or updated.

GENDER_LKP Analysis:

  • Since ‘male’ and ‘female’ values will be repeated for several records in EMPLOYEE_DTL table, you have to design this lookup for saving disk space. Sometimes the detail table may not show the exact business requirements also.
  • E.g. Since the example data contains both genders; you know that there are two genders. What would have happened if all records were male? By seeing the sample data, you might have come to a conclusion that there are only ‘males’ allowed as employees. By discussing with Business Analyst, you would come to know that both genders are allowed. For this purpose, you design a lookup table, which identifies the business requirements also.
  • You have to assign NOT NULL constraint for all columns since lookup tables in general should not have null value.
  • You have to create a column Gender Code, which is not present in sample data and this column should be assigned Primary Key to validate the detailed data in EMPLOYEE_DTL table.

Sample Source Data:

 

Entity Name Table

Name

Att

Name

Col

Name

Data

type

Null Option P Key F Key
Gender Lookup GENDER_LKP Gender Code GNDR_CD VARCHAR2(10) NOT NULL Yes No
Gender Lookup GENDER_LKP Gender Description GNDR_DESC VARCHAR2(50) NOT NULL No No
Gender Lookup GENDER_LKP DateTime Stamp DTTM_STMP DATE NOT NULL No No

Analysis of DEPARTMENT_LKP:

  • Since ‘Dept No’ and ‘Dept Desc’ values will be repeated for several records in EMPLOYEE_DTL table, you have to design this lookup for saving the disk space.
  • You have to assign NOT NULL constraint for all columns. Generally lookups should not have null value.
  • You have to assign ‘Dept No’ as the primary key to validate the detailed data in EMPLOYEE_DTL table.

Sample Source Data:

Entity Name Table

Name

Att

Name

Col

Name

Data

type

Null Optn P Key F Key
Department Lookup DEPART

MENT_LKP

Depart

ment Number

DPTMT_NUM NUMBER(2) NOT NULL Yes No
Department Lookup DEPART

MENT_LKP

Depart

ment Description

DPTMT_DESC VARCHAR2(50) NOT NULL No No
Department Lookup DEPART

MENT_LKP

DateTime Stamp DTTM_STMP DATE NOT NULL No No

Analysis of EMPLOYEE_DTL:

  • As of now, this example entity contains only sample of 2 records. In real time, this entity would store millions of records. In order to retrieve the data in a faster way, you have to create an additional column EMP_DTL_ID in EMPLOYEE_DTL and assign it as primary key.
  • In our example, column EMP_DTL_ID will be populated through sequence initiated by a trigger and you can see the sequence code and trigger code in later sections.
  • Even though SSN can be added as a primary key in EMPLOYEE_DTL table, you have to add a new column EMP_DTL_ID as the primary key to ensure the fastest retrieval of data.
  • Tip: Wherever it is required, you have to create new columns.

Sample Source Data:

Entity Name Table

Name

Att

Name

Col

Name

Data

type

Null Option P Key F Key
Employee Detail EMPLOYEE

_DTL

Employee Detail Identifier EMP_DTL_ID NUMBER NOT NULL Yes No
Employee Detail EMPLOYEE

_DTL

Department Number DPTMT_NUM NUMBER(2) NULL No Yes
Employee Detail EMPLOYEE

_DTL

Gender Code GNDR_CD VARCHAR2(10) NOT NULL No Yes
Employee Detail EMPLOYEE

_DTL

First Name FRST_NM VARCHAR2(30) NOT NULL No No
Employee Detail EMPLOYEE

_DTL

Last Name LST_NM VARCHAR2(30) NOT NULL No No
Employee Detail EMPLOYEE

_DTL

Middle Name MDLE_NM VARCHAR2(30) NULL No No
Employee Detail EMPLOYEE

_DTL

SSN SSN VARCHAR2(11) NOT NULL No No
Employee Detail EMPLOYEE

_DTL

Birth Date BRTH_DT DATE NOT NULL No No
Employee Detail EMPLOYEE

_DTL

Salary Amount SLRY_AMT NUMBER(7,2) NOT NULL No No
Employee Detail EMPLOYEE

_DTL

DateTime Stamp DTTM_STMP DATE NOT NULL No No

 

Leave a Reply

Your email address will not be published. Required fields are marked *