Fact Table

Fact Table | Star Schema:

The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.

In the example, “Sales Dollar” is a fact (measure) and  it can be added across several dimensions. Fact tables store different types of measures like additive, non additive and semi additive measures.

Measure Types:

  • Additive – Measures that can be added across all dimensions.
  • Non Additive – Measures that cannot be added across all dimensions.
  • Semi Additive – Measures that can be added across few dimensions and not with others.

In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called as Factless Fact tables.

Steps in designing Fact Table:

  • Identify a business process for analysis(like sales).
  • Identify measures or facts (sales dollar).
  • Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
  • List the columns that describe each dimension.(region name, branch name, region name).
  • Determine the lowest level of summary in a fact table(sales dollar).

Example of a Fact Table with an Additive Measure in Star Schema:

Example of a Fact Table with an Additive Measure in Star Schema

 

Designing Snowflake Schema

Snowflake Schema:

A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierarchies are broken into simpler tables.

In Star Schema example we had 4 dimensions like location, product, time, organization and a fact table (sales).

In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables and 1 fact table. The reason is that hierarchies(category, branch, state, and month) are being broken out of the dimension tables(PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively and shown separately. In OLAP, this Snowflake schema approach increases the number of joins and poor performance in retrieval of data. In few organizations, they try to normalize the dimension tables to save space. Since dimension tables hold less space, Snowflake schema approach may be avoided.

Example of Snowflake Schema:

Example of Snowflake Schema

 

Designing Star Schema

Star Schema: General Information

In general, an organization is started to earn money by selling a product or by providing service to the product. An organization may be at one place or may have several branches.

When we consider an example of an organization selling products throughout the world, the main four major dimensions are product, location, time and organization. Dimension tables have been explained in detail under the section Dimensions. With this example, we will try to provide detailed explanation about STAR SCHEMA.

What is Star Schema?

Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data.

Steps in designing Star Schema:

  • Identify a business process for analysis(like sales).
  • Identify measures or facts (sales dollar).
  • Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
  • List the columns that describe each dimension.(region name, branch name, region name).
  • Determine the lowest level of summary in a fact table(sales dollar).

Important aspects of Star Schema & Snow Flake Schema:

  • In a star schema every dimension will have a primary key.
  • In a star schema, a dimension table will not have any parent table.
  • Whereas in a snow flake schema, a dimension table will have one or more parent tables.
  • Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
  • Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.

Glossary:

Hierarchy: A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example, in a time dimension, a hierarchy might be used to aggregate data from the Month level to the Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals or not.

Level: A position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter, and Year levels.

Fact Table: A table in a star schema that contains facts and connected to dimensions. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.

A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation.

Example of Star Schema:

Star Schema Diagram

In the example sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. “Sales Dollar” in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below.

  • Sales Dollar value for a particular product.
  • Sales Dollar value for a product in a location.
  • Sales Dollar value for a product in a year within a location.
  • Sales Dollar value for a product in a year within a location sold or serviced by an employee

 

Slowly Changing Dimensions

What are Slowly Changing Dimensions?

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are  a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time.

Slowly Changing Dimensions are often categorized into three types namely Type1, Type2 and Type3. The following section deals with how to capture and handling these changes over time.

The “Product” table mentioned below contains a product named, Product1 with Product ID being the primary key. In the year 2004, the price of Product1 was $150 and over the time, Product1’s price changes from $150 to $350. With this information, let us explain the three types of Slowly Changing Dimensions.

Product Price in 2004:

Product ID(PK)YearProduct NameProduct Price
12004Product1$150

Type 1: Overwriting the old values.

In the year 2005, if the price of the product changes to $250, then the old values of the columns “Year” and “Product Price” have to be updated and replaced with the new values. In this Type 1, there is no way to find out the old value of the product “Product1” in year 2004 since the table now contains only the new price and year information.

Product:

Product ID(PK)YearProduct NameProduct Price
12005Product1$250

Type 2: Creating an another additional record.

In this Type 2, the old values will not be replaced but a new row containing the new values will be added to the product table. So at any point of time, the difference between the old values and new values can be retrieved and easily be compared. This would be very useful for reporting purposes.

Product:

Product ID(PK)YearProduct NameProduct Price
12004Product1$150
12005Product1$250

The problem with the above mentioned data structure is “Product ID” cannot store duplicate values of Product1  since “Product ID” is the primary key. Also, the current data structure doesn’t clearly specify the effective date and expiry date of Product1 like when the change to its price happened. So, it would be better to change the current data structure to overcome the above primary key violation.

Product:

Product ID(PK)Effective DateTime(PK)YearProduct NameProduct PriceExpiry DateTime
101-01-2004 12.00AM2004Product1$15012-31-2004 11.59PM
101-01-2005 12.00AM2005Product1$250

In the changed Product table’s Data structure, “Product ID” and “Effective DateTime” are composite primary keys. So there would be no violation of primary key constraint. Addition of new columns, “Effective DateTime” and “Expiry DateTime” provides the information about the product’s effective date and expiry date which adds more clarity and enhances the scope of this table. Type2 approach may need additional space in the data base, since for every changed record, an additional row has to be stored. Since dimensions are not that big in the real world, additional space is negligible.

Type 3: Creating new fields.

In this Type 3, the latest update to the changed values can be seen. Example mentioned below illustrates how to add new columns and keep track of the changes. From that, we are able to see the current price and the previous price of the product, Product1.

Product:

Product ID(PK)Current YearProduct NameCurrent Product PriceOld Product PriceOld Year
12005Product1$250$1502004

The problem with the Type 3 approach, is over years, if the product price continuously changes, then the complete history may not be stored, only the latest change will be stored. For example, in year 2006, if the product1’s price changes to $350, then we would not be able to see the complete history of 2004 prices, since the old values would have been updated with 2005 product information.

Product:

Product ID(PK)Current YearProduct NameCurrent Product PriceOld Product PriceOld Year
12006Product1$350$2502005

 

Time Dimension

In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not merged as a single table.  In a dimensional data modeling (star schema), these tables would be merged as a single table called TIME DIMENSION for performance and slicing data.

This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week.

Example of Time Dimension

Time Dimension Diagram

Year Lookup:

Year IDYear NumberDateTimeStamp
120041/1/2005 11:23:31 AM
220051/1/2005 11:23:31 AM

Quarter Lookup:

Quarter NumberQuarter NameDateTimeStamp
1Q11/1/2005 11:23:31 AM
2Q21/1/2005 11:23:31 AM
3Q31/1/2005 11:23:31 AM
4Q41/1/2005 11:23:31 AM

Month Lookup:

Month NumberMonth NameDateTimeStamp
1January1/1/2005 11:23:31 AM
2February1/1/2005 11:23:31 AM
3March1/1/2005 11:23:31 AM
4April1/1/2005 11:23:31 AM
5May1/1/2005 11:23:31 AM
6June1/1/2005 11:23:31 AM
7July1/1/2005 11:23:31 AM
8August1/1/2005 11:23:31 AM
9September1/1/2005 11:23:31 AM
10October1/1/2005 11:23:31 AM
11November1/1/2005 11:23:31 AM
12December1/1/2005 11:23:31 AM

Week Lookup:

Week NumberDay of WeekDateTimeStamp
1Sunday1/1/2005 11:23:31 AM
2Monday1/1/2005 11:23:31 AM
3Tuesday1/1/2005 11:23:31 AM
4Wednesday1/1/2005 11:23:31 AM
5Thursday1/1/2005 11:23:31 AM
6Friday1/1/2005 11:23:31 AM
7Saturday1/1/2005 11:23:31 AM

Time Dimension:

 

Time Dim IdYear NoDay Of YearQuarter NoMonth NoMonth NameMonth Day NoWeek NoDay of WeekCal DateDate Time Stamp
120041Q11January1151/1/20041/1/2005 11:23:31 AM
2200432Q12February1512/1/20041/1/2005 11:23:31 AM
320051Q11January1171/1/20051/1/2005 11:23:31 AM
4200532Q12February1532/1/2001/1/2005 11:23:31 AM

 

 

Organization Dimension

In a relational data model, for normalization purposes, corporate office lookup, region lookup, branch lookup, and employee lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called ORGANIZATION DIMENSION for performance and slicing data.

This dimension helps us to find the products sold or serviced within the organization by the employees. In any industry, we can calculate the sales on region basis, branch basis and employee basis. Based on the performance, an organization can provide incentives to employees and subsidies to the branches to increase further sales.

Example of Organization Dimension:

Example of Organization Dimension

Corporate Lookup:

Corporate CodeCorporate NameDateTimeStamp
COAmerican Bank1/1/2005 11:23:31 AM

Region Lookup:

Region CodeRegion NameDateTimeStamp
SESouth East1/1/2005 11:23:31 AM
MWMid West1/1/2005 11:23:31 AM

Branch Lookup:

Branch CodeBranch NameDateTimeStamp
FLTMFlorida-Tampa1/1/2005 11:23:31 AM
ILCHIllinois-Chicago1/1/2005 11:23:31 AM

Employee Lookup:

Employee CodeEmployee NameDateTimeStamp
E1Paul Young1/1/2005 11:23:31 AM
E2Chris Davis1/1/2005 11:23:31 AM

Organization Dimension:

Organization Dimension IdCorporate NameRegion NameBranch NameEmployee NameDateTimeStamp
1American BankSouth EastFlorida-TampaPaul Young1/1/2005 11:23:31 AM
2American BankMid WestIllinois-ChicagoChris Davis1/1/2005 11:23:31 AM

 

Next⇒ Time Dimension

 

Product Dimension

In a relational data model, for normalization purposes, product category lookup, product sub-category lookup, product lookup, and and product feature lookups are are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called PRODUCT DIMENSION for performance and slicing data requirements.

Example of Product Dimension:

Product Dimension Example Diagram

Product Category Lookup:

Product Category CodeProduct Category NameDateTimeStamp
1Apparel1/1/2005 11:23:31 AM
2Shoe1/1/2005 11:23:31 AM

Product Sub-Category Lookup:

Product Sub-Category CodeProduct Sub-Category NameDateTimeStamp
11Shirt1/1/2005 11:23:31 AM
12Trouser1/1/2005 11:23:31 AM
13Casual1/1/2005 11:23:31 AM
14Formal1/1/2005 11:23:31 AM

Product Lookup:

Product CodeProduct NameDateTimeStamp
1001Van Heusen1/1/2005 11:23:31 AM
1002Arrow1/1/2005 11:23:31 AM
1003Nike1/1/2005 11:23:31 AM
1004Adidas1/1/2005 11:23:31 AM

Product Feature Lookup:

Product Feature CodeProduct Feature DescriptionDateTimeStamp
10001Van-M1/1/2005 11:23:31 AM
10002Van-L1/1/2005 11:23:31 AM
10003Arr-XL1/1/2005 11:23:31 AM
10004Arr-XXL1/1/2005 11:23:31 AM
10005Nike-81/1/2005 11:23:31 AM
10006Nike-91/1/2005 11:23:31 AM
10007Adidas-101/1/2005 11:23:31 AM
10008Adidas-111/1/2005 11:23:31 AM

Product Dimension:

Product Dimension IdProduct Category NameProduct Sub-Category NameProduct NameProduct Feature DescDateTimeStamp
100001ApparelShirtVan HeusenVan-M1/1/2005 11:23:31 AM
100002ApparelShirtVan HeusenVan-L1/1/2005 11:23:31 AM
100003ApparelShirtArrowArr-XL1/1/2005 11:23:31 AM
100004ApparelShirtArrowArr-XXL1/1/2005 11:23:31 AM
100005ShoeCasualNikeNike-81/1/2005 11:23:31 AM
100006ShoeCasualNikeNike-91/1/2005 11:23:31 AM
100007ShoeCasualAdidasAdidas-101/1/2005 11:23:31 AM
100008ShoeCasualAdidasAdidas-111/1/2005 11:23:31 AM

 

Next⇒ Organization Dimension

 

Dimension Tables

Dimension Table:

Dimension table is one that describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.

Location Dimension:

In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This location dimension helps to compare the sales in one region with another region. We may see good sales profit in one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our marketing strategy etc.

Example of Location Dimension:

Example of Location Dimension

 

Country Lookup:

Country CodeCountry NameDateTimeStamp
USAUnited States Of America1/1/2005 11:23:31 AM

State Lookup:

State CodeState NameDateTimeStamp
NYNew York1/1/2005 11:23:31 AM
FLFlorida1/1/2005 11:23:31 AM
CACalifornia1/1/2005 11:23:31 AM
NJNew Jersey1/1/2005 11:23:31 AM

County Lookup:

County CodeCounty NameDateTimeStamp
NYSHShelby1/1/2005 11:23:31 AM
FLJEJefferson1/1/2005 11:23:31 AM
CAMOMontgomery1/1/2005 11:23:31 AM
NJHUHudson1/1/2005 11:23:31 AM

City Lookup:

City CodeCity NameDateTimeStamp
NYSHMAManhattan1/1/2005 11:23:31 AM
FLJEPCPanama City1/1/2005 11:23:31 AM
CAMOSHSan Hose1/1/2005 11:23:31 AM
NJHUJCJersey City1/1/2005 11:23:31 AM

Location Dimension:

Location Dimension IdCountry NameState NameCounty NameCity NameDateTimeStamp
1USANew YorkShelbyManhattan1/1/2005 11:23:31 AM
2USAFloridaJeffersonPanama City1/1/2005 11:23:31 AM
3USACaliforniaMontgomerySan Hose1/1/2005 11:23:31 AM
4USANew JerseyHudsonJersey City1/1/2005 11:23:31 AM

 

Next⇒ Product Dimension

Identifying Relationship

How to create Identifying Relationship with Cardinality of many to many relationship:

Identifying relationship:

If a foreign key column/multiple foreign key columns referencing some table, becomes a part of the primary key, and then it is an identifying relationship.

Employee and PROJECTS Table association:

Since the relationship is many to many with Employee_No (Employee Table) and Project_Code (Project table), we go with Identifying Relationship. In Identifying relationship, primary key migrates from the parent table to child table and it is a part of the primary key in Child  table. Foreign Key relationship is a thicker line.

  • Create table EMPLOYEE_PROJECT with no column.
  • Click identifying relationship in the tool bar.
  • Click Employee table first and then EMPLOYEE_PROJECT table. The line is not dotted and thicker. Employee_No column is created in EMPLOYEE_PROJECT table.
  • Similarly click identifying relationship in the tool bar. Click Project table first and then EMPLOYEE_PROJECT table. The line is not dotted and thicker. Project_Code column is created as primary key in EMPLOYEE_PROJECT table.

The example above talks about many to many relationships and it involves two tables Employee and Projects to form a third table Employee_Projects.

In some scenarios, if a primary key of the table is not able to identify uniqueness of a record in a table, then they add one more column from another table as a primary key with identifying relationship.

Save the data model with a meaningful name and save it either in your local directory or in a shared directory.

This data model is stored as “Data_Modeling_Relationship.txl”. On the tool bar, you can see the save menu.

Finally an image of the completed data model “Data_Modeling_Relationship” looks like this.

Employee_Projects_Datamodel

 

Non – Identifying Relationship

What is Non-Identifying relationship?

If a column (foreign key) / multiple columns (foreign key columns) referencing some table becomes a column in a table as a NON-PRIMARY KEY, and then it is non-identifying relationship.

How to create non-identifying relationship with cardinality of 1 to many relationship in Employee Table?

Example 1: Employee Table with Department Table:

Since the relationship is one to many with Department_No (Department Table) and Department_No (Employee table), we go with Non-Identifying Relationship. In Non-Identifying relationship, primary key migrates from the Department table to Employee table and it is not a part of the primary key in Employee table. Here Department_No becomes the foreign key column and generally Foreign Key relationship is represented by a dotted line.

  • Click Non-Identifying relationship in the toolbar.
  • Click first on Department table; you can see a dotted line and then click on Employee table. Now you can see a relationship(dotted line) created between the Department Table and Employee Table.
  • Double Click the relationship. Change the relationship name as required (EMPLOYEE_FK01).
  • Under section CARDINALITY, check Mandatory Parent and Mandatory Child, since NULL values are not allowed in both the primary key and foreign key columns. Now the cardinality is 1..1 to 1..n.
  • Cardinality describes the relationship between a parent table and a child table. When you edit properties of the relationship line, you will be requested check(enable) the check boxes for mandatory parent column, and mandatory child. We have explained various permutations and combinations below.

Parent TableChild TableMandatory ParentMandatory ChildCardinality Relationship
Department.Department_NoEmployee.Department_NoYesYes1..1 to 1..n
Department.Department_NoEmployee.Department_NoYesNo1..1 to 0..n
Department.Department_NoEmployee.Department_NoNoYes0..1 to 1..n
Department.Department_NoEmployee.Department_NoNoNo0..1 to 0..n

  • If you want to change the default foreign key index name, then change the name (EMPLOYEE_IDX01) in “Index to Foreign Key” tab. You have to double click the relationship line and there is a tab named “Index to Foreign Key”.
  • Click apply and ok. You can see how tables are connected and a new column Department_No is included in the Employee table.

Example 2: Employee Table with Title Table:

Repeat the above same procedure and you can see how Title table and Employee table are connected. You can see a new column Title_Code is included in the Employee Table.

How to create SELF REFERENTIAL OR RECURSIVE RELATIONSHIP in Employee Table:

When a Foreign key references the parent key(Primary key) of the same table, then it is called a SELF REFERENTIAL OR RECURSIVE RELATIONSHIP. The column Manager_No in the Employee table is a Foreign key which takes its values from the column Employee_No of the same Employee table.

This relationship is created by non-identifying relationship with cardinality 1 to many relationship in Employee Table:

  • Click Non-Identifying relationship in the tool bar.
  • Click the employee table twice. You can see the relationship created and a new column “FK_EMPLOYEE_NAME” appearing within the table. Go to the properties of that column and change the name to “MANAGER_NO”. So this column ‘Manager_No” will reference the values from its primary key column ‘Employee_No”.

 

1 2 3 6