Designing Star Schema

Star Schema: General Information

In general, an organization is started to earn money by selling a product or by providing service to the product. An organization may be at one place or may have several branches.

When we consider an example of an organization selling products throughout the world, the main four major dimensions are product, location, time and organization. Dimension tables have been explained in detail under the section Dimensions. With this example, we will try to provide detailed explanation about STAR SCHEMA.

What is Star Schema?

Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data.

Steps in designing Star Schema:

  • Identify a business process for analysis(like sales).
  • Identify measures or facts (sales dollar).
  • Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
  • List the columns that describe each dimension.(region name, branch name, region name).
  • Determine the lowest level of summary in a fact table(sales dollar).

Important aspects of Star Schema & Snow Flake Schema:

  • In a star schema every dimension will have a primary key.
  • In a star schema, a dimension table will not have any parent table.
  • Whereas in a snow flake schema, a dimension table will have one or more parent tables.
  • Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
  • Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.

Glossary:

Hierarchy: A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example, in a time dimension, a hierarchy might be used to aggregate data from the Month level to the Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals or not.

Level: A position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter, and Year levels.

Fact Table: A table in a star schema that contains facts and connected to dimensions. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.

A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation.

Example of Star Schema:

Star Schema Diagram

In the example sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. “Sales Dollar” in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below.

  • Sales Dollar value for a particular product.
  • Sales Dollar value for a product in a location.
  • Sales Dollar value for a product in a year within a location.
  • Sales Dollar value for a product in a year within a location sold or serviced by an employee

 

Data Warehouse frequently asked interview Questions and Answers

1. What is a data warehouse?

A data warehouse is a collection of integrated data from one or more sources, used for data analysis and reporting. Several years of data is stored in data warehouse. Data is static and not a transactional.

2. What is a data mart?

A data mart is a subset of data warehouse. Data mart gives a clear understanding of the small portion of a data warehouse. For viewing, analyzing, reporting, and documentation, data mart will be better.

3. What is the difference between a data warehouse and data mart?

Data Warehouse comprises of all subject areas, where data mart is focused on a specific subject area.

4. What is a Dimensional Data Model?

Dimensional Data Model contains one or more dimension tables and fact tables and is used for calculating the summarized data. Dimensional Data Model is used in data warehouse and data marts.

5. What is a dimension?

Dimension table is also called as lookup or reference table. The data (foreign keys) in the fact table refers to the data (primary key) in the dimension and is used for validation and calculation purpose.

6. What is a slowly changing dimension (SCD)? What are types of SCD?

Dimensions that change over time are called slowly changing dimensions. Type1, Type2 and Type3 are three types of slowly changing dimensions.

7. What is a star schema?

Star Schema is a database schema, which contains one or more dimensions and fact table representing multidimensional data. It is called as star schema because the relationship between the fact tables and dimensions looks like a star.

8. What is OLAP data modeling?

OLAP stands for ONLINE ANALYTICAL PROCESSING. The approach by which data models are constructed for analyzing data is called as OLAP data modeling. Example: Data Warehouse and Data Marts.

9. What is ETL?

ETL acronym stands for Extraction, Transformation and Loading. ETL is a process by which data stored in various sources are extracted, transformed and loaded into the Target Database.

10. What is a Fact table?

The centralized table in a star schema is called as Fact table. Fact table contains many columns referenced to dimension tables and standalone measure/fact columns. These facts or measure columns give useful and meaningful data based on some calculation.

11. What are the types of measure columns in a fact table?

Additive, Semi Additive and Non-Additive columns are three types of measure columns.

Additive means: Measures or facts that can be added across all columns.

Semi Additive means: Measures or fact that can be added across few columns.

Non Additive means: Measures that cannot be added across all dimensions.

12. What are the steps to create a Data Warehouse?

The various steps are: Analyzing the data from different sources, Data Modeling, creating databases, designing etl process, extracting data from various sources, transforming the data, loading the data into target data warehouse database/target Data Mart databases. From database, reports are generated as per the needs.