Data Warehouse Schemas
Data Warehouse Schemas
Warehousing
and Mining
Nishitha K C
Lecturer
Dept.of BCA
Table of Contents
• Data Warehouse Schemas
Star Schema
Snowflake Schema
Fact Constellation Schema
Data Warehouse Schemas
• Schema is a logical description of the entire database.
• It includes the name and description of records of all record
types including all associated data-items and aggregates.
• Much like a database, a data warehouse also requires to
maintain a schema.
• A database uses relational model, while a data warehouse
also uses different schemas based on the setup and data
which are maintained in a data warehouse.
• The three major types of Data Warehouse schemas
are:
i. Star Schema
ii. Snowflake Schema
iii. Fact Constellation Schema
Star Schema
• In a star schema, as the structure of a star, there is one fact
table in the middle and a number of associated dimension
tables.
• This structure resembles a star and hence it is known as a
star schema.
• Each dimension in a star schema is represented with only
one-dimension table.
• This dimension table contains the set of attributes.
• The fact table here consists of primary information in the
data warehouse.
• It surrounds the smaller dimension lookup tables which will
have details for different fact tables.
• The primary key which is present in each dimension is
related to a foreign key which is present in the fact table.
• The following diagram shows the sales data of a company
with respect to the four dimensions, namely time, item,
branch, and location.
• There is a fact table at the center.
• It contains the keys to each of four dimensions.
• The fact table also contains the attributes, namely dollars
sold and units sold.
Snowflake Schema
• Snowflake schema acts like an extended version of a star
schema. There are additional dimensions added to Star
schema.
• This schema is known as snowflake due to its structure.
• In this schema, the centralized fact table will be connected to
different multiple dimensions.
• The dimensions present are in normalized form from the
multiple related tables which are present.
• The snowflake structure is detailed and structured when
compared to star schema.
• The difference between star and snowflake schema is that
the dimensions of snowflake schema are maintained in such
a way that they reduce the redundancy of data.
• The tables are easy to manage and maintain.
• The dimension tables have been divided into segregated
normalized tables.
• Once they are segregated they are further joined with the
original dimension table which has a referential constraint.
• This schema may hamper the performance as the number of
tables that are required are more so that the joins are
satisfied.
• Unlike Star schema, the dimensions table in a snowflake schema are
normalized.
• For example, the item dimension table in star schema is normalized
and split into two dimension tables, namely item and supplier table.
• Now the item dimension table contains the attributes
item_key, item_name, type, brand, and supplier-key.
• The supplier key is linked to the supplier dimension table.
The supplier dimension table contains the attributes
supplier_key and supplier_type.
• Due to normalization in the Snowflake schema, the
redundancy is reduced and therefore, it becomes easy to
maintain and the save storage space.
Fact Constellation Schema
• A fact constellation can consist of multiple fact tables.
• These are more than two tables that share the same
dimension tables.
• Fact Constellation Schema also known as galaxy schema.
• It is viewed as a collection of stars and hence the name galaxy.
• The shared dimensions in this schema are known as
conformed dimensions.
• The dimensions in this schema are separated into segregated
dimensions which are having different levels of hierarchy.
• The dimensions created as large and built on the basis of
hierarchy.
• This schema is useful when aggregation of fact tables is
necessary.
• Fact constellations are considered to be more complex than
star and snowflake schemas.
• These are considered to be more flexible but hard to
implement and maintain.
• The following diagram shows two fact tables, namely sales
and shipping.
• The sales fact table is same as that in the star schema.
• The shipping fact table has the five dimensions, namely
item_key, time_key, shipper_key, from_location, to_location.
• The shipping fact table also contains two measures, namely
dollars sold and units sold.
• It is also possible to share dimension tables between fact
tables.
• For example, time, item, and location dimension tables are
shared between the sales and shipping fact table.
Thank
You