Dataware House Strcture
Dataware House Strcture
Model?
A multidimensional model views data in the form of a data-cube. A data cube
enables data to be modeled and viewed in multiple dimensions. It is defined
by dimensions and facts.
Consider the data of a shop for items sold per quarter in the city of Delhi. The
data is shown in the table. In this 2D representation, the sales for Delhi are
shown for the time dimension (organized in quarters) and the item dimension
(classified according to the types of an item sold). The fact or measure
displayed in rupee_sold (in thousands).
What is Data Cube?
When data is grouped or combined in multidimensional matrices called Data
Cubes. The data cube method has a few alternative names or a few variants,
such as "Multidimensional databases," "materialized views," and "OLAP (On-
Line Analytical Processing)."
A fact table might involve either detail level fact or fact that have been
aggregated (fact tables that include aggregated fact are often instead called
summary tables). A fact table generally contains facts with the same level of
aggregation.
Dimension Tables
A dimension is an architecture usually composed of one or more hierarchies
that categorize data. If a dimension has not got hierarchies and levels, it is
called a flat dimension or list. The primary keys of each of the dimensions
table are part of the composite primary keys of the fact table. Dimensional
attributes help to define the dimensional value. They are generally descriptive,
textual values. Dimensional tables are usually small in size than fact table.
Fact tables store data about sales while dimension tables data about the
geographic region (markets, cities), clients, products, times, channels.
The TIME table has a column for each day, month, quarter, and year. The ITEM
table has columns for each item_Key, item_name, brand, type, supplier_type.
The BRANCH table has columns for each branch_key, branch_name,
branch_type. The LOCATION table has columns of geographic data, including
street, city, state, and country.
In this scenario, the SALES table contains only four columns with IDs from the
dimension tables, TIME, ITEM, BRANCH, and LOCATION, instead of four
columns for time data, four columns for ITEM data, three columns for
BRANCH data, and four columns for LOCATION data. Thus, the size of the fact
table is significantly reduced. When we need to change an item, we need only
make a single change in the dimension table, instead of making many
changes in the fact table.
The snowflake schema is an expansion of the star schema where each point of
the star explodes into more points. It is called snowflake schema because the
diagram of snowflake schema resembles a snowflake. Snowflaking is a
method of normalizing the dimension tables in a STAR schemas. When we
normalize all the dimension tables entirely, the resultant structure resembles a
snowflake with the fact table in the middle.
The snowflake schema consists of one fact table which is linked to many
dimension tables, which can be linked to other dimension tables through a
many-to-one relationship. Tables in a snowflake schema are generally
normalized to the third normal form. Each dimension table performs exactly
one level in a hierarchy.
Example: Figure shows a snowflake schema with a Sales fact table, with Store,
Location, Time, Product, Line, and Family dimension tables. The Market
dimension has two dimension tables with Store as the primary dimension
table, and Location as the outrigger dimension table. The product dimension
has three dimension tables with Product as the primary dimension table, and
the Line and Family table are the outrigger dimension tables.
A star schema store all attributes for a dimension into one denormalized table.
This needed more disk space than a more normalized snowflake schema.
Snowflaking normalizes the dimension by moving attributes with low
cardinality into separate dimension tables that relate to the core dimension
table by using foreign keys. Snowflaking for the sole purpose of minimizing
disk space is not recommended, because it can adversely impact query
performance.
In snowflake, schema tables are normalized to delete redundancy. In
snowflake dimension tables are damaged into multiple dimension tables.
The STAR schema for sales, as shown above, contains only five tables, whereas
the normalized version now extends to eleven tables. We will notice that in the
snowflake schema, the attributes with low cardinality in each original
dimension tables are removed to form separate tables. These new tables are
connected back to the original dimension table through artificial keys.
A snowflake schema is designed for flexible querying across more complex
dimensions and relationship. It is suitable for many to many and one to many
relationships between dimension levels.