M 1.4 Multidimensional Data Model
M 1.4 Multidimensional Data Model
2
OLAP Server Architectures
3
From Tables and Spreadsheets to
Data Cubes
4
Data Cube
• In data warehousing literature, an n-D base
cube is called a base cuboid.
• The top most 0-D cuboid, which holds the
highest-level of summarization, is called the
apex cuboid.
• The lattice of cuboids forms a data cube
all
0-D (apex) cuboid
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
10
Conceptual Modeling of Data Warehouses
Different forms of multi dimensional data model are
⮚ star schema
⮚ snow flake schema
⮚ fast constellation schema.
branch_key
branch location
location_key
branch_key location_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
city
Measures avg_sales state_or_province
country
23
Advantage of Snowflake Schema
1.The primary advantage of the snowflake schema is
the development in query performance due to
minimized disk storage requirements and joining
smaller lookup tables.
2.It provides greater scalability in the interrelationship
between dimension levels and components.
3.No redundancy, so it is easier to maintain.
Disadvantage of Snowflake Schema
1.The primary disadvantage of the snowflake schema is
the additional maintenance efforts required due to the
increasing number of lookup tables. It is also known as
a multi fact star schema.
2.There are more complex queries and hence, difficult
to understand.
3.More
* tables moreDatajoin so
Mining: more
Concepts query execution time. 24
and Techniques
FACT CONSTELLATION
• Sophisticated application may require
multiple fact tables to share dimension
tables.
• Fact constellation can be viewed as a
collection of stars and hence called a galaxy
schema or a fact constellation
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
branch location
location_key
branch_key location_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
city
Measures avg_sales state_or_province
country
* Data Mining: Concepts and Techniques 30
SNOWFLAKE Schema Definition in DMQL
43
• Distributive: An aggregate function is distributive if it can be computed in
a distributed manner as follows. Suppose the data are partitioned
into n sets.
• We apply the function to each partition, resulting in n aggregate values.
• If the result derived by applying the function to the n aggregate values is
the same as that derived by applying the function to the entire data set
(without partitioning), the function can be computed in a distributed
manner.
• For example, sum() can be computed for a data cube by first partitioning
the cube into a set of subcubes, computing sum() for each subcube, and
then summing up the counts obtained for each subcube.
Office Day
Mont
h 50
Typical OLAP Operations
• Roll up (drill-up): summarize data
– by climbing up hierarchy or by dimension reduction
• Drill down (roll down): reverse of roll-up
– from higher level summary to lower level summary or detailed
data, or introducing new dimensions
• Slice and dice: project and select
• Pivot (rotate):
– reorient the cube, visualization, 3D to series of 2D planes
• Other operations
– drill across: involving (across) more than one fact table
– drill through: through the bottom level of the cube to its back-
end relational tables (using SQL)
51
Multidimensional Data
■ Sales volume as a function of product, month,
and region
Dimensions: Product, Location,
Time
Hierarchical summarization paths
Industry Region Year
Office Day
Mont
h 52
Roll-up
54
Drill-down
55
Drill-Down
• Initially the concept hierarchy was "day < month < quarter < year."
• It navigates the data from less detailed data to highly detailed data.
57
Slice
58
Slice
60
Dice
• Dice selects two or more dimensions
from a given cube and provides a
new sub-cube.
61
Dice
63
Pivot
• The pivot operation is also known as
rotation.
• It rotates the data axes in view in
order to provide an alternative
presentation of data.
64
Pivot
66
Summary
■ Data warehousing: A multi-dimensional model of a data warehouse
■ A data cube consists of dimensions & measures
■ Star schema, snowflake schema, fact constellations
67
A starnet Query model
• The querying of multi dimensional databases
can be based on a starnet model.
• A starnet model consist of radial lines
emanating from a central point.
• Each line represents concept hierarchy for each
dimension.
• Each abstraction level in the hierarchy is called
foot print.
70
71
ClassRoom Code
kon2vyd
72