0% found this document useful (0 votes)
88 views72 pages

M 1.4 Multidimensional Data Model

The document discusses multidimensional data models used in data warehousing, focusing on data cube structures and various schemas like star, snowflake, and fact constellation. It outlines the applications of data warehouses, including information processing, analytical processing, and data mining, and explains the architecture of OLAP servers. Additionally, it details the design of a star schema for a retail sales data warehouse, highlighting the fact and dimension tables along with their attributes.

Uploaded by

anaghamelayil1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views72 pages

M 1.4 Multidimensional Data Model

The document discusses multidimensional data models used in data warehousing, focusing on data cube structures and various schemas like star, snowflake, and fact constellation. It outlines the applications of data warehouses, including information processing, analytical processing, and data mining, and explains the architecture of OLAP servers. Additionally, it details the design of a star schema for a retail sales data warehouse, highlighting the fact and dimension tables along with their attributes.

Uploaded by

anaghamelayil1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

A Multidimensional Data Model

* Data Mining: Concepts and Techniques 1


Data Warehouse Usage
■ Three kinds of data warehouse applications
■ Information processing
■ supports querying, basic statistical analysis, and reporting
using crosstabs, tables, charts and graphs
■ Analytical processing
■ multidimensional analysis of data warehouse data
■ supports basic OLAP operations, slice-dice, drilling, pivoting
■ Data mining
■ knowledge discovery from hidden patterns
■ supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools

2
OLAP Server Architectures

■ Relational OLAP (ROLAP)


■ Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware
■ Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and services
■ Greater scalability
■ Multidimensional OLAP (MOLAP)
■ Sparse array-based multidimensional storage engine

■ Fast indexing to pre-computed summarized data


■ Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
■ Flexibility, e.g., low level: relational, high-level: array

■ Specialized SQL servers (e.g., Redbricks)


■ Specialized support for SQL queries over star/snowflake schemas

3
From Tables and Spreadsheets to
Data Cubes

• A data warehouse is based on a multidimensional data model which views


data in the form of a data cube
• A data cube allows data to be modeled and viewed in multiple dimensions
– Dimension tables:
A Table associated to each dimension such as item (item_name, brand,
type), or time(day, week, month, quarter, year)
– Fact table contains measures (such as dollars_sold) and keys to each of
the related dimension tables

4
Data Cube
• In data warehousing literature, an n-D base
cube is called a base cuboid.
• The top most 0-D cuboid, which holds the
highest-level of summarization, is called the
apex cuboid.
• The lattice of cuboids forms a data cube

* Data Mining: Concepts and Techniques 5


A 2-D view of sales data according to the
dimensions time and item, where the sales are from branches
located in the city of Vancouver. The measure displayed
is dollars sold (in thousands).

* Data Mining: Concepts and Techniques 6


A 3-D view of sales data, according to the dimensions time, item, and location. The measure displayed is dollars sold (in thousands).
A 4-D data cube representation of sales data, according to the
dimensions time, item, location, and supplier. The measure
displayed is dollars sold (in thousands). For improved
readability, only some of the cube values are shown.

* Data Mining: Concepts and Techniques 9


Cube: A Lattice of Cuboids

all
0-D (apex) cuboid

time item location supplier


1-D cuboids

time,location item,location location,supplier


time,item 2-D cuboids
time,supplier item,supplier

time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier

4-D (base) cuboid


time, item, location, supplier

10
Conceptual Modeling of Data Warehouses
Different forms of multi dimensional data model are
⮚ star schema
⮚ snow flake schema
⮚ fast constellation schema.

* Data Mining: Concepts and Techniques 11


Conceptual Modeling of Data Warehouses

• Modeling data warehouses: dimensions & measures


– Star schema: A fact table in the middle connected to a set
of dimension tables
– Snowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflake
– Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called
galaxy schema or fact constellation
12
STAR SCHEMA
• In star schema the data warehouse contains a
large central table called fact table and a set
of smaller attendant tables called dimension
tables one for each dimension.
• Fact table contains bulk of table with no
redundancy
• The star schema resembles a starburst with
the dimension tables displayed in a radial
pattern around the central fact table.

* Data Mining: Concepts and Techniques 13


* Data Mining: Concepts and Techniques 14
Fact Tables
⮚A table in a star schema which contains
facts and connected to dimensions.
⮚ A fact table has two types of columns:
those that include fact and those that are
foreign keys to the dimension table.
⮚The primary key of the fact tables is
generally a composite key that is made
up of all of its foreign keys.

* Data Mining: Concepts and Techniques 15


Dimension Tables

⮚A dimension is an architecture usually


composed of one or more hierarchies that
categorize data.
⮚If a dimension has not got hierarchies and
levels, it is called a flat dimension or list.
⮚The primary keys of each of the dimensions
table are part of the composite primary keys of
the fact table.
⮚Dimensional attributes help to define the
dimensional value.

* Data Mining: Concepts and Techniques 16


Characteristics of Star Schema

The star schema is intensely suitable for data


warehouse database design because of the
following features:

⮚It creates a de-normalized database that can


quickly provide query responses.
⮚It provides a flexible design that can be
changed easily or added to throughout the
development cycle, and as the database grows.
⮚It reduces the complexity of metadata for both
developers and end-users.

* Data Mining: Concepts and Techniques 17


Example of Star Schema
time
time_key item
day item_key
day_of_the_week item_name
month Sales Fact Table
brand
quarter type
time_key
year supplier_type
item_key
branch_key location
branch
location_key location_key
branch_key
street
branch_name units_sold city
branch_type
dollars_sold state_or_province
country
avg_sales
Measures
18
* Data Mining: Concepts and Techniques 19
SNOWFLAKE SCHEMA
• Snowflake schema is a variant of the star
schema.
• Some dimension tables are normalized, thereby
further splitting the data into additional tables.
• The resulting schema graph forms a shape similar
to a snowflake.
• In snowflake models dimension tables are kept in
normalized form to reduce redundancies.
• Since normalized dimension tables are easy to
maintain and save storage space.
* Data Mining: Concepts and Techniques 20
⮚ The snowflake schema consists of one fact table
which is linked to many dimension tables, which
can be linked to other dimension tables through a
many-to-one relationship.
⮚ Tables in a snowflake schema are generally
normalized to the third normal form.
⮚ Each dimension table performs exactly one level in
a hierarchy.

* Data Mining: Concepts and Techniques 21


* Data Mining: Concepts and Techniques 22
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
branch_key location_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
city
Measures avg_sales state_or_province
country
23
Advantage of Snowflake Schema
1.The primary advantage of the snowflake schema is
the development in query performance due to
minimized disk storage requirements and joining
smaller lookup tables.
2.It provides greater scalability in the interrelationship
between dimension levels and components.
3.No redundancy, so it is easier to maintain.
Disadvantage of Snowflake Schema
1.The primary disadvantage of the snowflake schema is
the additional maintenance efforts required due to the
increasing number of lookup tables. It is also known as
a multi fact star schema.
2.There are more complex queries and hence, difficult
to understand.
3.More
* tables moreDatajoin so
Mining: more
Concepts query execution time. 24
and Techniques
FACT CONSTELLATION
• Sophisticated application may require
multiple fact tables to share dimension
tables.
• Fact constellation can be viewed as a
collection of stars and hence called a galaxy
schema or a fact constellation

* Data Mining: Concepts and Techniques 25


Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city
units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
26
shipper_type
Distinction between data warehouse and
data mart
• A data warehouse collects information about
subjects that spans the entire organization
wide.
• A data mart is a department subset of the
data warehouse that focuses on selected
subjects and its scope is department wide
• For data marts we use star or snowflakes
schema.
• For Data warehouse we use fast constellation.

* Data Mining: Concepts and Techniques 27


Example of Star Schema
time
time_key item
day item_key
day_of_the_week item_name
month Sales Fact Table
brand
quarter type
time_key
year supplier_type
item_key
branch_key location
branch
location_key location_key
branch_key
street
branch_name units_sold city
branch_type
dollars_sold state_or_province
country
avg_sales
Measures
* Data Mining: Concepts and Techniques 28
STAR SCHEMA Definition in DMQL

* Data Mining: Concepts and Techniques 29


Example of Snowflake Schema

time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
branch_key location_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
city
Measures avg_sales state_or_province
country
* Data Mining: Concepts and Techniques 30
SNOWFLAKE Schema Definition in DMQL

* Data Mining: Concepts and Techniques 31


Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city
units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
* Data Mining: Concepts and Techniques location_key
32
shipper_type
Fact Constellation Schema Definition in
DMQL

* Data Mining: Concepts and Techniques 33


Fast Constellation Schema Definition in
DMQL

* Data Mining: Concepts and Techniques 34


Consider a franchise of retail stores having the business setup
only in India. The analysis requirements of the franchise
include getting to know which items are purchased together by
each individual consumer. They wish to know the sales figures
in terms of sales amount in Rupees as well as quantity of the
individual stores and also for the city, state and region in which
they are located. They also wish to know how sales differ over
different months, quarters and years; how sales figures
change with the hour of the day – e.g., how sales of morning
hours are different from sales of evening hours, etc.; how
buying habits of male consumers are different from that of the
female consumers; how buying habits of married consumers
are different from that of the unmarried consumers; how
buying habits of consumers vary with their native languages
(e.g., Kannad, Telugu, Marathi, etc.).

* Data Mining: Concepts and Techniques 35


Design a star schema for such a data warehouse clearly
identifying the fact table and dimension tables, their primary
keys, and foreign keys. Also, mention which columns in the
fact table represent dimensions and which ones represent
measures or facts.

Step 1: Identify the Business process to model in order


to identify the fact table.
We are talking about Sales here. Fact table will be
named as ‘Sales’.Facts or measures are
1. Total_sales_amount, 2. Total_sales_quantity.

* Data Mining: Concepts and Techniques 36


Step 2: Choose the dimensions for the fact table.
Dimensions are:
• Location(of stores),
• Date,
• Customer,
• Product,
• Time

* Data Mining: Concepts and Techniques 37


Step 3: Choose the attributes of dimension tables.
Attributes of Location dimension:
• Location_id (primary key and surrogate key)
• city
• district
• state
• region (rural or urban)
Attributes of Date dimension:
• Data_id (primary key and surrogate key)
• day
• week
• month
• quarter
• year

* Data Mining: Concepts and Techniques 38


Attributes of Customer dimension:
• Customer_id (primary key and surrogate key)
• name
• gender
• marital_status
• language
Attributes of Product dimension:
• Product_id (primary key and surrogate key)
• name
• type
• price
Attributes of Time dimension:
• Time_id (primary key and surrogate key)
• am_pm_indicator
The primary key of the fact table is the composite key, consisting
of primary keys of all 5 dimensions.
PK of Sales is {Location_id, Date_id, Customer_id, Product_id,
Time_id}
* Data Mining: Concepts and Techniques 39
Step 4: Attribute hierarchy in the dimension tables.
Location: city -> district -> state Date:
day -> week -> month -> quarter -> year.

* Data Mining: Concepts and Techniques 40


Write one SQL statement that runs on your schema and returns
the number of purchases made during the evening hours by the
married customers and the unmarried customers in the month of
May 2005.

* Data Mining: Concepts and Techniques 41


Data Cube Measures

• A data cube measure is a numeric function that can


be evaluated at each point in the data cube space.
• A measure value is computed for a given point by
aggregating the data corresponding to the respective
dimension–value pairs defining the given point.

* Data Mining: Concepts and Techniques 42


Data Cube Measures: Three Categories

• Distributive: if the result derived by applying the function to n


aggregate values is the same as that derived by applying the function
on all the data without partitioning
• E.g., count(), sum(), min(), max()
• Algebraic: if it can be computed by an algebraic function with M
arguments (where M is a bounded integer), each of which is obtained
by applying a distributive aggregate function.
avg() can be obtained by sum()/count().
• E.g., avg(), min_N(), standard_deviation()
• Holistic: if there is no constant bound on the storage size needed to
describe a subaggregate. That is there is no algebraic functions with
M arguments that characterizes the computation.
• E.g., median(), mode(), rank()

43
• Distributive: An aggregate function is distributive if it can be computed in
a distributed manner as follows. Suppose the data are partitioned
into n sets.
• We apply the function to each partition, resulting in n aggregate values.
• If the result derived by applying the function to the n aggregate values is
the same as that derived by applying the function to the entire data set
(without partitioning), the function can be computed in a distributed
manner.
• For example, sum() can be computed for a data cube by first partitioning
the cube into a set of subcubes, computing sum() for each subcube, and
then summing up the counts obtained for each subcube.

• Hence, sum() is a distributive aggregate function.

* Data Mining: Concepts and Techniques 44


• An aggregate function is algebraic if it can be computed by
an algebraic function with M arguments (where M is a
bounded positive integer), each of which is obtained by
applying a distributive aggregate function.
• For example, avg() (average) can be computed
by sum()/count(), where both sum() and count() are
distributive aggregate functions.

* Data Mining: Concepts and Techniques 45


• An aggregate function is holistic if there is no constant
bound on the storage size needed to describe a
subaggregate.
• That is, there does not exist an algebraic function
with M arguments (where M is a constant) that
characterizes the computation.
• Common examples of holistic functions
include median(), mode(), and rank().
• A measure is holistic if it is obtained by applying a
holistic aggregate function.

* Data Mining: Concepts and Techniques 46


Concept Hierarchy
A concept hierarchy defines a sequence of mappings from a set of
low-level concepts to higher-level, more general concepts.

* Data Mining: Concepts and Techniques 47


Many concept hierarchies are implicit within the database
schema.
Example:
• Let the dimension location, which is described by the
attributes number, street, city, province_or_state, zip_code,
and country.
• These attributes are related by a total order, forming a concept
hierarchy such as “street < city < province_or_state < country.”

* Data Mining: Concepts and Techniques 48


The attributes of a dimension may be organized in a partial order,
forming a lattice.
Example:
A partial order for the time dimension based on the attributes day,
week, month, quarter, and year is “day <{month < quarter;
week} < year.”1

A concept hierarchy that is a total or partial


order among attributes in a database schema
is called a schema hierarchy.

* Data Mining: Concepts and Techniques 49


Multidimensional Data
■ Sales volume as a function of product, month,
and region
Dimensions: Product, Location,
Time
Hierarchical summarization paths
Industry Region Year

Category Country Quarter


Product

Product City Month Week

Office Day

Mont
h 50
Typical OLAP Operations
• Roll up (drill-up): summarize data
– by climbing up hierarchy or by dimension reduction
• Drill down (roll down): reverse of roll-up
– from higher level summary to lower level summary or detailed
data, or introducing new dimensions
• Slice and dice: project and select
• Pivot (rotate):
– reorient the cube, visualization, 3D to series of 2D planes
• Other operations
– drill across: involving (across) more than one fact table
– drill through: through the bottom level of the cube to its back-
end relational tables (using SQL)

51
Multidimensional Data
■ Sales volume as a function of product, month,
and region
Dimensions: Product, Location,
Time
Hierarchical summarization paths
Industry Region Year

Category Country Quarter


Product

Product City Month Week

Office Day

Mont
h 52
Roll-up

* Data Mining: Concepts and Techniques 53


• Roll-up is performed by climbing up a concept hierarchy for the
dimension location.
• Initially the concept hierarchy was "street < city < province <
country".
• On rolling up, the data is aggregated by ascending the location
hierarchy from the level of city to the level of country.
• The data is grouped into cities rather than countries.
• When roll-up is performed, one or more dimensions from the data
cube are removed.

54
Drill-down

• By stepping down a concept


hierarchy for a dimension
• By introducing a new dimension.

55
Drill-Down

* Data Mining: Concepts and Techniques 56


• Drill-down is performed by stepping down a concept hierarchy for the
dimension time.

• Initially the concept hierarchy was "day < month < quarter < year."

• On drilling down, the time dimension is descended from the level of


quarter to the level of month.

• When drill-down is performed, one or more dimensions from the data


cube are added.

• It navigates the data from less detailed data to highly detailed data.
57
Slice

• The slice operation selects one


particular dimension from a given
cube and provides a new sub-cube.

58
Slice

* Data Mining: Concepts and Techniques 59


• Here Slice is performed for the dimension
"time" using the criterion time = "Q1".

• It will form a new sub-cube by selecting one


or more dimensions.

60
Dice
• Dice selects two or more dimensions
from a given cube and provides a
new sub-cube.

61
Dice

* Data Mining: Concepts and Techniques 62


• The dice operation on the cube based on the
following selection criteria involves three
dimensions.

• (location = "Toronto" or "Vancouver")


• (time = "Q1" or "Q2")
• (item =" Mobile" or "Modem")

63
Pivot
• The pivot operation is also known as
rotation.
• It rotates the data axes in view in
order to provide an alternative
presentation of data.

64
Pivot

* Data Mining: Concepts and Techniques 65


Fig. 3.10 Typical OLAP
Operations

66
Summary
■ Data warehousing: A multi-dimensional model of a data warehouse
■ A data cube consists of dimensions & measures
■ Star schema, snowflake schema, fact constellations

■ OLAP operations: drilling, rolling, slicing, dicing and pivoting

■ Data Warehouse Architecture, Design, and Usage


■ Multi-tiered architecture
■ Business analysis design framework
■ Information processing, analytical processing, data mining, OLAM
(Online Analytical Mining)
■ Implementation: Efficient computation of data cubes
■ Partial vs. full vs. no materialization

■ Indexing OALP data: Bitmap index and join index

■ OLAP query processing

■ OLAP servers: ROLAP, MOLAP, HOLAP

67
A starnet Query model
• The querying of multi dimensional databases
can be based on a starnet model.
• A starnet model consist of radial lines
emanating from a central point.
• Each line represents concept hierarchy for each
dimension.
• Each abstraction level in the hierarchy is called
foot print.

* Data Mining: Concepts and Techniques 68


Starnet Query Model
OLAP vs OLTP

70
71
ClassRoom Code

kon2vyd

72

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy