Ch4 DW Detailed Version
Ch4 DW Detailed Version
11
Outline
Part 1:
I. Introduction to Data Warehousing
II. Architecture of Data Warehousing
III. Design and Modeling in Data Warehousing
Part 2:
5
Data warehouse VS Database (1/3)
7
Data warehouse VS Database (3/3)
8
OLTP VS OLAP
analysis.
The goal is to provide a clear and optimized representation of data that supports
The key concepts in dimensional modeling are facts, dimensions, and attributes. 12
15
Fact table—primary key is a surrogate key. Fact table— several measures.
Key concepts : Dimension
A dimension is an entity that establishes the business context for the measures
(facts) used by an enterprise.
Dimensions define the who, what, where, and why of the dimensional model,
and group similar attributes into a category or subject area. Examples of
dimensions are product, geography, customers, employees, and time. Whereas
facts are numeric, dimensions are descriptive in nature (although some of those
descriptions, such as a product list price, may be numeric).
Creating a dimension enables facts to store attributes in a single place
16
Dimension
Dimensions keep the database from being overrun with redundant data. With all
the attributes in a dimension table, they don’t have to be repeated in the fact
tables.
Example:
Take Amazon, for example. The data for an individual sale will contain the product
identification number, but will not repeat all the attributes of the product (color,
description, reviews, etc.). Those attributes are in a dimension, and each individual
sale of that product just points to them.
From a business perspective, the key purpose of dimensions it to use their
17
attributes to filter and analyze data based on performance measures
Dimension
• Dimensions are used for
• Selection of data
• Grouping of data at the right level of detail
• Dimensions consist of dimension values
• Product dimension has values ”milk”, ”cream”, …
• Time dimension has values ”1/1/2001”, ”2/1/2001”,…
• Dimension values may have an ordering
• Used for comparing cube data across values
• Especially used for Time dimension
18
Dimension
• Dimensions have hierarchies with levels
• Typically 3-5 levels (of detail)
• Dimension values are organized in a tree structure
• Product: Product->Type->Category
• Store: Store->Area->City->County
• Time: Day->Month->Quarter->Year
• Dimensions have a bottom level and a top level
• Levels may have attributes
• Simple, non-hierarchical information
• Day has Workday as attribute
• Dimensions should contain much information
19
• Time dimension may contain holiday, season, events,…
• Good dimensions have 50-100 or more attributes/levels
Dimensional model: Example
20
Granularity: Dimensionality
Hierarchy
• Granularity of facts is important
• Level of detail
• Given by combination of bottom levels
• A dimensional hierarchy defines mappings from a set of lower-level
concepts to higher level concepts.
21
Data Warehouse Design
22
Star Schema
In a star schema, there is a central fact table surrounded by dimension
tables.
table
The fact table contains numerical measures (such as sales or revenue), and
23
Star Schema: Example
24
Snowflake schema
Snowflake schema is an expanded version of a star schema in which
• Advantages
• Disadvantages
26
Fact constellation
schema
• A fact constellation has multiple fact tables. It is also known as galaxy
schema.
• The following diagram shows two fact tables, namely sales and Inventory
27
From the Data Warehouse to
Data Marts
• A data mart contains only those data that are specific to a particular
group. For example, the marketing data mart may contain only data
related to items, customers, and sales.
• Data marts are confined to subjects.
• Data marts are small in size.
• Data marts are customized by department
28
The complete Decision Support
System
29
DWH Architecture
30
Types of Data Warehousing
Architectures
1. Centralized Data Warehouse : is a single, unified repository that stores
decision-making.
33
Data architecture VS Data
modeling
• Data architecture applies to the higher-level view of how the enterprise
handles its data, such as how it is categorized, integrated, and stored.
• Data modeling applies to very specific and detailed rules about how pieces
of data are arranged in the database. Where data architecture is the
blueprint for your house, data modeling is the instructions for installing a
faucet.
34
Kimball Approach:
• Kimball emphasizes the use of dimensional modeling, creating star or
snowflake schemas. This approach focuses on designing the data
warehouse based on business processes and user requirements.
36
Inmon's Approach:
Inmon supporters the creation of a centralized Enterprise Data Warehouse
(EDW) as the foundation. This EDW serves as a single, integrated repository for
the entire organization.
37
Kimball VS Inmon’s Approach
Philosophy:
Kimball: Business-driven, iterative, and agile.
Inmon: Enterprise-centric, normalized, and long-term.
Data Model:
Kimball: Dimensional modeling, star or snowflake schemas.
Inmon: Normalized data model, 3NF.
Development Approach:
Kimball: Bottom-up development, starting with data marts.
Inmon: Top-down development, starting with the enterprise data warehouse.
Data Marts:
Kimball: Considers data marts as primary deliverables.
Inmon: Views data marts as subsets of the enterprise data warehouse.
Flexibility: 38
Kimball: Agile and adaptable to changing business needs.
Inmon: Emphasizes a stable and scalable architecture for long-term use.
Kimball approach: Main
steps
1. Choose the subject : Clearly define the business objectives and scope of
the data warehouse project.
2. Requirements Gathering: Collaborate closely with business users to
gather their reporting and analysis requirements.
3. Dimensional Modeling: Star or Snowflake Schema: Develop dimensional
models using star or snowflake schemas. Identify Dimensions and Facts
4. ETL Design and Development: Create Extract, Transform, Load (ETL)
processes based on dimensional models.
5. Data Mart Development: Develop data marts as subsets of the data
warehouse, addressing specific business needs.
39
6. Business Intelligence Tools Integration: Choose and integrate business
intelligence tools compatible with dimensional models.