0% found this document useful (0 votes)

20 views39 pages

Ch4 DW Detailed Version

Uploaded by

rymachayeb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views39 pages

Ch4 DW Detailed Version

Uploaded by

rymachayeb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Data warehousing

11
Outline
Part 1:
I. Introduction to Data Warehousing
II. Architecture of Data Warehousing
III. Design and Modeling in Data Warehousing

Part 2:

IV. ETL Processes in Data Warehousing

A. Extracting Data
1. Data Extraction Techniques
2. Data Profiling
B. Transforming Data
1. Data Cleaning and Quality
2. Data Integration
C. Loading Data 2
Data warehouse: Definition
A data warehouse is a centralized repository that integrates and stores large volumes
of structured, historical data from various sources within an organization.

It is designed for the purpose of supporting business intelligence (BI) activities,

including reporting, analysis, and decision-making processes.

Data warehouses provide a consolidated view of an organization's data, allowing

users to analyze trends, identify patterns, and gain valuable insights that can inform
strategic and operational decisions.

Data warehouses play a crucial role in business intelligence by providing decision-

makers with a unified and consistent view of historical data. 3
Keys characteristics
Key characteristics of a data warehouse include:

• Subject-Oriented: Data warehouses are organized around specific business

subjects or areas, such as sales, finance, or customer relations, to support
analytical queries and reporting within those domains.
• Integrated Data: Data from disparate sources, such as transactional databases,
spreadsheets, and external systems, is integrated and transformed to ensure
consistency and coherence in the warehouse. This integration process is often
facilitated through ETL (Extract, Transform, Load) procedures.
• Time-Variant: Data in a data warehouse is time-stamped, allowing users to
analyze trends and changes over time. This time-variant aspect enables historical
analysis and reporting.
4
Keys characteristics
• Non-Volatile: Unlike operational databases that are frequently updated with
transactional data, a data warehouse is non-volatile. Once data is loaded into
the warehouse, it is typically not updated or deleted, ensuring a stable
environment for analytical processing.
• Optimized for Query and Reporting: Data warehouses are structured and
indexed for efficient querying and reporting. They often use denormalized
schemas, such as star or snowflake schemas, to simplify and accelerate
analytical queries.

5
Data warehouse VS Database (1/3)

Data warehouse Database

Purpose Primarily designed for Designed for transactional
analytical processing and processing and day-to-day
business intelligence. It is operations. Focus is on efficient
optimized for complex data retrieval, insertion, and
queries and reporting. updating.
Data Types Stores large volumes of Stores operational data, often in
historical, structured data. real-time. Primarily contains
Often includes data from current and frequently updated
multiple sources within the information.
organization.
Schema Design Uses specialized schemas like Typically uses normalized
star schema or snowflake schemas to reduce redundancy
schema for efficient querying and maintain data integrity.
and reporting. Normalization helps in 6
transactional processing.
Data warehouse VS Database (2/3)

Data warehouse Database

Data Integration Involves the integration of data May store data from a specific
from various sources using ETL application or domain.
(Extract, Transform, Load) Integration is focused on
processes to ensure maintaining consistency within
consistency and coherence. the operational context.
Data Volatility Non-volatile; historical data is Volatile; data is frequently
stored and rarely updated. updated and modified as part
Changes typically involve of ongoing transactions.
adding new data rather than
modifying existing records.

7
Data warehouse VS Database (3/3)

Data warehouse Database

Query Optimization Optimized for complex Optimized for fast retrieval
queries. and updating of individual
records.
User Base Primarily used by analysts, Used by application
data scientists, and decision- developers, system
makers for in-depth analysis, administrators, and
reporting, and business operational staff for day-to-
intelligence activities. day application support and
transactional processing.
Data Processing Online Analytical Processing Online Transactional
(OLAP) Processing (OLTP)

8
OLTP VS OLAP

Data warehouses are tailored for analytical

processing, historical analysis, and business
intelligence, whereas databases are focused 9
on supporting transactional processing and
day-to-day operations.
Main Components of a Data
Warehouse
A data warehouse comprises several components that work together to facilitate the
storage, integration, and retrieval of large volumes of data for analytical processing.
The main components of a data warehouse include:
1. Data Sources:
These are systems or applications that generate and store data. Data sources can
include operational databases, external data feeds, spreadsheets, and other
repositories.
2. ETL (Extract, Transform, Load) Processes:
ETL processes are responsible for extracting data from various sources,
transforming it to conform to the data warehouse's structure and quality 10
standards, and loading it into the data warehouse.
Main Components of a Data
Warehouse
3. Data Warehouse Database:
The central repository that stores the integrated and transformed data. It is
optimized for analytical querying and reporting. Data warehouses often use
specialized database management systems (DBMS) designed for analytical
workloads.
4. Data Marts:
Data marts are subsets of the data warehouse that focus on specific business
functions or departments. They are often designed for the needs of a particular
group of users.
5. OLAP (Online Analytical Processing) Servers:
OLAP servers enable users to interactively analyze and explore data in a 11
multidimensional way. OLAP provides capabilities for slicing and dicing data,
drilling down into details, and performing complex analyses.
Design and Modeling in Data
Warehousing
Data warehouse modeling involves designing the structure and organization of

data within a data warehouse to facilitate efficient querying, reporting, and

analysis.

The goal is to provide a clear and optimized representation of data that supports

business intelligence and decision-making.

Dimensional modeling is much better suited for business intelligence (BI)

applications and data warehousing (DW)

The key concepts in dimensional modeling are facts, dimensions, and attributes. 12

All these concepts can be organized in several ways, called schemas.

Dimensional modeling overview

 The fact Tbl_Fact_Store_Sales is at the core of the dimensional model

 Four surrounding dimensions that define and put into context the store
sales:
• Tbl_Dim_Item, which is what products were sold.
• Tbl_Dim_Date, which is when those products were sold
• Tbl_Dim_Customer, who bought the products 13
• Tbl_Dim_Buyer, who bought the product for the store
Key concepts: Facts Tables

A fact is a measurement of a business activity, such as a business event or

transaction, and is generally numeric.
Examples of facts are sales, expenses, and inventory levels
Fact tables are composed of two types of columns: keys and measures
• The first, the key column, consists of a group of foreign keys (FK) that point
to the primary keys of dimensional tables that are associated with this fact
table to enable business analysis. The relationships between fact tables and
the dimensions are one-to-many.
• The second type of column is the actual measures of the business activity
such as the sales revenue and order quantity. Every measurement has a
grain, which is the level of detail in the measurement of an event such as a 14

unit of measure or currency used.

Facts Tables: Example

15
Fact table—primary key is a surrogate key. Fact table— several measures.
Key concepts : Dimension

A dimension is an entity that establishes the business context for the measures
(facts) used by an enterprise.
Dimensions define the who, what, where, and why of the dimensional model,
and group similar attributes into a category or subject area. Examples of
dimensions are product, geography, customers, employees, and time. Whereas
facts are numeric, dimensions are descriptive in nature (although some of those
descriptions, such as a product list price, may be numeric).
Creating a dimension enables facts to store attributes in a single place

16
Dimension

Dimensions keep the database from being overrun with redundant data. With all
the attributes in a dimension table, they don’t have to be repeated in the fact
tables.
Example:
Take Amazon, for example. The data for an individual sale will contain the product
identification number, but will not repeat all the attributes of the product (color,
description, reviews, etc.). Those attributes are in a dimension, and each individual
sale of that product just points to them.
From a business perspective, the key purpose of dimensions it to use their
17
attributes to filter and analyze data based on performance measures
Dimension
• Dimensions are used for
• Selection of data
• Grouping of data at the right level of detail
• Dimensions consist of dimension values
• Product dimension has values ”milk”, ”cream”, …
• Time dimension has values ”1/1/2001”, ”2/1/2001”,…
• Dimension values may have an ordering
• Used for comparing cube data across values
• Especially used for Time dimension

18
Dimension
• Dimensions have hierarchies with levels
• Typically 3-5 levels (of detail)
• Dimension values are organized in a tree structure
• Product: Product->Type->Category
• Store: Store->Area->City->County
• Time: Day->Month->Quarter->Year
• Dimensions have a bottom level and a top level
• Levels may have attributes
• Simple, non-hierarchical information
• Day has Workday as attribute
• Dimensions should contain much information
19
• Time dimension may contain holiday, season, events,…
• Good dimensions have 50-100 or more attributes/levels
Dimensional model: Example

Example: sales of supermarkets

• Facts and measures
• Each sales record is a fact, and its sales value is a measure
• Dimensions
• Group correlated attributes into the same dimension
• Each sales record is associated with its values of Product, store,
Time

20
Granularity: Dimensionality
Hierarchy
• Granularity of facts is important
• Level of detail
• Given by combination of bottom levels
• A dimensional hierarchy defines mappings from a set of lower-level
concepts to higher level concepts.

21
Data Warehouse Design

• A schema is a logical description of the entire database.

• Database uses relational model, while a data warehouse uses Star,

Snowflake, and Fact Constellation schema.

22
Star Schema
In a star schema, there is a central fact table surrounded by dimension

tables.

Each dimension in a star schema is represented with only one-dimension

table

The fact table contains numerical measures (such as sales or revenue), and

dimension tables provide descriptive information about the measures .

This dimension table contains the set of attributes.

23
Star Schema: Example

24
Snowflake schema
Snowflake schema is an expanded version of a star schema in which

dimension tables are normalized into several related tables.

• Advantages

• Small saving in storage space

• Normalized structures are easier to update and maintain

• Disadvantages

• A schema that is less intuitive

• The ability to browse through the content is difficult

25
• A degraded query performance because of additional joins.
Snowflake schema: Example

26
Fact constellation
schema
• A fact constellation has multiple fact tables. It is also known as galaxy

schema.

• The following diagram shows two fact tables, namely sales and Inventory

27
From the Data Warehouse to
Data Marts

• A data mart contains only those data that are specific to a particular
group. For example, the marketing data mart may contain only data
related to items, customers, and sales.
• Data marts are confined to subjects.
• Data marts are small in size.
• Data marts are customized by department

28
The complete Decision Support
System

29
DWH Architecture

30
Types of Data Warehousing
Architectures
1. Centralized Data Warehouse : is a single, unified repository that stores

and manages data from various sources within an organization. It serves

as a centralized and integrated platform for business intelligence and

decision-making.

2. Data Marts : are smaller, specialized subsets of a data warehouse that

focus on specific business areas, departments, or user groups. They are

designed to meet the needs of a particular set of users with common

31
interests.
Types of Data Warehousing
Architectures
3. Federated Data Warehouse : is an architecture that integrates data

from multiple independent data sources without physically

consolidating the data into a central repository. It enables distributed

data access and processing.

4. Hybrid Data Warehouse: combines elements of both centralized and

distributed architectures. It may involve a mix of on-premises and

cloud-based solutions, as well as a combination of centralized and

32
federated approaches.
Extraction Transformation
Loading–ETL tools

33
Data architecture VS Data
modeling
• Data architecture applies to the higher-level view of how the enterprise
handles its data, such as how it is categorized, integrated, and stored.

• Data modeling applies to very specific and detailed rules about how pieces
of data are arranged in the database. Where data architecture is the
blueprint for your house, data modeling is the instructions for installing a
faucet.

34
Kimball Approach:
• Kimball emphasizes the use of dimensional modeling, creating star or
snowflake schemas. This approach focuses on designing the data
warehouse based on business processes and user requirements.

• Follows a bottom-up development approach, starting with the creation

of data marts that address immediate business requirements. These
data marts are then integrated to form the complete data warehouse.

• Kimball's approach involves the use of Extract, Transform, Load (ETL)

processes that are specifically designed for dimensional models. This
35
ensures the transformation of source data into a format optimized for
reporting and analysis.
Kimball Approach:

36
Inmon's Approach:
Inmon supporters the creation of a centralized Enterprise Data Warehouse
(EDW) as the foundation. This EDW serves as a single, integrated repository for
the entire organization.

Inmon's approach follows a top-down development methodology. It begins

with the creation of an enterprise-wide data warehouse and then focuses on
building data marts to meet specific business needs.

37
Kimball VS Inmon’s Approach
Philosophy:
Kimball: Business-driven, iterative, and agile.
Inmon: Enterprise-centric, normalized, and long-term.
Data Model:
Kimball: Dimensional modeling, star or snowflake schemas.
Inmon: Normalized data model, 3NF.
Development Approach:
Kimball: Bottom-up development, starting with data marts.
Inmon: Top-down development, starting with the enterprise data warehouse.
Data Marts:
Kimball: Considers data marts as primary deliverables.
Inmon: Views data marts as subsets of the enterprise data warehouse.
Flexibility: 38
Kimball: Agile and adaptable to changing business needs.
Inmon: Emphasizes a stable and scalable architecture for long-term use.
Kimball approach: Main
steps
1. Choose the subject : Clearly define the business objectives and scope of
the data warehouse project.
2. Requirements Gathering: Collaborate closely with business users to
gather their reporting and analysis requirements.
3. Dimensional Modeling: Star or Snowflake Schema: Develop dimensional
models using star or snowflake schemas. Identify Dimensions and Facts
4. ETL Design and Development: Create Extract, Transform, Load (ETL)
processes based on dimensional models.
5. Data Mart Development: Develop data marts as subsets of the data
warehouse, addressing specific business needs.
39
6. Business Intelligence Tools Integration: Choose and integrate business
intelligence tools compatible with dimensional models.

DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
DianeWIN V2-5-0 Benutzerhandbuch EN
100% (1)
DianeWIN V2-5-0 Benutzerhandbuch EN
99 pages
XCOM User Guide PDF
100% (1)
XCOM User Guide PDF
39 pages
Adec Service Guide For 1010-1015-1020 Dental Exam Chairs
No ratings yet
Adec Service Guide For 1010-1015-1020 Dental Exam Chairs
90 pages
Jaipur Facebook Users 10266
No ratings yet
Jaipur Facebook Users 10266
458 pages
Security Survey of SITE
100% (2)
Security Survey of SITE
4 pages
Sony-Ps2-Scph-39000 Series Service Manual gh-017 gh-019
No ratings yet
Sony-Ps2-Scph-39000 Series Service Manual gh-017 gh-019
30 pages
Sprinkler Monitoring Manual
No ratings yet
Sprinkler Monitoring Manual
55 pages
DWM Gufran Notes
No ratings yet
DWM Gufran Notes
318 pages
DSA Project Slides
No ratings yet
DSA Project Slides
38 pages
Antim Prahar Business Data Warehousing Data Mining 2024
No ratings yet
Antim Prahar Business Data Warehousing Data Mining 2024
65 pages
2-Datawarehousing Schema and Architecture-11!08!2021 (11-Aug-2021) Material I 11-Aug-2021 Datawarehousing - Introductory Slides
No ratings yet
2-Datawarehousing Schema and Architecture-11!08!2021 (11-Aug-2021) Material I 11-Aug-2021 Datawarehousing - Introductory Slides
90 pages
Chapter1 Data Warehousing Intro
No ratings yet
Chapter1 Data Warehousing Intro
48 pages
Data Warehousing: People Making Technology Wor K™
100% (1)
Data Warehousing: People Making Technology Wor K™
44 pages
7 Data Warehousing - 1
No ratings yet
7 Data Warehousing - 1
32 pages
Capitalism Plus Manual
No ratings yet
Capitalism Plus Manual
174 pages
DWDM Lecture Notes III-II
No ratings yet
DWDM Lecture Notes III-II
81 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
55 pages
PDF
No ratings yet
PDF
41 pages
Online Votting System For Student Union
No ratings yet
Online Votting System For Student Union
48 pages
DWDM Concept Demonstration
No ratings yet
DWDM Concept Demonstration
102 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
FDS Unit 2
No ratings yet
FDS Unit 2
21 pages
Lec.10.D. M. Spring 2025
No ratings yet
Lec.10.D. M. Spring 2025
40 pages
IOWA-LX-600: User Manual
No ratings yet
IOWA-LX-600: User Manual
159 pages
Unit I DWDM
No ratings yet
Unit I DWDM
67 pages
Compilers Intro Jan2025
No ratings yet
Compilers Intro Jan2025
60 pages
Unit 2 Updated
No ratings yet
Unit 2 Updated
50 pages
ELIWELL Mdcontrol&Marve Eliwell2011
100% (1)
ELIWELL Mdcontrol&Marve Eliwell2011
121 pages
Bda U2
No ratings yet
Bda U2
44 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
DWDM - Unit - I
No ratings yet
DWDM - Unit - I
70 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
Product Description For Qlik Cloud Subscriptions
No ratings yet
Product Description For Qlik Cloud Subscriptions
8 pages
DataMining - Chapter2 - Data WareHouse
No ratings yet
DataMining - Chapter2 - Data WareHouse
53 pages
Data Warehousing 1
No ratings yet
Data Warehousing 1
29 pages
Module 1 Notes
No ratings yet
Module 1 Notes
29 pages
Warehouse
No ratings yet
Warehouse
60 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
Data Warehouse: Subject Oriented
No ratings yet
Data Warehouse: Subject Oriented
6 pages
Data Warehousing, Business Analytics and Online Analytical - 1
No ratings yet
Data Warehousing, Business Analytics and Online Analytical - 1
35 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
66 pages
Group-4 - Data Warehousing
No ratings yet
Group-4 - Data Warehousing
33 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
British Standard: A Single Copy of This British Standard Is Licensed To
No ratings yet
British Standard: A Single Copy of This British Standard Is Licensed To
25 pages
Chapter 2
No ratings yet
Chapter 2
79 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Database Presentation Slides
No ratings yet
Database Presentation Slides
52 pages
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
No ratings yet
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
59 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
Data Warehouse Unit1 CS3551
No ratings yet
Data Warehouse Unit1 CS3551
25 pages
02datawarehousing For DM
No ratings yet
02datawarehousing For DM
38 pages
BDA Unit 2 B.tech
No ratings yet
BDA Unit 2 B.tech
9 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
BSC (Computer Science) Sem-3
No ratings yet
BSC (Computer Science) Sem-3
11 pages
ch4 DW Summary
No ratings yet
ch4 DW Summary
8 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
Data Mining Unit-2 Notes
No ratings yet
Data Mining Unit-2 Notes
8 pages
Data Warehousing Unit 1,2
No ratings yet
Data Warehousing Unit 1,2
9 pages
Safety Audit Checklist - Excavation & Foundation
No ratings yet
Safety Audit Checklist - Excavation & Foundation
2 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
DATA Ware House Mining NOTES
No ratings yet
DATA Ware House Mining NOTES
31 pages
Data Warehousing-1
No ratings yet
Data Warehousing-1
51 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Unit 1
No ratings yet
Unit 1
22 pages
操作及维护保养说明书
No ratings yet
操作及维护保养说明书
22 pages
HRIS
No ratings yet
HRIS
10 pages
Cybercrimes 13
No ratings yet
Cybercrimes 13
12 pages
Documenting Software
No ratings yet
Documenting Software
2 pages
Final Report
No ratings yet
Final Report
29 pages
DW Concepts Shiva
No ratings yet
DW Concepts Shiva
32 pages
Mid Level CV Template
No ratings yet
Mid Level CV Template
4 pages
The Data Warehouse Is A Place Where People Can Access Their Data. The Goals of A Data Warehouse Are As Follows
No ratings yet
The Data Warehouse Is A Place Where People Can Access Their Data. The Goals of A Data Warehouse Are As Follows
22 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
46 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
41 pages
Telecom Service Provider RFQ
No ratings yet
Telecom Service Provider RFQ
4 pages
BDF Cedric
No ratings yet
BDF Cedric
2 pages
Victoria Drury Resume
No ratings yet
Victoria Drury Resume
1 page
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
Lab2 VHDL
No ratings yet
Lab2 VHDL
1 page
Cypress - IdentityServer - Cracking The OIDC Protocol - by Dev Shah - Tenets - Medium
No ratings yet
Cypress - IdentityServer - Cracking The OIDC Protocol - by Dev Shah - Tenets - Medium
7 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ch4 DW Detailed Version

Uploaded by

Ch4 DW Detailed Version

Uploaded by

Data warehousing

IV. ETL Processes in Data Warehousing

It is designed for the purpose of supporting business intelligence (BI) activities,

Data warehouses provide a consolidated view of an organization's data, allowing

Data warehouses play a crucial role in business intelligence by providing decision-

• Subject-Oriented: Data warehouses are organized around specific business

Data warehouse Database

Data warehouse Database

Data warehouse Database

Data warehouses are tailored for analytical

data within a data warehouse to facilitate efficient querying, reporting, and

business intelligence and decision-making.

Dimensional modeling is much better suited for business intelligence (BI)

applications and data warehousing (DW)

All these concepts can be organized in several ways, called schemas.

 The fact Tbl_Fact_Store_Sales is at the core of the dimensional model

A fact is a measurement of a business activity, such as a business event or

unit of measure or currency used.

Example: sales of supermarkets

• A schema is a logical description of the entire database.

• Database uses relational model, while a data warehouse uses Star,

Snowflake, and Fact Constellation schema.

Each dimension in a star schema is represented with only one-dimension

dimension tables provide descriptive information about the measures .

This dimension table contains the set of attributes.

dimension tables are normalized into several related tables.

• Small saving in storage space

• Normalized structures are easier to update and maintain

• A schema that is less intuitive

• The ability to browse through the content is difficult

and manages data from various sources within an organization. It serves

as a centralized and integrated platform for business intelligence and

2. Data Marts : are smaller, specialized subsets of a data warehouse that

focus on specific business areas, departments, or user groups. They are

designed to meet the needs of a particular set of users with common

from multiple independent data sources without physically

consolidating the data into a central repository. It enables distributed

data access and processing.

4. Hybrid Data Warehouse: combines elements of both centralized and

distributed architectures. It may involve a mix of on-premises and

cloud-based solutions, as well as a combination of centralized and

• Follows a bottom-up development approach, starting with the creation

• Kimball's approach involves the use of Extract, Transform, Load (ETL)

Inmon's approach follows a top-down development methodology. It begins

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.