0% found this document useful (0 votes)

97 views48 pages

What Is A Data Warehouse?

The document defines a data warehouse as a subject-oriented, integrated, non-volatile collection of historical data used to support management decision making. It contains summarized data from multiple operational databases. A data warehouse is organized around subjects like customers, products, sales rather than day-to-day operations. It provides a time-variant view of integrated data to support analysis and strategic decision making. Developing a data warehouse involves iterative planning, prototyping, and implementation to meet evolving analytical needs.

Uploaded by

Kishori Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views48 pages

What Is A Data Warehouse?

Uploaded by

Kishori Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

What is a Data

Warehouse?

Definition
A data warehouse is a stand alone repository
of information integrated from several possibly
heterogeneous operational databases.
Bill Inmons definition:- It is defined as subjectoriented, integrated, non-volatile and time
variant collection of data in support of
management's decision making process.
It contains integrated granular historical data.

Application-Orientation vs. SubjectOrientation

Application-Orientation

Subject-Orientation

Operation
al
Database
Loans

Credit
Card

Data
Warehouse
Customer
Vendor

Trust
Savings

Product
Activity

Basic Features of Data Warehousing

Subject-oriented:
A data warehouse is organized around major
subjects, such as customer, vendor, product, and
Sales. It focuses on the modeling and analysis of
data rather than day-to-day business operations.
Integrated: A data warehouse is constructed by
integrating multiple heterogeneous data sources.
Time variant: A data warehouse is a repository of
historical data. It gives the view of the data for a
designated time frame.
Non-volatile: A data warehouse is always a
physically
separate store of data transformed from the
application data
found in the operational environment. Due to this

Operational DBMS
Relational DBMS is often used for OLTP (online transaction processing)
It deals with day-to-day operations such as
banking, purchasing, manufacturing,
registration, accounting, etc.
These systems typically get data into the
database.
Each transaction processes information
about a single entity.
Following are some examples of OLTP
queries:
What is the price of 2GB Kingston Pen
drive?

Data Warehouse
The major purpose of maintaining data
warehouse is OLAP (on-line analytical processing).
Data warehousing systems are used for data
analysis and strategic decision making process.
Following are some examples of OLAP queries:
How is the student job placement rate
changing over the years across different
colleges of HSNC board ?
Is it financially viable to continue the
production unit at location X?
Since OLAP queries involve huge amount of
aggregation, we need to read huge amount of
data before we are able to conclusively answer
these queries.
The purpose of these queries is to support

Operational
Systems
Data

Current Values

Data Structure Optimized for

transaction
Access
High
Frequency
Access Type
Read, Update,
Insert
Usage
Predictable,
repetitive
Response
Sub-seconds
Time
Users
Large number

Decision
Support
System
Summarized
Historical data
Optimized for
queries
Medium to low
Read
Ad hoc, random,
heuristic
Several seconds
to minutes
Relatively small

Operational Systems (OLTP) vs

Data Warehouse
OLTP
Application Oriented
Used to run business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User
Performance Sensitive
Few Records accessed at
a time (tens)
Read/Update Access
Database Size
100MB
-100 GB
Thousands of users

Warehouse (DSS)
Subject Oriented
Used to analyze business
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User (Manager)
Performance relaxed
Large volumes accessed at
a time(millions)
Mostly Read
Database Size
100 GB
- few terabytes
Hundreds of users

Data Warehouse Architecture

Following are the various parts of a Data
Warehouse:

Source System
Source Data Transport Layer
Data Quality Control and Data Profiling Layer
Metadata Management Layer
Data Integration Layer
Data Processing Layer
End User Reporting Layer

Data Warehouse Architecture

Source System
The Source Systems are operational systems that
feed data into the data warehouse.
The databases in operational systems are
designed to handle business transactions.
Such databases have difficulty accessing the data
for other management or informational purpose.
Therefore organizations require a data warehouse
that has integrated data from several operational
systems to understand customers, operations,
financial situation, product performance, trends
and a host of key business measurements.

Cntd

Source System
The data warehousing integrate data from
several operational systems and combine it with
information from other, often external, sources of
data.
It is very essential to identify the right data
sources and determine an efficient process to
collect the fact.

Source Data Transport Layer

This layer of the data warehouse architecture is
concerned with the transmission of data from the
source system to enterprise warehouse system.
There are various tools and processes involved in
transporting data from the source system to
enterprise warehouse system.

Data Quality Control and

Data Profiling Layer
The data in a data warehouse must be complete
and accurate.
So the quality of data must be examined prior to
the loading of source systems data into the data
warehouse.
The data profiling process prevents data quality
problems before they are introduced into the data
warehouse.

Metadata Management
Layer

Metadata is the auxiliary descriptive data that

exists to tell the user and the analyst where data
is in the DW 2.0 environment.
Meta data describes the actual data.
It is important for designing, constructing,
retrieving and controlling the data warehouse.

Data Integration Layer

Integration is the process of aggregating data
from different data sources to create a unified
view of these data.
A lot of formatting and cleaning activities happen
in this layer so that the data is consistent across
the enterprise.

Data Processing Layer

This layer consist of data staging and enterprise
warehouse.
Data staging often involves complex
programming, but increasingly warehouse tools
are being created that help in this process.
Staging may also involve data quality analysis
programs and filters that identify patterns and
structures with in existing operational data.

End User Reporting Layer

Data Warehouse Options

There are many ways to develop an enterprise
data warehouse.
Following are the key factors to be considered
while developing a data warehouse.
Scope
Data redundancy
Type of end user

Scope
A data warehouse project has either of the two
scopes:
Very broad scope( by integrating all enterprise
data from the beginning of time)
Very narrow scope ( by developing only a personal
data warehouse for a single manager for a single
year.
If the data warehouse is developed by taking all
informational data for the entire enterprise from the
beginning of time, then it is very expensive and take
large amount of time and money to built.
Therefore most organizations develop inexpensive
departmental data warehouses as first steps towards

Data Redundancy
There are essentially three levels of data redundancy
that enterprises should think about when
considering their data warehouse options:
"Virtual" or "Point-to-Point" Data Warehouses
Central Data Warehouses
Distributed Data Warehouses

Virtual Data Warehouses

This option provide end uses with direct access to
multiple operational databases through middleware
tools.
The advantages of this approach are:
Flexibility
No data redundancy
Provides end-users with the
most current corporate
information

Central Data Warehouses

It is a single physical repository that contains all
data for a specific functional area, department
division or enterprise.
A central data warehouse may contain information
from multiple operational systems.
A central data warehouse contain time variant data.
The advantages of this approach are:
security
Ease of management
The disadvantages are:
Performance implications
Expansion is expensive
At times non-reliable
cost

Distributed Data Warehouses

Distributed data warehouse are those in which
certain components are distributed across a number
of different physical databases.
Increasingly, large organizations are pushing
decision-making down to lower and lower levels of
the organization and in turn pushing the data
needed for decision making down (or out) to the LAN
or local computer serving the local decision-maker.
Distributed Data Warehouses usually involve the
most redundant data.

Inmons approach to
Distributed Data
Warehouses

Type of End-User
Executives and managers
Power users (business and financial analysts,
engineers)
Support users (clerical, administrative)

Why developing a Data

Warehouse is a different
approach
To builddevelopment
a classical operational
system developers

need to gather all requirements to build a complete

system all at once.
This approach is well suited for the operational
application environment where processes are run
repetitively, and where complete requirements can be
gathered before a system is built.
Unlike classical operational system, a data warehouse
is built iteratively, a step at a time.
The first reason for this approach is that data
warehouse projects tend to be large.
Another reason for this approach is due to the fact
that the requirements for a data warehouse are not
known when it is first built.
This is because the end users of the data warehouse
do not know exactly what they want.

Developing Data Warehouses

Developing a data warehousing involves activities
such as careful planning, requirements definition,
design, prototyping and implementation.

Developing Strategy
The first and foremost step in developing a data
warehouse is formulating a strategy which is
appropriate for its needs and its user population.
There are a number of strategies by which
organizations can get into data warehousing.
One way is to establish a "Virtual Data Warehouse"
environment by:
Installing a set of data access, data directory and
process management facilities
Training the end-users
Monitoring how the data warehouse facilities are
actually used
Based on actual usage, create a physical data
warehouse to support the high-frequency
Cntd..
requests.

Developing Strategy
Another strategy to develop a data warehouse is as
follows:
Duplicate operational data from a single
operational system
And provide data warehouse features to it with
the help of a series of information access tools.

Cntd..

Developing Strategy
Ultimately, the optimal data warehousing strategy is
to select a user population based on value to the
enterprise and do an analysis of their issues,
questions and data access needs.
Based on these needs, prototype data warehouses
are built and populated so the end-users can
experiment and modify their requirements.
Based on the requirements data can be extracted
from various operational systems.
The needs of various enterprises are different and
hence there is no one approach to build a data
warehouse that will fit the needs of every enterprise.

Cntd..

Data Warehouse Design Consideration

and Dimensional Modeling

Defining Dimensional Model

(Star Schema Model)
The central themeof adimensional modelis thestar
schema.
It represents multi dimensional data.
A star schema consists of:
a central fact table containing measures
and a set of dimension tables.
In star schema model a fact table is at the center of
the star and the dimension tables as points of the
star.

Cntd..

Defining Dimensional Model

(Star Schema Model)
A star schema represents one central set of facts.
The dimension tables contain descriptions about
each of the aspects.
Say for example a warehouse that store sales data,
there is a sales fact table stores facts about sales
while dimension tables store data about location ,
clients, items, times, branches.
The primary key in each dimension table is related
to a foreign key in the fact table.

Star Schema for Sales

One star can contain multiple facts.

Star Schema for Sales

Snowflake Schema
It is a modified version of star schema.
In a star schema, if dimension is complex and contain relationship
such as hierarchies, it is compressed or flattened to a single
dimension.
Like star schema model, the snowflake schema also represents a
dimensional model containing a central fact table and a set of
constituent dimension tables.
The major difference is that in snowflake schema complex
dimensions are normalized into sub-dimension tables.
In a snowflake schema implementation, Warehouse Builder uses
more than one table or view to store the dimension data.
Separate tables store data pertaining to each level in the
dimension.

Star Schema Vs. Snowflake

Schema

Star Schema

Snowflake Schema

Granularity of Facts

Granularity refers to the level of details in a fact

table.
Highest level of granularity is always advisable.
For highest level of granularity data should be kept
in most detailed level.
Low granularity means data is summarized or
aggregated.
Each non- key column in a fact table must be at the
same level of granularity.
Say for example following levels of granularity can

Additivity of Facts

A fact is something that is measurable and are

typically numerical values that can be aggregated.
Following are the three types of facts:
Additive: facts that are additive across all
dimensions.
Semi-Additive: facts that are additive across some
of the dimensions, but not all.
Non-Additive: facts that are not additive across
any dimension.
Additive facts are the most useful facts.

Cntd..

Additivity of Facts
In general, facts representing individual transactions
are fully additive, although cumulative totals are semiadditive.
Non-additive facts are usually the result of ratio or
other calculations.

Additivity of Facts
Example of Additive Fact:Time
Customer
Item
Location
Dimensions
Branch
Quantity
Fact
The Quantity can be summed up over all of the
dimensions(Time, Customer, Item, Location, Branch)

Additivity of Facts
Example of Semi-Additive Fact:Suppose a Bank has the following table to store
current balance by account by end of each day.
Date
Account
Current_Balance
Dimensions Fact
The Balance can not be summed up across Time
dimension. It does not make sense if we sum the
current balance by date.

Additivity of Facts

Example of Non-Additive Fact:Time

Customer
Item
Location
Dimensions
Branch
Net_Profit_Margin

The Price and Net_Profit_Margin can not be summed

up across any dimension.
Facts

Helper Tables
Helper tables usually take one of two forms:
Help for multi valued dimensions
Helper tables for complex hierarchies

Multi Valued Dimensions

Complex Hierarchies

Internal Audit Report To Audit Committee
83% (12)
Internal Audit Report To Audit Committee
38 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
48 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Lect 5 Data Warehousing I - 240924 - 033406
No ratings yet
Lect 5 Data Warehousing I - 240924 - 033406
38 pages
Data Warehousing
No ratings yet
Data Warehousing
16 pages
1 & 2 Data Warehousing - 021052
No ratings yet
1 & 2 Data Warehousing - 021052
80 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
Unit 2 Data Warehousing and OLAP
No ratings yet
Unit 2 Data Warehousing and OLAP
72 pages
2.data Warehousing: Heterogeneous Database Integration
No ratings yet
2.data Warehousing: Heterogeneous Database Integration
26 pages
Introduction To Data Warehouse Edited
No ratings yet
Introduction To Data Warehouse Edited
34 pages
Module 3
No ratings yet
Module 3
17 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
DWM 1
No ratings yet
DWM 1
15 pages
Course Overview: What Is Data Warehouse
No ratings yet
Course Overview: What Is Data Warehouse
75 pages
Lec1 - Introduction To DWH
No ratings yet
Lec1 - Introduction To DWH
41 pages
Eval of Business Performance - Module 1
No ratings yet
Eval of Business Performance - Module 1
8 pages
Unit 6 Data Warehousing
No ratings yet
Unit 6 Data Warehousing
40 pages
Data Warehouse Final Report
No ratings yet
Data Warehouse Final Report
19 pages
Data War Eh Puse
No ratings yet
Data War Eh Puse
51 pages
Lecture 1 Introduction To Data Warehousing
No ratings yet
Lecture 1 Introduction To Data Warehousing
41 pages
DH&DM Unit-1
No ratings yet
DH&DM Unit-1
16 pages
Data Warehousing and Data Mining
100% (1)
Data Warehousing and Data Mining
48 pages
Business Intelligence - Data Warehouse Implementation
100% (1)
Business Intelligence - Data Warehouse Implementation
157 pages
Unit One
No ratings yet
Unit One
41 pages
What Is Data Warehouse
No ratings yet
What Is Data Warehouse
19 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Data Warehouse
No ratings yet
Data Warehouse
109 pages
Data Mining Final New
No ratings yet
Data Mining Final New
109 pages
Notes DWDM
No ratings yet
Notes DWDM
12 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
57 pages
9 DMW Olap PPT 11.2
No ratings yet
9 DMW Olap PPT 11.2
12 pages
Datawarehouse Unit-2
No ratings yet
Datawarehouse Unit-2
59 pages
Unit 2 Data Warehouse New
No ratings yet
Unit 2 Data Warehouse New
45 pages
Unit 1 - CS-703
No ratings yet
Unit 1 - CS-703
16 pages
Module 3 Introduction To Data Warehouse
No ratings yet
Module 3 Introduction To Data Warehouse
34 pages
Data Warehousing
No ratings yet
Data Warehousing
77 pages
Data Warehousing & Data Mining-A View
No ratings yet
Data Warehousing & Data Mining-A View
11 pages
Presentation Prepared By:: Aqsa Ashfaq
No ratings yet
Presentation Prepared By:: Aqsa Ashfaq
22 pages
Data Warehouse
No ratings yet
Data Warehouse
57 pages
Data Warehouse 9 Oct
No ratings yet
Data Warehouse 9 Oct
15 pages
Overview of Data Warehouse
No ratings yet
Overview of Data Warehouse
30 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
UNITyssu 1 LT
No ratings yet
UNITyssu 1 LT
12 pages
DWDM
No ratings yet
DWDM
15 pages
Data Warehousing-Notes (Module - I & II)
No ratings yet
Data Warehousing-Notes (Module - I & II)
32 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages
Ccs341 DW Notes All 5 Units
100% (1)
Ccs341 DW Notes All 5 Units
159 pages
Data Warehouse OLAP OLTP
No ratings yet
Data Warehouse OLAP OLTP
12 pages
Data Warehousing Quick Guide
No ratings yet
Data Warehousing Quick Guide
92 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
52 pages
Data and AI - Data Warehousing
No ratings yet
Data and AI - Data Warehousing
58 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
DWM Unit 1. Introduction To Data Warehousing
100% (4)
DWM Unit 1. Introduction To Data Warehousing
12 pages
Unit 1
No ratings yet
Unit 1
29 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Kendall Sad9 Cpu 09 PDF
No ratings yet
Kendall Sad9 Cpu 09 PDF
4 pages
Nikunj Shrivastava QA Manager 9yrs Exp
No ratings yet
Nikunj Shrivastava QA Manager 9yrs Exp
6 pages
Project Report On: Digital Signature
No ratings yet
Project Report On: Digital Signature
58 pages
Scenario Test Case Action (TC Description) Response (Expected) Serial Number Test Case No
No ratings yet
Scenario Test Case Action (TC Description) Response (Expected) Serial Number Test Case No
26 pages
Pricing Procedure in SAP MM
100% (1)
Pricing Procedure in SAP MM
26 pages
Configuring Group Policies
100% (1)
Configuring Group Policies
55 pages
Cefdebug
No ratings yet
Cefdebug
5 pages
Building Skills in Python
100% (5)
Building Skills in Python
574 pages
AUP - Raman Ramsin
No ratings yet
AUP - Raman Ramsin
17 pages
Multichain Blockchain Data Platform: Pitch Deck
No ratings yet
Multichain Blockchain Data Platform: Pitch Deck
11 pages
Advanced Incident Detection and Threat Hunting
No ratings yet
Advanced Incident Detection and Threat Hunting
119 pages
M365 Enterprise
No ratings yet
M365 Enterprise
993 pages
User - Concurrent - Program - Name Enabled - Flag Description Output - File - Type Save - Output - Flag Application - Id
No ratings yet
User - Concurrent - Program - Name Enabled - Flag Description Output - File - Type Save - Output - Flag Application - Id
7 pages
FR KC Leadership Compass Am 2023 Report
No ratings yet
FR KC Leadership Compass Am 2023 Report
125 pages
Conti Leaked Playbook TTPs
No ratings yet
Conti Leaked Playbook TTPs
8 pages
Teradata SQL Assistant and Java Edition
No ratings yet
Teradata SQL Assistant and Java Edition
22 pages
Installation Guide: Quest Netvault 13.0.1
100% (1)
Installation Guide: Quest Netvault 13.0.1
69 pages
Day 5 Assignment Python Oops Concepts
No ratings yet
Day 5 Assignment Python Oops Concepts
3 pages
WinWorld - Microsoft Visual Basic 6.0
No ratings yet
WinWorld - Microsoft Visual Basic 6.0
2 pages
Adv Java Nit
100% (3)
Adv Java Nit
229 pages
Sample Exam Questions
No ratings yet
Sample Exam Questions
10 pages
Kamini Prajapati SRS1
No ratings yet
Kamini Prajapati SRS1
17 pages
Module 2
No ratings yet
Module 2
19 pages
Database Tunning With High Perforemance
No ratings yet
Database Tunning With High Perforemance
19 pages
Zach Demaris Resume
No ratings yet
Zach Demaris Resume
1 page
Photo Resume
No ratings yet
Photo Resume
2 pages
Resume Prashanth Vadla
No ratings yet
Resume Prashanth Vadla
2 pages
UiPath Certified Professional - ABA - RecommendedTraining
No ratings yet
UiPath Certified Professional - ABA - RecommendedTraining
3 pages
OpenHIM Product Overview
No ratings yet
OpenHIM Product Overview
17 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

What Is A Data Warehouse?

Uploaded by

What Is A Data Warehouse?

Uploaded by

What is a Data

Application-Orientation vs. SubjectOrientation

Basic Features of Data Warehousing

Data Structure Optimized for

Operational Systems (OLTP) vs

Data Warehouse Architecture

Data Warehouse Architecture

Source Data Transport Layer

Data Quality Control and

Metadata is the auxiliary descriptive data that

Data Integration Layer

Data Processing Layer

End User Reporting Layer

Data Warehouse Options

Virtual Data Warehouses

Central Data Warehouses

Distributed Data Warehouses

Why developing a Data

need to gather all requirements to build a complete

Developing Data Warehouses

Data Warehouse Design Consideration

Defining Dimensional Model

Defining Dimensional Model

Star Schema for Sales

One star can contain multiple facts.

Star Schema for Sales

Star Schema Vs. Snowflake

Granularity refers to the level of details in a fact

A fact is something that is measurable and are

Example of Non-Additive Fact:Time

The Price and Net_Profit_Margin can not be summed

Multi Valued Dimensions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.