2 Data Warehousing
2 Data Warehousing
Warehousing
Yeow
Wei
Choong
Anne
Laurent
Databases
§ Databases
are
developed
on
the
IDEA
that
DATA
is
one
of
the
cri>cal
materials
of
the
Informa>on
Age
§ Informa>on,
which
is
created
by
data,
becomes
the
bases
for
decision
making
Decision
Support
Systems
§ Created
to
facilitate
the
decision
making
process
§ So
much
informa>on
that
it
is
difficult
to
extract
it
all
from
a
tradi>onal
database
§ Need
for
a
more
comprehensive
data
storage
facility
– Data
Warehouse
Decision
Support
Systems
§ Extract
Informa>on
from
data
to
use
as
the
basis
for
decision
making
§ Used
at
all
levels
of
the
Organiza>on
§ Tailored
to
specific
business
areas
§ Interac>ve
§ Ad
Hoc
queries
to
retrieve
and
display
informa>on
§ Combines
historical
opera>on
data
with
business
ac>vi>es
4
Components
of
DSS
1.
Data
Store
–
The
DSS
Database
– Business
Data
– Business
Model
Data
– Internal
and
External
Data
2.
Data
Extrac>on
and
Filtering
– Extract
and
validate
data
from
the
opera>onal
database
and
the
external
data
sources
4
Components
of
DSS
3.
End-‐User
Query
Tools
– Create
Queries
that
access
either
the
Opera>onal
or
the
DSS
database
4.
End-‐User
Presenta>on
Tools
– Organize
and
Present
the
Data
Data Store Data Extraction/ End-User Query End-User
(Business Data) Filtering Tools Presentation Tools
Differences
with
DSS
§ Opera>onal
– Stored
in
Normalized
Rela>onal
Database
– Support
transac>ons
that
represent
daily
opera>ons
(Not
Query
Friendly)
§ 3
Main
Differences
– Time
Span
– Granularity
– Dimensionality
Time Span
§ Opera>onal
– Real
Time
– Current
Transac>ons
– Short
Time
Frame
– Specific
Data
Facts
§ DSS
– Historic
– Long
Time
Frame
(Months/Quarters/Years)
– Pa_erns
Granularity
§ Opera>onal
– Specific
Transac>ons
that
occur
at
a
given
>me
§ DSS
– Shown
at
different
levels
of
aggrega>on
– Different
Summary
Levels
– Decompose
(drill
down)
– Summarize
(roll
up)
Dimensionality
§ Most
dis>nguishing
characteris>c
of
DSS
data
§ Opera>onal
– Represents
atomic
transac>ons
§ DSS
– Data
is
related
in
many
ways
– Develop
the
larger
picture
– Mul>-‐dimensional
view
of
data
Data Cube
DSS
Database
Requirements
§ DSS
Database
Scheme
– Support
Complex
and
Non-‐Normalized
data
§ Summarized
and
Aggregate
data
§ Mul>ple
Rela>onships
§ Queries
must
extract
mul>-‐dimensional
>me
slices
§ Redundant
Data
Non-‐Normalized
Data
DSS
Database
Requirements
§ Data
Extrac>on
and
Filtering
– DSS
databases
are
created
mainly
by
extrac>ng
data
from
opera>onal
databases
combined
with
data
imported
from
external
source
§ Need
for
advanced
data
extrac>on
&
filtering
tools
§ Allow
batch
/
scheduled
data
extrac>on
§ Support
different
types
of
data
sources
§ Check
for
inconsistent
data
/
data
valida>on
rules
§ Support
advanced
data
integra>on
/
data
formaang
conflicts
DSS
Database
Requirements
§ End-‐User
Analy>cal
Interface
– Must
support
advanced
data
modeling
and
data
presenta>on
tools
– Data
analysis
tools
– Query
genera>on
– Must
Allow
the
User
to
Navigate
through
the
DSS
§ Size
Requirements
– VERY
Large
–
Terabytes
– Advanced
Hardware
(Mul>ple
processors,
mul>ple
disk
arrays,
etc.)
Very Large Databases
Very Large Databases
Data
Warehouse
§ DSS
–
friendly
data
repository
for
the
DSS
is
the
DATA
WAREHOUSE
Fact
Table
Dimensions
§ Qualifying
characteris>cs
that
provide
addi>onal
perspec>ves
to
a
given
fact
– DSS
data
is
almost
always
viewed
in
rela>on
to
other
data
§ Dimensions
are
normally
stored
in
dimension
tables
Star
Schema
for
Sales
Members
Dimension
Tables
Fact
Table
A_ributes
§ Dimension
Tables
contain
A_ributes
§ A_ributes
are
used
to
search,
filter,
or
classify
facts
§ Dimensions
provide
descrip>ve
characteris>cs
about
the
facts
through
their
a_ributed
§ Must
define
common
business
a_ributes
that
will
be
used
to
narrow
a
search,
group
informa>on,
or
describe
dimensions.
(ex.:
Time
/
Loca>on
/
Product)
§ No
mathema>cal
limit
to
the
number
of
dimensions
(3-‐D
makes
it
easy
to
model)
A_ribute
Hierarchies
§ Provides
a
Top-‐Down
data
organiza>on
– Aggrega>on
– Drill-‐down
/
Roll-‐Up
data
analysis
§ A_ributes
from
different
dimensions
can
be
grouped
to
form
a
hierarchy
Hierarchy: TIME
TIME
Star
Schema
for
Sales
Members
Dimension
Tables
Fact
Table
Star
Schema
Representa>on
§ Fact
and
Dimensions
are
represented
by
physical
tables
in
the
data
warehouse
database
§ Fact
tables
are
related
to
each
dimension
table
in
a
Many
to
One
rela>onship
(Primary/Foreign
Key
Rela>onships)
§ Fact
Table
is
related
to
many
dimension
tables
– The
primary
key
of
the
fact
table
is
a
composite
primary
key
from
the
dimension
tables
§ Each
fact
table
is
designed
to
answer
a
specific
DSS
ques>on
Star
Schema
§ The
fact
table
is
always
the
largest
table
in
the
star
schema
§ Each
dimension
record
is
related
to
thousand
of
fact
records
§ Star
Schema
facilitated
data
retrieval
func>ons
§ DBMS
first
searches
the
Dimension
Tables
before
the
larger
fact
table
Database Management System
Data
Warehouse
Implementa>on
§ An
Ac>ve
Decision
Support
Framework
– Not
a
Sta>c
Database
– Always
a
Work
in
Process
– Complete
Infrastructure
for
Company-‐Wide
decision
support
– Hardware
/
Sonware
/
People
/
Procedures
/
Data
– Data
Warehouse
is
a
cri>cal
component
of
the
Modern
DSS
–
But
not
the
Only
cri>cal
component
How many dimensions in this representation?
Inmon & Kimball
- The Two Pioneers