0% found this document useful (0 votes)
40 views85 pages

2 Data Warehousing

The document discusses databases and data warehousing. It explains that data warehouses store large amounts of historical data from multiple sources to support decision making. Data warehouses integrate data from across an organization, are organized by subject area, contain time-varying data, and data is never removed.

Uploaded by

Wafa Benzaoui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views85 pages

2 Data Warehousing

The document discusses databases and data warehousing. It explains that data warehouses store large amounts of historical data from multiple sources to support decision making. Data warehouses integrate data from across an organization, are organized by subject area, contain time-varying data, and data is never removed.

Uploaded by

Wafa Benzaoui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Data

 Warehousing  
 
Yeow  Wei  Choong  
Anne  Laurent  
 
Databases  
§ Databases  are  developed  on  the  IDEA  that  DATA  
is  one  of  the  cri>cal  materials  of  the  Informa>on  
Age  
§ Informa>on,  which  is  created  by  data,  becomes  
the  bases  for  decision  making  
Decision  Support  Systems  
§ Created  to  facilitate  the  decision  making  
process  
§ So  much  informa>on  that  it  is  difficult  to  extract  
it  all  from  a  tradi>onal  database  
§ Need  for  a  more  comprehensive  data  storage  
facility  
– Data  Warehouse  
Decision  Support  Systems  
§ Extract  Informa>on  from  data  to  use  as  the  
basis  for  decision  making  
§ Used  at  all  levels  of  the  Organiza>on  
§ Tailored  to  specific  business  areas  
§ Interac>ve  
§ Ad  Hoc  queries  to  retrieve  and  display  
informa>on  
§ Combines  historical  opera>on  data  with  
business  ac>vi>es  
4  Components  of  DSS  
1.  Data  Store  –  The  DSS  Database  
– Business  Data  
– Business  Model  Data  
– Internal  and  External  Data  
2.  Data  Extrac>on  and  Filtering  
– Extract  and  validate  data  from  the  opera>onal  
database  and  the  external  data  sources  
4  Components  of  DSS  
3.  End-­‐User  Query  Tools  
– Create  Queries  that  access  either  the  Opera>onal  or  
the  DSS  database  
4.  End-­‐User  Presenta>on  Tools  
– Organize  and  Present  the  Data  
Data Store Data Extraction/ End-User Query End-User
(Business Data) Filtering Tools Presentation Tools
Differences  with  DSS  
§ Opera>onal  
– Stored  in  Normalized  Rela>onal  Database  
– Support  transac>ons  that  represent  daily  opera>ons  
(Not  Query  Friendly)  
§ 3  Main  Differences  
– Time  Span  
– Granularity  
– Dimensionality  
Time Span
§ Opera>onal  
– Real  Time  
– Current  Transac>ons  
– Short  Time  Frame  
– Specific  Data  Facts  
§ DSS  
– Historic  
– Long  Time  Frame  (Months/Quarters/Years)  
– Pa_erns  
Granularity  
§ Opera>onal  
– Specific  Transac>ons  that  occur  at  a  given  >me  
§ DSS  
– Shown  at  different  levels  of  aggrega>on  
– Different  Summary  Levels  
– Decompose  (drill  down)  
– Summarize  (roll  up)  
Dimensionality  
§ Most  dis>nguishing  characteris>c  of  DSS  data  
§ Opera>onal  
– Represents  atomic  transac>ons  
§ DSS  
– Data  is  related  in  many  ways  
– Develop  the  larger  picture  
– Mul>-­‐dimensional  view  of  data  
Data Cube
DSS  Database  Requirements  
§ DSS  Database  Scheme  
– Support  Complex  and  Non-­‐Normalized  data  
§ Summarized  and  Aggregate  data  
§ Mul>ple  Rela>onships  
§ Queries  must  extract  mul>-­‐dimensional  >me  slices  
§ Redundant  Data  
Non-­‐Normalized  Data  
DSS  Database  Requirements  
§ Data  Extrac>on  and  Filtering  
– DSS  databases  are  created  mainly  by  extrac>ng  data  from  
opera>onal  databases  combined  with  data  imported  from  
external  source  
§ Need  for  advanced  data  extrac>on  &  filtering  tools  
§ Allow  batch  /  scheduled  data  extrac>on  
§ Support  different  types  of  data  sources  
§ Check  for  inconsistent  data  /  data  valida>on  rules  
§ Support  advanced  data  integra>on  /  data  formaang  conflicts  
DSS  Database  Requirements  
§ End-­‐User  Analy>cal  Interface  
– Must  support  advanced  data  modeling  and  data  presenta>on  
tools  
– Data  analysis  tools  
– Query  genera>on  
– Must  Allow  the  User  to  Navigate  through  the  DSS  
§ Size  Requirements  
– VERY  Large  –  Terabytes  
– Advanced  Hardware  (Mul>ple  processors,  mul>ple  disk  
arrays,  etc.)  
Very Large Databases
Very Large Databases
Data  Warehouse  
§ DSS  –  friendly  data  repository  for  the  DSS  is  the  
DATA  WAREHOUSE  

§ Defini>on:    Integrated,  Subject-­‐Oriented,  Time-­‐


Variant,  Nonvola>le  database  that  provides  
support  for  decision  making  
Data Warehouse
Integrated  
§ The  data  warehouse  is  a  centralized,  
consolidated  database  that  integrated  data  
derived  from  the  en>re  organiza>on  
– Mul>ple  Sources  
– Diverse  Sources  
– Diverse  Formats  
Subject-­‐Oriented  
§ Data  is  arranged  and  op>mized  to  provide  
answer  to  ques>ons  from  diverse  func>onal  
areas  
– Data  is  organized  and  summarized  by  topic  
§ Sales  /  Marke>ng  /  Finance  /  Distribu>on  /  Etc.  
Time-­‐Variant  
§ The  Data  Warehouse  represents  the  flow  of  
data  through  >me  
§ Can  contain  projected  data  from  sta>s>cal  
models  
§ Data  is  periodically  uploaded  then  >me-­‐
dependent  data  is  recomputed  
Nonvola>le  
§ Once  data  is  entered  it  is  NEVER  removed  
§ Represents  the  company’s  en>re  history  
– Near  term  history  is  con>nually  added  to  it  
– Always  growing  
– Must  support  terabyte  databases  and  
mul>processors  
§ Read-­‐Only  database  for  data  analysis  and  query  
processing  
Data  Marts  
§ Small  Data  Stores  
§ More  manageable  data  sets  
§ Targeted  to  meet  the  needs  of  small  groups  
within  the  organiza>on  
§ Small,  Single-­‐Subject  data  warehouse  subset  
that  provides  decision  support  to  a  small  group  
of  people  
Data  Marts
Data  Marts  
OLAP  
§ Online  Analy>cal  Processing  Tools  
§ DSS  tools  that  use  mul>dimensional  data  
analysis  techniques  
– Support  for  a  DSS  data  store  
– Data  extrac>on  and  integra>on  filter  
– Specialized  presenta>on  interface  
12  Rules  of  a  Data  Warehouse  
1. Data  Warehouse  and  Opera>onal  
Environments  are  Separated  
2. Data  is  integrated  
3. Contains  historical  data  over  a  long  period  of  
>me  
4. Data  is  a  snapshot  data  captured  at  a  given  
point  in  >me  
5. Data  is  subject-­‐oriented  
12  Rules  of  Data  Warehouse  
6. Mainly  read-­‐only  with  periodic  batch  updates  
7. Development  Life  Cycle  has  a  data  driven  
approach  versus  the  tradi>onal  process-­‐driven  
approach  
8. Data  contains  several  levels  of  detail  
6. Current,  Old,  Lightly  Summarized,  Highly  
Summarized  
12  Rules  of  Data  Warehouse  
9. Environment  is  characterized  by  Read-­‐only  
transac>ons  to  very  large  data  sets  
10. System  that  traces  data  sources,  transforma>ons,  and  
storage  
11. Metadata  is  a  cri>cal  component  
Source,  transforma>on,  integra>on,  storage,  rela>onships,  
history,  etc  
12. Contains  a  chargeback  mechanism  for  resource  usage  
that  enforces  op>mal  use  of  data  by  end  users  
OLAP  
§ Need  for  More  Intensive  Decision  Support  
§ 4  Main  Characteris>cs  
– Mul>dimensional  data  analysis  
– Advanced  Database  Support  
– Easy-­‐to-­‐use  end-­‐user  interfaces  
– Support  Client/Server  architecture  
Mul>dimensional  Data  Analysis  
Techniques  
§ Advanced  Data  Presenta>on  Func>ons  
– 3-­‐D  graphics,  Pivot  Tables,  Crosstabs,  etc.  
– Compa>ble  with  Spreadsheets  &  Sta>s>cal  packages  
– Advanced  data  aggrega>ons,  consolida>on  and  
classifica>on  across  >me  dimensions  
– Advanced  computa>onal  func>ons  
– Advanced  data  modeling  func>ons  
Advanced  Database  Support  
§ Advanced  Data  Access  Features  
– Access  to  many  kinds  of  DBMS’s,  flat  files,  and  
internal  and  external  data  sources  
– Access  to  aggregated  data  warehouse  data  
– Advanced  data  naviga>on  (drill-­‐downs  and  roll-­‐ups)  
– Ability  to  map  end-­‐user  requests  to  the  appropriate  
data  source  
– Support  for  Very  Large  Databases  
Drill-down
Easy-­‐to-­‐Use  End-­‐User  Interface  
§ Graphical  User  Interfaces  (GUIs)  
§ Much  more  useful  if  access  is  kept  simple  
Client/Server  Architecture  
§ Framework  for  the  new  systems  to  be  designed,  
developed  and  implemented  
§ Divide  the  OLAP  system  into  several  
components  that  define  its  architecture  
– Same  Computer  
– Distributed  among  several  computer  
OLAP  Architecture  
§ 3  Main  Modules  
– GUI  
– Analy>cal  Processing  Logic  
– Data-­‐processing  Logic  
OLAP  Client/Server  
Architecture  
Rela>onal  OLAP  
§ Rela>onal  Online  Analy>cal  Processing  
– OLAP  func>onality  using  rela>onal  database  and  
familiar  query  tools  to  store  and  analyze  
mul>dimensional  data  
§ Mul>dimensional  data  schema  support  
§ Data  access  language  &  query  performance  for  
mul>dimensional  data  
§ Support  for  Very  Large  Databases  
Mul>dimensional  Data  Schema  
Support  
§ Decision  Support  Data  tends  to  be  
– Nonnormalized  
– Duplicated  
– Preaggregated  
§ Star  Schema  
– Special  Design  technique  for  mul>dimensional  data  
representa>ons  
– Op>mize  data  query  opera>ons  instead  of  data  
update  opera>ons  
Star  Schemas  
§ Data  Modeling  Technique  to  map  
mul>dimensional  decision  support  data  into  a  
rela>onal  database  
§ Current  Rela>onal  modeling  techniques  do  not  
serve  the  needs  of  advanced  data  requirements  
Star  Schema  
§ 4  Components  
– Facts  
– Dimensions  
– A_ributes  
– A_ribute  Hierarchies  
 
Star  Schema  
§ 4  Components  
– Facts  
– Dimensions  
– A_ributes  
– A_ribute  Hierarchies  
 
Star  Schema  
§ 4  Components  
– Facts  
– Dimensions  
– A_ributes  
– A_ribute  Hierarchies  
 
Facts  
§ Numeric  measurements  (values)  that  represent  
a  specific  business  aspect  or  ac>vity  
§ Stored  in  a  fact  table  at  the  center  of  the  star  
scheme  
§ Contains  facts  that  are  linked  through  their  
dimensions  
§ Can  be  computed  or  derived  at  run  >me  
§ Updated  periodically  with  data  from  opera>onal  
databases  
Star  Schema  for  Sales  
Dimension  
Tables  

Fact  Table  
Dimensions  
§ Qualifying  characteris>cs  that  provide  addi>onal  
perspec>ves  to  a  given  fact  
– DSS  data  is  almost  always  viewed  in  rela>on  to  
other  data  
§ Dimensions  are  normally  stored  in  dimension  
tables  
Star  Schema  for  Sales  

Members
Dimension  
Tables  

Fact  Table  
A_ributes  
§ Dimension  Tables  contain  A_ributes  
§ A_ributes  are  used  to  search,  filter,  or  classify  
facts  
§ Dimensions  provide  descrip>ve  characteris>cs  
about  the  facts  through  their  a_ributed  
§ Must  define  common  business  a_ributes  that  
will  be  used  to  narrow  a  search,  group  
informa>on,  or  describe  dimensions.  (ex.:  
Time  /  Loca>on  /  Product)  
§ No  mathema>cal  limit  to  the  number  of  
dimensions  (3-­‐D  makes  it  easy  to  model)  
A_ribute  Hierarchies  
§ Provides  a  Top-­‐Down  data  organiza>on  
– Aggrega>on  
– Drill-­‐down  /  Roll-­‐Up  data  analysis  
§ A_ributes  from  different  dimensions  can  be  
grouped  to  form  a  hierarchy  
Hierarchy: TIME

TIME
Star  Schema  for  Sales  

Members
Dimension  
Tables  

Fact  Table  
Star  Schema  Representa>on  
§ Fact  and  Dimensions  are  represented  by  physical  tables  
in  the  data  warehouse  database  
§ Fact  tables  are  related  to  each  dimension  table  in  a  
Many  to  One  rela>onship  (Primary/Foreign  Key  
Rela>onships)  
§ Fact  Table  is  related  to  many  dimension  tables  
– The  primary  key  of  the  fact  table  is  a  composite  primary  key  
from  the  dimension  tables  
§ Each  fact  table  is  designed  to  answer  a  specific  DSS  
ques>on  
Star  Schema  
§ The  fact  table  is  always  the  largest  table  in  the  
star  schema  
§ Each  dimension  record  is  related  to  thousand  of  
fact  records  
§ Star  Schema  facilitated  data  retrieval  func>ons  
§ DBMS  first  searches  the  Dimension  Tables  
before  the  larger  fact  table  
Database Management System
Data  Warehouse  Implementa>on  
§ An  Ac>ve  Decision  Support  Framework  
– Not  a  Sta>c  Database  
– Always  a  Work  in  Process  
– Complete  Infrastructure  for  Company-­‐Wide  decision  
support  
– Hardware  /  Sonware  /  People  /  Procedures  /  Data  
– Data  Warehouse  is  a  cri>cal  component  of  the  
Modern  DSS  –  But  not  the  Only  cri>cal  component  
How many dimensions in this representation?
Inmon & Kimball
- The Two Pioneers

William Inmon Ralph Kimball


The Books
Exercise  1  
§ What  is  the  difference  between  a  database  and  
a  data  warehouse?  
§ Describe  the  followings:  
– Data  Marts  
– Granularity  
§ Draw  an  example  of  a  Star  Schema  
Exercise  2  
§ Match  the  typical  OLAP  opera>ons:  
– Roll  up  (drill-­‐up)  
– Drill  down  (roll  up)  
– Slice  and  dice  
– Pivot  (Rotate)  
– Drill  across  
– Drill  through  
Exercise  3  

1. What are the main dimensions of this fact table?


2. How are these dimensions linked to the fact table?
3. How can these operations permit fast queries,
reports and data consolidations to be carried out?
4. From this star schema, construct an example of a query
END

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy