100% found this document useful (1 vote)
140 views96 pages

Unit-I DW - Architecture

The document provides an overview of data warehousing and data mining, including definitions of key concepts such as data warehousing, data warehouse, and OLAP. It discusses the benefits of data warehousing such as improved decision making and competitive advantage. Additionally, it contrasts data warehouses with operational databases and OLTP systems in terms of design, data contents, access patterns, and users.

Uploaded by

Harish Babu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
140 views96 pages

Unit-I DW - Architecture

The document provides an overview of data warehousing and data mining, including definitions of key concepts such as data warehousing, data warehouse, and OLAP. It discusses the benefits of data warehousing such as improved decision making and competitive advantage. Additionally, it contrasts data warehouses with operational databases and OLTP systems in terms of design, data contents, access patterns, and users.

Uploaded by

Harish Babu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 96

DATA WAREHOUSING & DATA MINING

by:
Prof. Asha Ambhaikar

1
UNIT-I

Overview and Concepts

2
Contents of Unit-I

 Need for data warehousing, Basic elements of


data warehousing, Trends in data warehousing.
Planning
 And Requirements: Project planning and
management, Collecting the requirements.
Architecture And
 Infrastructure: Architectural components,
Infrastructure and metadata

3
Text Books:

 Prabhu,Data ware housing- concepts,


Techniques, Products and Applications, Prentice
hall of India
 Soman K P, “Insight into Data Mining: Theory &
Pratice” , Prentice hall of India
 M.H. Dunham, “Data Mining Introductory and
Advanced Topics”, Pearson Education.

4
Reference Books:

 Paulraj Ponniah, “Data Warehousing


Fundamentals”, John Wiley.
 Arun K. Pujari, “Data mining Techniques”,
Universities Press.
 Ralph Kimball, “The Data Warehouse Lifecycle
toolkit”, John Wiley.
 IBM, “Introduction to Building The Data
warehouse” PHI

5
What is Data Warehousing?

A process of transforming
Information data into information
and making it available to
users in a timely enough
manner to make a
difference

Data

6
Data Warehousing --
It is a process
 Technique for assembling and
managing data from various
sources for the purpose of
answering business
questions. Thus making
decisions that were not previous
possible
 A decision support database
maintained separately from the
organization’s operational
database
7
What is Data Warehouse?

 “A data warehouse is a subject-


oriented, integrated, time-
variant, and nonvolatile collection
of data in support of management’s
decision-making process.”—W. H.
Inmon
 Data warehousing:
 The process of constructing and
using data warehouses
8
Data Warehouse—Subject-Oriented

 Organized around major subjects, such as customer,


product, sales.
 Focusing on the modeling and analysis of data for
decision makers, not on daily operations or transaction
processing.
 Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process.

9
Data Warehouse—Integrated
 Constructed by integrating multiple,
heterogeneous data sources
 relational databases, flat files, on-line transaction
records
 Data cleaning and data integration techniques
are applied.
 Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different
data sources
 E.g., Hotel price: currency, tax, breakfast covered, etc.
 When data is moved to the warehouse, it is
converted.

10
Data Warehouse—Time Variant

 The time horizon for the data warehouse is significantly


longer than that of operational systems.
 Operational database: current value data.
 Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
 Every key structure in the data warehouse
 Contains an element of time, explicitly or implicitly
 But the key of operational data may or may not
contain “time element”.

11
Data Warehouse—Non-Volatile

 A physically separate store of data transformed from


the operational environment.
 Operational update of data does not occur in the
data warehouse environment.
 Does not require transaction processing, recovery,
and concurrency control mechanisms
 Requires only two operations in data accessing:
 initial loading of data and access of data.

12
Very Large Data Bases
 Terabytes -- 10^12 bytes: Walmart-- 24 Terabytes

 Petabytes -- 10^15 bytes: Geographic Information


Systems
 Exabytes -- 10^18 bytes: National Medical Records

 Zettabytes -- 10^21 bytes: Weather images

 Zottabytes -- 10^24 bytes: Intelligence Agency Videos

13
Data Warehousing
 Physical separation of operational and decision
support environments
 Purpose: to establish a data repository making
operational data accessible
 Transforms operational data to relational form
 Only data needed for decision support come from the
TPS
 Data are transformed and integrated into a
consistent structure
 Data warehousing (information warehousing): solves
the data access problem
 End users perform ad hoc query, reporting analysis
and visualization
14
Evolution of Data Warehouse

15
Data Warehouse vs. Heterogeneous DBMS

 Traditional heterogeneous DB integration:


 Build on top of heterogeneous databases
 Query driven approach
 When a query is posed to a client site, a meta-dictionary is
used to translate the query into queries appropriate for
individual heterogeneous sites involved, and the results are
integrated into a global answer set
 Complex information filtering, compete for resources
 Data warehouse: update-driven, high performance
 Information from heterogeneous sources is integrated in
advance and stored in warehouses for direct query and
analysis

16
Benefits of Data warehouse

 Better Information
 Better Strategies and plans
 Better tactics and decisions
 More efficient processed
 Time saving
 Reduction in paper reporting

17
Data Warehousing Benefits

 Increase in knowledge worker productivity


 Supports all decision makers’ data requirements
 Provide ready access to critical data
 Insulates operation databases from ad hoc
processing
 Provides high-level summary information
 Provides drill down capabilities
Yields
 Improved business knowledge
 Competitive advantage
 Enhances customer service and satisfaction
 Facilitates decision making
 Help streamline business processes

18
Benefits of DW
 Executives, managers and staff are provided
with improved access to data from many
databases with in the organization.
 Manager manage with the data they want
rather than the data they get.
 Less time spent gathering data from various
systems and more time available to analyze
and act.
 Ability to quickly answer a series of questions,
each of which depends upon the answer to the
previous question. (in a sec or min)
19
Data Warehouse vs. Operational DBMS
 OLTP (on-line transaction processing)
 Major task of traditional relational DBMS
 Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
 OLAP (on-line analytical processing)
 Major task of data warehouse system
 Data analysis and decision making
 Distinct features (OLTP vs. OLAP):
 User and system orientation: customer vs. market
 Data contents: current, detailed vs. historical, consolidated
 Database design: ER + application vs. star + subject
 View: current, local vs. evolutionary, integrated
 Access patterns: update vs. read-only but complex queries
20
OLTP vs. Data Warehouse

 OLTP systems are tuned for known transactions


and workloads while workload is not known a
priori in a data warehouse
 Special data organization, access methods and
implementation methods are needed to support
data warehouse queries (typically
multidimensional queries)

21
OLTP vs Data Warehouse
 OLTP  Warehouse (DSS)
 Application Oriented  Subject Oriented
 Used to run business  Used to analyze business
 Detailed data  Summarized and refined
 Current up to date  Snapshot data
 Isolated Data  Integrated Data
 Repetitive access  Ad-hoc access
 Clerical User  Knowledge User
(Manager)

22
OLTP vs Data Warehouse

 OLTP  Data Warehouse


 Performance Sensitive  Performance relaxed
 Few Records accessed at a  Large volumes accessed at
time (tens) a time(millions)
 Mostly Read (Batch
 Read/Update Access Update)
 Redundancy present
 No data redundancy  Database Size 100
 Database Size 100MB - GB - few terabytes
100 GB

23
OLTP vs Data Warehouse

 OLTP  Data Warehouse


 Transaction  Query throughput is
throughput is the the performance
performance metric metric
 Thousands of users  Hundreds of users
 Managed in  Managed by subsets
entirety(whole)

24
To summarize ...

 OLTP Systems are


used to “run” a
business

 The Data Warehouse


helps to “optimize” the
business

25
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

26
The Goals of a Data Warehouse

27
Goals of Data Warehouse
Makes an organization’s information accessible.

Makes the organization’s information consistent.

Is an adaptive and durable source of information

Is a secure support that protects the


organization’s information asset.

Is the foundation for decision making

28
Needs for Data Warehousing

29
Why We need Separate Data Warehouse?

 missing data: Decision support requires


historical data which operational DBs do not
typically maintain
 data consolidation: Decision Support requires
consolidation (aggregation, summarization) of
data from heterogeneous sources
 data quality: different sources typically use
inconsistent data representations, codes and
formats which have to be reconciled

30
Trends in Data Warehouse

31
Three Complementary Trends
 Data Warehousing
_ Consolidate data from many sources in one large repository.
– Loading, periodic synchronization of replicas.
– Semantic integration.
 OLAP:
– Complex SQL queries and views.
– Queries based on spreadsheet-style operations and
multidimensional” view of data.
 – Interactive and “online” queries.
 Data Mining
_ Exploratory search for interesting
trends and anomalies.

32
Architecture and Infrastructure

33
Design of a Data Warehouse: A
Business Analysis Framework
 Four views regarding the design of a data warehouse
 Top-down view
 allows selection of the relevant information necessary for the
data warehouse
 Data source view
 exposes the information being captured, stored, and
managed by operational systems
 Data warehouse view
 consists of fact tables and dimension tables
 Business query view
 sees the perspectives of data in the warehouse from the view
of end-user
34
 Basic Elements of Data Warehouse

35
Basic Elements of a Data Warehouse

 Source System
 Staging Area
 Presentation Area
 End User Data Access Tools
 Metadata

36
Basic Elements of Data Warehouse

Relational Optimized Loader


Databases Extraction
Transform
Cleansing
Analyze
Data Warehouse Query
Legacy Engine Reporting &
Data Data Mining
Tools

Purchased
Data
Metadata Repository
37
Data Warehouse Architecture
Monitor
& OLAP Server
other Metadata
sources Integrator

Analysis tools
Operational Extract Query &
Transform Data Serve Reporting tools
DBs
Load
Refresh
Warehouse and
Data mining
tools

Data Marts
Middle Layer Top Layer
Bottom Layer
Data Sources
Data Storage OLAP Engine Front-End Tools 38
Working of Data Warehouse

Bottom Layer:
 The bottom layer is a DW database servers
that is almost always a relational database
system
 Data from operational databases and
external sources are extracted using
application program interfaces known as
gateways
 It is supported by primary system
39
cont….
 It has repository that is metadata (data about data)
 Which is responsible for extracting the information
from DW according to the queries given by the end
users
 Metadata is the bridge between DW and the DSS
 It provides logical linkage between data and
application
 Metadata can pinpoint access to information across
the entire DW.

40
Middle Layer:

 The middle layer consists of OLAP server


 OLAP means On Line Analytical
Processing
 It is used to perform analysis on data
and transform it in to useful information
for decision making
 OLAP is a continuously iterative process
 OLAP servers are implemented by either
ROLAP,MOLAP or HOLAP
41
Cont..

TOP Layer:
 The top layer is a client
 That is the end user
 It consists of
1.query and reporting tools
2.Analysis tools and
3. Data Mining Tools
 It acts as an interface between the user
and the server
42
Cont..

 This layer takes queries from the users


 And then send it to the servers
 Receiving information records back and
 Gives them as output to the end users.
 Eg. Analysis of weather forecasting,
predictions and so on.

43
Principles of Dimensional Modeling

44
Multidimensional Data Model

 Collection of numeric measures, which depend


on a set of dimensions.
– E.g., measure Sales, dimensions
Product (key: pid), Location (locid)
and Time (timeid).

45
Multidimensional Data Models

 A data warehouse is based on a multidimensional data model which


views data in the form of a data cube
 A data cube, such as sales, allows data to be modeled and viewed
in multiple dimensions
 Dimension tables, such as item (item_name, brand, type), or
time(day, week, month, quarter, year)
 Fact table contains measures (such as dollars_sold) and keys to
each of the related dimension tables
 In data warehousing literature, an n-D base cube is called a base
cuboid. The top most 0-D cuboid, which holds the highest-level of
summarization, is called the apex cuboid. The lattice of
cuboids forms a data cube.
46
Cuboids Corresponding to the Cube

all
0-D(apex) cuboid
product date country
1-D cuboids

product,date product,country date, country


2-D cuboids

3-D(base) cuboid
product, date, country

47
Multidimensionality
 3-D + Spreadsheets (OLAP has this)
 Data can be organized the way managers like to see
them, rather than the way that the system analysts do
 Different presentations of the same data can be
arranged easily and quickly

 Dimensions: products, salespeople, market segments,


business units, geographical locations, distribution
channels, country, or industry
 Measures: money, sales volume, head count, inventory
profit, actual versus forecast
 Time: daily, weekly, monthly, quarterly, or yearly

48
Multidimensional Data
 Sales volume as a function of product, month,
and region
Dimensions: Product, Location, Time
Hierarchical summarization paths

Industry Region Year

Category Country Quarter


Product

Product City Month Week

Office Day

Month
49
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR

Country
sum
Canada

Mexico

sum

50
Browsing a Data Cube

 Visualization
 OLAP capabilities
 Interactive manipulation
51
OLAP Operations
 OLAP means On Line Analytical Processing.
 It is used to perform analysis on data and

transform it into information for decision


making purpose.
 OLAP is a continuous iterative process.

 A common operation is to aggregate a measure

over one or more dimensions.


– Find total sales.
– Find total sales for each city, or for each state.
– Find top five products ranked by total sales.
52
Typical OLAP Operations

 Roll up (drill-up): summarize data

 Drill down (roll down): reverse of roll-up

 Slice and dice: project and select

 Pivot : rotate

53
Roll-up and Drill Down
Higher Level of
Aggregation
 Sales Channel

 Region

 Country

 State

 Location Address

 Sales Representative

Low-level
Details
54
Slicing and Dicing

Product The Telecomm Slice

Household

Telecomm

Video Europe
Far East
Audio India

Retail Direct Special Sales Channel


55
A Visual Operation: Pivot
(Rotate)

Juice
Cola 10

Milk 47

Crea 30
m 12 Product

3/1 3/2 3/3 3/4


Date
56
Typical OLAP Operations

 Roll up (drill-up): summarize data


 by climbing up hierarchy or by dimension reduction
 This operation performs aggregation on the data cube,either by
climbing up a concept of hierarchy for a dimension or by
dimension reduction.
 When roll up is performed by dimension reduction, one or more
dimensions are removed from the given cube.
 Drill down (roll down): reverse of roll-up
 from higher level summary to lower level summary or detailed
data, or introducing new dimensions
 It navigates from less detailed data to more detailed data.
 This can be realized by either stepping down a concept
hierarchy for a dimension or introducing additional dimensions.
57
Cont…..
 Slice and dice:
 project and select
 The slice operation performs a selection on one
dimension of the given cube resulting in a sub cube
 The dice operation defines a sub cube by performing
a selection on two or more dimensions
 Pivot (rotate):
 It is visualization operation that rotates the data
axes in new view in order to provide an alternative
presentation of the data.
 reorient the cube, visualization, 3D to series of 2D
planes. 58
Cont…..

 Other operations
 drill across: Executes queries involving
(across) more than one fact table
 drill through: Operation uses relational SQL
facilities to drill through the bottom level of
the data cube to its back-end relational
tables

59
Physical Design Process

60
Stars, Snowflakes & fact Constellations:

 Multidimensional model can exit in the form of a star schema, a


Snowflake schema or a fact Constellation (collection) Schema
 Star schema: In star schema a data warehouse contains:
 A large central table (fact table) containing the bulk of the
data with no redundancy
 a set of dimension tables one for each dimensions
 Snowflake schema:
 A snowflake schema is a refinement of the star schema,
 where some dimension tables are normalized by splitting the
data into additional tables.

61
Cont….

 The difference between the snowflake and star schema


model is that the dimension tables of the snowflake
model can be kept in a normalized form to reduce
redundancy.
 Fact constellations:
 Multiple fact tables share dimension tables,
 viewed as a collection of stars,
 therefore called galaxy schema or fact constellation

62
Fact Table

 Central table
 mostly raw numeric items

 narrow rows, a few columns at most

 large number of rows (millions to a billion)

 Access via dimensions

63
Star Schema
 A single fact table and for each dimension
one dimension table
 Does not capture hierarchies directly
T
p
date, custno, prodno, cityname, r
i ...
m
e
f o
d
a
c
c c
u t i
s t
t y
64
Snowflake schema
 Represent dimensional hierarchy directly
by normalizing tables.
 Easy to maintain and saves storage
date, custno, prodno, cityname, ...
Time Prod
f
a
c
Cust t Region
city

65
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures

66
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country

67
Example of Fact constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter item_key
time_key type
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city units_shipped
province_or_street
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
shipper_type 68
A Concept Hierarchy: Dimension (location)

all all

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

69
OLAP Is FASMI

 Fast
 Analysis to
 Share
 Multidimensional
 Information

70
Types of OLAP Servers

ROLAP SERVERS:
 Relational On Line Analytical Processing are
intermediate servers which lies between a
relational back end server and client front
end tools.
 They uses a relational DBMS to storage and
manage data
 ROLAP servers support multidimensional
views of data

71
Cont…
 Multidimensional OLAP (MOLAP) Servers
 These servers support multidimensional views of data

 Array-based multidimensional storage engine

 fast indexing to pre-computed summarized data

 Hybrid OLAP (HOLAP)


 HOLAP is a combination of ROLAP and MOLAP

 It for User flexibility, e.g., low level: relational, high-

level: array
 Specialized SQL servers
 specialized support for SQL queries over

star/snowflake schemas

72
From the Data Warehouse to Data Marts

Information
Individually
Structured
Less

Departmentally History
Structured
Normalized
Detailed

Organizationally
Structured Data Warehouse More

Data
73
Data Warehouse and Data Marts
OLAP
Data Mart Data Mart
Lightly summarized
Departmentally structured

Data Warehouse
Organizationally structured
Atomic
Detailed Data Warehouse Data

74
Characteristics of Data Mart

 Data Marts are the subset of


Data Mart DW.
 Data marts has OLAP
 It is smaller than data
warehouse
 It contains information from a
single department of a business
or organization
 It is Flexible
Data Warehouse  Customized by Department
 Source is departmentally
structured data warehouse
75
True Warehouse

Data Sources

Data Warehouse

Data Marts

76
Data Warehouse Back-End Tools and
Utilities (ETL Tool)
 Data extraction:
 get data from multiple, heterogeneous, and external
sources
 Data cleaning:
 detect errors in the data and rectify them when
possible
 Data transformation:
 convert data from legacy or host format to warehouse
format
 Load:
 sort, summarize, consolidate, compute views, check
integrity, and build indices and partitions
 Refresh
 propagate the updates from the data sources to the
warehouse
77
Components of Data Warehouse
Reporting, query,EIS
tools
Operational highly
Data source Meta Data Summarized
Data

Operational
Lightly
Data Source
Summarized OLAP tool
data
Operational
Data Source
Detailed data
Data Mining
End-Users
Operational Tool
Data Source

Operational Data Store Archive/backup data


(OSD)
78
Data Warehouse Components

 Operational Data Sources


 Operational Data Store (ODS )
 Load Manager
 Warehouse Manager
 Query Manager

79
1.Operational Data Sources

 Operational Data sources for the DW is supplies


from mainframe.
 Operational data held is first generation,
hierarchical and network database.
 departmental data, private data from
workstations, servers and external
system such as internet.
 commercially available DB or DB associated
with the organizations, suppliers or customers.

80
2. Operational Data Stores (ODS)

 Operational Data Store is a repository of


current and integrated operational data
used for analysis.
 It is often structured and supplied with data in
the same way as the data warehouse.
 But in fact it simply act as a staging area for
data to be moved in to warehouse.

81
3. Load Manager

 Load Manager is called the backend component


 It performs all the operations associated with
the extraction and loading of the data in to
the warehouse.
 These operation includes simple
transformation of the data to prepare the
data for entry in to warehouse.

82
4. Warehouse Manager
 Warehouse Manager performs all the
operations associated with the management of
the data in the warehouse.
 The operation performed by the component
includes
 Analysis of the data to ensure consistency

 Transformation and merging of source data

 Creation of indexes and views

 Archiving and backing-up of data

83
5. Query Manager

 Query Manager is called front end component.


 It performs all the operation associated with
the management of user queries.
 The operation performed by this component
includes…
 directing queries to the appropriate tables

and scheduling the execution of queries.


 Detailed, lightly and highly summarized data,

archive/backup data.
84
Cont…

 Metadata
 End-user access tools:
 It can be categories in to five main groups

1. Data reporting and query tools


2. Application development tools
3. Executive information System(EIS) tools
4. Online Analytical Processing(OLAP) tools &
5. Data Mining Tools

85
Data Flow

 Inflow: It is the process associated with the


extraction, cleaning and loading of the
data from the source systems in to the
warehouse.
 Up flow: The process associated with adding
value to the data in the warehouse through
summarizing, packaging and backing up
of data in the warehouse.

86
Cont…

 Down Flow: The process associated with


archiving and backing up of data in the
warehouse.
 Out Flow: The process associated with making
the data available to the end-users.

 Meta Flow: The process associated with the


management of the metadata.

87
Detailed Data

 It stores all the detailed data in the database


schema.
 In most cases, the detailed data is not stored
online but aggregated to the next level of
detail.
 On regular basis, detailed data is added to the
warehouse to supplement the aggregated data.

88
Lightly and Highly Summarized Data
 It stores all the pre defined lightly and highly
aggregated data generated by the warehouse manager.
 Transient as it will be subject to change on a ongoing
basis in order to respond to changing query
profiles.
 The purpose of summary information is to….
 Speed up the performance of queries.

 Removes the requirement to continuously perform

summary operations such as sort or group by in


answering user queries.
 The summary data is updated continuously as new

data is loaded in to warehouse.


89
Archive/Backup Data

 It stores detailed and summarized data for the


purpose of archiving and backup.
 May be necessary to backup online summary of
data, if this data is kept beyond the retention
period for detailed data.
 The data is transferred to storage archives such
as magnetic tape or optical disk.

90
Meta data
 The area of the warehouse stores all the metadata(data
about data) definitions used by all the processes in the
warehouse.
 It is used for variety of purposes….
 Extraction and loading process: Meta data is

used to map data sources to common view of


information with in the warehouse.
 Warehouse management process: Meta data is

used to automate the production of summary tables.


 Query management process: Meta data is used

to direct a query to the most appropriate data


source.
91
End User Access Tools

 High performance is achieved by pre-planning


the requirements for joins, summarizations and
periodic reports by end-users.
 There are five main groups of access tools….
 Data reporting and query tools

 Application development tool

 Executive Information System Tools

 On line Analytical System(OLAP) Tools

 Data Mining Tools.

92
Tools used for Data Warehouse

Most popular tools of DW are


 Informatica Tool

 Cognos Tool

 Business Intelligence Tool

 EIS

 DSS

 OLAP

 Multidimensional Analysis Tool

93
Data Warehouse Usage
 Three kinds of data warehouse applications
 Information processing
 supports querying, basic statistical analysis, and
reporting using crosstabs, tables, charts and graphs
 Analytical processing
 multidimensional analysis of data warehouse data
 supports basic OLAP operations, slice-dice, drilling, pivoting
 Data mining
 knowledge discovery from hidden patterns
 supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools.

94
Summary
 Data warehouse
 A subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-
making process
 A multi-dimensional model of a data warehouse
 Star schema, snowflake schema, fact constellations

 A data cube consists of dimensions & measures


 OLAP operations: drilling, rolling, slicing, dicing and pivoting
 OLAP servers: ROLAP, MOLAP, HOLAP

95
Important Questions
 What is Data Warehouse? Explain in detail.
 Draw and explain the Data Warehouse Architecture.
 Explain Data warehouse component with suitable
diagram.
 What is OLAP? Explain OLAP operations along with its
types.
 Explain Star Schema and snowflake Schema.
 What is multidimensional data model? Explain with neat
diagram.
 Compare the OLTP and OLAP.
 What do you mean by project planning and
requirement? Explain how it is necessary in DW.
 Explain the role of Project Management. 96

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy