DWDM Unit-2
DWDM Unit-2
Data Warehouse:
A decision support database that is maintained separately from
the organization’s operational database
Support information processing by providing a solid platform of
consolidated, historical data for analysis.
Data warehousing provides architectures and tools for business
executives to systematically organize, understand, and use their data
to make strategic decisions.
Data warehouse systems are valuable tools in today’s
competitive, fast-evolving world. In the last several years, many firms
have spent millions of dollars in building enterprise-wide data
warehouses.
Data Warehouse subject-oriented:
Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing.
Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision support
process.
Data Warehouse integrated:
Constructed by integrating multiple, heterogeneous data sources
▪ relational databases, flat files, on-line transaction records
▪ Data cleaning and data integration techniques are applied.
▪ Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
▪ When data is moved to the warehouse, it is converted.
Data Warehouse time-variant:
The time horizon for the data warehouse is significantly longer than
that of operational systems
▪ Operational database: current value data
▪ Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
▪ Contains an element of time, explicitly or implicitly
▪ But the key of operational data may or may not contain
“time element”.
Data Warehouse non-volatile:
▪ A physically separate store of data transformed from the
operational environment
▪ Operational update of data does not occur in the data warehouse
environment
▪ Does not require transaction processing, recovery, and
concurrency control mechanisms
▪ Requires only two operations in data accessing:
▪ initial loading of data and access of data.
Why a Separate Data Warehouse?
▪ High performance for both systems
▪ Business data
▪ Top-down view
▪ allows selection of the relevant information necessary
for the data warehouse
▪ Data source view
▪ exposes the information being captured, stored, and
managed by operational systems
▪ Data warehouse view
▪ consists of fact tables and dimension tables
▪ Information processing
▪ The i-th bit is set if the i-th row of the base table has the value
for the indexed column
▪ not suitable for high cardinality domains
Attribute-Oriented Induction:
▪ Proposed in 1989 (KDD ‘89 workshop)
▪ Data generalization
▪ Presentation of data summarization at multiple levels of
abstraction
▪ Interactive drilling, pivoting, slicing and dicing
▪ Differences:
▪ OLAP has systematic preprocessing, query independent,
and can drill down to rather low level
▪ AOI has automated desired level allocation, and may
perform dimension relevance analysis/ranking when there
are many relevant dimensions
▪ AOI works on the data which are not in relational forms