AIMP339 Material 2
AIMP339 Material 2
What historical reason for data warehouse development does not primarily involve
technology?
This question is highly relevant today, because the success of data warehouse deployment
depends on this organizational capability.
You have three learning objects in this lesson today. You should be able to discuss
historical factors
that drove the development of data warehouse deployment and technology.
In your own words, you should be able to explain characteristics of data warehouses.
To clarify your understanding of operational databases and data warehouses, you should be
able to explain several differences between the two types of databases.
The databases support decision making in organizations. The traditional decision
making hierarchy depicts management levels and volumes of decisions at each level.
Two Types of Databases
Operational databases serve lower level decision making. In the early days of relational
databases, it was assumed that operational databases would also support higher levels
of decision making.
Technology and Deployment Limitations
Lack of
integration
Missing
Performance
DBMS
limitations
features
Data
warehouse
technology
and
deployments
This difficulty spurred the development of data warehouse technology
and deployment of data warehouses starting in the mid 1990s.
The failure of operational databases to support high level decision making is a
combination of inadequacy of database technology, and limitations in
deployment of databases.
Product vendors discovered that lack key features to support summery data
and analytical calculations vital for business intelligence processing.
The SQL group by clause was inadequate to specify queries involving summery
data.
The select statement in SQL did not have any features for analytical
calculations such as moving averages.
Performance limitation
- Performance problems with a separate database for both transaction processing and
business intelligence decision making
- Never solved. Use a separate database
The lack of integration was not a design failure.
Lack of integration
-Most important issue; highlight with pen
-Management issue
-Lack of integration with transaction databases and external data
sources
-Add value: integrate, standardize, clean, and summarize both internal
and external data sources
Characteristics
- Subject-oriented: Organized around business entities (e.g., customers,
products, and employees) rather than business processes
- Integrated: many transformations to unify source data from independent
data sources (units of measure, data formats, naming conventions)
- Time-variant: historical data (time stamped); snapshots of business
processes captured at different points in time
- Nonvolatile: new data are appended periodically; existing data is not
changed; warehouse data may be archived after its usefulness declines
Comparison of Processing Environments
Transaction
processing
• Primary data from
transactions
• Daily operations and
short term decisions
Business intelligence
processing
• Transformed secondary
data
• Medium and long-term
decisions
Transaction processing uses primary data from large
volumes of transactions to support daily operations and
short-term decision making of an organization.
Business intelligence:
- Integrated and standardized data: difficult to directly use operational
data
- Substantial processing for transformations and integration
- Value for decision making results from standardizing and integrating data
across organizational units and external sources
- Decisions: broad view of customers, products, production, marketing for
capacity planning, store locations, new lines of business, …
A. Data warehouse
B. Business intelligence
C. Operational database
D. Integrated database
Data Comparison
Characteristic Operational Data Warehouse
Database
Currency Current Historical
Details level Individual Individual and summary
Orientation Process Subject
Records per Few Thousands
request
Normalization Mostly normalized Normalization relaxed
level
Update level Highly volatile Mostly refreshed (non volatile)
Data model Relational Relational (star schemas) and
multidimensional (data cubes)
Operational databases largely contain current data at the individual level,
while data warehouses have historical data at both the individual and summarized
levels.
While a business intelligence application may use monthly sales over a period of
several years.
Operational databases primarily use the relational data model, while data
warehouses use star schema patterns of tables as well as a multidimensional
data model.
In contrast, a data warehouse will typically just show one level detail
and no many to many relationships.
You will learn details about schema patterns for data warehouses in Lesson
three.