CH08 DSS Turban Data Warehouse
CH08 DSS Turban Data Warehouse
Chapter 8:
Data Warehousing
Learning Objectives
• Understand the basic definitions and concepts
of data warehouses
• Understand data warehousing architectures
• Describe the processes used in developing and
managing data warehouses
• Explain data warehousing operations
• Explain the role of data warehouses in decision
support
Learning Objectives
• Explain data integration and the
extraction, transformation, and load (ETL)
processes
• Describe real-time (active) data
warehousing
• Understand data warehouse
administration and security issues
Main Data Warehousing
(DW) Topics
• DW definitions
• Characteristics of DW
• Data Marts
• ODS, EDW, Metadata
• DW Framework
• DW Architecture & ETL Process
• DW Development
• DW Issues
Data Warehousing
Definitions and Concepts
• Data warehouse
A physical repository where relational data are
specially organized to provide enterprise-wide,
cleansed data in a standardized format
Repository of current and historical data
Data are structured to be available in the form
ready for analytical processing activities [e.g.,
OLAP, data mining, querying, reporting ]
A subject oriented, integrated, time-variant,
non-volatile collection of data for supporting
management’s decision making process.
Data Warehousing
Definitions and Concepts
A subject oriented, integrated, time-variant,
non-volatile collection of data for supporting
management’s decision making process.
“The data warehouse is a collection of
integrated, subject-oriented databases
designed to support DSS functions, where each
unit of data is non-volatile and relevant to some
moment in time”
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
Subject oriented
Data are organized by detailed subject, such as
sales, products, or customers, containing only
information relevant for decision support.
Integrated
Data warehouses must place data from different
sources into a consistent format.
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
Time variant (time series)
A warehouse maintains historical data. They detect
trends, deviations, and long-term relationships for
forecasting and comparisons, leading to decision
making. Time is the one important dimension that all
data warehouses must support. Data for analysis
from multiple sources contains multiple time points
(e.g., daily, weekly, monthly views).
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
Nonvolatile
After data are entered into a data warehouse, users
cannot change or update the data.
Web based
Data warehouses are typically designed to provide
an efficient computing environment for Web-based
applications.
Relational/multidimensional
A data warehouse uses either a relational structure
or a multidimensional structure.
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
Client/server
A data warehouse uses the client/server architecture
to provide easy access for end users.
Real-time
Newer data warehouses provide real-time, or active,
data-access and analysis capabilities
Include metadata
Data warehouse contains metadata (data about
data) about how the data are organized and how to
effectively use them.
Data Warehousing
Definitions and Concepts
• There are three main types of data
warehouses:
1. Data marts
2. Operational data stores (ODS)
3. Enterprise data warehouse (EDW)
Data Warehousing
Definitions and Concepts
• Data mart
A departmental data warehouse that stores
only relevant data
Subset of DW (single subject area)
Dependent data mart
A subset that is created directly from a data
warehouse
Independent data mart
• A small data warehouse designed for a strategic
business unit or a department but its source is not
an EDW
Data Warehousing
Definitions and Concepts
• Operational data stores (ODS)
A type of database often used as an interim area for a
data warehouse.
Recent form of customer info file (CIF)
Used for short term decision involving mission-critical
application
Short term memory: store recent information. In
comparison DW is long term memory which store
permanent information
• Oper marts
An operational data mart. It is created when
operational data need to analyzed multidimensionally.
Data comes from ODS
Data Warehousing Definitions
and Concepts
• Enterprise data warehouse (EDW)
– Large-scale data warehouse that is used
across the enterprise for decision support
– Used to provide data for many types of
DSS:
• Customer Relation Management (CRM)
• Supply-Chain Management (SCM)
• Business Performance management BPM)
• Business activity monitor (BAM)
• Product lifecycle monitoring (PLM)
• Revenue management
• Knowledge management System (KMS)
Data Warehousing Definitions
and Concepts
• Metadata
– Data about data. In a data warehouse,
metadata describe the contents/structure
of a data warehouse and the manner of its
use.
– Pattern view of meta data:
• Syntactic metadata
• Structural metadata
• Semantic metadata
Data Warehousing
Process Overview
• Organizations continuously collect data,
information, and knowledge at an
increasingly accelerated rate and store
them in computerized systems
• The number of users needing to access the
information continues to increase as a
result of improved reliability and availability
of network access, especially the Internet
Data Warehousing
Process Overview
• Organizations need to create data
warehouses – massive data stores of time
series data for decision support.
• Data are imported form various internal and
external sources, cleansed and organized
in a manner consentient with the
organization’s need.
• After populating data in data warehouse ,
data mart can be loaded for any
department
Data Warehousing
Process Overview
No data marts option
Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL
Reporting
Process Data mart
(Marketing)
Select
/ Middleware
Legacy Metadata Data/text
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
API
Data mart Dashboard,
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data
Web pages
Application
Server
Client Web
(Web browser) Internet/ Server
Intranet/
Extranet
Data
warehouse
Packaged Transient
application data source
Data
warehouse
Legacy
Extract Transform Cleanse Load
system
Data mart
Other internal
applications
Web Mining
x1
w1 Y1
.
. Summation
Transfer
.
Function
wn Yn
xn
x2 Weighted Transfer
(PE) Sum Function
Y1
x3 (S) (f)
(PE)
(PE) (PE)
Output
(PE)
Layer
Hidden
(PE)
Layer
Input
Layer
Questions / Answers
Basic Concepts
of Expert Systems (ES)
• Is a computer program that attempts to
imitate expert’s reasoning processes and
knowledge in solving specific problems
• Most Popular Applied AI Technology
– Enhance Productivity
– Augment Work Forces
• Works best with narrow problem areas/tasks
• Expert systems do not replace experts, but
– Make their knowledge and experience more widely
available, and thus
– Permit non-experts to work better
Big Data - Definition
• Big [volume] Data is not new!
• Big Data means different things to people with
different backgrounds and interests
• Traditionally, “Big Data” = massive volumes of data
– E.g., volume of data at CERN, NASA, Google, …
• Where does the Big Data come from?
– Everywhere! Web logs, RFID, GPS systems, sensor
networks, social networks, Internet-based text documents,
Internet search indexes, detail call records, astronomy,
atmospheric science, biology, genomics, nuclear physics,
biochemical experiments, medical records, scientific
research, military surveillance, multimedia archives, …
The Data Size Is Getting Big,
Bigger…
• Hadron Collider - 1 Names for Big Data Sizes
PB/sec
• Boeing jet - 20 TB/hr
• Facebook - 500 TB/day.
• YouTube – 1 TB/4 min.
• The proposed Square
Kilometer Array
telescope (the world’s
proposed biggest
telescope) – 1 EB/day
Link: https://www.geeksinphoenix.com/blog/post/2016/11/08/what-is-a-bit-
what-is-a-byte.aspx
Big Data -
Definition and Concepts
• Big Data is a misnomer!
• Big Data is more than just “big”
• The Vs that define Big Data
– Volume
– Variety
– Velocity
– Veracity
– Variability
– Value
– …
– Link: https://www.xsnet.com/blog/bid/205405/the-v-s-of-big-
data-velocity-volume-value-variety-and-veracity
Big Data Analytics
• Big Data by itself, regardless of the size,
type, or speed, is worthless
• Big Data + “big” analytics = value
• With the value proposition, Big Data also
brought about big challenges
– Effectively and efficiently capturing, storing,
and analyzing Big Data
– New breed of technologies needed
(developed (or purchased or hired or
outsourced …)
Big Data Analytics
• Big data analytics is the often complex
process of examining large and varied
data sets -- or big data -- to uncover
information including hidden patterns,
unknown correlations, market trends and
customer preferences that can help
organizations make informed business
decisions.
Big Data Analytics
• Big Data analytics is the process of
collecting, organizing and analyzing large
sets of data (called Big Data) to discover
patterns and other useful information.
• Big Data analytics can help organizations to
better understand the information contained
within the data and will also help identify the
data that is most important to the business
and future business decisions.
• Analysts working with Big Data typically
want the knowledge that comes from
analyzing the data.
Cloud Computing and BI
• Cloud Computing
– A style of computing in which dynamically scalable
and often virtualized resources are provided over the
Internet
– Users need not have knowledge of, experience in, or
control over the technology infrastructures in the
cloud that supports them
– Related terms: utility computing, application service
provider grid computing, on-demand computing,
software as a service (SaaS)
– Cloud = Internet
Cloud Computing and BI
• Fragments of cloud computing
– Infrastructure as a service (IaaS)
– Platforms as a service (PaaS)
– Software as a service (SaaS)
– Data as a service (DaaS)
• Example: Web-based e-mail and Google Docs
• Cloud-computing service providers
– Salesforce.com, IBM, Sun Microsystems, Microsoft
(Azure), Google, and Yahoo!
Cloud Computing and BI
• Cloud computing, like many other IT trends,
has resulted in new offerings in business
intelligence
– Cloud-based data warehousing (by 1010data,
LogiXML, Lucid Era)
– Cloud-based dashboard and data management tools
(by Elastra, Rightscale)
• Advantage: rapid diffusion, cutting-edge
technology, less investment,…
• Concerns: loss of control and privacy, legal
liabilities, cross-border political issues, …
Machine Learning
• Machine learning (ML) is a family of artificial
intelligence technologies that is primarily
concerned with the design and development of
algorithms that allow computers to “learn” from
historical data
– ML is the process by which a computer learns from
experience
– It differs from knowledge acquisition in ES: instead
of relying on experts (and their willingness) ML relies
on historical facts
– ML helps in discovering patterns in data
Intelligent Agents
• Intelligent Agent (IA): is an autonomous computer
program that observes and acts upon an environment
and directs its activity toward achieving specific goals
• Relatively new technology
• Other names include
– Software agents
– Wizards
– Knowbots
– Intelligent software robots (Softbots)
– Bots
See
www.sensenetworks.com/city
sense.php for real-time
animation of the content
Recommendation Engines
• People rely on recommendations by others
– Success for retailer line Amazon.com
• Recommender systems
– Web-based information filtering system that
takes the inputs from users and then aggregates
the inputs to provide recommendations for other
users in their product or service selection
choices
• Data
– Structured ratings/rankings
– Unstructured textual comments
Major Components of
Service-Oriented DSS/BI
• Analytics-as-a-Service (AaaS)
– “Agile Analytics”
– AaaS in the cloud has economies of scale,
better scalability, and higher cost savings
– Data/Text Mining + Big Data Cloud
Computing
• Storage and access to Big Data
• Massively Parallel Processing
• In-memory processing
• In-database processing
• Resource polling, scaling, cost and time saving, …
Speech Analytics
• Speech analytics – analysis of voice
– Content versus other Voice Features
• Two Approaches
– The Acoustic Approach
• Intensity, Pitch, Jitter, Shimmer, etc.
– The Linguistic Approach
• Lexical: words, phrases, etc.
• Disfluencies: filled pauses, hesitation, restarts, etc.
• Higher semantics: taxonomy/ontology, pragmatics
• Many uses and use cases exist