0% found this document useful (0 votes)
199 views65 pages

CH08 DSS Turban Data Warehouse

The document discusses data warehousing concepts including definitions of data warehouses, characteristics, types of data warehouses like data marts, operational data stores, and enterprise data warehouses. It also describes the data warehousing process including data sources, extraction, loading, the comprehensive database, metadata, and middleware tools. The learning objectives cover understanding data warehouses, architectures, development processes, operations, and roles in decision support.

Uploaded by

Saiful Islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views65 pages

CH08 DSS Turban Data Warehouse

The document discusses data warehousing concepts including definitions of data warehouses, characteristics, types of data warehouses like data marts, operational data stores, and enterprise data warehouses. It also describes the data warehousing process including data sources, extraction, loading, the comprehensive database, metadata, and middleware tools. The learning objectives cover understanding data warehouses, architectures, development processes, operations, and roles in decision support.

Uploaded by

Saiful Islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 65

Business Intelligence and

Decision Support Systems


(9th Ed., Prentice Hall)

Chapter 8:
Data Warehousing
Learning Objectives
• Understand the basic definitions and concepts
of data warehouses
• Understand data warehousing architectures
• Describe the processes used in developing and
managing data warehouses
• Explain data warehousing operations
• Explain the role of data warehouses in decision
support
Learning Objectives
• Explain data integration and the
extraction, transformation, and load (ETL)
processes
• Describe real-time (active) data
warehousing
• Understand data warehouse
administration and security issues
Main Data Warehousing
(DW) Topics
• DW definitions
• Characteristics of DW
• Data Marts
• ODS, EDW, Metadata
• DW Framework
• DW Architecture & ETL Process
• DW Development
• DW Issues
Data Warehousing
Definitions and Concepts
• Data warehouse
 A physical repository where relational data are
specially organized to provide enterprise-wide,
cleansed data in a standardized format
 Repository of current and historical data
 Data are structured to be available in the form
ready for analytical processing activities [e.g.,
OLAP, data mining, querying, reporting ]
 A subject oriented, integrated, time-variant,
non-volatile collection of data for supporting
management’s decision making process.
Data Warehousing
Definitions and Concepts
 A subject oriented, integrated, time-variant,
non-volatile collection of data for supporting
management’s decision making process.
 “The data warehouse is a collection of
integrated, subject-oriented databases
designed to support DSS functions, where each
unit of data is non-volatile and relevant to some
moment in time”
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
 Subject oriented
 Data are organized by detailed subject, such as
sales, products, or customers, containing only
information relevant for decision support.
 Integrated
 Data warehouses must place data from different
sources into a consistent format.
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
 Time variant (time series)
 A warehouse maintains historical data. They detect
trends, deviations, and long-term relationships for
forecasting and comparisons, leading to decision
making. Time is the one important dimension that all
data warehouses must support. Data for analysis
from multiple sources contains multiple time points
(e.g., daily, weekly, monthly views).
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
 Nonvolatile
 After data are entered into a data warehouse, users
cannot change or update the data.
 Web based
 Data warehouses are typically designed to provide
an efficient computing environment for Web-based
applications.
 Relational/multidimensional
 A data warehouse uses either a relational structure
or a multidimensional structure.
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
 Client/server
 A data warehouse uses the client/server architecture
to provide easy access for end users.
 Real-time
 Newer data warehouses provide real-time, or active,
data-access and analysis capabilities
 Include metadata
 Data warehouse contains metadata (data about
data) about how the data are organized and how to
effectively use them.
Data Warehousing
Definitions and Concepts
• There are three main types of data
warehouses:
1. Data marts
2. Operational data stores (ODS)
3. Enterprise data warehouse (EDW)
Data Warehousing
Definitions and Concepts
• Data mart
 A departmental data warehouse that stores
only relevant data
 Subset of DW (single subject area)
 Dependent data mart
 A subset that is created directly from a data
warehouse
 Independent data mart
• A small data warehouse designed for a strategic
business unit or a department but its source is not
an EDW
Data Warehousing
Definitions and Concepts
• Operational data stores (ODS)
 A type of database often used as an interim area for a
data warehouse.
 Recent form of customer info file (CIF)
 Used for short term decision involving mission-critical
application
 Short term memory: store recent information. In
comparison DW is long term memory which store
permanent information
• Oper marts
 An operational data mart. It is created when
operational data need to analyzed multidimensionally.
 Data comes from ODS
Data Warehousing Definitions
and Concepts
• Enterprise data warehouse (EDW)
– Large-scale data warehouse that is used
across the enterprise for decision support
– Used to provide data for many types of
DSS:
• Customer Relation Management (CRM)
• Supply-Chain Management (SCM)
• Business Performance management BPM)
• Business activity monitor (BAM)
• Product lifecycle monitoring (PLM)
• Revenue management
• Knowledge management System (KMS)
Data Warehousing Definitions
and Concepts
• Metadata
– Data about data. In a data warehouse,
metadata describe the contents/structure
of a data warehouse and the manner of its
use.
– Pattern view of meta data:
• Syntactic metadata
• Structural metadata
• Semantic metadata
Data Warehousing
Process Overview
• Organizations continuously collect data,
information, and knowledge at an
increasingly accelerated rate and store
them in computerized systems
• The number of users needing to access the
information continues to increase as a
result of improved reliability and availability
of network access, especially the Internet
Data Warehousing
Process Overview
• Organizations need to create data
warehouses – massive data stores of time
series data for decision support.
• Data are imported form various internal and
external sources, cleansed and organized
in a manner consentient with the
organization’s need.
• After populating data in data warehouse ,
data mart can be loaded for any
department
Data Warehousing
Process Overview
No data marts option
Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL
Reporting
Process Data mart
(Marketing)
Select

/ Middleware
Legacy Metadata Data/text
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate

API
Data mart Dashboard,
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data

Figure: A Conceptual Framework for DW


Data Warehousing
Process Overview
• The major components of a data
warehousing process
 Data sources
 Data extraction
 Data loading
 Comprehensive database
 Metadata
 Middleware tools
Data Warehousing
Process Overview
• Data sources
 Multiple independent operational legacy system
 From external data provider (Census)
 From OLTP or enterprise resource planning (ERP)
 Web data.
• Data extraction:
 Extracted by custom-written or commercial
software
• Data loading
 Loaded into staging area where transformed and
cleansed
 Then data are ready to load in DW
Data Warehousing
Process Overview
• Comprehensive database
 This is EDW to support all decision analysis by
providing relevant summarized and detailed information
from different sources.
• Metadata
 Includes software programs about data and rules for
organizing data summaries that are easy to index or
search, with web tools
• Middleware tools
 Enables access to DW
 May use SQL queries,
 Front end application
 Ex: data reporting, data mining, OLAP, reporting tools, data
visualizing tool
Data Warehousing
Architectures

Web pages
Application
Server

Client Web
(Web browser) Internet/ Server
Intranet/
Extranet
Data
warehouse

Figure: Architecture of web-based data warehousing


Data Integration and the
Extraction, Transformation,
and Load (ETL) Process
• Extraction, transformation, and load (ETL)
TH ETL process consists of:
– Extraction:
• reading data from one or more databases
– Transformation:
• converting the extracted data from its previous form
into the form in which it needs to be so that it can be
placed into a data warehouse or simply another
database, and
– Load:
• putting the data into the data warehouse
Data Integration and the
Extraction, Transformation,
and Load (ETL) Process

Packaged Transient
application data source

Data
warehouse

Legacy
Extract Transform Cleanse Load
system

Data mart
Other internal
applications

Figure 5.8: The ETL Process


Real-Time Data Warehousing
• Real-time /active data warehousing
 The process of loading and providing data via a
data warehouse as they become available
 Enabling real-time data updates for real-time
analysis and real-time decision making
 Overstock.com (online retailer)
 Egg Plc ( Online bank)
 Real time Data Warehousing: (video link)
 http://link.brightcove.com/services/player/bcpid19
02159189001?bckey=AQ~~,AAABuqDKQ7k~,E8y
DudMzkZ0kxUW-
0IKpRnpU6ERPDumo&bctid=1932510719001
Definition of Data Mining
• Data mining: a misnomer?
• The nontrivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns in data stored in
structured databases. - Fayyad et al., (1996)

• Keywords in this definition: Process, nontrivial,


valid, novel, potentially useful, understandable.
• Other names: knowledge extraction, pattern
analysis, knowledge discovery, information
harvesting, pattern searching, data dredging,…
Data Mining Concepts
and Applications
• Data mining (DM)
– A process that uses statistical, mathematical,
artificial intelligence and machine-learning
techniques to extract and identify useful
information and subsequent knowledge from
large databases
– Patterns can be rules, affinities, correlation,
trends, or prediction models
– Knowledge discovery in database (KDD)
– Process of finding mathematical patterns from
large set of data
Association Rule Mining
• A very popular DM method in business
• Finds interesting relationships (affinities)
between variables (items or events)
• Part of machine learning family
• Employs unsupervised learning
• There is no output variable
• Also known as market basket analysis
• Often used as an example to describe DM to
ordinary people, such as the famous
“relationship between diapers and beers!”
Association Rule Mining
• Input: the simple point-of-sale transaction data
• Output: Most frequent affinities among items
• Example: according to the transaction data…
“Customer who bought a laptop computer and a virus
protection software, also bought extended service plan
70 percent of the time."
• How do you use such a pattern/knowledge?
– Put the items next to each other for ease of finding
– Promote the items as a package (do not put one on sale if the
other(s) are on sale)
– Place items apart from each other so that the customer has to
walk the aisles to search for it, and by doing so potentially
seeing and buying other items
Association Rule Mining
• A representative applications of association
rule mining include
– In business: cross-marketing, cross-selling, store
design, catalog design, e-commerce site design,
optimization of online advertising, product pricing,
and sales/promotion configuration
– In medicine: relationships between symptoms and
illnesses; diagnosis and patient characteristics and
treatments (to be used in medical DSS); and genes
and their functions (to be used in genomics
projects)…
Data Mining Software Tools
• Commercial
– SPSS - PASW (formerly Clementine)
– SAS - Enterprise Miner
– IBM - Intelligent Miner
– StatSoft – Statistical Data Miner
– … many more
• Free and/or Open Source
– Weka
– RapidMiner…
• Table 5.6 : Selected Data Mining software
Text Mining Concepts
• 85-90 percent of all corporate data is in some
kind of unstructured form (e.g., text)
• Unstructured corporate data is doubling in
size every 18 months
• Tapping into these information sources is not
an option, but a need to stay competitive
• Answer: text mining
– A semi-automated process of extracting
knowledge from unstructured data sources
– a.k.a. text data mining or knowledge discovery in
textual databases
Data Mining versus Text
Mining
• Both seek for novel and useful patterns
• Both are semi-automated processes
• Difference is the nature of the data:
– Structured versus unstructured data
– Structured data: in databases
– Unstructured data: Word documents, PDF
files, text excerpts, XML files, and so on
• Text mining – first, impose structure to
the data, then mine the structured data
Web Mining
• Web mining (or Web data mining) is the
process of discovering intrinsic relationships
from Web data (textual, linkage, or usage)

Web Mining

Web Content Mining Web Structure Mining Web Usage Mining


Source: unstructured Source: the unified Source: the detailed
textual content of the resource locator (URL) description of a Web
Web pages (usually in links contained in the site’s visits (sequence
HTML format) Web pages of clicks by sessions)
Web Content/Structure
Mining
• Web Content Mining
– Mining of the textual content on the Web
– Data collection via Web crawlers
– Web pages include hyperlinks
• Authoritative pages
• Hubs
• hyperlink-induced topic search (HITS)
• Web structure mining
– The development of useful information from the
links included in the Web documents.
• Popularity of a document
Web Usage Mining
• Extraction of information from data generated
through Web page visits and transactions…
– data stored in server access logs, referrer logs,
agent logs, and client-side cookies
– user characteristics and usage profiles
– metadata, such as page attributes, content
attributes, and usage data
• Clickstream data
• Clickstream analysis
Basic Concepts
of Neural Networks
• Neural networks (NN)
 NN represent brain metaphor for information
processing
 These models are biologically inspired rather than
an exact replica of how the brain actually functions.
Basic Concepts
of Neural Networks
• Neural computing
– Refers to a pattern recognition methodology
for machine learning
– An experimental computer design aimed at
building intelligent computers that operate in
a manner modeled on the functioning of the
human brain.
Basic Concepts
of Neural Networks
• Artificial neural network (ANN)
 The resulting model from neural computing is
called an artificial neural network (ANN) or neural
network
 Computer technology that attempts to build
computers that will operate like a human brain.
 This machines possess simultaneous memory
storage and works with ambiguous information
Biological Neural Networks

• Two interconnected brain cells (neurons)


Processing Information in
ANN
Inputs Weights Outputs

x1
w1 Y1

x2 w2 Neuron (or PE) f (S )


. S  
n
X iW
Y
. Y2
. i 1
i

.
. Summation
Transfer
.
Function
wn Yn
xn

• A single neuron (processing element – PE)


with inputs and outputs
Processing Information in
ANN
• Biological and artificial neural networks
 Main processing elements for a neural network are
individual neurons
 Artificial neuron receive the sum “information” from
other neurons
 Perform a transformation on the inputs and then pass
on the transformed information to other neurons
Basic Concepts
of Neural Networks
x1 (PE)

x2 Weighted Transfer
(PE) Sum Function
Y1
x3 (S) (f)

(PE)

(PE) (PE)

Output
(PE)
Layer

Hidden
(PE)
Layer

Input
Layer

Figure: Neural Network with One Hidden Layer


Concepts and Definitions
of Artificial Intelligence
• Artificial intelligence (AI) definitions
– Artificial intelligence (AI)
 The subfield of computer science concerned with symbolic
reasoning and problem solving
 Two basic ideas:
 It involves studying the thought process of humans ( to
understand what intelligence is )
 It deals with representing and duplicating those process
via machine
– AI has many definitions…
 Behavior by a machine that, if performed by a human being,
would be considered intelligent
 “…study of how to make computers do things at which, at the
moment, people are better
Concepts and Definitions
of Artificial Intelligence
• Artificial intelligence (AI) definitions
– Turing test
 A test designed to measure the “intelligence”
of a computer
 A computer can be considered to be smart
only when a human interviewer, “conversing”
with both an unseen human being and an
unseen computer, can not determine which is
which.
- Alan Turing
Test for Intelligence

Questions / Answers
Basic Concepts
of Expert Systems (ES)
• Is a computer program that attempts to
imitate expert’s reasoning processes and
knowledge in solving specific problems
• Most Popular Applied AI Technology
– Enhance Productivity
– Augment Work Forces
• Works best with narrow problem areas/tasks
• Expert systems do not replace experts, but
– Make their knowledge and experience more widely
available, and thus
– Permit non-experts to work better
Big Data - Definition
• Big [volume] Data is not new!
• Big Data means different things to people with
different backgrounds and interests
• Traditionally, “Big Data” = massive volumes of data
– E.g., volume of data at CERN, NASA, Google, …
• Where does the Big Data come from?
– Everywhere! Web logs, RFID, GPS systems, sensor
networks, social networks, Internet-based text documents,
Internet search indexes, detail call records, astronomy,
atmospheric science, biology, genomics, nuclear physics,
biochemical experiments, medical records, scientific
research, military surveillance, multimedia archives, …
The Data Size Is Getting Big,
Bigger…
• Hadron Collider - 1 Names for Big Data Sizes
PB/sec
• Boeing jet - 20 TB/hr
• Facebook - 500 TB/day.
• YouTube – 1 TB/4 min.
• The proposed Square
Kilometer Array
telescope (the world’s
proposed biggest
telescope) – 1 EB/day

Link: https://www.geeksinphoenix.com/blog/post/2016/11/08/what-is-a-bit-
what-is-a-byte.aspx
Big Data -
Definition and Concepts
• Big Data is a misnomer!
• Big Data is more than just “big”
• The Vs that define Big Data
– Volume
– Variety
– Velocity
– Veracity
– Variability
– Value
– …
– Link: https://www.xsnet.com/blog/bid/205405/the-v-s-of-big-
data-velocity-volume-value-variety-and-veracity
Big Data Analytics
• Big Data by itself, regardless of the size,
type, or speed, is worthless
• Big Data + “big” analytics = value
• With the value proposition, Big Data also
brought about big challenges
– Effectively and efficiently capturing, storing,
and analyzing Big Data
– New breed of technologies needed
(developed (or purchased or hired or
outsourced …)
Big Data Analytics
• Big data analytics is the often complex
process of examining large and varied
data sets -- or big data -- to uncover
information including hidden patterns,
unknown correlations, market trends and
customer preferences that can help
organizations make informed business
decisions.
Big Data Analytics
• Big Data analytics is the process of
collecting, organizing and analyzing large
sets of data (called Big Data) to discover
patterns and other useful information.
• Big Data analytics can help organizations to
better understand the information contained
within the data and will also help identify the
data that is most important to the business
and future business decisions.
• Analysts working with Big Data typically
want the knowledge that comes from
analyzing the data.
Cloud Computing and BI
• Cloud Computing
– A style of computing in which dynamically scalable
and often virtualized resources are provided over the
Internet
– Users need not have knowledge of, experience in, or
control over the technology infrastructures in the
cloud that supports them
– Related terms: utility computing, application service
provider grid computing, on-demand computing,
software as a service (SaaS)
– Cloud = Internet
Cloud Computing and BI
• Fragments of cloud computing
– Infrastructure as a service (IaaS)
– Platforms as a service (PaaS)
– Software as a service (SaaS)
– Data as a service (DaaS)
• Example: Web-based e-mail and Google Docs
• Cloud-computing service providers
– Salesforce.com, IBM, Sun Microsystems, Microsoft
(Azure), Google, and Yahoo!
Cloud Computing and BI
• Cloud computing, like many other IT trends,
has resulted in new offerings in business
intelligence
– Cloud-based data warehousing (by 1010data,
LogiXML, Lucid Era)
– Cloud-based dashboard and data management tools
(by Elastra, Rightscale)
• Advantage: rapid diffusion, cutting-edge
technology, less investment,…
• Concerns: loss of control and privacy, legal
liabilities, cross-border political issues, …
Machine Learning
• Machine learning (ML) is a family of artificial
intelligence technologies that is primarily
concerned with the design and development of
algorithms that allow computers to “learn” from
historical data
– ML is the process by which a computer learns from
experience
– It differs from knowledge acquisition in ES: instead
of relying on experts (and their willingness) ML relies
on historical facts
– ML helps in discovering patterns in data
Intelligent Agents
• Intelligent Agent (IA): is an autonomous computer
program that observes and acts upon an environment
and directs its activity toward achieving specific goals
• Relatively new technology
• Other names include
– Software agents
– Wizards
– Knowbots
– Intelligent software robots (Softbots)
– Bots

• Agent - Someone employed to act on one’s behalf


Intelligent Agents
• Intelligent agents are software entities that carry out some
set of operations on behalf of a user or another program,
with some degree of independence or autonomy and in so
doing, employ some knowledge or representation of the
user’s goals or desires.”
(“The IBM Agent”)

• Autonomous agents are computational systems that inhabit


some complex dynamic environment, sense and act
autonomously in this environment and by doing so realize a
set of goals or tasks for which they are designed
(Maes, 1995, p. 108)
Reality Mining
• Identifying aggregate patterns of human
activity trends (see sensenetworks.com by
MIT and Columbia University)
• Many devices send location information
– Cars, buses, taxis, mobile phones, cameras, and
personal navigation devices
– Using technologies such as GPS, WiFi, and cell
tower triangulation
• Enables tracking of assets, finding nearby
services, locating friends/family members, …
Reality Mining
• Citisense: finding people with similar interests

A map of an area of San


Francisco with density
designation at place of
interests

See
www.sensenetworks.com/city
sense.php for real-time
animation of the content
Recommendation Engines
• People rely on recommendations by others
– Success for retailer line Amazon.com
• Recommender systems
– Web-based information filtering system that
takes the inputs from users and then aggregates
the inputs to provide recommendations for other
users in their product or service selection
choices
• Data
– Structured  ratings/rankings
– Unstructured  textual comments
Major Components of
Service-Oriented DSS/BI
• Analytics-as-a-Service (AaaS)
– “Agile Analytics”
– AaaS in the cloud has economies of scale,
better scalability, and higher cost savings
– Data/Text Mining + Big Data  Cloud
Computing
• Storage and access to Big Data
• Massively Parallel Processing
• In-memory processing
• In-database processing
• Resource polling, scaling, cost and time saving, …
Speech Analytics
• Speech analytics – analysis of voice
– Content versus other Voice Features
• Two Approaches
– The Acoustic Approach
• Intensity, Pitch, Jitter, Shimmer, etc.
– The Linguistic Approach
• Lexical: words, phrases, etc.
• Disfluencies: filled pauses, hesitation, restarts, etc.
• Higher semantics: taxonomy/ontology, pragmatics
• Many uses and use cases exist

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy