0% found this document useful (0 votes)
84 views7 pages

Data Warehousing, Data Mining, OLAP and OLTP Technologies Are Indispensable Elements To Support Decision-Making Process in Industrial World

Data Warehousing, Data Mining, OLAP and OLTP Technologies Are Indispensable Elements to Support Decision-Making Process in Industrial World
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views7 pages

Data Warehousing, Data Mining, OLAP and OLTP Technologies Are Indispensable Elements To Support Decision-Making Process in Industrial World

Data Warehousing, Data Mining, OLAP and OLTP Technologies Are Indispensable Elements to Support Decision-Making Process in Industrial World
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

International Journal of Scientific and Research Publications, Volume 5, Issue 5, May 2015 1

ISSN 2250-3153

Data Warehousing, Data Mining, OLAP and OLTP


Technologies Are Indispensable Elements to Support
Decision-Making Process in Industrial World
Amandeep Kour

Assistant Professor, “Department of Computer Science and Engineering”


M.B.S College of Engineering & Technology, Babliana, Jammu (J&K) India

Abstract- This paper provides an overview of Data warehousing, Time-Variant: Historical data is kept in a data warehouse.
Data Mining, OLAP, OLTP technologies, exploring the features, For example, one can retrieve data from 3 months, 6 months, 12
new applications and the architecture of Data Warehousing and months, or even older data from a data warehouse. This contrasts
data mining. The data warehouse supports on-line analytical with a transactions system, where often only the most recent data
processing (OLAP), the functional and performance is kept. For example, a transaction system may hold the most
requirements of which are quite different from those of the on- recent address of a customer, where a data warehouse can hold
line transaction processing (OLTP) applications traditionally all addresses associated with a customer.
supported by the operational databases. Data warehouses provide Non-volatile: Once data is in the data warehouse, it will not
on-line analytical processing (OLAP) tools for the interactive change. So, historical data in a data warehouse should never be
analysis of multidimensional data of varied granularities, which altered.
facilitates effective data mining. Data warehousing and on-line Ralph Kimball provided a more concise definition of a data
analytical processing (OLAP) are essential elements of decision warehouse: A data warehouse is a copy of transaction data
support, which has increasingly become a focus of the database specifically structured for query and analysis. This is a functional
industry. OLTP is customer-oriented and is used for transaction view of a data warehouse. Kimball did not address how the data
and query processing by clerks, clients and information warehouse is built like Inmon did; rather he focused on the
technology professionals. An OLAP system is market-oriented functionality of a data warehouse.
and is used for data analysis by knowledge workers, including Data warehousing is a collection of decision support
managers, executives and analysts. Data warehousing and OLAP technologies, aimed at enabling the knowledge worker
have emerged as leading technologies that facilitate data storage, (executive, manager, analyst) to make better and faster
organization and then, significant retrieval. Decision support decisions. Data warehousing technologies have been
places some rather different requirements on database technology successfully deployed in many industries: manufacturing (for
compared to traditional on-line transaction processing order shipment and customer support), retail (for user profiling
applications. and inventory management), financial services (for claims
analysis, risk analysis, credit card analysis, and fraud
Index Terms- Data Warehousing, OLAP, OLTP, Data Mining, detection),transportation(for fleet management),
Decision Making and Decision Support, Data mining, Data telecommunications (for call analysis and fraud detection),
marts, Meta data, ETL (Extraction, Transportation, utilities (for power usage analysis), and healthcare (for
transformation and loading), Server, Data warehouse outcomes analysis). This paper presents a roadmap of data
architecture. warehousing technologies, focusing on the special requirements
that data warehouses place on database management
systems (DBMSs).
I. INTRODUCTION

D ifferent people have different definitions for a data


warehouse. The most popular definition came from “Bill
Inmon”, who provided the following: 2.1
II. DATA WAREHOUSING
Definition of data warehousing:
A data warehouse is a subject-oriented, integrated, time- A single, complete and consistent store of data obtained
variant and non-volatile collection of data in support of from a variety of different sources made available to end users in
management's decision making process. what they can understand and use in a business context.
Subject-Oriented: A data warehouse can be used to analyze a Data Warehousing is defined in many different ways, but not
particular subject area. For example, "sales" can be a particular rigorously.
subject. A decision support database that is maintained separately
Integrated: A data warehouse integrates data from multiple from the organization’s operational database Support information
data sources. For example, source A and source B may have processing by providing a solid platform of consolidated,
different ways of identifying a product, but in a data warehouse, historical data for analysis. Data warehousing is the process of
there will be only a single way of identifying a product. constructing and using data warehouses.Organized around major

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 5, Issue 5, May 2015 2
ISSN 2250-3153

subjects, such as customer, product, sales Focusing on the


modeling and analysis of data for decision makers, not on daily
operations or transaction processing Provide a simple and
concise view around particular subject issues by excluding data
that are not useful in the decision support process.A data
warehouse draws data from operational systems, but is physically
separate and serves a different purpose. Operational systems
have their own databases and are used for transaction processing;
a data warehouse has its own database and is used to support
decision making. Once the warehouse is created, users (e.g.,
analysts, managers) access the data in the warehouse using tools
that generate SQL (i.e., structured query language) queries or
through applications such as a decision support system or an
executive information system. “Data warehousing” is a broader
term than “data warehouse” and is used to describe the creation, Fig: 1 shows a Data warehousing architecture
maintenance, use, and continuous refreshing of the data in the
warehouse. It includes tools for extracting data from multiple operational
databases and external sources; for cleaning, transforming and
2.2 Explains how to design and manage data warehouse integrating this data; for loading data into the data warehouse;
systems focusing on project management aspects: and for periodically refreshing the warehouse to reflect updates
They give an overview of organizational roles involved in a at the sources and to purge data from the warehouse, perhaps
typical data warehouse project. Meyer (2000) and Meyer/Winter onto slower archival storage. In addition to the main warehouse,
(2001) present organizational requirements for data warehousing there may be several departmental data marts. Data in the
and the concept of data ownership. A two-dimensional warehouse and data marts is stored and managed by one or more
organizational structure for large financial service companies warehouseservers, which present multidimensional views of data
combining infrastructural competencies and content to a variety of front end tools: query tools, report writers,
competencies is derived. Auth (2003) develops a process- analysis tools, and data mining tools. Finally, there is a
oriented organizational concept for metadata management repository for storing and managing metadata, and tools for
providing detailed activity chains and organizational roles. As monitoring and administering the warehousing system.
shown above the organizational domain of data warehouse Enterprise warehouse collects all of the information about
systems still lacks attention of data warehouse researchers subjects spanning the entire organization
compared to technical aspects. Therefore this paper aims at Data Mart is a subset of corporate-wide data that is of value
providing deeper insights in the current organizational situation to a specific groups of users. Its scope is confined to specific,
of data warehouse departments in practice. The organizational selected groups, such as marketing data mart
domain of companies can be divided in a structural, human Independent vs. dependent (directly from warehouse) data
resource, political, and symbolic dimension and each dimension mart.
requires different design instruments (Bolman/Deal 2003, Virtual warehouse is a set of views over operational
Mueller-Stewens 2003). The structural dimension focuses on databases
goals, formal roles and relationships. Structures are created to Only some of the possible summary views may be
achieve the company’s goals considering technological and materialized.
environmental factors. Rules, policies, processes, and hierarchies Meta data is the data defining warehouse objects. It stores
are the design elements of the structural dimension. Drawing Description of the structure of the data warehouse schema, view,
from psychology, the human resource dimension takes care about dimensions, hierarchies, derived data defined, data mart locations
the needs, feelings, prejudices, and limitations of all individuals. and contents
The political dimension sees organizations as arenas. Different Operational meta-data is a data lineage (history of migrated
interest groups cause conflicts while competing for power and data and transformation path), currency of data (active, archived,
resources and the organizational life is characterized by or purged), monitoring information (warehouse usage statistics,
bargaining, negations and compromises. The OLAP Council error reports, audit trails)
(http://www.olapcouncil.org) is a good source of information on The algorithms used for summarization and the mapping
standardization efforts across the industry. The symbolic from operational environment to the data warehouse data related
dimension abandons the assumptions of rational behavior and to system performance Warehouse schema, view and derived
views organizations as some kind of theatres.. Finally, a good data definitions.
source of references on data warehousing and OLAP is the Data
Warehousing Information Center.
III. OLTP AND OLAP
2.3 Data warehouse Architecture
The job of earlier on-line operational systems was to perform
transaction and query processing. So, they are also termed as on-
line transaction processing systems (OLTP). Data warehouse
systems serve users or knowledge workers in the role of data

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 5, Issue 5, May 2015 3
ISSN 2250-3153

analysis and decision-making. Such systems can organize and Scientific Viewpoint…
present data in various formats in order to accommodate the  Data collected and stored at enormous speeds
diverse needs of the different users. These systems are called on- – Remote sensor on a satellite
line analytical processing (OLAP) systems. – Telescope scanning the skies
– Microarrays generating gene expression data
3.1 Major distinguishing features between OLTP and – Scientific simulations generating terabytes of
OLAP data
 Traditional techniques are infeasible for raw data
i) Users and system orientation: OLTP is customer-  Data mining for data reduction
oriented and is used for transaction and query – Cataloging, classifying, segmenting data
processing by clerks, clients and information – Helps scientists in Hypothesis Formation.
technology professionals. An OLAP system is
market-oriented and is used for data analysis by 4.3 Major Data Mining Tasks
knowledge workers, including managers,
executives and analysts.  Classification: Predicting an item class.
ii) Data contents: OLTP system manages current data in  Association Rule Discovery: descriptive.
too detailed format. While an OLAP system  Clustering: descriptive, finding groups of items.
manages large amounts of historical data, provides  Sequential Pattern Discovery: descriptive.
facilities for summarization and aggregation.  Deviation Detection: predictive, finding changes.
Moreover, information is stored and managed at  Forecasting: predicting a parameter value
different levels of granularity, it makes the data  Description: describing a group.
easier to use in informed decision-making.  Link analysis: finding relationships and associations.
iii) Database design: An OLTP system generally adopts an
entity –relationship data model and an application- 4.3.1 Classification: Definition
oriented database design. An OLAP system adopts  Given a collection of records(training set)
either a star or snowflake model and a subject – Each record contains a set of attributes, one of
oriented database design. the attributes is the class.
 Find a model for class attribute as a function of the
values of other attributes.
IV. DATA MINING  Goal: previously unseen records should be assigned a
Data Mining is the extraction or “Mining” of knowledge class as accurately as possible.
from a large amount of data or data warehouse. To do this – A test set is used to determine the accuracy of
extraction data mining combines artificial intelligence, statistical the model. Usually, the given data set is
analysis and database management systems to attempt to pull divided into training and test sets, with training
knowledge form stored data. Data mining is the process of set used to build the model and test set used to
applying intelligent methods to extract data patterns. This is done validate it.
using the front-end tools. The spreadsheet is still the most
compiling front-end application for Online Analytical Processing 4.3.1.1 Classification: Application
(OLAP).
 The automatic discovery of relationships in typically  Direct Marketing
large database and, in some instances, the use of the – Goal: Reduce cost of mailing by targeting a set
discovery results in predicting relationships. of customers likely to buy a new cell-phone
 An essential process where intelligent methods are product.
applied in order to extract data patterns. – Approach:
 Data mining lets you be proactive  Use the data for a similar product
– Prospective rather than Retrospective introduced before.
 We know which customers decided to
1.1 Why mine data? buy and which decided otherwise.
Commercial viewpoint… This {buy, don’t buy} decision forms
the class attribute.
 Lots of data is being collected and warehoused.  Collect various demographic,
 Computing has become affordable. lifestyle, and company-interaction
 Competitive Pressure is Strong related information about all such
– Provide better, customized services for an customers.
edge. – Type of business, where they
– Information is becoming product in its own stay, how much they earn,
right. etc.
 Use this information as input
1.2 Why Mine Data? attributes to learn a classifier model.

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 5, Issue 5, May 2015 4
ISSN 2250-3153

4.3.1.2 Associations  Market Segmentation:


– Goal: subdivide a market into distinct subsets
 I = {i 1 , i 2 , …i m}: a set of literals, called items. of customers where any subset may
 Transaction d: a set of items such that d ⊆ I conceivably be selected as a market target to be
 Database D: a set of transactions reached with a distinct marketing mix.
 A transaction d contains X, a set of some items in L, if X  Approach:
Íd. – Collect different attributes of customers based
 An association rule is an implication of the form X⇒ Y, on their geographical and lifestyle related
where X, Y⊂ I. information
– Find clusters of similar customers.
– Measure the clustering quality by observing
4.3.1.3 Association rules buying patterns of customers in same cluster
vs. those from different clusters.
 Used to find all rules in a basket data
 Basket data also called transaction data
 analyze how items purchased by customers in a shop are V. BAYESIAN BELIEF NETWORKS
related Bayesian belief network (also known as Bayesian network,
 discover all rules that have:- probabilistic network): allows class conditional independencies
– support greater than min sup specified by user between subsets of variables
– confidence greater than min conf specified by • Two components: (1) A directed acyclic
user graph (called a structure) and (2) a set of
 Example of transaction data:- conditional probability tables (CPTs).
– CD player, music’s CD, music’s book • A (directed acyclic) graphical model of
– CD player, music’s CD causal influence relationships.
– music’s CD, music’s book • Represents dependency among the
– CD player variables.
 Let I = {i 1 , i 2 , …i m} be a total set of items • Gives a specification of joint probability
D a set of transactions
distribution.
d is one transaction consists of a set of items
– d⊆I 1.3 How Are Bayesian Networks Constructed?
 Association rule:-
• Subjective construction: Identification of
– Let I = {i 1 , i 2 , …i m} be a total set of items
(direct) causal structure.
– D a set of transactions
• People are quite good at identifying direct
– d is one transaction consists of a set of
causes from a given set of variables &
items
whether the set contains all relevant direct
– d⊆I
causes.

• Markovian assumption: Each variable
Association rule:-
becomes independent of its non-effects
– X  Y where X ⊂ I ,Y ⊂ I and X ∩ Y = ∅ once its direct causes are known
– support = (#of transactions contain X ∪ Y ) / • E.g., S ‹— F —› A ‹— T, path S—›A is
D
blocked once we know F—›A
– confidence = (#of transactions contain X ∪ Y ) • HMM (Hidden Markov Model): often
/
used to model dynamic systems whose
– #of transactions contain
states are not observable, yet their outputs
X .
are:
4.3.1.4 Clustering
• Synthesis from other specifications
E.g., from a formal system design: block diagrams & info flow
 Given a set of data points, each having a set of
• Learning from data
attributes, and a similarity measure among them, find
E.g., from medical records or student admission record.
clusters such that
– Data points in one cluster are more similar to • Learn parameters give its structure or learn
one another. both structure and parms
– Data points in separate clusters are less similar • Maximum likelihood principle: favors
to one another. Bayesian networks that maximize the
probability of observing the given data set.
4.3.1.4 Clustering Applications 5.2 Training Bayesian Networks: Several Scenarios.

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 5, Issue 5, May 2015 5
ISSN 2250-3153

• Scenario 1: Given both the network consist of the fittest rules and their
structure and all variables observable: offspring
compute only the CPT entries • The fitness of a rule is represented by its
• Scenario 2: Network structure known, classification accuracy on a set of training
some variables hidden: gradient descent examples
(greedy hill-climbing) method, i.e., search • Offspring are generated by crossover and
for a solution along the steepest descent of mutation
a criterion function • The process continues until a population P
• Weights are initialized to random evolves when each rule in P satisfies a pre-
probability values specified threshold
• At each iteration, it moves towards what • Slow but easily parallelizable.
appears to be the best solution at the
moment, w.o. backtracking 5.4.1 Rough Set Approach:
• Weights are updated at each iteration &
converge to local optimum  Rough sets are used to approximately or “roughly”
• Scenario 3: Network structure unknown, define equivalent classes
all variables observable: search through  A rough set for a given class C is approximated by two
the model space to reconstruct network sets: a lower approximation (certain to be in C) and an
topology . upper approximation (cannot be described as not
• Scenario 4: Unknown structure, all hidden belonging to C)
variables: No good algorithms known for  Finding the minimal subsets (reducts) of attributes for
this purpose feature reduction is NP-hard but a discernibility matrix
• D. Heckerman. A Tutorial on Learning (which stores the differences between attribute values
with Bayesian Networks. In Learning in for each pair of data tuples) is used to reduce the
Graphical Models, M. Jordan, ed. MIT computation intensity.
Press, 1999.

1.4 Neuron: A Hidden/Output Layer Unit

Fig: 2 Hidden/output layer diagram

For Example
n
y = sign(∑ wi xi − µ k )
i =0 Fig: 3 rough set Approach
 An n-dimensional input vector x is mapped into variable
y by means of the scalar product and a nonlinear
function mapping VI. ACTIVE LEARNING
 Class labels are expensive to obtain
The inputs to unit are outputs from the previous layer. They  Active learner: query human (oracle) for labels
are multiplied by their corresponding weights to form a weighted  Pool-based approach: Uses a pool of unlabeled data
sum, which is added to the bias associated with unit. Then a  L: a small subset of D is labeled, U: a pool of
nonlinear activation function is applied to it. unlabeled data in D
 Use a query function to carefully select one or
1.5 Genetic Algorithms (GA) more tuples from U and request labels from an
• Genetic Algorithm: based on an analogy to oracle (a human annotator)
biological evolution  The newly labeled samples are added to L, and
• An initial population is created consisting learn a model
of randomly generated rules  Goal: Achieve high accuracy using as few
• Each rule is represented by a string of bits labeled data as possible
• E.g., if A 1 and ¬A 2 then C 2 can be  Evaluated using learning curves: Accuracy as a function
encoded as 100 of the number of instances queried (# of tuples to be
• If an attribute has k > 2 values, k bits can queried should be small)
be used  Research issue: How to choose the data tuples to be
• Based on the notion of survival of the queried?
fittest, a new population is formed to

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 5, Issue 5, May 2015 6
ISSN 2250-3153

VIII. A CLOSER LOOK AT CMAR


  CMAR (Classification based on Multiple Association
 Rules: Li, Han, Pei, ICDM’01)
 Efficiency: Uses an enhanced FP-tree that maintains the
distribution of class labels among tuples satisfying each
frequent itemset
 Rule pruning whenever a rule is inserted into the tree
 Given two rules, R1 and R2 , if the antecedent
of R1 is more general than that of R2 and
conf(R1 ) ≥ conf(R 2 ), then prune R2
Fig: 7 Rough set Approach  Prunes rules for which the rule antecedent and
class are not positively correlated, based on a
 Uncertainty sampling: choose the least certain χ2 test of statistical significance
ones  Classification based on generated/pruned rules
 Reduce version space, the subset of hypotheses  If only one rule satisfies tuple X, assign the
consistent w. the training data class label of the rule
 Reduce expected entropy over U: Find the  If a rule set S satisfies X, CMAR
greatest reduction in the total number of  divides S into groups according to
incorrect predictions. class labels
 uses a weighted χ2 measure to find the
strongest group of rules, based on the
VII. TRANSFER LEARNING: CONCEPTUAL FRAMEWORK statistical correlation of rules within a
 Transfer learning: Extract knowledge from one or more group
source tasks and apply the knowledge to a target task  assigns X the class label of the
 Traditional learning: Build a new classifier for each new strongest group.
task
 Transfer learning: Build new classifier by applying
existing knowledge learned from source tasks. REFERENCES
[1] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford
University Press, 1995
Different Tasks [2] C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern
Recognition. Data Mining and Knowledge Discovery, 2(2): 121-168, 1998
[3] H. Cheng, X. Yan, J. Han, and C.-W. Hsu, Discriminative Frequent pattern
Analysis for Effective Classification, ICDE'07
[4] H. Cheng, X. Yan, J. Han, and P. S. Yu, Direct Discriminative Pattern
Mining for Effective Classification, ICDE'08
[5] N. Cristianini and J. Shawe-Taylor, Introduction to Support Vector
Machines and Other Kernel-Based Learning Methods, Cambridge
University Press, 2000
[6] A. J. Dobson. An Introduction to Generalized Linear Models. Chapman &
Hall, 1990
[7] G. Dong and J. Li. Efficient mining of emerging patterns: Discovering
Learning System Learning System Learning System
trends and differences. KDD'99
[8] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2ed. John
Wiley, 2001
Fig: 8 Traditional Learning Framework [9] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001
[10] S. Haykin, Neural Networks and Learning Machines, Prentice Hall, 2008
[11] D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian
networks: The combination of knowledge and statistical data. Machine
Source Tasks Target Task Learning, 1995.
[12] V. Kecman, Learning and Soft Computing: Support Vector Machines,
Neural Networks, and Fuzzy Logic, MIT Press, 2001
[13] W. Li, J. Han, and J. Pei, CMAR: Accurate and Efficient Classification
Based on Multiple Class-Association Rules, ICDM'01
[14] T.-S. Lim, W.-Y. Loh, and Y.-S. Shih. A comparison of prediction
accuracy, complexity, and training time of thirty-three old and new
classification algorithms. Machine Learning, 2000

Knowledge Learning System

Fig: 9 Transfer Learning Framework

www.ijsrp.org
International Journal of Scientific and Research Publications, Volume 5, Issue 5, May 2015 7
ISSN 2250-3153

AUTHORS College of Engineering & Technology, Babliana, Jammu (J&K)


First Author – AMANDEEP KOUR, Assistant Professor, India, amandeeepkour607@gmail.com
“Department of Computer Science and Engineering”, M.B.S

www.ijsrp.org

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy