0% found this document useful (0 votes)
10 views20 pages

Unit - II DW

Data warehousing involves compiling and organizing data into a common database, while data mining focuses on extracting useful information from that data. Data mining techniques, such as predictive and descriptive analysis, help businesses make data-driven decisions by identifying patterns and relationships within large datasets. However, challenges such as data quality, complexity, privacy, and interpretability must be addressed to effectively utilize data mining.

Uploaded by

amanprajapat648
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views20 pages

Unit - II DW

Data warehousing involves compiling and organizing data into a common database, while data mining focuses on extracting useful information from that data. Data mining techniques, such as predictive and descriptive analysis, help businesses make data-driven decisions by identifying patterns and relationships within large datasets. However, challenges such as data quality, complexity, privacy, and interpretability must be addressed to effectively utilize data mining.

Uploaded by

amanprajapat648
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Data Warehouse and Data Mining

Unit - II
Data Mining Vs Data Warehousing
Data warehouse refers to the process of compiling and organizing data into one common database, whereas data
mining refers to the process of extracting useful data from the databases. The data mining process depends on the data
compiled in the data warehousing phase to recognize meaningful patterns. A data warehousing is created to support
management systems.
Data Warehouse:
A Data Warehouse refers to a place where data can be stored for useful mining. It is like a quick computer system with
exceptionally huge data storage capacity. Data from the various organization's systems are copied to the Warehouse,
where it can be fetched and conformed to delete errors. Here, advanced requests can be made against the warehouse
storage of data.

Data warehouse combines data from numerous sources which ensure the data quality, accuracy, and consistency. Data
warehouse boosts system execution by separating analytics processing from transnational databases. Data flows into a
data warehouse from different databases. A data warehouse works by sorting out data into a pattern that depicts the
format and types of data. Query tools examine the data tables using patterns.
Data warehouses and databases both are relative data systems, but both are made to serve different purposes. A data
warehouse is built to store a huge amount of historical data and empowers fast requests over all the data, typically
using Online Analytical Processing (OLAP). A database is made to store current transactions and allow quick access
to specific transactions for ongoing business processes, commonly known as Online Transaction Processing (OLTP).

Data Mining:
Data mining refers to the analysis of data. It is the computer-supported process of analyzing huge sets of data that have
either been compiled by computer systems or have been downloaded into the computer. In the data mining process, the
computer analyzes the data and extract useful information from it. It looks for hidden patterns within the data set and
try to predict future behavior. Data mining is primarily used to discover and indicate relationships among the data sets.

Page 1 of 20
Data Warehouse and Data Mining
Data mining aims to enable business organizations to view business behaviors, trends relationships that allow the
business to make data-driven decisions. It is also known as knowledge Discover in Database (KDD). Data mining tools
utilize AI, statistics, databases, and machine learning systems to discover the relationship between the data. Data mining
tools can support business-related questions that traditionally time-consuming to resolve any issue.
Important features of Data Mining:
The important features of Data Mining are given below:
 It utilizes the Automated discovery of patterns.
 It predicts the expected results.
 It focuses on large data sets and databases
 It creates actionable information.

Advantages of Data Mining:


 Market Analysis:
o Data Mining can predict the market that helps the business to make the decision. For example, it predicts
who is keen to purchase what type of products.
 Fraud detection:
o Data Mining methods can help to find which cellular phone calls, insurance claims, credit, or debit card
purchases are going to be fraudulent.
 Financial Market Analysis:
o Data Mining techniques are widely used to help Model Financial Market
 Trend Analysis:
o Analyzing the current existing trend in the marketplace is a strategic benefit because it helps in cost
reduction and manufacturing process as per market demand.

Classification of Data Mining Systems


Data mining refers to the process of extracting important data from raw data. It analyses the data patterns in huge sets
of data with the help of several software. Ever since the development of data mining, it is being incorporated by
researchers in the research and development field.
With Data mining, businesses are found to gain more profit. It has not only helped in understanding customer demand
but also in developing effective strategies to enforce overall business turnover. It has helped in determining business
objectives for making clear decisions.
Data collection and data warehousing, and computer processing are some of the strongest pillars of data mining. Data
mining utilizes the concept of mathematical algorithms to segment the data and assess the possibility of occurrence of
future events.

Page 2 of 20
Data Warehouse and Data Mining
 Classification based on the mined Databases
 Classification based on the type of mined knowledge
 Classification based on statistics
 Classification based on Machine Learning
 Classification based on visualization
 Classification based on Information Science
 Classification based on utilized techniques
 Classification based on adapted applications

Classification Based on the mined Databases


A data mining system can be classified based on the types of databases that have been mined. A database system
can be further segmented based on distinct principles, such as data models, types of data, etc., which further
assist in classifying a data mining system.
For example, if we want to classify a database based on the data model, we need to select either relational,
transactional, object-relational or data warehouse mining systems.
Classification Based on the type of Knowledge Mined
A data mining system categorized based on the kind of knowledge mind may have the following functionalities:
 Characterization
 Discrimination
 Association and Correlation Analysis
 Classification
 Prediction
 Outlier Analysis
 Evolution Analysis

Classification Based on the Techniques Utilized


A data mining system can also be classified based on the type of techniques that are being incorporated. These
techniques can be assessed based on the involvement of user interaction involved or the methods of analysis
employed.
Classification Based on the Applications Adapted
Data mining systems classified based on adapted applications adapted are as follows:
 Finance
 Telecommunications
 DNA
 Stock Markets
 E-mail

Examples of Classification Task


Following is some of the main examples of classification tasks:
 Classification helps in determining tumor cells as benign or malignant.
 Classification of credit card transactions as fraudulent or legitimate.
 Classification of secondary structures of protein as alpha-helix, beta-sheet, or random coil.
 Classification of news stories into distinct categories such as finance, weather, entertainment, sports, etc.

Page 3 of 20
Data Warehouse and Data Mining
Types of Data Mining
Each of the following data mining techniques serves several different business problems and provides a different insight
into each of them. However, understanding the type of business problem you need to solve will also help in knowing
which technique will be best to use, which will yield the best results. The Data Mining types can be divided into two
basic parts that are as follows:
1. Predictive Data Mining Analysis
2. Descriptive Data Mining Analysis

1. Predictive Data Mining


As the name signifies, Predictive Data-Mining analysis works on the data that may help to know what may happen later
(or in the future) in business. Predictive Data-Mining can also be further divided into four types that are listed below:
 Classification Analysis
 Regression Analysis
 Time Serious Analysis
 Prediction Analysis

2. Descriptive Data Mining


The main goal of the Descriptive Data Mining tasks is to summarize or turn given data into relevant information. The
Descriptive Data-Mining Tasks can also be further divided into four types that are as follows:
 Clustering Analysis
 Summarization Analysis
 Association Rules Analysis
 Sequence Discovery Analysis

Data Mining Architecture


The significant components of data mining systems are a data source, data mining engine, data warehouse server, the
pattern evaluation module, graphical user interface, and knowledge base.

Page 4 of 20
Data Warehouse and Data Mining
Data Source:
The actual source of data is the Database, data warehouse, World Wide Web (WWW), text files, and other documents.
You need a huge amount of historical data for data mining to be successful. Organizations typically store data in
databases or data warehouses. Data warehouses may comprise one or more databases, text files spreadsheets, or other
repositories of data. Sometimes, even plain text files or spreadsheets may contain information. Another primary source
of data is the World Wide Web or the internet.
Different processes:
Before passing the data to the database or data warehouse server, the data must be cleaned, integrated, and selected. As
the information comes from various sources and in different formats, it can't be used directly for the data mining
procedure because the data may not be complete and accurate. So, the first data requires to be cleaned and unified. More
information than needed will be collected from various data sources, and only the data of interest will have to be selected
and passed to the server. These procedures are not as easy as we think. Several methods may be performed on the data
as part of selection, integration, and cleaning.
Database or Data Warehouse Server:
The database or data warehouse server consists of the original data that is ready to be processed. Hence, the server is
cause for retrieving the relevant data that is based on data mining as per user request.
Data Mining Engine:
The data mining engine is a major component of any data mining system. It contains several modules for operating data
mining tasks, including association, characterization, classification, clustering, prediction, time-series analysis, etc.
In other words, we can say data mining is the root of our data mining architecture. It comprises instruments and software
used to obtain insights and knowledge from data collected from various data sources and stored within the data
warehouse.
Pattern Evaluation Module:
The Pattern evaluation module is primarily responsible for the measure of investigation of the pattern by using a
threshold value. It collaborates with the data mining engine to focus the search on exciting patterns.
This segment commonly employs stake measures that cooperate with the data mining modules to focus the search
towards fascinating patterns. It might utilize a stake threshold to filter out discovered patterns. On the other hand, the
pattern evaluation module might be coordinated with the mining module, depending on the implementation of the data
mining techniques used. For efficient data mining, it is abnormally suggested to push the evaluation of pattern stake as
much as possible into the mining procedure to confine the search to only fascinating patterns.
Graphical User Interface:
The graphical user interface (GUI) module communicates between the data mining system and the user. This module
helps the user to easily and efficiently use the system without knowing the complexity of the process. This module
cooperates with the data mining system when the user specifies a query or a task and displays the results.
Knowledge Base:
The knowledge base is helpful in the entire process of data mining. It might be helpful to guide the search or evaluate
the stake of the result patterns. The knowledge base may even contain user views and data from user experiences that
might be helpful in the data mining process. The data mining engine may receive inputs from the knowledge base to
make the result more accurate and reliable. The pattern assessment module regularly interacts with the knowledge base
to get inputs, and also update it.

Page 5 of 20
Data Warehouse and Data Mining
Challenges of Data Mining
Data mining, the process of extracting knowledge from data, has become increasingly important as the amount of data
generated by individuals, organizations, and machines has grown exponentially. However, data mining is not without
its challenges. In this article, we will explore some of the main challenges of data mining.

1]Data Quality
The quality of data used in data mining is one of the most significant challenges. The accuracy, completeness, and
consistency of the data affect the accuracy of the results obtained. The data may contain errors, omissions, duplications,
or inconsistencies, which may lead to inaccurate results. Moreover, the data may be incomplete, meaning that some
attributes or values are missing, making it challenging to obtain a complete understanding of the data.
Data quality issues can arise due to a variety of reasons, including data entry errors, data storage issues, data integration
problems, and data transmission errors. To address these challenges, data mining practitioners must apply data cleaning
and data preprocessing techniques to improve the quality of the data. Data cleaning involves detecting and correcting
errors, while data preprocessing involves transforming the data to make it suitable for data mining.

2]Data Complexity
Data complexity refers to the vast amounts of data generated by various sources, such as sensors, social media, and the
internet of things (IoT). The complexity of the data may make it challenging to process, analyze, and understand. In
addition, the data may be in different formats, making it challenging to integrate into a single dataset.
To address this challenge, data mining practitioners use advanced techniques such as clustering, classification, and
association rule mining. These techniques help to identify patterns and relationships in the data, which can then be used
to gain insights and make predictions.

3]Data Privacy and Security


Data privacy and security is another significant challenge in data mining. As more data is collected, stored, and analyzed,
the risk of data breaches and cyber-attacks increases. The data may contain personal, sensitive, or confidential
information that must be protected. Moreover, data privacy regulations such as GDPR, CCPA, and HIPAA impose strict
rules on how data can be collected, used, and shared.

4]Scalability
Data mining algorithms must be scalable to handle large datasets efficiently. As the size of the dataset increases, the
time and computational resources required to perform data mining operations also increase. Moreover, the algorithms
must be able to handle streaming data, which is generated continuously and must be processed in real-time.
To address this challenge, data mining practitioners use distributed computing frameworks such as Hadoop and Spark.
These frameworks distribute the data and processing across multiple nodes, making it possible to process large datasets
quickly and efficiently.

5]Interpretability
Data mining algorithms can produce complex models that are difficult to interpret. This is because the algorithms use a
combination of statistical and mathematical techniques to identify patterns and relationships in the data. Moreover, the
models may not be intuitive, making it challenging to understand how the model arrived at a particular conclusion.
To address this challenge, data mining practitioners use visualization techniques to represent the data and the models
visually. Visualization makes it easier to understand the patterns and relationships in the data and to identify the most
important variables.

Page 6 of 20
Data Warehouse and Data Mining
6]Ethics
Data mining raises ethical concerns related to the collection, use, and dissemination of data. The data may be used to
discriminate against certain groups, violate privacy rights, or perpetuate existing biases. Moreover, data mining
algorithms may not be transparent, making it challenging to detect biases or discrimination.

Principles of Data Visualization


 Clarity
o The visualization should be clear and easily understood by the intended audience.
 Simplicity
o Keep the visualization simple and avoid unnecessary complexity.
 Purposeful
o Understand what message or insight you want to communicate and design for that purpose.
 Consistency
o Maintain consistency in the design elements throughout the visualization.
 Contextualization
o Provide context for the data being presented.
 Accuracy
o Ensure the visualization accurately represents the underlying data.
 Visuals Encoding
o Choose appropriate visual encodings for the data types you are visualizing.
 Intuitiveness
o Design the visualization to be intuitive and easy to comprehend.
 Interactivity
o Consider adding interactive elements to the visualization, such as tooltips, zooming, filtering, or
highlighting.
 Aesthetics
o Although aesthetics are subjective, a visually appealing design can engage viewers and increase their
interest in the data.
 Accessibility
o Accessibility is key; if users can’t read the data, it’s useless.
 Hierarchy
o Work out hierarchy of information early on and always remind yourself of what the purpose of
representing the data is.

Major Issues In Data Mining:


 Mining different kinds of knowledge in databases. - The need of different users is not the same. And Different
user may be in interested in different kind of knowledge. Therefore it is necessary for data mining to cover
broad range of knowledge discovery task.

 Interactive mining of knowledge at multiple levels of abstraction. - The data mining process needs to be
interactive because it allows users to focus the search for patterns, providing and refining data mining requests
based on returned results.

 Incorporation of background knowledge. - To guide discovery process and to express the discovered patterns,
the background knowledge can be used. Background knowledge may be used to express the discovered patterns
not only in concise terms but at multiple level of abstraction.

Page 7 of 20
Data Warehouse and Data Mining
 Data mining query languages and ad hoc data mining. - Data Mining Query language that allows the user
to describe ad hoc mining tasks, should be integrated with a data warehouse query language and optimized for
efficient and flexible data mining.

 Presentation and visualization of data mining results. - Once the patterns are discovered it needs to be
expressed in high level languages, visual representations. This representations should
 be easily understandable by the users.

 Handling noisy or incomplete data. - The data cleaning methods are required that can handle the noise,
incomplete objects while mining the data regularities. If data cleaning methods are not there then the accuracy
of the discovered patterns will be poor.

 Pattern evaluation. - It refers to interestingness of the problem. The patterns discovered should be interesting
because either they represent common knowledge or lack novelty.

 Efficiency and scalability of data mining algorithms. - In order to effectively extract the information from
huge amount of data in databases, data mining algorithm must be efficient and scalable.

 Parallel, distributed, and incremental mining algorithms. - The factors such as huge size of databases, wide
distribution of data, and complexity of data mining methods motivate the development of parallel and
distributed data mining algorithms. These algorithms divide the data into partitions which is further processed
parallel. Then the results from the partitions are merged. The incremental algorithms, updates the databases
without having to mine the data again from the scratch.

Page 8 of 20
Data Warehouse and Data Mining
On Line Transaction Processing (OLTP)
On-Line Transaction Processing (OLTP) System is a type of computer system that helps manage transaction-related
tasks. These systems are made to quickly handle transactions and queries (Insert, Delete and update) on the internet.
Almost every industry nowadays uses OLTP systems to keep track of their transactional data. OLTP systems mainly
focus on entering, storing, and retrieving data, which includes daily operations like purchasing, manufacturing, payroll,
accounting, etc. Many users use these systems for short transactions. They support simple database queries, which
makes it easier for users to get quick responses.
Type of queries that an OLTP system can Process
Insert queries
OLTP systems can process insert queries that add new data to the database, such as when a customer purchases a product.
Update queries
OLTP systems can process update queries that modify existing data in the database, such as when a customer changes
their address.
Delete queries
OLTP systems can process delete queries that remove data from the database, such as when a customer cancels an order.
Simple select queries
OLTP systems can process simple select queries that retrieve data from the database, such as when a customer searches
for a product.
Join queries
OLTP systems can process join queries that retrieve data from multiple tables in the database, such as when a customer
wants to see all their orders and the corresponding product details.

OLAP (Online Analytical Processing)?


OLAP stands for On-Line Analytical Processing. OLAP is a classification of software technology which authorizes
analysts, managers, and executives to gain information through fast, consistent, interactive access in a wide variety of
possible views of data that has been transformed from raw information to reflect the real dimensional of the enterprise
as understood by the clients.
OLAP implement the multidimensional analysis of business information and support the capability for complex
estimations, trend analysis. It is the essential foundation for Intelligent Solutions containing Business Performance
Management, Planning, Budgeting, Forecasting, Financial Documenting, Analysis, Simulation-Models, Knowledge
Discovery, and Data Warehouses Reporting.

Page 9 of 20
Data Warehouse and Data Mining
Who uses OLAP and Why?
OLAP applications are used by a variety of the functions of an organization.
Finance and accounting:
o Budgeting
o Activity-based costing
o Financial performance analysis
o And financial modeling
Sales and Marketing
o Sales analysis and forecasting
o Market research analysis
o Promotion analysis
o Customer analysis
o Market and customer segmentation
Production
o Production planning
o Defect analysis

Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from the first letters of the characteristics are:
 Fast
 Analysis
 Share
 Multidimensional
 Information

OLAP Operations in the Multidimensional Data Model


In the multidimensional model, the records are organized into various dimensions, and each dimension includes multiple
levels of abstraction described by concept hierarchies.

Consider the OLAP operations which are to be performed on multidimensional data. The figure shows data cubes for
sales of a shop. The cube contains the dimensions, location, and time and item, where the location is aggregated with
regard to city values, time is aggregated with respect to quarters, and an item is aggregated with respect to item types.

Roll-Up
The roll-up operation (also known as drill-up or aggregation operation) performs aggregation on a data cube, by
climbing down concept hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data cubes. Figure
shows the result of roll-up operations performed on the dimension location. The hierarchy for the location is defined as
the Order Street, city, province, or state, country. The roll-up operation aggregates the data by ascending the location
hierarchy from the level of the city to the level of the country.

When a roll-up is performed by dimensions reduction, one or more dimensions are removed from the cube. For example,
consider a sales data cube having two dimensions, location and time. Roll-up may be performed by removing, the time
dimensions, appearing in an aggregation of the total sales by location, relatively than by location and by time.

Example

Consider the following cubes illustrating temperature of certain days recorded weekly:

Temperature 64 65 68 69 70 71 72 75 80 81 83 85

Week1 1 0 1 0 1 0 0 0 0 0 1 0

Week2 0 0 0 1 0 0 1 2 0 1 0 0

Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in temperature from the above cubes.

Page 10 of 20
Data Warehouse and Data Mining
To do this, we have to group column and add up the value according to the concept hierarchies. This operation is known
as a roll-up.
By doing this, we contain the following cube:
Temperature cool mild hot

Week1 2 1 1

Week2 2 1 1

The roll-up operation groups the information by levels of temperature.

The following diagram illustrates how roll-up works.

Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-down is like zooming-in on
the data cube. It navigates from less detailed record to more detailed data. Drill-down can be performed by
either stepping down a concept hierarchy for a dimension or adding additional dimensions.
Figure shows a drill-down operation performed on the dimension time by stepping down a concept hierarchy which is
defined as day, month, quarter, and year. Drill-down appears by descending the time hierarchy from the level of the
quarter to a more detailed level of the month.
Because a drill-down adds more details to the given data, it can also be performed by adding a new dimension to a cube.
For example, a drill-down on the central cubes of the figure can occur by introducing an additional dimension, such as
a customer group.

Page 11 of 20
Data Warehouse and Data Mining
Example
Drill-down adds more details to the given data

Temperature cool mild hot

Day 1 0 0 0

Day 2 0 0 0

Day 3 0 0 1

Day 4 0 1 0

Day 5 1 0 0

Day 6 0 0 0

Day 7 1 0 0

Day 8 0 0 0

Day 9 1 0 0

Day 10 0 1 0

Day 11 0 1 0

Day 12 0 1 0

Day 13 0 0 1

Day 14 0 0 0

The following diagram illustrates how Drill-down works.

Page 12 of 20
Data Warehouse and Data Mining

Slice
A slice is a subset of the cubes corresponding to a single value for one or more members of the dimension. For example,
a slice operation is executed when the customer wants a selection on one dimension of a three-dimensional cube
resulting in a two-dimensional site. So, the Slice operations perform a selection on one dimension of the given cube,
thus resulting in a subcube.
For example, if we make the selection, temperature=cool we will obtain the following cube:

Temperature cool

Day 1 0

Day 2 0

Day 3 0

Day 4 0

Day 5 1

Page 13 of 20
Data Warehouse and Data Mining

Day 6 1

Day 7 1

Day 8 1

Day 9 1

Day 11 0

Day 12 0

Day 13 0

Day 14 0

The following diagram illustrates how Slice works.

Here Slice is functioning for the dimensions "time" using the criterion time = "Q1".
It will form a new sub-cubes by selecting one or more dimensions.

Page 14 of 20
Data Warehouse and Data Mining
Dice
The dice operation describes a subcube by operating a selection on two or more dimension.
For example, Implement the selection (time = day 3 OR time = day 4) AND (temperature = cool OR temperature = hot)
to the original cubes we get the following subcube (still two-dimensional)

Temperature cool hot

Day 3 0 1

Day 4 0 0

Consider the following diagram, which shows the dice operations.

The dice operation on the cubes based on the following selection criteria involves three dimensions.
o (location = "Toronto" or "Vancouver")
o (time = "Q1" or "Q2")
o (item =" Mobile" or "Modem")

Page 15 of 20
Data Warehouse and Data Mining
Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations which rotates the data axes in view to
provide an alternative presentation of the data. It may contain swapping the rows and columns or moving one of the
row-dimensions into the column dimensions.

Consider the following diagram, which shows the pivot operation.

Page 16 of 20
Data Warehouse and Data Mining

Types of OLAP
There are three main types of OLAP servers are as following:

ROLAP stands for Relational OLAP, an application based on relational DBMSs.


MOLAP stands for Multidimensional OLAP, an application based on multidimensional DBMSs.
HOLAP stands for Hybrid OLAP, an application using both relational and multidimensional techniques.

Relational OLAP (ROLAP) Server


These are intermediate servers which stand in between a relational back-end server and user frontend tools.
They use a relational or extended-relational DBMS to save and handle warehouse data, and OLAP middleware to
provide missing pieces.
ROLAP servers contain optimization for each DBMS back end, implementation of aggregation navigation logic, and
additional tools and services.
ROLAP technology tends to have higher scalability than MOLAP technology.
ROLAP systems work primarily from the data that resides in a relational database, where the base data and dimension
tables are stored as relational tables. This model permits the multidimensional analysis of data.
This technique relies on manipulating the data stored in the relational database to give the presence of traditional OLAP's
slicing and dicing functionality. In essence, each method of slicing and dicing is equivalent to adding a "WHERE"
clause in the SQL statement.
Relational OLAP Architecture
ROLAP Architecture includes the following components
o Database server.
o ROLAP server.
o Front-end tool.

Page 17 of 20
Data Warehouse and Data Mining

Multidimensional OLAP (MOLAP) Server


A MOLAP system is based on a native logical model that directly supports multidimensional data and operations. Data
are stored physically into multidimensional arrays, and positional techniques are used to access them.
One of the significant distinctions of MOLAP against a ROLAP is that data are summarized and are stored in an
optimized format in a multidimensional cube, instead of in a relational database. In MOLAP model, data are structured
into proprietary formats by client's reporting requirements with the calculations pre-generated on the cubes.
MOLAP Architecture
MOLAP Architecture includes the following components
o Database server.
o MOLAP server.
o Front-end tool.

MOLAP structure primarily reads the precompiled data. MOLAP structure has limited capabilities to dynamically create
aggregations or to evaluate results which have not been pre-calculated and stored.
Hybrid OLAP (HOLAP) Server
HOLAP incorporates the best features of MOLAP and ROLAP into a single architecture. HOLAP systems save more
substantial quantities of detailed data in the relational tables while the aggregations are stored in the pre-calculated
cubes. HOLAP also can drill through from the cube down to the relational tables for delineated data. The Microsoft
SQL Server 2000 provides a hybrid OLAP server.

Page 18 of 20
Data Warehouse and Data Mining

Difference between ROLAP, MOLAP, and HOLAP

ROLAP MOLAP HOLAP

MOLAP stands for


ROLAP stands for Relational Online HOLAP stands for Hybrid Online
Multidimensional Online
Analytical Processing. Analytical Processing.
Analytical Processing.

The HOLAP storage mode connects


The MOLAP storage mode
attributes of both MOLAP and
The ROLAP storage mode causes the principle the aggregations of the
ROLAP. Like MOLAP, HOLAP
aggregation of the division to be division and a copy of its source
causes the aggregation of the
stored in indexed views in the information to be saved in a
division to be stored in a
relational database that was specified multidimensional operation in
multidimensional operation in an
in the partition's data source. analysis services when the
SQL Server analysis services
separation is processed.
instance.

This MOLAP operation is highly


optimize to maximize query
performance. The storage area can
ROLAP does not because a copy of
be on the computer where the HOLAP does not causes a copy of
the source information to be stored in
partition is described or on another the source information to be stored.
the Analysis services data folders.
computer running Analysis For queries that access the only
Instead, when the outcome cannot be
services. Because a copy of the summary record in the aggregations
derived from the query cache, the
source information resides in the of a division, HOLAP is the
indexed views in the record source are
multidimensional operation, equivalent of MOLAP.
accessed to answer queries.
queries can be resolved without
accessing the partition's source
record.

Queries that access source record for


example, if we want to drill down to
Query response times can be
Query response is frequently slower an atomic cube cell for which there
reduced substantially by using
with ROLAP storage than with the is no aggregation information must
aggregations. The record in the
MOLAP or HOLAP storage mode. retrieve data from the relational
partition's MOLAP operation is
Processing time is also frequently database and will not be as fast as
only as current as of the most recent
slower with ROLAP. they would be if the source
processing of the separation.
information were stored in the
MOLAP architecture.

Page 19 of 20
Data Warehouse and Data Mining

Following are the difference between OLAP and OLTP system.

Users: OLTP systems are designed for office worker while the OLAP systems are designed for decision-makers.
Therefore while an OLTP method may be accessed by hundreds or even thousands of clients in a huge enterprise, an
OLAP system is suitable to be accessed only by a select class of manager and may be used only by dozens of users.

2) Functions: OLTP systems are mission-critical. They provide day-to-day operations of an enterprise and are largely
performance and availability driven. These operations carry out simple repetitive operations. OLAP systems are
management-critical to support the decision of enterprise support tasks using detailed investigation.

3) Nature: Although SQL queries return a set of data, OLTP methods are designed to step one record at the time, for
example, a data related to the user who may be on the phone or in the store. OLAP system is not designed to deal with
individual customer records. Instead, they include queries that deal with many data at a time and provide summary or
aggregate information to a manager. OLAP applications include data stored in a data warehouses that have been
extracted from many tables and possibly from more than one enterprise database.

4) Design: OLTP database operations are designed to be application-oriented while OLAP operations are designed to
be subject-oriented. OLTP systems view the enterprise record as a collection of tables (possibly based on an entity-
relationship model). OLAP operations view enterprise information as multidimensional).

5) Data: OLTP systems usually deal only with the current status of data. For example, a record about an employee who
left three years ago may not be feasible on the Human Resources System. The old data may have been achieved on
some type of stable storage media and may not be accessible online. On the other hand, OLAP systems needed historical
data over several years since trends are often essential in decision making.

6) Kind of use: OLTP methods are used for reading and writing operations while OLAP methods usually do not update
the data.

7) View: An OLTP system focuses primarily on the current data within an enterprise or department, which does not
refer to historical data or data in various organizations. In contrast, an OLAP system spans multiple version of a database
schema, due to the evolutionary process of an organization. OLAP system also deals with information that originates
from different organizations, integrating information from many data stores. Because of their huge volume, these are
stored on multiple storage media.

8) Access Patterns: The access pattern of an OLTP system consist primarily of short, atomic transactions. Such a system
needed concurrency control and recovery techniques. However, access to OLAP systems is mostly read-only operations
because these data warehouses store historical information.

The biggest difference between an OLTP and OLAP system is the amount of data analyzed in a single transaction.
Whereas an OLTP handles many concurrent customers and queries touching only a single data or limited collection of
records at a time, an OLAP system must have the efficiency to operate on millions of data to answer a single query.

Page 20 of 20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy