0% found this document useful (0 votes)

10 views20 pages

Unit - II DW

Data warehousing involves compiling and organizing data into a common database, while data mining focuses on extracting useful information from that data. Data mining techniques, such as predictive and descriptive analysis, help businesses make data-driven decisions by identifying patterns and relationships within large datasets. However, challenges such as data quality, complexity, privacy, and interpretability must be addressed to effectively utilize data mining.

Uploaded by

amanprajapat648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views20 pages

Unit - II DW

Uploaded by

amanprajapat648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Warehouse and Data Mining

Unit - II
Data Mining Vs Data Warehousing
Data warehouse refers to the process of compiling and organizing data into one common database, whereas data
mining refers to the process of extracting useful data from the databases. The data mining process depends on the data
compiled in the data warehousing phase to recognize meaningful patterns. A data warehousing is created to support
management systems.
Data Warehouse:
A Data Warehouse refers to a place where data can be stored for useful mining. It is like a quick computer system with
exceptionally huge data storage capacity. Data from the various organization's systems are copied to the Warehouse,
where it can be fetched and conformed to delete errors. Here, advanced requests can be made against the warehouse
storage of data.

Data warehouse combines data from numerous sources which ensure the data quality, accuracy, and consistency. Data
warehouse boosts system execution by separating analytics processing from transnational databases. Data flows into a
data warehouse from different databases. A data warehouse works by sorting out data into a pattern that depicts the
format and types of data. Query tools examine the data tables using patterns.
Data warehouses and databases both are relative data systems, but both are made to serve different purposes. A data
warehouse is built to store a huge amount of historical data and empowers fast requests over all the data, typically
using Online Analytical Processing (OLAP). A database is made to store current transactions and allow quick access
to specific transactions for ongoing business processes, commonly known as Online Transaction Processing (OLTP).

Data Mining:
Data mining refers to the analysis of data. It is the computer-supported process of analyzing huge sets of data that have
either been compiled by computer systems or have been downloaded into the computer. In the data mining process, the
computer analyzes the data and extract useful information from it. It looks for hidden patterns within the data set and
try to predict future behavior. Data mining is primarily used to discover and indicate relationships among the data sets.

Page 1 of 20
Data Warehouse and Data Mining
Data mining aims to enable business organizations to view business behaviors, trends relationships that allow the
business to make data-driven decisions. It is also known as knowledge Discover in Database (KDD). Data mining tools
utilize AI, statistics, databases, and machine learning systems to discover the relationship between the data. Data mining
tools can support business-related questions that traditionally time-consuming to resolve any issue.
Important features of Data Mining:
The important features of Data Mining are given below:
 It utilizes the Automated discovery of patterns.
 It predicts the expected results.
 It focuses on large data sets and databases
 It creates actionable information.

Advantages of Data Mining:

 Market Analysis:
o Data Mining can predict the market that helps the business to make the decision. For example, it predicts
who is keen to purchase what type of products.
 Fraud detection:
o Data Mining methods can help to find which cellular phone calls, insurance claims, credit, or debit card
purchases are going to be fraudulent.
 Financial Market Analysis:
o Data Mining techniques are widely used to help Model Financial Market
 Trend Analysis:
o Analyzing the current existing trend in the marketplace is a strategic benefit because it helps in cost
reduction and manufacturing process as per market demand.

Classification of Data Mining Systems

Data mining refers to the process of extracting important data from raw data. It analyses the data patterns in huge sets
of data with the help of several software. Ever since the development of data mining, it is being incorporated by
researchers in the research and development field.
With Data mining, businesses are found to gain more profit. It has not only helped in understanding customer demand
but also in developing effective strategies to enforce overall business turnover. It has helped in determining business
objectives for making clear decisions.
Data collection and data warehousing, and computer processing are some of the strongest pillars of data mining. Data
mining utilizes the concept of mathematical algorithms to segment the data and assess the possibility of occurrence of
future events.

Page 2 of 20
Data Warehouse and Data Mining
 Classification based on the mined Databases
 Classification based on the type of mined knowledge
 Classification based on statistics
 Classification based on Machine Learning
 Classification based on visualization
 Classification based on Information Science
 Classification based on utilized techniques
 Classification based on adapted applications

Classification Based on the mined Databases

A data mining system can be classified based on the types of databases that have been mined. A database system
can be further segmented based on distinct principles, such as data models, types of data, etc., which further
assist in classifying a data mining system.
For example, if we want to classify a database based on the data model, we need to select either relational,
transactional, object-relational or data warehouse mining systems.
Classification Based on the type of Knowledge Mined
A data mining system categorized based on the kind of knowledge mind may have the following functionalities:
 Characterization
 Discrimination
 Association and Correlation Analysis
 Classification
 Prediction
 Outlier Analysis
 Evolution Analysis

Classification Based on the Techniques Utilized

A data mining system can also be classified based on the type of techniques that are being incorporated. These
techniques can be assessed based on the involvement of user interaction involved or the methods of analysis
employed.
Classification Based on the Applications Adapted
Data mining systems classified based on adapted applications adapted are as follows:
 Finance
 Telecommunications
 DNA
 Stock Markets
 E-mail

Examples of Classification Task

Following is some of the main examples of classification tasks:
 Classification helps in determining tumor cells as benign or malignant.
 Classification of credit card transactions as fraudulent or legitimate.
 Classification of secondary structures of protein as alpha-helix, beta-sheet, or random coil.
 Classification of news stories into distinct categories such as finance, weather, entertainment, sports, etc.

Page 3 of 20
Data Warehouse and Data Mining
Types of Data Mining
Each of the following data mining techniques serves several different business problems and provides a different insight
into each of them. However, understanding the type of business problem you need to solve will also help in knowing
which technique will be best to use, which will yield the best results. The Data Mining types can be divided into two
basic parts that are as follows:
1. Predictive Data Mining Analysis
2. Descriptive Data Mining Analysis

1. Predictive Data Mining

As the name signifies, Predictive Data-Mining analysis works on the data that may help to know what may happen later
(or in the future) in business. Predictive Data-Mining can also be further divided into four types that are listed below:
 Classification Analysis
 Regression Analysis
 Time Serious Analysis
 Prediction Analysis

2. Descriptive Data Mining

The main goal of the Descriptive Data Mining tasks is to summarize or turn given data into relevant information. The
Descriptive Data-Mining Tasks can also be further divided into four types that are as follows:
 Clustering Analysis
 Summarization Analysis
 Association Rules Analysis
 Sequence Discovery Analysis

Data Mining Architecture

The significant components of data mining systems are a data source, data mining engine, data warehouse server, the
pattern evaluation module, graphical user interface, and knowledge base.

Page 4 of 20
Data Warehouse and Data Mining
Data Source:
The actual source of data is the Database, data warehouse, World Wide Web (WWW), text files, and other documents.
You need a huge amount of historical data for data mining to be successful. Organizations typically store data in
databases or data warehouses. Data warehouses may comprise one or more databases, text files spreadsheets, or other
repositories of data. Sometimes, even plain text files or spreadsheets may contain information. Another primary source
of data is the World Wide Web or the internet.
Different processes:
Before passing the data to the database or data warehouse server, the data must be cleaned, integrated, and selected. As
the information comes from various sources and in different formats, it can't be used directly for the data mining
procedure because the data may not be complete and accurate. So, the first data requires to be cleaned and unified. More
information than needed will be collected from various data sources, and only the data of interest will have to be selected
and passed to the server. These procedures are not as easy as we think. Several methods may be performed on the data
as part of selection, integration, and cleaning.
Database or Data Warehouse Server:
The database or data warehouse server consists of the original data that is ready to be processed. Hence, the server is
cause for retrieving the relevant data that is based on data mining as per user request.
Data Mining Engine:
The data mining engine is a major component of any data mining system. It contains several modules for operating data
mining tasks, including association, characterization, classification, clustering, prediction, time-series analysis, etc.
In other words, we can say data mining is the root of our data mining architecture. It comprises instruments and software
used to obtain insights and knowledge from data collected from various data sources and stored within the data
warehouse.
Pattern Evaluation Module:
The Pattern evaluation module is primarily responsible for the measure of investigation of the pattern by using a
threshold value. It collaborates with the data mining engine to focus the search on exciting patterns.
This segment commonly employs stake measures that cooperate with the data mining modules to focus the search
towards fascinating patterns. It might utilize a stake threshold to filter out discovered patterns. On the other hand, the
pattern evaluation module might be coordinated with the mining module, depending on the implementation of the data
mining techniques used. For efficient data mining, it is abnormally suggested to push the evaluation of pattern stake as
much as possible into the mining procedure to confine the search to only fascinating patterns.
Graphical User Interface:
The graphical user interface (GUI) module communicates between the data mining system and the user. This module
helps the user to easily and efficiently use the system without knowing the complexity of the process. This module
cooperates with the data mining system when the user specifies a query or a task and displays the results.
Knowledge Base:
The knowledge base is helpful in the entire process of data mining. It might be helpful to guide the search or evaluate
the stake of the result patterns. The knowledge base may even contain user views and data from user experiences that
might be helpful in the data mining process. The data mining engine may receive inputs from the knowledge base to
make the result more accurate and reliable. The pattern assessment module regularly interacts with the knowledge base
to get inputs, and also update it.

Page 5 of 20
Data Warehouse and Data Mining
Challenges of Data Mining
Data mining, the process of extracting knowledge from data, has become increasingly important as the amount of data
generated by individuals, organizations, and machines has grown exponentially. However, data mining is not without
its challenges. In this article, we will explore some of the main challenges of data mining.

1]Data Quality
The quality of data used in data mining is one of the most significant challenges. The accuracy, completeness, and
consistency of the data affect the accuracy of the results obtained. The data may contain errors, omissions, duplications,
or inconsistencies, which may lead to inaccurate results. Moreover, the data may be incomplete, meaning that some
attributes or values are missing, making it challenging to obtain a complete understanding of the data.
Data quality issues can arise due to a variety of reasons, including data entry errors, data storage issues, data integration
problems, and data transmission errors. To address these challenges, data mining practitioners must apply data cleaning
and data preprocessing techniques to improve the quality of the data. Data cleaning involves detecting and correcting
errors, while data preprocessing involves transforming the data to make it suitable for data mining.

2]Data Complexity
Data complexity refers to the vast amounts of data generated by various sources, such as sensors, social media, and the
internet of things (IoT). The complexity of the data may make it challenging to process, analyze, and understand. In
addition, the data may be in different formats, making it challenging to integrate into a single dataset.
To address this challenge, data mining practitioners use advanced techniques such as clustering, classification, and
association rule mining. These techniques help to identify patterns and relationships in the data, which can then be used
to gain insights and make predictions.

3]Data Privacy and Security

Data privacy and security is another significant challenge in data mining. As more data is collected, stored, and analyzed,
the risk of data breaches and cyber-attacks increases. The data may contain personal, sensitive, or confidential
information that must be protected. Moreover, data privacy regulations such as GDPR, CCPA, and HIPAA impose strict
rules on how data can be collected, used, and shared.

4]Scalability
Data mining algorithms must be scalable to handle large datasets efficiently. As the size of the dataset increases, the
time and computational resources required to perform data mining operations also increase. Moreover, the algorithms
must be able to handle streaming data, which is generated continuously and must be processed in real-time.
To address this challenge, data mining practitioners use distributed computing frameworks such as Hadoop and Spark.
These frameworks distribute the data and processing across multiple nodes, making it possible to process large datasets
quickly and efficiently.

5]Interpretability
Data mining algorithms can produce complex models that are difficult to interpret. This is because the algorithms use a
combination of statistical and mathematical techniques to identify patterns and relationships in the data. Moreover, the
models may not be intuitive, making it challenging to understand how the model arrived at a particular conclusion.
To address this challenge, data mining practitioners use visualization techniques to represent the data and the models
visually. Visualization makes it easier to understand the patterns and relationships in the data and to identify the most
important variables.

Page 6 of 20
Data Warehouse and Data Mining
6]Ethics
Data mining raises ethical concerns related to the collection, use, and dissemination of data. The data may be used to
discriminate against certain groups, violate privacy rights, or perpetuate existing biases. Moreover, data mining
algorithms may not be transparent, making it challenging to detect biases or discrimination.

Principles of Data Visualization

 Clarity
o The visualization should be clear and easily understood by the intended audience.
 Simplicity
o Keep the visualization simple and avoid unnecessary complexity.
 Purposeful
o Understand what message or insight you want to communicate and design for that purpose.
 Consistency
o Maintain consistency in the design elements throughout the visualization.
 Contextualization
o Provide context for the data being presented.
 Accuracy
o Ensure the visualization accurately represents the underlying data.
 Visuals Encoding
o Choose appropriate visual encodings for the data types you are visualizing.
 Intuitiveness
o Design the visualization to be intuitive and easy to comprehend.
 Interactivity
o Consider adding interactive elements to the visualization, such as tooltips, zooming, filtering, or
highlighting.
 Aesthetics
o Although aesthetics are subjective, a visually appealing design can engage viewers and increase their
interest in the data.
 Accessibility
o Accessibility is key; if users can’t read the data, it’s useless.
 Hierarchy
o Work out hierarchy of information early on and always remind yourself of what the purpose of
representing the data is.

Major Issues In Data Mining:

 Mining different kinds of knowledge in databases. - The need of different users is not the same. And Different
user may be in interested in different kind of knowledge. Therefore it is necessary for data mining to cover
broad range of knowledge discovery task.

 Interactive mining of knowledge at multiple levels of abstraction. - The data mining process needs to be
interactive because it allows users to focus the search for patterns, providing and refining data mining requests
based on returned results.

 Incorporation of background knowledge. - To guide discovery process and to express the discovered patterns,
the background knowledge can be used. Background knowledge may be used to express the discovered patterns
not only in concise terms but at multiple level of abstraction.

Page 7 of 20
Data Warehouse and Data Mining
 Data mining query languages and ad hoc data mining. - Data Mining Query language that allows the user
to describe ad hoc mining tasks, should be integrated with a data warehouse query language and optimized for
efficient and flexible data mining.

 Presentation and visualization of data mining results. - Once the patterns are discovered it needs to be
expressed in high level languages, visual representations. This representations should
 be easily understandable by the users.

 Handling noisy or incomplete data. - The data cleaning methods are required that can handle the noise,
incomplete objects while mining the data regularities. If data cleaning methods are not there then the accuracy
of the discovered patterns will be poor.

 Pattern evaluation. - It refers to interestingness of the problem. The patterns discovered should be interesting
because either they represent common knowledge or lack novelty.

 Efficiency and scalability of data mining algorithms. - In order to effectively extract the information from
huge amount of data in databases, data mining algorithm must be efficient and scalable.

 Parallel, distributed, and incremental mining algorithms. - The factors such as huge size of databases, wide
distribution of data, and complexity of data mining methods motivate the development of parallel and
distributed data mining algorithms. These algorithms divide the data into partitions which is further processed
parallel. Then the results from the partitions are merged. The incremental algorithms, updates the databases
without having to mine the data again from the scratch.

Page 8 of 20
Data Warehouse and Data Mining
On Line Transaction Processing (OLTP)
On-Line Transaction Processing (OLTP) System is a type of computer system that helps manage transaction-related
tasks. These systems are made to quickly handle transactions and queries (Insert, Delete and update) on the internet.
Almost every industry nowadays uses OLTP systems to keep track of their transactional data. OLTP systems mainly
focus on entering, storing, and retrieving data, which includes daily operations like purchasing, manufacturing, payroll,
accounting, etc. Many users use these systems for short transactions. They support simple database queries, which
makes it easier for users to get quick responses.
Type of queries that an OLTP system can Process
Insert queries
OLTP systems can process insert queries that add new data to the database, such as when a customer purchases a product.
Update queries
OLTP systems can process update queries that modify existing data in the database, such as when a customer changes
their address.
Delete queries
OLTP systems can process delete queries that remove data from the database, such as when a customer cancels an order.
Simple select queries
OLTP systems can process simple select queries that retrieve data from the database, such as when a customer searches
for a product.
Join queries
OLTP systems can process join queries that retrieve data from multiple tables in the database, such as when a customer
wants to see all their orders and the corresponding product details.

OLAP (Online Analytical Processing)?

OLAP stands for On-Line Analytical Processing. OLAP is a classification of software technology which authorizes
analysts, managers, and executives to gain information through fast, consistent, interactive access in a wide variety of
possible views of data that has been transformed from raw information to reflect the real dimensional of the enterprise
as understood by the clients.
OLAP implement the multidimensional analysis of business information and support the capability for complex
estimations, trend analysis. It is the essential foundation for Intelligent Solutions containing Business Performance
Management, Planning, Budgeting, Forecasting, Financial Documenting, Analysis, Simulation-Models, Knowledge
Discovery, and Data Warehouses Reporting.

Page 9 of 20
Data Warehouse and Data Mining
Who uses OLAP and Why?
OLAP applications are used by a variety of the functions of an organization.
Finance and accounting:
o Budgeting
o Activity-based costing
o Financial performance analysis
o And financial modeling
Sales and Marketing
o Sales analysis and forecasting
o Market research analysis
o Promotion analysis
o Customer analysis
o Market and customer segmentation
Production
o Production planning
o Defect analysis

Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from the first letters of the characteristics are:
 Fast
 Analysis
 Share
 Multidimensional
 Information

OLAP Operations in the Multidimensional Data Model

In the multidimensional model, the records are organized into various dimensions, and each dimension includes multiple
levels of abstraction described by concept hierarchies.

Consider the OLAP operations which are to be performed on multidimensional data. The figure shows data cubes for
sales of a shop. The cube contains the dimensions, location, and time and item, where the location is aggregated with
regard to city values, time is aggregated with respect to quarters, and an item is aggregated with respect to item types.

Roll-Up
The roll-up operation (also known as drill-up or aggregation operation) performs aggregation on a data cube, by
climbing down concept hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data cubes. Figure
shows the result of roll-up operations performed on the dimension location. The hierarchy for the location is defined as
the Order Street, city, province, or state, country. The roll-up operation aggregates the data by ascending the location
hierarchy from the level of the city to the level of the country.

When a roll-up is performed by dimensions reduction, one or more dimensions are removed from the cube. For example,
consider a sales data cube having two dimensions, location and time. Roll-up may be performed by removing, the time
dimensions, appearing in an aggregation of the total sales by location, relatively than by location and by time.

Example

Consider the following cubes illustrating temperature of certain days recorded weekly:

Temperature 64 65 68 69 70 71 72 75 80 81 83 85

Week1 1 0 1 0 1 0 0 0 0 0 1 0

Week2 0 0 0 1 0 0 1 2 0 1 0 0

Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in temperature from the above cubes.

Page 10 of 20
Data Warehouse and Data Mining
To do this, we have to group column and add up the value according to the concept hierarchies. This operation is known
as a roll-up.
By doing this, we contain the following cube:
Temperature cool mild hot

Week1 2 1 1

Week2 2 1 1

The roll-up operation groups the information by levels of temperature.

The following diagram illustrates how roll-up works.

Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-down is like zooming-in on
the data cube. It navigates from less detailed record to more detailed data. Drill-down can be performed by
either stepping down a concept hierarchy for a dimension or adding additional dimensions.
Figure shows a drill-down operation performed on the dimension time by stepping down a concept hierarchy which is
defined as day, month, quarter, and year. Drill-down appears by descending the time hierarchy from the level of the
quarter to a more detailed level of the month.
Because a drill-down adds more details to the given data, it can also be performed by adding a new dimension to a cube.
For example, a drill-down on the central cubes of the figure can occur by introducing an additional dimension, such as
a customer group.

Page 11 of 20
Data Warehouse and Data Mining
Example
Drill-down adds more details to the given data

Temperature cool mild hot

Day 1 0 0 0

Day 2 0 0 0

Day 3 0 0 1

Day 4 0 1 0

Day 5 1 0 0

Day 6 0 0 0

Day 7 1 0 0

Day 8 0 0 0

Day 9 1 0 0

Day 10 0 1 0

Day 11 0 1 0

Day 12 0 1 0

Day 13 0 0 1

Day 14 0 0 0

The following diagram illustrates how Drill-down works.

Page 12 of 20
Data Warehouse and Data Mining

Slice
A slice is a subset of the cubes corresponding to a single value for one or more members of the dimension. For example,
a slice operation is executed when the customer wants a selection on one dimension of a three-dimensional cube
resulting in a two-dimensional site. So, the Slice operations perform a selection on one dimension of the given cube,
thus resulting in a subcube.
For example, if we make the selection, temperature=cool we will obtain the following cube:

Temperature cool

Day 1 0

Day 2 0

Day 3 0

Day 4 0

Day 5 1

Page 13 of 20
Data Warehouse and Data Mining

Day 6 1

Day 7 1

Day 8 1

Day 9 1

Day 11 0

Day 12 0

Day 13 0

Day 14 0

The following diagram illustrates how Slice works.

Here Slice is functioning for the dimensions "time" using the criterion time = "Q1".
It will form a new sub-cubes by selecting one or more dimensions.

Page 14 of 20
Data Warehouse and Data Mining
Dice
The dice operation describes a subcube by operating a selection on two or more dimension.
For example, Implement the selection (time = day 3 OR time = day 4) AND (temperature = cool OR temperature = hot)
to the original cubes we get the following subcube (still two-dimensional)

Temperature cool hot

Day 3 0 1

Day 4 0 0

Consider the following diagram, which shows the dice operations.

The dice operation on the cubes based on the following selection criteria involves three dimensions.
o (location = "Toronto" or "Vancouver")
o (time = "Q1" or "Q2")
o (item =" Mobile" or "Modem")

Page 15 of 20
Data Warehouse and Data Mining
Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations which rotates the data axes in view to
provide an alternative presentation of the data. It may contain swapping the rows and columns or moving one of the
row-dimensions into the column dimensions.

Consider the following diagram, which shows the pivot operation.

Page 16 of 20
Data Warehouse and Data Mining

Types of OLAP
There are three main types of OLAP servers are as following:

ROLAP stands for Relational OLAP, an application based on relational DBMSs.

MOLAP stands for Multidimensional OLAP, an application based on multidimensional DBMSs.
HOLAP stands for Hybrid OLAP, an application using both relational and multidimensional techniques.

Relational OLAP (ROLAP) Server

These are intermediate servers which stand in between a relational back-end server and user frontend tools.
They use a relational or extended-relational DBMS to save and handle warehouse data, and OLAP middleware to
provide missing pieces.
ROLAP servers contain optimization for each DBMS back end, implementation of aggregation navigation logic, and
additional tools and services.
ROLAP technology tends to have higher scalability than MOLAP technology.
ROLAP systems work primarily from the data that resides in a relational database, where the base data and dimension
tables are stored as relational tables. This model permits the multidimensional analysis of data.
This technique relies on manipulating the data stored in the relational database to give the presence of traditional OLAP's
slicing and dicing functionality. In essence, each method of slicing and dicing is equivalent to adding a "WHERE"
clause in the SQL statement.
Relational OLAP Architecture
ROLAP Architecture includes the following components
o Database server.
o ROLAP server.
o Front-end tool.

Page 17 of 20
Data Warehouse and Data Mining

Multidimensional OLAP (MOLAP) Server

A MOLAP system is based on a native logical model that directly supports multidimensional data and operations. Data
are stored physically into multidimensional arrays, and positional techniques are used to access them.
One of the significant distinctions of MOLAP against a ROLAP is that data are summarized and are stored in an
optimized format in a multidimensional cube, instead of in a relational database. In MOLAP model, data are structured
into proprietary formats by client's reporting requirements with the calculations pre-generated on the cubes.
MOLAP Architecture
MOLAP Architecture includes the following components
o Database server.
o MOLAP server.
o Front-end tool.

MOLAP structure primarily reads the precompiled data. MOLAP structure has limited capabilities to dynamically create
aggregations or to evaluate results which have not been pre-calculated and stored.
Hybrid OLAP (HOLAP) Server
HOLAP incorporates the best features of MOLAP and ROLAP into a single architecture. HOLAP systems save more
substantial quantities of detailed data in the relational tables while the aggregations are stored in the pre-calculated
cubes. HOLAP also can drill through from the cube down to the relational tables for delineated data. The Microsoft
SQL Server 2000 provides a hybrid OLAP server.

Page 18 of 20
Data Warehouse and Data Mining

Difference between ROLAP, MOLAP, and HOLAP

ROLAP MOLAP HOLAP

MOLAP stands for

ROLAP stands for Relational Online HOLAP stands for Hybrid Online
Multidimensional Online
Analytical Processing. Analytical Processing.
Analytical Processing.

The HOLAP storage mode connects

The MOLAP storage mode
attributes of both MOLAP and
The ROLAP storage mode causes the principle the aggregations of the
ROLAP. Like MOLAP, HOLAP
aggregation of the division to be division and a copy of its source
causes the aggregation of the
stored in indexed views in the information to be saved in a
division to be stored in a
relational database that was specified multidimensional operation in
multidimensional operation in an
in the partition's data source. analysis services when the
SQL Server analysis services
separation is processed.
instance.

This MOLAP operation is highly

optimize to maximize query
performance. The storage area can
ROLAP does not because a copy of
be on the computer where the HOLAP does not causes a copy of
the source information to be stored in
partition is described or on another the source information to be stored.
the Analysis services data folders.
computer running Analysis For queries that access the only
Instead, when the outcome cannot be
services. Because a copy of the summary record in the aggregations
derived from the query cache, the
source information resides in the of a division, HOLAP is the
indexed views in the record source are
multidimensional operation, equivalent of MOLAP.
accessed to answer queries.
queries can be resolved without
accessing the partition's source
record.

Queries that access source record for

example, if we want to drill down to
Query response times can be
Query response is frequently slower an atomic cube cell for which there
reduced substantially by using
with ROLAP storage than with the is no aggregation information must
aggregations. The record in the
MOLAP or HOLAP storage mode. retrieve data from the relational
partition's MOLAP operation is
Processing time is also frequently database and will not be as fast as
only as current as of the most recent
slower with ROLAP. they would be if the source
processing of the separation.
information were stored in the
MOLAP architecture.

Page 19 of 20
Data Warehouse and Data Mining

Following are the difference between OLAP and OLTP system.

Users: OLTP systems are designed for office worker while the OLAP systems are designed for decision-makers.
Therefore while an OLTP method may be accessed by hundreds or even thousands of clients in a huge enterprise, an
OLAP system is suitable to be accessed only by a select class of manager and may be used only by dozens of users.

2) Functions: OLTP systems are mission-critical. They provide day-to-day operations of an enterprise and are largely
performance and availability driven. These operations carry out simple repetitive operations. OLAP systems are
management-critical to support the decision of enterprise support tasks using detailed investigation.

3) Nature: Although SQL queries return a set of data, OLTP methods are designed to step one record at the time, for
example, a data related to the user who may be on the phone or in the store. OLAP system is not designed to deal with
individual customer records. Instead, they include queries that deal with many data at a time and provide summary or
aggregate information to a manager. OLAP applications include data stored in a data warehouses that have been
extracted from many tables and possibly from more than one enterprise database.

4) Design: OLTP database operations are designed to be application-oriented while OLAP operations are designed to
be subject-oriented. OLTP systems view the enterprise record as a collection of tables (possibly based on an entity-
relationship model). OLAP operations view enterprise information as multidimensional).

5) Data: OLTP systems usually deal only with the current status of data. For example, a record about an employee who
left three years ago may not be feasible on the Human Resources System. The old data may have been achieved on
some type of stable storage media and may not be accessible online. On the other hand, OLAP systems needed historical
data over several years since trends are often essential in decision making.

6) Kind of use: OLTP methods are used for reading and writing operations while OLAP methods usually do not update
the data.

7) View: An OLTP system focuses primarily on the current data within an enterprise or department, which does not
refer to historical data or data in various organizations. In contrast, an OLAP system spans multiple version of a database
schema, due to the evolutionary process of an organization. OLAP system also deals with information that originates
from different organizations, integrating information from many data stores. Because of their huge volume, these are
stored on multiple storage media.

8) Access Patterns: The access pattern of an OLTP system consist primarily of short, atomic transactions. Such a system
needed concurrency control and recovery techniques. However, access to OLAP systems is mostly read-only operations
because these data warehouses store historical information.

The biggest difference between an OLTP and OLAP system is the amount of data analyzed in a single transaction.
Whereas an OLTP handles many concurrent customers and queries touching only a single data or limited collection of
records at a time, an OLAP system must have the efficiency to operate on millions of data to answer a single query.

Page 20 of 20

Exercises Aeroelasticity 2
100% (1)
Exercises Aeroelasticity 2
1 page
A Detailed Lesson Plan in MATHEMATICS 6 Day 3
No ratings yet
A Detailed Lesson Plan in MATHEMATICS 6 Day 3
10 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Character When Relevant
No ratings yet
Character When Relevant
4 pages
Datawarehouse & Data Mining
No ratings yet
Datawarehouse & Data Mining
59 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
DWDM Word Docu
No ratings yet
DWDM Word Docu
14 pages
DM - Unit4
No ratings yet
DM - Unit4
15 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
DM Notes
No ratings yet
DM Notes
193 pages
Session 35 - Data Mining and Data Warehousing
No ratings yet
Session 35 - Data Mining and Data Warehousing
14 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
No ratings yet
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
9 pages
Data Mining Abstract
No ratings yet
Data Mining Abstract
6 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
13 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
Computer Science 3rd Year Specilization
No ratings yet
Computer Science 3rd Year Specilization
9 pages
DM Material
No ratings yet
DM Material
98 pages
Datamining
100% (1)
Datamining
11 pages
Dmbi PPT 1
No ratings yet
Dmbi PPT 1
40 pages
Data Minng
No ratings yet
Data Minng
20 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
6 pages
DMDW
No ratings yet
DMDW
287 pages
DM 1
No ratings yet
DM 1
22 pages
Data Mining
No ratings yet
Data Mining
25 pages
Data Mining
No ratings yet
Data Mining
14 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
11 pages
1 DM Intro1
No ratings yet
1 DM Intro1
34 pages
Current Trends
No ratings yet
Current Trends
35 pages
Unit 1
No ratings yet
Unit 1
46 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
8 Data Mining and Warehousing
No ratings yet
8 Data Mining and Warehousing
171 pages
DWDM External
No ratings yet
DWDM External
30 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
DSDM Notes
No ratings yet
DSDM Notes
114 pages
Unit 1 Data Warehouse and Data Mining
No ratings yet
Unit 1 Data Warehouse and Data Mining
13 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Chapter-5 Data Mining - Introduction
No ratings yet
Chapter-5 Data Mining - Introduction
1 page
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
Hu DM 2024
No ratings yet
Hu DM 2024
205 pages
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
No ratings yet
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
14 pages
Data Mining: The Basic Concept
No ratings yet
Data Mining: The Basic Concept
23 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
"Connecting The Dots To Make Sense of Data": Contents
No ratings yet
"Connecting The Dots To Make Sense of Data": Contents
14 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
DWDM Unit - 1-1
No ratings yet
DWDM Unit - 1-1
25 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
56 pages
Chapter 6 - Data Mining Techniques
No ratings yet
Chapter 6 - Data Mining Techniques
19 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
1 DM Intro
No ratings yet
1 DM Intro
34 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Dwdm-Unit-1 R16
No ratings yet
Dwdm-Unit-1 R16
17 pages
Unit 4 Introduction To Data Mining
No ratings yet
Unit 4 Introduction To Data Mining
22 pages
Intro To Data Minning
No ratings yet
Intro To Data Minning
24 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
13 pages
Blue White Creative Cute Group Project Presentation
No ratings yet
Blue White Creative Cute Group Project Presentation
18 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Course 5
No ratings yet
Course 5
19 pages
PCX Hotline: 725-8888: Dealer's Price List Fri, Aug 21, 2020
No ratings yet
PCX Hotline: 725-8888: Dealer's Price List Fri, Aug 21, 2020
2 pages
Basics of Knitting Straight Bar Knitting Machine
100% (4)
Basics of Knitting Straight Bar Knitting Machine
3 pages
LAS Biotech 8 MELC 1 Week 1
No ratings yet
LAS Biotech 8 MELC 1 Week 1
8 pages
Jurisprudence Syllabus - NAAC - New
No ratings yet
Jurisprudence Syllabus - NAAC - New
8 pages
Briyana Butler Resume 2-4-4
No ratings yet
Briyana Butler Resume 2-4-4
1 page
Education Skills: Video Editing and Post Production
No ratings yet
Education Skills: Video Editing and Post Production
1 page
MBA II Sem (R19) RegulSup Results Aug-2023
No ratings yet
MBA II Sem (R19) RegulSup Results Aug-2023
40 pages
CAUNIT1
No ratings yet
CAUNIT1
31 pages
Tentative Deviation at KK Nagar - Correction
100% (1)
Tentative Deviation at KK Nagar - Correction
84 pages
Bab 6 Heat Treatment of Steels
No ratings yet
Bab 6 Heat Treatment of Steels
23 pages
Fuckbook
No ratings yet
Fuckbook
5 pages
Review of Gran Turismo 6
No ratings yet
Review of Gran Turismo 6
4 pages
InductiveReasoningTest4 Questions
100% (1)
InductiveReasoningTest4 Questions
31 pages
Cause Effect Eng.5lpppp
100% (2)
Cause Effect Eng.5lpppp
3 pages
412 MM CH32
No ratings yet
412 MM CH32
78 pages
SET 2 BROADCST TV and RADIO Signals SET 2
No ratings yet
SET 2 BROADCST TV and RADIO Signals SET 2
3 pages
(Ebook) Mental Disorder in Canada: An Epidemiological Perspective by John Cairney David L. Streiner ISBN 9781442698574, 1442698578
100% (2)
(Ebook) Mental Disorder in Canada: An Epidemiological Perspective by John Cairney David L. Streiner ISBN 9781442698574, 1442698578
77 pages
Manual A2 Final
No ratings yet
Manual A2 Final
43 pages
EWM Setup - Expertsoft
No ratings yet
EWM Setup - Expertsoft
7 pages
Entrepreneurship Introduction
No ratings yet
Entrepreneurship Introduction
25 pages
02 CHAPTER 2 Gears and Gear Trains
No ratings yet
02 CHAPTER 2 Gears and Gear Trains
23 pages
Admission Test For The Degree Course in Medicine and Surgery Academic Year 2020/2021
No ratings yet
Admission Test For The Degree Course in Medicine and Surgery Academic Year 2020/2021
45 pages
Mba-Hrd Placementreport 2018-19
No ratings yet
Mba-Hrd Placementreport 2018-19
16 pages
Gift Policy: ETHICS Handbook 23
No ratings yet
Gift Policy: ETHICS Handbook 23
3 pages
Notes - The New Deal - Text PP
No ratings yet
Notes - The New Deal - Text PP
2 pages
TCASII Pilot Handbook
No ratings yet
TCASII Pilot Handbook
99 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit - II DW

Uploaded by

Unit - II DW

Uploaded by

Data Warehouse and Data Mining

Advantages of Data Mining:

Classification of Data Mining Systems

Classification Based on the mined Databases

Classification Based on the Techniques Utilized

Examples of Classification Task

1. Predictive Data Mining

2. Descriptive Data Mining

Data Mining Architecture

3]Data Privacy and Security

Principles of Data Visualization

Major Issues In Data Mining:

OLAP (Online Analytical Processing)?

OLAP Operations in the Multidimensional Data Model

The roll-up operation groups the information by levels of temperature.

The following diagram illustrates how roll-up works.

Temperature cool mild hot

The following diagram illustrates how Drill-down works.

The following diagram illustrates how Slice works.

Temperature cool hot

Consider the following diagram, which shows the dice operations.

Consider the following diagram, which shows the pivot operation.

ROLAP stands for Relational OLAP, an application based on relational DBMSs.

Relational OLAP (ROLAP) Server

Multidimensional OLAP (MOLAP) Server

Difference between ROLAP, MOLAP, and HOLAP

ROLAP MOLAP HOLAP

MOLAP stands for

The HOLAP storage mode connects

This MOLAP operation is highly

Queries that access source record for

Following are the difference between OLAP and OLTP system.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.