0% found this document useful (0 votes)
183 views6 pages

The Full Form of KDD Is

The data warehouse backend process involves data extraction, data cleansing, data transformation and data loading. The main OLAP operations are roll-up, drill-down, slice, dice and pivot. Roll-up aggregates data along a dimension hierarchy or by removing dimensions. Drill-down navigates to more granular data by descending a hierarchy or adding dimensions. Slice selects a subset of data on one dimension. Dice selects subsets across multiple dimensions. Pivot rotates or transforms the view of data in a cube.

Uploaded by

Arshad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views6 pages

The Full Form of KDD Is

The data warehouse backend process involves data extraction, data cleansing, data transformation and data loading. The main OLAP operations are roll-up, drill-down, slice, dice and pivot. Roll-up aggregates data along a dimension hierarchy or by removing dimensions. Drill-down navigates to more granular data by descending a hierarchy or adding dimensions. Slice selects a subset of data on one dimension. Dice selects subsets across multiple dimensions. Pivot rotates or transforms the view of data in a cube.

Uploaded by

Arshad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

a) The full form of KDD is

i) Knowledge database
ii) Knowledge discovery in databases
iii) Knowledge data division
iv) Knowledge data definition
b) You are given data about seismic activity in japan, and you want to predict a magnitude of the
next earthquake , this is an example of
i) Supervised learning
ii) Unsupervised learning
iii) Serration
iv) Dimensionality reduction
c) Which of the following data not involve in data mining ?
i) Knowledge extraction
ii) Data archaeology
iii) Data exploration
iv) Data transformation
d) ………………….. is a comparison of the general features of the target class data objects against the
general features of objects form one or multiple contrasting classes.
i) Data characterization
ii) Data classification
iii) Data discrimination
iv) Data selection
e) Bayesian classifiers is
i) A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory.
ii) Any mechanism employed by a learning system to constrain the search space of a
hypothesis.
iii) An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanation to fit the new situation.
iv) None of the above.
f) The output of KDD is
i) Data
ii) Information
iii) Query
iv) Useful information
g) Cluster is
i) Group of similar objects that differ significantly from other objects
ii) Operations on a database to transform or simplify data in order to prepare it for a
machine learning algorithm
iii) Symbolic representation of facts or ideas from which information can potentially be
extracted
iv) None of the above
h) Background knowledge referred to
i) Additional acquaintance used by a learning algorithm to facilitate the learning process
ii) A neural network that makes use of a hidden layer
iii) It is a form of automatic learning
iv) None of the above
i) Case-based learning is
i) A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory
ii) Any mechanism employed by a learning system to constrain the search space of a
hypothesis
iii) An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanation to fit the new situation.
iv) None of the above.
j) Some telecommunication companies want to segment their customers into distinct groups in
order to send appropriate subscription offers. This is an example of
i) Supervised learning
ii) Data extraction
iii) Serration
iv) Unsupervised learning

2. A) compare and contrast data warehouse system and operational database system.

Operational Database Data Warehouse

Operational systems are designed to support Data warehousing systems are typically
high-volume transaction processing. designed to support high-volume analytical
processing (i.e., OLAP).

Operational systems are usually concerned with Data warehousing systems are usually
current data. concerned with historical data.

Data within operational systems are mainly Non-volatile, new data may be added regularly.
updated regularly according to need. Once Added rarely changed.

It is designed for real-time business dealing and It is designed for analysis of business measures
processes. by subject area, categories, and attributes.

It is optimized for a simple set of transactions, It is optimized for extent loads and high,
generally adding or retrieving a single row at a complex, unpredictable queries that access
time per table. many rows per table.
It is optimized for validation of incoming Loaded with consistent, valid information,
information during transactions, uses validation requires no real-time validation.
data tables.

It supports thousands of concurrent clients. It supports a few concurrent clients relative to


OLTP.

Operational systems are widely process-oriented. Data warehousing systems are widely subject-
oriented

Operational systems are usually optimized to Data warehousing systems are usually
perform fast inserts and updates of associatively optimized to perform fast retrievals of relatively
small volumes of data. high volumes of data.

Data In Data Out

Less Number of data accessed. Large Number of data accessed.

Relational databases are created for on-line Data Warehouse designed for on-line Analytical
transactional Processing (OLTP) Processing (OLAP)

b) Describe the steps involved in data mining when viewed as a process of knowledge discovery.

Answer:
1. Data Cleaning
2. Data Integration
3. Data Selection
4. Data transformation
5. Data Mining
6. Pattern Evaluation
7. Knowledge presentation
Explanation:
The steps involved in data mining or data analytics when viewed as a process of
knowledge discovery includes the following:
Step 1. Data cleaning: this involves the elimination of inconsistent data.
Step 2. Data integration: this involves the combination of data from multiple sources.
Step 3. Data selection: this is the step where significant data for task examination
are gathered from the database.
Step 4. Data transformation: this the step in which the data are modified for mining
by conducting the aggregate operation.
Step 5. Data mining: this step involves the extraction of data patterns through
specific techniques.
Step 6. Pattern evaluation: this step involves the identification of patterns that depict
knowledge based on measures.
Step 7. Knowledge presentation: this is the step in which visualization and
knowledge representation methods are utilized to illustrate mined knowledge to
users.
3. a) What is data warehouse backend process? Explain. Briefly

b) explain different OLAP operations.

Roll-up (Drill-up): The roll-up operation performs aggregation on a data cube, either by climbing
up a concept hierarchy for a dimension or by dimension reduction.

• Performing roll-up using climbing up a concept hierarchy :

o Consider a hierarchy defined as the total order “street<city <province="" or="" state=""
<country.”="" <="" p="">

o Rather than grouping the data by city, the resulting cube groups the data by country.

• Performing roll-up using dimension reduction:

o One or more dimensions are removed from the given cube.

o Consider a sales data cube containing only the two dimensions location and time.

o Roll-up may be performed by removing the time dimension, resulting in an aggregation of the
total sales by location, rather than by both location and by time.

Drill-down: Drill-down is the reverse of roll-up. It navigates from less detailed data to more
detailed data. Drill-down can be realized by either stepping down a concept hierarchy for a
dimension or introducing additional dimensions.

• Performing a drill-down operation using stepping down a concept hierarchy

o Consider time defined as “day <month <quarter="" <year.”="" <="" p="">

o Drill-down occurs by descending the time hierarchy from the level of quarter to the more
detailed level of month.
• Performing a drill-down operation by adding new dimensions to a cube

o Consider the central cube of the figure

o A drill-down on can occur by introducing an additional dimension, such as customer group.

Slice: The slice operation performs a selection on one dimension of the given cube, resulting in
a sub cube.

o The figure shows a slice operation where the sales data are selected from the central cube for
the dimension ‘time’ using the criterion ‘time= “Q1” ’.

Dice: The dice operation defines a sub cube by performing a selection on two or more
dimensions.

o The figure shows a dice operation on the central cube based on the following selection criteria
that involve three dimensions: (location = “Toronto” or “Vancouver”) and (time = “Q1” or “Q2”)
and (item = “home entertainment” or “computer”).

Pivot (rotate): Pivot is a visualization operation that rotates the data axes in view in order to
provide an alternative presentation of the data.

o The figure shows a pivot operation where the item and location axes in a 2-D slice are rotated.

o Other examples include rotating the axes in a 3-D cube, or transforming a 3-D cube into a
series of 2-D planes.

Other OLAP operations (extra points for reference)

• Drill-across operation executes queries involving more than one fact table.

• Drill-through operation uses relational SQL facilities to drill through the bottom level of a data
cube down to its back-end relational tables.

• ranking the top N or bottom N items in lists, as well as computing moving averages, growth
rates, interests, internal rates of return, depreciation, currency conversions, and statistical
functions.

• OLAP offers analytical modeling capabilities, including a calculation engine for deriving ratios,
variance, and so on, and for computing measures across multiple dimensions.

• It can generate summarizations, aggregations, and hierarchies at each granularity level and at
every dimension intersection.

• OLAP also supports functional models for forecasting, trend analysis, and statistical analysis.
In this context, an OLAP engine is a powerful data analysis tool.
4. a) what is data cleaning? Describe various approaches for cleaning data having missing values.

b) use the two methods below to normalize the following group of data :

200; 300; 400; 600; 1000

Min-max normalization by setting min= 0 and max = 1

z-score normalization

c) What is the value range of the following normalization methods?

Min-max normalization

z-score normalization

normalization by decimal scaling.

5. a) write and explain pseudo code for a priori algorithm. Explain the terms

i) support count;

ii) confidence.

b) Consider a database D , consisting is transactions. Suppose minimum support count is 2 (i.e,


min_sup = 20% ) and minimum confidence required is 70% . find out the frequent item set using a priori
algorithm. Explain each step with diagram :

6. Draw decision tree for the following data sets. Use entropy as a node selection mechanism:

7. a) what are the main requirements for cluster analysis?

b) explain different basic clustering methods.

8. a) what is multilevel association rule mining? Explain different approaches to do multilevel


association rule mining.

b) explain naïve Bayesian classification algorithm.

9. a) with neat diagram, explain the architecture of data warehouse. Explain the terms ROLAP, MOLAP
and HOLAP.

b) what are the differences between the three main type of data warehouse usage information
processing, analytical processing and data mining? Discuss the motivation behind OLAP mining ( OLAM).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy