The Full Form of KDD Is
The Full Form of KDD Is
i) Knowledge database
ii) Knowledge discovery in databases
iii) Knowledge data division
iv) Knowledge data definition
b) You are given data about seismic activity in japan, and you want to predict a magnitude of the
next earthquake , this is an example of
i) Supervised learning
ii) Unsupervised learning
iii) Serration
iv) Dimensionality reduction
c) Which of the following data not involve in data mining ?
i) Knowledge extraction
ii) Data archaeology
iii) Data exploration
iv) Data transformation
d) ………………….. is a comparison of the general features of the target class data objects against the
general features of objects form one or multiple contrasting classes.
i) Data characterization
ii) Data classification
iii) Data discrimination
iv) Data selection
e) Bayesian classifiers is
i) A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory.
ii) Any mechanism employed by a learning system to constrain the search space of a
hypothesis.
iii) An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanation to fit the new situation.
iv) None of the above.
f) The output of KDD is
i) Data
ii) Information
iii) Query
iv) Useful information
g) Cluster is
i) Group of similar objects that differ significantly from other objects
ii) Operations on a database to transform or simplify data in order to prepare it for a
machine learning algorithm
iii) Symbolic representation of facts or ideas from which information can potentially be
extracted
iv) None of the above
h) Background knowledge referred to
i) Additional acquaintance used by a learning algorithm to facilitate the learning process
ii) A neural network that makes use of a hidden layer
iii) It is a form of automatic learning
iv) None of the above
i) Case-based learning is
i) A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory
ii) Any mechanism employed by a learning system to constrain the search space of a
hypothesis
iii) An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanation to fit the new situation.
iv) None of the above.
j) Some telecommunication companies want to segment their customers into distinct groups in
order to send appropriate subscription offers. This is an example of
i) Supervised learning
ii) Data extraction
iii) Serration
iv) Unsupervised learning
2. A) compare and contrast data warehouse system and operational database system.
Operational systems are designed to support Data warehousing systems are typically
high-volume transaction processing. designed to support high-volume analytical
processing (i.e., OLAP).
Operational systems are usually concerned with Data warehousing systems are usually
current data. concerned with historical data.
Data within operational systems are mainly Non-volatile, new data may be added regularly.
updated regularly according to need. Once Added rarely changed.
It is designed for real-time business dealing and It is designed for analysis of business measures
processes. by subject area, categories, and attributes.
It is optimized for a simple set of transactions, It is optimized for extent loads and high,
generally adding or retrieving a single row at a complex, unpredictable queries that access
time per table. many rows per table.
It is optimized for validation of incoming Loaded with consistent, valid information,
information during transactions, uses validation requires no real-time validation.
data tables.
Operational systems are widely process-oriented. Data warehousing systems are widely subject-
oriented
Operational systems are usually optimized to Data warehousing systems are usually
perform fast inserts and updates of associatively optimized to perform fast retrievals of relatively
small volumes of data. high volumes of data.
Relational databases are created for on-line Data Warehouse designed for on-line Analytical
transactional Processing (OLTP) Processing (OLAP)
b) Describe the steps involved in data mining when viewed as a process of knowledge discovery.
Answer:
1. Data Cleaning
2. Data Integration
3. Data Selection
4. Data transformation
5. Data Mining
6. Pattern Evaluation
7. Knowledge presentation
Explanation:
The steps involved in data mining or data analytics when viewed as a process of
knowledge discovery includes the following:
Step 1. Data cleaning: this involves the elimination of inconsistent data.
Step 2. Data integration: this involves the combination of data from multiple sources.
Step 3. Data selection: this is the step where significant data for task examination
are gathered from the database.
Step 4. Data transformation: this the step in which the data are modified for mining
by conducting the aggregate operation.
Step 5. Data mining: this step involves the extraction of data patterns through
specific techniques.
Step 6. Pattern evaluation: this step involves the identification of patterns that depict
knowledge based on measures.
Step 7. Knowledge presentation: this is the step in which visualization and
knowledge representation methods are utilized to illustrate mined knowledge to
users.
3. a) What is data warehouse backend process? Explain. Briefly
Roll-up (Drill-up): The roll-up operation performs aggregation on a data cube, either by climbing
up a concept hierarchy for a dimension or by dimension reduction.
o Consider a hierarchy defined as the total order “street<city <province="" or="" state=""
<country.”="" <="" p="">
o Rather than grouping the data by city, the resulting cube groups the data by country.
o Consider a sales data cube containing only the two dimensions location and time.
o Roll-up may be performed by removing the time dimension, resulting in an aggregation of the
total sales by location, rather than by both location and by time.
Drill-down: Drill-down is the reverse of roll-up. It navigates from less detailed data to more
detailed data. Drill-down can be realized by either stepping down a concept hierarchy for a
dimension or introducing additional dimensions.
o Drill-down occurs by descending the time hierarchy from the level of quarter to the more
detailed level of month.
• Performing a drill-down operation by adding new dimensions to a cube
Slice: The slice operation performs a selection on one dimension of the given cube, resulting in
a sub cube.
o The figure shows a slice operation where the sales data are selected from the central cube for
the dimension ‘time’ using the criterion ‘time= “Q1” ’.
Dice: The dice operation defines a sub cube by performing a selection on two or more
dimensions.
o The figure shows a dice operation on the central cube based on the following selection criteria
that involve three dimensions: (location = “Toronto” or “Vancouver”) and (time = “Q1” or “Q2”)
and (item = “home entertainment” or “computer”).
Pivot (rotate): Pivot is a visualization operation that rotates the data axes in view in order to
provide an alternative presentation of the data.
o The figure shows a pivot operation where the item and location axes in a 2-D slice are rotated.
o Other examples include rotating the axes in a 3-D cube, or transforming a 3-D cube into a
series of 2-D planes.
• Drill-across operation executes queries involving more than one fact table.
• Drill-through operation uses relational SQL facilities to drill through the bottom level of a data
cube down to its back-end relational tables.
• ranking the top N or bottom N items in lists, as well as computing moving averages, growth
rates, interests, internal rates of return, depreciation, currency conversions, and statistical
functions.
• OLAP offers analytical modeling capabilities, including a calculation engine for deriving ratios,
variance, and so on, and for computing measures across multiple dimensions.
• It can generate summarizations, aggregations, and hierarchies at each granularity level and at
every dimension intersection.
• OLAP also supports functional models for forecasting, trend analysis, and statistical analysis.
In this context, an OLAP engine is a powerful data analysis tool.
4. a) what is data cleaning? Describe various approaches for cleaning data having missing values.
b) use the two methods below to normalize the following group of data :
z-score normalization
Min-max normalization
z-score normalization
5. a) write and explain pseudo code for a priori algorithm. Explain the terms
i) support count;
ii) confidence.
6. Draw decision tree for the following data sets. Use entropy as a node selection mechanism:
9. a) with neat diagram, explain the architecture of data warehouse. Explain the terms ROLAP, MOLAP
and HOLAP.
b) what are the differences between the three main type of data warehouse usage information
processing, analytical processing and data mining? Discuss the motivation behind OLAP mining ( OLAM).