DM QB
DM QB
Module-1
1. Discuss component of data mining.
2. Draw and explain Data Mining Architecture.
3. Write down short note on KDD process.
4. Why we called data mining rather than knowledge mining?
5. Explain classification of Data Mining.
6. What are the issues in Data Mining explain in detail.
7. Briefly explain four schemes of integration of Data Mining to
Data Warehouse.
Module-2
1. Using the data for age:13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25,
25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. (a)
Use min-max normalization to transform the value. b) Use z-
score normalization to transform the value.
2. What is noise? Explain data smoothing methods as noise
removal technique to divide given data into bins of size 3 by bin
partition (equal frequency), by bin means, by bin medians and
by bin boundaries. Consider the data: 10, 2, 19, 18, 20, 18, 25,
28, 22.
3. Explain various data normalization techniques with suitable
example.
4. Describe data reduction methods and Explain any one method.
5. What is concept hierarchy and explain types of hierarchy
generation method.
6. Explain Attribute Selection methods with suitable example.
7. Define terms:1)modes 2)variance 3)standard deviation 3)
quartile.
8. Suppose that the data for analysis includes the attribute age. The age
values for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20,
20, 21, 22, 22, 25, 25, 25, 25, 30, 33,33, 35, 35, 35, 35, 36, 40, 45, 46, 52,
70.
(a) What is the mean of the data?What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e.,
bimodal,trimodal, etc.).
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile
(Q3) of the data?
(e) Give the five-number summary of the data.
(f) Show a boxplot of the data.
(g) Use min-max normalization to transform the value35for age on to the
range [0:0,1:0].
(h) Use z-score normalization to transform the value 35 for age, where
the standard
deviation of age is 12.94 years.
(I) Use normalization by decimal scaling to transform the value 35 for
age.
9. In real-world data, tuples with missing values for some attributes are a
common occurrence. Describe various methods for handling this problem.
10. Suppose a group of 12 sales price records has been sorted as follows:
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215
Partition them into three bins by each of the following methods:
(a) equal-frequency (equidepth) partitioning
(b) equal-width partitioning
(c) clustering
Module-3
1. Briefly explain Market Basket Analysis.
2. Generate frequent item sets and generate association rules based on it using Apriori
algorithm. Minimum support is 2 and minimum confidence is 50%
3.Write a note on Association Rule Mining.
4. Explain the two measures of rule interestingness: support and
confidence.