Data Mining-1,2,3,4, & 5-Units & Qps
Data Mining-1,2,3,4, & 5-Units & Qps
PART – A
(25 Marks)
PART – B
(50 Marks)
2.a) Explain how to integrate data mining system with a data warehouse.
b) “Data preprocessing is necessary before data mining process”. Justify your answer. [5+5]
OR
3.a) Differentiate between data mining and data warehouse.
b) Discuss the major issues in data mining. [5+5]
---ooOoo---
Code No: 157BC R18
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, July/August - 2022
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max.Marks:75
Answer any five questions
All questions carry equal marks
---
---oo0oo---
Code No: 157BC R18
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, February/March - 2022
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Answer any Five Questions
All Questions Carry Equal Marks
---
1. Explain the need of data preprocessing and various forms of preprocessing. [15]
2. What is a data warehouse? Demonstrate integrating data mining system with a data
warehouse with a neat diagram. [15]
3. Apply FP-Growth algorithm to the following data for finding frequent item sets, consider
support threshold as 30%. [15]
8. How to apply mining techniques to unstructured text database? Explain with example.
[15]
---ooOoo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, January/February - 2023
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Note: i) Question paper consists of Part A, Part B.
ii) Part A is compulsory, which carries 25 marks. In Part A, Answer all questions.
iii) In Part B, Answer any one question from each unit. Each question carries 10 marks
and may have a, b as sub questions.
PART – A
(25 Marks)
Define data mining. [2]
b) What is meant by outlier analysis? [3]
c) Define maximal frequent item set. [2]
d) How to compute confidence of an association rule? Give example. [3]
e) What is meant by test data? [2]
f) What is the significance of information gain? [3]
g) What is cluster analysis? [2]
h) What are the draw backs of single linkage clustering? [3]
i) Give examples for un structured text. [2]
j) List the applications of web usage mining. [3]
PART – B
(50 Marks)
2. Discuss the steps in knowledge discovery process and compare it with data access and
information retrieval. [10]
OR
3.a) Appraise usage of smoothing in data transformation.
b) Evaluate distance measures for dissimilarity computation. [5+5]
4. Apply FP –Growth algorithm to the following transactional database to find frequent item
sets. [10]
List of items
001 I1,I3,I5,I7
002 I1,I5,I6,I7
003 I6,I7
004 I2,I3,I6,I7
005 I8,I1,I6
006 I2,I5,I8
OR
5.a) Appraise the limitations of Apriori and suggest mechanisms to improve it.
b) Explain item merging concept for mining closed frequent item sets. [5+5]
6. State classification problem and discuss a general approach to solve classification problem.
[10]
OR
7.a) Discuss decision tree over fitting and pruning techniques.
b) Justify the selection of k value for kNN classifier. [5+5]
8. Discuss hierarchical methods for clustering and contrast agglomerative and divisive
approaches. [10]
OR
9.a) With suitable data explain statistical based outlier detection.
b) Criticize the evaluation metrics used for clusters. [5+5]
10. Discuss the data mining tasks applicable to text databases. [10]
OR
11.a) Discover episode rules for the text given in this question paper.
b) Give a brief note on PageRank algorithm used in web structure mining. [5+5]
---ooOoo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, July/August - 2022
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max.Marks:75
Answer any five questions
All questions carry equal marks
---
3. How to find all the frequent item sets using Apriori algorithm for the given data
where min-sup = 2. [15]
4.a) List out different kinds of Association Rules with an example for each.
b) Explain about maximal frequent Item set and closed frequent Item set. [7+8]
---oo0oo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, February/March - 2022
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Answer any five questions
All questions carry equal marks
---
3. Write a note on Maximal Frequent Item Set and Closed Frequent Item Set. [15]
4. Explain about the Apriori algorithm for finding frequent item sets with an example.[15]
---ooOoo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, March - 2021
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Answer any Five Questions
All Questions Carry Equal Marks
--
3. Explain market basket analysis and its relevance to association rule. Explain the Apriori
algorithm using the following transactional data assuming that the support count is 22%.
Illustrate with an example.
TID LIST OF ITEMS
001 milk, dal, sugar, bread
002 Dal, sugar, wheat ,jam
003 Milk, bread, curd, paneer
004 Wheat, paneer, dal, sugar
005 Milk, paneer, bread
006 Wheat, dal, paneer, bread. [15]
5. How Neural Networks can be used for Data classification? Which algorithm is suitable?
Explain them with example? [15]
--ooOoo--
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, September - 2021
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Answer any Five Questions
All Questions Carry Equal Marks
---
1.a) What is data mining? Discuss the challenges associated with data mining.
b) Illustrate any three measures for dissimilarity of numeric data. [8+7]
3. Generate the strong association rules for the following transactions using Apriori algorithm.
minsup = 30% and minconf = 65%. [15]
4. Consider the following training data set to construct naïve Bayesian classifier and classify
the test case: Attr1=M, Attr2=Q, Attr3=? Explain the process. [15]
--ooOoo--
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, October/November - 2020
DATA MINING
(Common to CSE, IT)
Time: 2 hours Max. Marks: 75
Answer any Five Questions
All Questions Carry Equal Marks
---
1. What is the need for processing of data? Explain the techniques for Data processing.
[15]
2. Discuss the characteristics of Data warehousing and steps in building a data warehousing
architecture. [15]
3. How can the efficiency of Apriori Algorithm be improved? Explain mining quantitative
association rules with appropriate examples with support count = 2
T1 = {fever, cold, sore throat, running nose, difficulty breathing}
T2 = {cold, sore throat}
T3 = {cold, fever}
T4 = {difficulty breathing, fever}
T5 = {fever, cold, difficulty breathing}
T6 = {cold, fever, running nose} [15]
7. How efficient is the K-medoids algorithm on large data sets? Illustrate with example.[15]
---ooOoo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, December - 2019
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
PART – A
(25 Marks)
PART – B
(50 Marks)
2. Discuss data mining as a step in knowledge discovery process and various challenges
associated. [10]
OR
3. Use a flowchart to summarize the following procedures for attribute subset selection:
a) Stepwise forward selection
b) Stepwise backward elimination. [10]
4. Classify frequent pattern mining methods and explain the criteria followed for
classification. [10]
OR
5. Apply apriori algorithm to find frequent itemsets from the following transactional database.
Let min_sup = 30%. [10]
TID Items_bought
1 Pen, notebook, ruler
2 Pencil, eraser, sharpener
3 Pen, ruler, chart, sharpener
4 Pencil, clip, eraser
5 Ruler, pin, story book, pen
6 Marker, chart, sketchpens
6. State classification problem and briefly explain general approaches to solve it. [10]
OR
7. Apply Naïve-Bayesian classifier to identify class label(campus_placement) to the new
sample/student < 7 to 8, ‘Fair’, ‘Excellent’,’No’>. [10]
8. Suppose that the data mining task is to cluster the following eight students into three
clusters, the distance function is Manhattan.Assign record 1,2,3 as the centroid of each
cluster respectively.Use the k-means algorithm to show the final three clusters. [10]
OR
9. Appraise the importance of outlier detection and its application. Explain any one approach
for outlier detection. [10]
10. Discuss various kinds of patterns to be mined from web/server logs in web usage mining.
[10]
OR
11. Compare and contrast text mining with web content mining using lucid examples. [10]
--ooOoo—