0% found this document useful (0 votes)
83 views160 pages

Data Mining-1,2,3,4, & 5-Units & Qps

Uploaded by

yashrafsonu4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views160 pages

Data Mining-1,2,3,4, & 5-Units & Qps

Uploaded by

yashrafsonu4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 160

Code No: 157BC R18

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD


B. Tech IV Year I Semester Examinations, January/February - 2023
DATA MINING
(Common to CSE, IT, ITE)
Time: 3 Hours Max. Marks: 75

Note: i) Question paper consists of Part A, Part B.


ii) Part A is compulsory, which carries 25 marks. In Part A, answer all questions.
iii) In Part B, Answer any one question from each unit. Each question carries 10 marks
and may have a, b as sub questions.

PART – A
(25 Marks)

What is data warehouse? [2]


b) List out the applications of data mining. [3]
c) What is meant by association rule mining? [2]
d) Write a short note on SPM algorithm? [3]
e) Why are decision trees useful? [2]
f) List the advantages of using decision trees. [3]
g) Discuss the two approaches to improve quality of hierarchical clustering. [2]
h) List the applications of cluster analysis. [3]
i) Define data stream mining. [2]
j) Give the taxonomy of web mining. [3]

PART – B
(50 Marks)

2.a) Explain how to integrate data mining system with a data warehouse.
b) “Data preprocessing is necessary before data mining process”. Justify your answer. [5+5]
OR
3.a) Differentiate between data mining and data warehouse.
b) Discuss the major issues in data mining. [5+5]

4.a) Write a short notes on constraint based association mining.


b) Describe various types of association rules. [5+5]
OR
5. Explain in detail about frequent pattern mining in data mining. [10]
6. Describe Bayesian Belief Network with an example. [10]
OR
7.a) Briefly explain classification problems and general approaches to solve them.
b) Explain the merits and de-merits of the lazy learning method. [5+5]
8. Explain the following.
a) Cluster analysis.
b) Grid–based methods. [5+5]
OR
9.a) How density based method is used for clustering?
b) Illustrate K-mean algorithm with an example. [4+6]

10. Explain the following.


a) Spatial data mining.
b) Text mining. [5+5]
OR
11. Discuss various kinds of patterns to be mined from web/server logs in web usage mining.
[10]

---ooOoo---
Code No: 157BC R18
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, July/August - 2022
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max.Marks:75
Answer any five questions
All questions carry equal marks
---

1.a) Write short notes on data mining task primitives.


b) Discuss in detail about data preprocessing. [7+8]

2. Explain the following:


a) Integration of data mining system with a data warehouse.
b) Classification of data mining systems. [7+8]

3.a) How do you find frequent patterns in data mining? Explain.


b) Explain constraint based association mining. [7+8]

4.a) What are the measures of association rule mining? Explain.


b) Write short notes on SPM. [8+7]

5.a) Compare the methods of classification and prediction.


b) How to evaluate performance of classification model? Explain. [7+8]

6. Discuss in detail about rule-based classification. [15]

7.a) Explain K-means algorithm with an example.


b) What are the key issues in hierarchical clustering? [9+6]

8. Explain the following:


a) Spatial data mining.
b) Mining sequence patterns in transactional databases. [7+8]

---oo0oo---
Code No: 157BC R18
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, February/March - 2022
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Answer any Five Questions
All Questions Carry Equal Marks
---

1. Explain the need of data preprocessing and various forms of preprocessing. [15]

2. What is a data warehouse? Demonstrate integrating data mining system with a data
warehouse with a neat diagram. [15]

3. Apply FP-Growth algorithm to the following data for finding frequent item sets, consider
support threshold as 30%. [15]

TID List of ItemIDs


1 I1, i2, i4, i5
2 I2, i4, i7
3 I2,i3,i4,i5
4 I1,i3,i4,i7
5 I1,i2,i3,i4,i5
6 I3,i4,i5,i6

4.a) How to identify sub graphs in a graph?


b) Give an overview of correlation analysis. [8+7]

5.a) Explain classification as a two step process.


b) State Bayes theorem. How this concept is used in classification. [8+7]

6. What is a decision tree? Explain decision tree induction algorithm. [15]

7.a) Contrast k-means clustering with k-medoids clustering approach.


b) Discuss the merits and demerits of hierarchical approaches for clustering. [8+7]

8. How to apply mining techniques to unstructured text database? Explain with example.
[15]

---ooOoo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, January/February - 2023
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Note: i) Question paper consists of Part A, Part B.
ii) Part A is compulsory, which carries 25 marks. In Part A, Answer all questions.
iii) In Part B, Answer any one question from each unit. Each question carries 10 marks
and may have a, b as sub questions.

PART – A
(25 Marks)
Define data mining. [2]
b) What is meant by outlier analysis? [3]
c) Define maximal frequent item set. [2]
d) How to compute confidence of an association rule? Give example. [3]
e) What is meant by test data? [2]
f) What is the significance of information gain? [3]
g) What is cluster analysis? [2]
h) What are the draw backs of single linkage clustering? [3]
i) Give examples for un structured text. [2]
j) List the applications of web usage mining. [3]

PART – B
(50 Marks)
2. Discuss the steps in knowledge discovery process and compare it with data access and
information retrieval. [10]
OR
3.a) Appraise usage of smoothing in data transformation.
b) Evaluate distance measures for dissimilarity computation. [5+5]

4. Apply FP –Growth algorithm to the following transactional database to find frequent item
sets. [10]

List of items

001 I1,I3,I5,I7

002 I1,I5,I6,I7

003 I6,I7

004 I2,I3,I6,I7

005 I8,I1,I6

006 I2,I5,I8
OR
5.a) Appraise the limitations of Apriori and suggest mechanisms to improve it.
b) Explain item merging concept for mining closed frequent item sets. [5+5]

6. State classification problem and discuss a general approach to solve classification problem.
[10]
OR
7.a) Discuss decision tree over fitting and pruning techniques.
b) Justify the selection of k value for kNN classifier. [5+5]

8. Discuss hierarchical methods for clustering and contrast agglomerative and divisive
approaches. [10]
OR
9.a) With suitable data explain statistical based outlier detection.
b) Criticize the evaluation metrics used for clusters. [5+5]

10. Discuss the data mining tasks applicable to text databases. [10]
OR
11.a) Discover episode rules for the text given in this question paper.
b) Give a brief note on PageRank algorithm used in web structure mining. [5+5]

---ooOoo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, July/August - 2022
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max.Marks:75
Answer any five questions
All questions carry equal marks
---

1.a) Discuss about challenging issues in Data Mining.


b) What is preprocessing? Explain about Data Transformation techniques. [7+8]

2.a) Explain about the Data Cleaning techniques in detail.


b) Write about Data Mining Tasks with examples. [7+8]

3. How to find all the frequent item sets using Apriori algorithm for the given data
where min-sup = 2. [15]

4.a) List out different kinds of Association Rules with an example for each.
b) Explain about maximal frequent Item set and closed frequent Item set. [7+8]

5. Describe Naïve Bayesian Classification method with an example. [15]

6.a) How to solve a classification problem using k-nearest neighbor algorithm?


b) Explain about the measures for selecting the best split. [8+7]

7.a) List out various clustering methods.


b) How to cluster the data sets using k-means clustering algorithm? [5+10]

8.a) Explain about unstructured text mining.


b) What is web content mining? Discuss in detail. [7+8]

---oo0oo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, February/March - 2022
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Answer any five questions
All questions carry equal marks
---

1.a) Explain Various Data Mining Functionalities with an example.


b) Illustrate about Data Mining Task Primitives. [8+7]

2.a) What is Data Cleaning? Describe various methods of Data Cleaning.


b) Discuss about the Issues to be considered during Data Integration. [7+8]

3. Write a note on Maximal Frequent Item Set and Closed Frequent Item Set. [15]

4. Explain about the Apriori algorithm for finding frequent item sets with an example.[15]

5. Discuss about Decision tree induction algorithm with an example. [15]

6. Discuss about Naïve-Bayes classification algorithm with an example. [15]

7. Write partitioning around mediods algorithm. [15]

8. Explain about hierarchy of categories in text mining. [15]

---ooOoo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, March - 2021
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Answer any Five Questions
All Questions Carry Equal Marks
--

1. a) How to handle redundancy in data integration?


b) Explain principal component analysis as a method of dimensionality reduction. [7+8]

2. How can we mine closed frequent item sets? Explain. [15]

3. Explain market basket analysis and its relevance to association rule. Explain the Apriori
algorithm using the following transactional data assuming that the support count is 22%.
Illustrate with an example.
TID LIST OF ITEMS
001 milk, dal, sugar, bread
002 Dal, sugar, wheat ,jam
003 Milk, bread, curd, paneer
004 Wheat, paneer, dal, sugar
005 Milk, paneer, bread
006 Wheat, dal, paneer, bread. [15]

4. Discuss K- Nearest neighbor classification-Algorithm and Characteristics. [15]

5. How Neural Networks can be used for Data classification? Which algorithm is suitable?
Explain them with example? [15]

6. Explain various issues and challenges in data mining. [15]

7.a) Describe web usage mining.


b) Explain about Text Clustering with an illustrative example. [7+8]

8.a) Write and explain about the k-medoids algorithm.


b) Describe distance based outlier detection. [8+7]

--ooOoo--
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, September - 2021
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75
Answer any Five Questions
All Questions Carry Equal Marks
---

1.a) What is data mining? Discuss the challenges associated with data mining.
b) Illustrate any three measures for dissimilarity of numeric data. [8+7]

2.a) How to handle missing values in data mining process?


b) Explain the steps in principal component analysis for data reduction. [7+8]

3. Generate the strong association rules for the following transactions using Apriori algorithm.
minsup = 30% and minconf = 65%. [15]

Trans- List of items


id
T1 Paneer, cheese, garlic, ginger,
butter
T2 Bread, butter, cheese, milk, sugar
T3 Milk, tea powder, sugar, bread
T4 Noodles, pasta, butter, cheese
T5 Paneer, peas, baby corn, butter
T6 Bread, jam, butter, eggs
T7 Bread, cheese, butter, milk
T8 Paneer, butter, eggs, sugar.

4. Consider the following training data set to construct naïve Bayesian classifier and classify
the test case: Attr1=M, Attr2=Q, Attr3=? Explain the process. [15]

Attr1 Attr2 Attr3


M B T
M S T
G Q T
H S T
G Q T
G Q F
G S F
H B F
H Q F
M B F
5.a) Discuss the significance of information gain in decision tree induction.
b) Explain k-nearest neighbor algorithm with an example. [7+8]

6.a) How to evaluate clustering algorithms? Provide illustrations.


b) Explain the key issues, strengths and weaknesses of hierarchical clustering algorithms.[7+8]

7.a) Discuss the applications of web usage mining.


b) Explain web structure mining with a suitable algorithm. [7+8]

8.a) How to convert unstructured text in to features in text mining?


b) Demonstrate clustering of text documents using appropriate similarity measures. [7+8]

--ooOoo--
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, October/November - 2020
DATA MINING
(Common to CSE, IT)
Time: 2 hours Max. Marks: 75
Answer any Five Questions
All Questions Carry Equal Marks
---

1. What is the need for processing of data? Explain the techniques for Data processing.
[15]

2. Discuss the characteristics of Data warehousing and steps in building a data warehousing
architecture. [15]

3. How can the efficiency of Apriori Algorithm be improved? Explain mining quantitative
association rules with appropriate examples with support count = 2
T1 = {fever, cold, sore throat, running nose, difficulty breathing}
T2 = {cold, sore throat}
T3 = {cold, fever}
T4 = {difficulty breathing, fever}
T5 = {fever, cold, difficulty breathing}
T6 = {cold, fever, running nose} [15]

4.a) Explain types of association rules in data mining.


b) Describe constraint -based association mining. [7+8]

5. Explain Naïve-Bayes classification technique with an illustrative example. [15]

6. What is need of performing efficient implementation of Decision tree induction? Explain


entropy information gain and gini index with example on sales data. [15]

7. How efficient is the K-medoids algorithm on large data sets? Illustrate with example.[15]

8.a) Discuss the basic measures for text retrieval.


b) Explain briefly about Web mining. [7+8]

---ooOoo---
Code No: 137BQ R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year I Semester Examinations, December - 2019
DATA MINING
(Common to CSE, IT)
Time: 3 Hours Max. Marks: 75

Note: This question paper contains two parts A and B.


Part A is compulsory which carries 25 marks. Answer all questions in Part A. Part B
consists of 5 Units. Answer any one full question from each unit. Each question carries 10
marks and may have a, b as sub questions.

PART – A
(25 Marks)

1. a) Define data mining. [2]


b) List the methods of filling missing values. [3]
c) Define closed frequent itemset. [2]
d) What is the need of confidence measure in association rule mining? [3]
e) List the measures for selecting best split in decision tree construction. [2]
f) Quote an example for Bayesian belief network. [3]
g) What are the limitations of single linkage algorithm? [2]
h) List the typical requirements of clustering in data mining. [3]
i) What is meant by stop words? [2]
j) Give the taxonomy of web mining [3]

PART – B
(50 Marks)

2. Discuss data mining as a step in knowledge discovery process and various challenges
associated. [10]
OR
3. Use a flowchart to summarize the following procedures for attribute subset selection:
a) Stepwise forward selection
b) Stepwise backward elimination. [10]

4. Classify frequent pattern mining methods and explain the criteria followed for
classification. [10]
OR
5. Apply apriori algorithm to find frequent itemsets from the following transactional database.
Let min_sup = 30%. [10]
TID Items_bought
1 Pen, notebook, ruler
2 Pencil, eraser, sharpener
3 Pen, ruler, chart, sharpener
4 Pencil, clip, eraser
5 Ruler, pin, story book, pen
6 Marker, chart, sketchpens
6. State classification problem and briefly explain general approaches to solve it. [10]
OR
7. Apply Naïve-Bayesian classifier to identify class label(campus_placement) to the new
sample/student < 7 to 8, ‘Fair’, ‘Excellent’,’No’>. [10]

SID CGPA Coding Soft Hackathon Campus_placement


Skills Skills Participation
1 7 to 8 Excellent Fair Yes Yes
2 8 to 9 Fair Excellent Yes Yes
3 9 to10 Poor Fair No Yes
4 5 to 6 Poor Excellent No No
5 7 to 8 Excellent Poor No No
6 8 to 9 Fair Fair Yes Yes
7 9 to Poor Poor No No
10

8. Suppose that the data mining task is to cluster the following eight students into three
clusters, the distance function is Manhattan.Assign record 1,2,3 as the centroid of each
cluster respectively.Use the k-means algorithm to show the final three clusters. [10]

RecordID Height(cms) Weight(kgs)


1 145 35
2 165 55
3 170 90
4 135 60
5 140 50
6 160 75
7 150 40
8 155 65

OR
9. Appraise the importance of outlier detection and its application. Explain any one approach
for outlier detection. [10]

10. Discuss various kinds of patterns to be mined from web/server logs in web usage mining.
[10]
OR
11. Compare and contrast text mining with web content mining using lucid examples. [10]

--ooOoo—

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy