Assignment 2 Slot8 TTS3208 Summer
Assignment 2 Slot8 TTS3208 Summer
Department of CSE
Academic Year: 2021-2022
Assignment – II
Task Allotment
List of Tasks
Knowl
Task Course
Question Outcome
edge
No Level
1. A. Suppose that the values for a given set of data are grouped into intervals.
The intervals and corresponding frequencies are as follows.
age frequency
1–5 200
5–15 450
15–20 300
20–50 1500
50–80 700
80–110 44
Using the data for age given, plot an equal-width histogram of width 10.
CO3 K2
B. The following table shows the midterm and final exam grades obtained for CO4 K3
students in a database course. CO5 K2
X 2 6 7 10 9 10 13 12 34 55 90 20
Y 25 65 75 59 35 25 55 18 19 20 29 10
B. A database has five transactions. Let min sup = 50% and min conf = 60%.
TID items bought CO3 K2
T100 {F,I,S,H,E,R} CO4 K3
T200 { G,O,A,T,S} CO5 K2
T300 {D, O, V, E,S}
T400 {L, I,O, N, E,S,S }
T500 {S, U,G,A,R,C,A,N,E}
Explore all frequent itemsets using FP-growth and find the strong association
rules.
C. Discuss a case study on impacts of data mining for weather forecasting
3. A. Suppose a hospital tested the age and body fat data for randomly selected
adults with the following result:
age 23 23 27 27 39 41 47 49 50
%fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2
B. A database has five transactions. Let min sup = 60% and min con f = 80%. CO3 K2
TID items bought CO4 K3
T100 {M, O, N, K, E, Y} CO5 K2
T200 {D, O, N, K, E, Y }
T300 {M, A, K, E}
T400 {M, U, C, K, Y}
T500 {C, O, O, K, I ,E}
Explore all frequent itemsets using FP- growth and find the strong association
rules
C. Discuss a case study on impacts of data mining in HR.
4. A. Suppose that the data for analysis includes the attribute age. The age values
for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22,
25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) Compute the mean and median of the data.
(b) Compute the mode of the data.
(c)Compute the midrange of the data.
B. A database has five transactions. Let min sup = 60% and min con f = 80%.
TID items bought
T100 {M, O, N, K, E, Y} CO3 K2
T200 {D, O, N, K, E, Y } CO4 K3
T300 {M, A, K, E} CO5 K2
T400 {M, U, C, K, Y}
T500 {C, O, O, K, I ,E}
(a) Find all frequent itemsets using Apriori.
(b) List all of the strong association rules (with support s and confidence c)
matching the following metarule, where X is a variable representing customers,
and itemi denotes variables representing items (e.g., “A”, “B”, etc.):
(a) Suppose that the association rule “hot dogs=> hamburgers” is mined. Given
a minimum support threshold of 25% and a minimum confidence threshold of
50%, is this association rule strong?
(b) Based on the given data, is the purchase of hot dogs independent of the
purchase of hamburgers? If not, what kind of correlation relationship exists
between the two?
(a) Suppose that the association rule “hot dogs=> hamburgers” is mined. Given
a minimum support threshold of 25% and a minimum confidence threshold of
50%, is this association rule strong?
(b) Based on the given data, is the purchase of hot dogs independent of the
purchase of hamburgers? If not, what kind of correlation relationship exists
between the two
8. A. Suppose a hospital tested the age and body fat data for 9 randomly CO3 K2
selected adults with the following result: CO4 K3
age 52 54 54 56 57 58 58 60 61 CO5 K2
%fat 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7
Icecream ̅̅̅̅̅̅̅̅̅̅̅̅̅
𝐼𝑐𝑒𝑐𝑟𝑒𝑎𝑚 ∑ 𝑟𝑜𝑤
Chocolate 3,500 2300 5,800
̅̅̅̅̅̅̅̅̅̅̅̅̅̅
𝐶ℎ𝑜𝑐𝑜𝑙𝑎𝑡𝑒 2,500 700 3,200
∑ 𝑐𝑜𝑙 6,000 3,000 9,000
(a) Suppose that the association rule “Icecream=> Chocolate” is mined. Given
a minimum support threshold of 25% and a minimum confidence threshold of
50%, is this association rule strong?
(b) Based on the given data, is the purchase of Icecream independent of the
purchase of Chocolate? If not, what kind of correlation relationship exists
between the two?
B. The following table shows the midterm and final exam grades obtained for
students in a database course. CO3 K2
(a) Plot the data. Do x and y seem to have a linear relationship? CO4 K3
(b) Use the method of least squares to find an equation for the prediction of a CO5 K2
student’s final exam grade based on the student’s midterm grade in the course.
X Mid 72 50 81 74 94 86 59 83 65 33 88 81
Y Final 84 63 77 78 90 75 49 79 77 52 74 90
B. A database has five transactions. Let min sup = 50% and min conf = 60%.
TID items bought
T100 {F,I,S,H,E,R}
T200 { G,O,A,T,S}
T300 {D, O, V, E,S}
T400 {L, I,O, N, E,S,S }
T500 {S, U,G,A,R,C,A,N,E}
(a) Find all frequent itemsets using Apriori and find the strong
association rules
B. A database has five transactions. Let min sup = 60% and min con f = 80%.
TID items bought
T100 {M, O, N, K, E, Y}
T200 {D, O, N, K, E, Y }
T300 {M, A, K, E}
T400 {M, U, C, K, Y}
T500 {C, O, O, K, I ,E}
Explore all frequent itemsets using Apriori and find the strong association rules
B. A random sample of 435 people were surveyed and each person was asked
to report the highest education level they obtained. Is gender independent
of education level? The data that resulted from the survey is summarized in
the following table:
Female 55 65 60 46 226
Male 45 44 60 60 209
B. Using the data set for predicting borrowers who will default on loan
payments,
Classify the borrowers using Bayes’ rule based on the following borrowers.
1. { Homeowner=yes, marital status=single, Annual Income=300k}
2. { Homeowner=yes, marital status=married, Annual Income=50k}
C. Cluster the following ten points (with (x, y) representing locations) into three
clusters: A1(10, 9), A2(4, 9), A3(6, 6), A4(9, 7), A5(6, 5), A6(9, 4), A7(1, 9), A8(5, 9),
A9(9, 2),A10(10, 2), using Kmeans algorithm.
C. 16
A. Suppose a hospital tested the age and body fat data for randomly selected
adults with the following result:
age 23 23 27 27 39 41 47 49 50
%fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2
B. Car theft dataset with attributes such as Color , Type , Origin, and
the subject, stolen can be either yes or no.
Example Color Type Origin Stolen?
1 Red Sports Domestic Yes
2 Red Sports Domestic No
3 Red Sports Domestic Yes
4 Yellow Sports Domestic No
5 Yellow Sports Imported Yes
6 Yellow SUV Imported No
7 Yellow SUV Imported Yes
8 Yellow SUV Domestic No
9 Red SUV Imported No
10 Red Sports Imported Yes
11 Blue Sports Imported Yes
12 Green SUV Domestic No
Classify the cars using Bayes’ rule based on the following features
1. {Blue Domestic Sports}
2. {Red Imported SUV}
D. 17 A. Use the two methods below to normalize the following group of data:150,
350, 400, 650, 850, 900, 1000
(a) min-max normalization by setting min = 0 and max = 1
(b) z-score normalization
B. A random sample of 435 people were surveyed and each person was asked
to report the highest education level they obtained. Is gender independent of
education level? The data that resulted from the survey is summarized in the
following table:
Female 55 65 60 46 226
Male 45 44 60 60 209