Isp565 - Its665 Feb 22
Isp565 - Its665 Feb 22
INSTRUCTIONS TO CANDIDATES:
2. Late submission is allowed but the penalty will be given, which subject to lecturer's
consideration.
3. All answers need to be submitted through the MS Teams within the given time.
QUESTION 1
Table 1 shows a database of lists of garments and sports equipment purchased by six (6)
customers. Given that the minimum support is 50% and minimum confidence is 80%.
Table 1
Customer ID List of items
T1 coat, blouse, trousers
T2 coat, sari, rods, coat
T3 helmet, trousers, rods
T4 blouse, coat, sari, rods
T5 nets, rods, coat, sari
T6 rods, kurta, sari
a) Determine frequent k-itemsets using the Apriori algorithm for the k=1. Clearly show the
steps involved.
(4 marks)
b) Determine frequent k-itemsets using Apriori algorithm for the k=2 and k=3. Clearly show
the steps involved.
(6 marks)
c) From (b), generate the possible rules for k=2 and identify the satisfied rules.
(7 marks)
d) From (b), generate the possible rules for k=3 that satisfy this rule (a à b Ç c). Provide the
conclusion.
(4 marks)
QUESTION 2
a) You are given the following dissimilarity matrix between the six cities in Peninsular
Malaysia.
i) Perform complete link hierarchical clustering. Show the first two iterations and draw
its dendrogram(s).
(8 marks)
ii) Continue to show the remaining iterations and draw its dendrogram(s). Conclude
the number of cluster(s) if the threshold is 170.
(6 marks)
b) Given three objects represented by A(10, 4, 2, 8), B(3, 12, 1 , 6) and C(15, 4, 1 , 7).
ii) Construct the dissimilarity matrix for the objects. Which objects are the most similar
to others? Justify why.
(5 marks)
* Required
* This form will record your name, please fill your name.
(2 Points)
(2 Points)
Q Supervised learning
O Reinforcement learning
O Unsupervised learning
O Hybrid learning
2/14/2022
4
In predicting the cases of COVID-19, the final number of total patients can be
considered as the .*
(2 Points)
Q Outcome
O Attribute
O Observation
O Features
(2 Points)
O performance
O robustness
Q high dimensionality
^ The output of a data mining task can be in the form of patterns, trends or rules that are implicit in the
^ data.
^ Data mining is an Artificial Intelligence powered tool that can discover useful information from human
that can then be used to improve actions.
O Data mining can discover the anomalies that might be significant in a particular business.
x-v Data mining can best be described as business intelligence (Bl) technology that has various techniques
to extract comprehensible, hidden and useful information from a population of data.
"Children over age of 2 whose Body Mass Index (BMI) is less than the 5th
percentile are considered underweight". This statement refers to
(2 Points)
O wisdom
O information
O knowledge
O data
2/14/2022
8
The following are the problems in mining huge amount of data except
(2 Points)
O Time and speed - time taken to achieve a certain level of accuracy or for evaluation.
O Cost of the learning set - related to sample size for training and cost to achieve good accuracy.
Which one of the following correctly refers to the task of the classification?
(2 Points)
2/14/2022
10
(2 Points)
O Classifying the propensity of the COVID-19 patient to have prolong COVID-19 or not.
11
II. where a given data is randomly partitioned into two independent sets.
III. a procedure that has a single parameter called k that refers to the number of
groups that a given data sample is to be split into. As such, the procedure is often
called k-fold cross-validation.
(3 Points)
O I, HI, and IV
O I and II
Ensemble Methods are used to increase the accuracy of the classifier's model.
Which of these is NOT an ensemble method? *
(3 Points)
2/14/2022
A confusion matrix for multiple classes is shown in Table 1 is
for questions 13 to 15.
13
(3 Points)
1
p
Q 109 "good" instances are correctly classified and 8 "poor" instances are correctly classified as "poor".
o
w
11 "good" instances are incorrectly classified as "poor" and 109 instances are correctly classified as
"good".
^ 8 "poor" instances are wrongly classified as "good" and 11 "good" instances are wrongly classified as
^ "poor".
112 instances are correctly classified as "poor" and 8 "poor" instances are incorrectly classified as
o: good".
14
(3 Points)
(3 Points)
2/14/2022
Questions 16-20 are based on the paragraphs below
(Paragraphs 1 - 4).
Paragraph 1]
PUTRAJAYA: The Health Ministry has identified 22 new COVID-19 clusters nationwide, of which 16
are workplace-related. Out the 16 workplace clusters, eight are linked to factories in Selangor,
Negeri Sembilan, Melaka, Kedah and Johor. One of them is the Industri Waja 2 cluster, which
involves workers of a factory at Kawasan Perusahaan Telok Panglima Garang in Kuala Langat,
Selangor.
[Paragraph 2]
A total of 249 individuals were involved in a targeted screening exercise which managed to detect
29 cases. There were also four new clusters linked to construction sites in Kuala Lumpur and
Selangor. One of the clusters is related to workers of a construction site at Lingkaran Eco Majestic
in Beranang, Hulu Langat. A total of 279 individuals had undertaken a targeted screening exercise,
of which 24 have been confirmed as COVID-19 positive.
[Paragraph 3]
The other workplace clusters are related to employees of supermarkets and a public institute.
There were also six clusters classified as community outbreaks. This includes the Dah Sawi cluster
in Kulim, Kedah, which is traced to house visits by several individuals.
[Paragraph 4]
The index case is a 50-year-old man who tested positive on June 26, after developing symptoms
on June 21. Some 35 people linked to the cluster have been screened, and 27 have so far tested
positive. Nationwide, there are now 880 active clusters.
Source: https://www.thestar.com.my/news/nation/2021/06/28/covid-19-16-out-of-22-new-
clusters-related-to-workplaces (https://www.thestar.com.my/news/nation/2021/06/28/covid-19-
16-out-of-22-new-clusters-related-to-workplaces)
16
Consider exact matching to identify the correct pair for the following document-
term matrix for the first and second paragraphs in the following news. *
(3 Points)
17
Consider exact matching to identify the correct pair for the following document-
term matrix for the second and forth paragraphs in the following news.
(3 Points)
2/14/2022
18
(3 Points)
19
2/14/2022
20
Conclude the similarity result for the documents based on the cosine similarity
that has been calculated above. *
(3 Points)
2/14/2022
Student's Details
21
Matric Number *
22
Name *
23
Group *
O CS259 4A
O CS259 4B
This content is neither created nor endorsed by Microsoft. The data you submit will be sent to the form owner.
• • Microsoft Forms
2/14/2022