0% found this document useful (0 votes)
664 views17 pages

Isp565 - Its665 Feb 22

Exam Paper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
664 views17 pages

Isp565 - Its665 Feb 22

Exam Paper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CONFIDENTIAL CS/FEB 2022/ISP565/Set 1

UNIVERSITI TEKNOLOGI MARA


FINAL EXAMINATION

COURSE DATA MINING


COURSE CODE ISP565 / ITS665
EXAMINATION FEBRUARY 2022
TIME 90 MINUTES

INSTRUCTIONS TO CANDIDATES:

1. This question paper consists of PART 2 of Short Structured (2 QUESTIONS).

2. Late submission is allowed but the penalty will be given, which subject to lecturer's
consideration.

3. All answers need to be submitted through the MS Teams within the given time.

4. Please prepare your worksheet in softcopy.

5. Always make back up by having a copy of answer in your worksheet.

6. Answer ALL questions in English.

7. Save your file: CS2594X_studentID_name.pdf

DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO DO SO


This examination paper consists of 33 printed pages

© Hak Cipta Universiti Teknologi MARA CONFIDENTIAL


CONFIDENTIAL 2 CS/FEB 2022/ISP565/Set 1

QUESTION 1
Table 1 shows a database of lists of garments and sports equipment purchased by six (6)
customers. Given that the minimum support is 50% and minimum confidence is 80%.

Table 1
Customer ID List of items
T1 coat, blouse, trousers
T2 coat, sari, rods, coat
T3 helmet, trousers, rods
T4 blouse, coat, sari, rods
T5 nets, rods, coat, sari
T6 rods, kurta, sari

a) Determine frequent k-itemsets using the Apriori algorithm for the k=1. Clearly show the
steps involved.
(4 marks)

b) Determine frequent k-itemsets using Apriori algorithm for the k=2 and k=3. Clearly show
the steps involved.
(6 marks)

c) From (b), generate the possible rules for k=2 and identify the satisfied rules.
(7 marks)

d) From (b), generate the possible rules for k=3 that satisfy this rule (a à b Ç c). Provide the
conclusion.
(4 marks)

e) Discuss TWO (2) limitations of Market Basket Analysis.


(4 marks)

© Hak Cipta Universiti Teknologi MARA CONFIDENTIAL


CONFIDENTIAL 3 CS/FEB 2022/ISP565/Set 1

QUESTION 2
a) You are given the following dissimilarity matrix between the six cities in Peninsular
Malaysia.

Kangar Alor Ipoh Rawang Kuantan Jerantut


M Setar [A) CO (R) (Nj 0]
Kangar (K) 0
Alor Setar [A) 58 0
Ipoh (I) 290 240 0
Rawang (RJ 462 407 173 0
Kuantan (N) 706 651 417 241 0
Jerantut (J) 649 600 360 185 164 0

i) Perform complete link hierarchical clustering. Show the first two iterations and draw
its dendrogram(s).
(8 marks)

ii) Continue to show the remaining iterations and draw its dendrogram(s). Conclude
the number of cluster(s) if the threshold is 170.

(6 marks)

b) Given three objects represented by A(10, 4, 2, 8), B(3, 12, 1 , 6) and C(15, 4, 1 , 7).

i) Compute the Euclidean distance between the three objects.


(6 marks)

ii) Construct the dissimilarity matrix for the objects. Which objects are the most similar
to others? Justify why.
(5 marks)

END OF QUESTION PAPER

© Hak Cipta Universiti Teknologi MARA CONFIDENTIAL


FINAL ASSESSMENT (FEBRUARY 2022)
ISP565 DATA MINING (PART 1) (DR
SHUZLINA)
Please read and fill up the STUDENT'S DECLARATION FORM before answering this question.
This question paper is for the first part of the Final Assessment. Answer ALL questions.
20 MULTIPLE CHOICE QUESTIONS
DATE: 14 FEBRUARY 2022
TIME : 9:00AM-10:30AM [90 MINUTES]

Please confirm everything before the submission.


The form allows only ONE response per person.
This is an individual assessment, DO NOT share questions or answers with others.

* Required

* This form will record your name, please fill your name.

What are the tasks of Data Mining?

(2 Points)

O All of the options

O Prediction and categorization

Q Cluster analysis and summarization

O Anomaly and Deviation analysis

Q Association and Market basket analysis


2

Searching patterns from the uncategorized data refers to .

(2 Points)

O None of the options

Q Supervised learning

O Reinforcement learning

O Unsupervised learning

O Hybrid learning

Which of the following option is the correct combination of the pre-processing


tasks in the knowledge discovery process? *
(2 Points)

O Data transformation, Data cleaning, Data selection, Modeling

O None of the options

O Data integration, Data transformation, Data cleaning, Clustering

O Data cleaning, Data selection, Classification, Data transformation

O Data selection, Data cleaning, Data transformation, Data integration

2/14/2022
4

In predicting the cases of COVID-19, the final number of total patients can be
considered as the .*
(2 Points)

O None of the options

Q Outcome

O Attribute

O Observation

O Features

The efficiency of data mining algorithms is based on execution time, meanwhile,


the scalability of data mining algorithms refers to the increment of attributes or
instances. The statement is best described as of the
algorithms.

(2 Points)

O performance

Q none of the options

O robustness

Q high dimensionality

O diverse data types


6

Which of the following statement is FALSE? *


(2 Points)

^ The output of a data mining task can be in the form of patterns, trends or rules that are implicit in the
^ data.

^ Data mining is an Artificial Intelligence powered tool that can discover useful information from human
that can then be used to improve actions.

O Data mining can discover the anomalies that might be significant in a particular business.

O All of the options.

x-v Data mining can best be described as business intelligence (Bl) technology that has various techniques
to extract comprehensible, hidden and useful information from a population of data.

"Children over age of 2 whose Body Mass Index (BMI) is less than the 5th
percentile are considered underweight". This statement refers to

(2 Points)

O none of the options

O wisdom

O information

O knowledge

O data

2/14/2022
8

The following are the problems in mining huge amount of data except

(2 Points)

O Integration - the ability to merge different sources of data type.

Q Ethics - Proliferation of security and privacy concerns by organizations.

O Time and speed - time taken to achieve a certain level of accuracy or for evaluation.

O None of the options

O Cost of the learning set - related to sample size for training and cost to achieve good accuracy.

Which one of the following correctly refers to the task of the classification?

(2 Points)

O Classification is a learning process without training and testing.

O Partitioning of a set of objects into several classes.

O Segmenting the objects into several subgroups.

O None of the options

O Determining a set of instances into categories.

2/14/2022
10

Which of the following is NOT a data mining task?

(2 Points)

O Classifying the propensity of the COVID-19 patient to have prolong COVID-19 or not.

Q Forecasting the air traffic levels based on existing routes.

O Analyzing the relationship of Facebook users towards a particular product.

O Dividing the customers of a company according to their location.

O None of the options

11

Cross-validation in model evaluation is

I. a statistical method used to estimate the performance (or accuracy) of the


models.

II. where a given data is randomly partitioned into two independent sets.

III. a procedure that has a single parameter called k that refers to the number of
groups that a given data sample is to be split into. As such, the procedure is often
called k-fold cross-validation.

IV. used to protect against overfitting in a predictive model, particularly in a case


where the amount of data may be limited.

(3 Points)

O None of the options

O I, HI, and IV

O All of the options

O I and II

O I, II, and III


12

Ensemble Methods are used to increase the accuracy of the classifier's model.
Which of these is NOT an ensemble method? *
(3 Points)

O Combining a set of heterogeneous classifiers.

Q Averaging the prediction over a collection of classifiers.

O Weighted vote with a collection of classifiers.

O None of the options

O Reduces the size of a collection of classifiers.

2/14/2022
A confusion matrix for multiple classes is shown in Table 1 is
for questions 13 to 15.

13

Which of the following statement is FALSE?

(3 Points)
1
p

Table 1: Confusion matrix of air quality


Good air quality (Predicted) Poor air quality (Predicted)
Good air quality (Actual) 109 11
Poor air quality (Actual) 8 112
'

Q 109 "good" instances are correctly classified and 8 "poor" instances are correctly classified as "poor".

o
w
11 "good" instances are incorrectly classified as "poor" and 109 instances are correctly classified as

"good".

^ 8 "poor" instances are wrongly classified as "good" and 11 "good" instances are wrongly classified as
^ "poor".

O None of the options

112 instances are correctly classified as "poor" and 8 "poor" instances are incorrectly classified as
o: good".

14

Calculate the correct values of sensitivity and precision.

(3 Points)

O Sensitivity = 93.33%, precision = 90.83%.

O Sensitivity = 90.83%, precision = 93.16%.

O Sensitivity = 93.16%, precision =91.06%.

O None of the options

O Sensitivity = 91.06%, precision = 93.16%.


2/14/2022
15

The following statement is TRUE except

(3 Points)

O None of the options

Q True Negative value is 112.

O The number of actual Poor instances is 123.

O The total number of instances is 240.

O The accuracy of the model is 92.08%.

2/14/2022
Questions 16-20 are based on the paragraphs below
(Paragraphs 1 - 4).
Paragraph 1]

PUTRAJAYA: The Health Ministry has identified 22 new COVID-19 clusters nationwide, of which 16
are workplace-related. Out the 16 workplace clusters, eight are linked to factories in Selangor,
Negeri Sembilan, Melaka, Kedah and Johor. One of them is the Industri Waja 2 cluster, which
involves workers of a factory at Kawasan Perusahaan Telok Panglima Garang in Kuala Langat,
Selangor.

[Paragraph 2]

A total of 249 individuals were involved in a targeted screening exercise which managed to detect
29 cases. There were also four new clusters linked to construction sites in Kuala Lumpur and
Selangor. One of the clusters is related to workers of a construction site at Lingkaran Eco Majestic
in Beranang, Hulu Langat. A total of 279 individuals had undertaken a targeted screening exercise,
of which 24 have been confirmed as COVID-19 positive.

[Paragraph 3]

The other workplace clusters are related to employees of supermarkets and a public institute.
There were also six clusters classified as community outbreaks. This includes the Dah Sawi cluster
in Kulim, Kedah, which is traced to house visits by several individuals.

[Paragraph 4]

The index case is a 50-year-old man who tested positive on June 26, after developing symptoms
on June 21. Some 35 people linked to the cluster have been screened, and 27 have so far tested
positive. Nationwide, there are now 880 active clusters.

Source: https://www.thestar.com.my/news/nation/2021/06/28/covid-19-16-out-of-22-new-
clusters-related-to-workplaces (https://www.thestar.com.my/news/nation/2021/06/28/covid-19-
16-out-of-22-new-clusters-related-to-workplaces)
16

Consider exact matching to identify the correct pair for the following document-
term matrix for the first and second paragraphs in the following news. *
(3 Points)

O COVID-19 (1,1); clusters (2,4); workers (0,1); Selangor (2,2).

O COVID-19 (1,1); clusters (2,2); workers (1,1); Selangor (2,1).

O None of the options

O COVID-19 (1,1); clusters (3,2); workers (1,0); Selangor (1,1).

O COVID-19 (1,1); clusters (2,3); workers (1,1); Selangor (2,1).

17

Consider exact matching to identify the correct pair for the following document-
term matrix for the second and forth paragraphs in the following news.

(3 Points)

O screening (2,0); symptoms (0,1); positive (1,2).

O screening (3,0); symptoms (0,1); positive (1,2).

O screening (2,0); symptoms (0,1); positive (1,1).

O None of the options

O screening (0,2); symptoms (1,0); positive (2,1).

2/14/2022
18

Compute the similarity among these 2 documents (Paragraph 1 and Paragraph


2) using Cosine similarity.

(3 Points)

O The similarity between document 1 and 2 is 0.955.

O The similarity between document 1 and 2 is 8.

O The similarity between document 1 and 2 is 2.646.

Q The similarity between document 1 and 2 is 3.162.

O None of the options

19

Compute the similarity among these 2 documents (Paragraph 2 and Paragraph 4)


using Cosine similarity. *
(3 Points)

O The similarity between documents 2 and 4 is 0.399.

O None of the options

O The similarity between documents 2 and 4 is 4.970.

O The similarity between documents 2 and 4 is 2.236.

O The similarity between documents 2 and 4 is 2.

2/14/2022
20

Conclude the similarity result for the documents based on the cosine similarity
that has been calculated above. *
(3 Points)

O Document 2 and Document 4 are the most similar.

Q Document 1 and Document 2 are the most similar.

O None of the options

O Document (1,2) is dissimilar with documents (2,4).

O Document (1,2) is similar with documents (2,4).

2/14/2022
Student's Details

21

Matric Number *

22

Name *

23

Group *

O CS259 4A

O CS259 4B

This content is neither created nor endorsed by Microsoft. The data you submit will be sent to the form owner.

• • Microsoft Forms

2/14/2022

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy