0% found this document useful (0 votes)

664 views17 pages

Isp565 - Its665 Feb 22

Exam Paper

Uploaded by

MASHITAH MAISARAH ZAILI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

664 views17 pages

Isp565 - Its665 Feb 22

Exam Paper

Uploaded by

MASHITAH MAISARAH ZAILI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

CONFIDENTIAL CS/FEB 2022/ISP565/Set 1

UNIVERSITI TEKNOLOGI MARA

FINAL EXAMINATION

COURSE DATA MINING

COURSE CODE ISP565 / ITS665
EXAMINATION FEBRUARY 2022
TIME 90 MINUTES

INSTRUCTIONS TO CANDIDATES:

1. This question paper consists of PART 2 of Short Structured (2 QUESTIONS).

2. Late submission is allowed but the penalty will be given, which subject to lecturer's
consideration.

3. All answers need to be submitted through the MS Teams within the given time.

4. Please prepare your worksheet in softcopy.

5. Always make back up by having a copy of answer in your worksheet.

6. Answer ALL questions in English.

7. Save your file: CS2594X_studentID_name.pdf

DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO DO SO

This examination paper consists of 33 printed pages

© Hak Cipta Universiti Teknologi MARA CONFIDENTIAL

CONFIDENTIAL 2 CS/FEB 2022/ISP565/Set 1

QUESTION 1
Table 1 shows a database of lists of garments and sports equipment purchased by six (6)
customers. Given that the minimum support is 50% and minimum confidence is 80%.

Table 1
Customer ID List of items
T1 coat, blouse, trousers
T2 coat, sari, rods, coat
T3 helmet, trousers, rods
T4 blouse, coat, sari, rods
T5 nets, rods, coat, sari
T6 rods, kurta, sari

a) Determine frequent k-itemsets using the Apriori algorithm for the k=1. Clearly show the
steps involved.
(4 marks)

b) Determine frequent k-itemsets using Apriori algorithm for the k=2 and k=3. Clearly show
the steps involved.
(6 marks)

c) From (b), generate the possible rules for k=2 and identify the satisfied rules.
(7 marks)

d) From (b), generate the possible rules for k=3 that satisfy this rule (a à b Ç c). Provide the
conclusion.
(4 marks)

e) Discuss TWO (2) limitations of Market Basket Analysis.

(4 marks)

© Hak Cipta Universiti Teknologi MARA CONFIDENTIAL

CONFIDENTIAL 3 CS/FEB 2022/ISP565/Set 1

QUESTION 2
a) You are given the following dissimilarity matrix between the six cities in Peninsular
Malaysia.

Kangar Alor Ipoh Rawang Kuantan Jerantut

M Setar [A) CO (R) (Nj 0]
Kangar (K) 0
Alor Setar [A) 58 0
Ipoh (I) 290 240 0
Rawang (RJ 462 407 173 0
Kuantan (N) 706 651 417 241 0
Jerantut (J) 649 600 360 185 164 0

i) Perform complete link hierarchical clustering. Show the first two iterations and draw
its dendrogram(s).
(8 marks)

ii) Continue to show the remaining iterations and draw its dendrogram(s). Conclude
the number of cluster(s) if the threshold is 170.

(6 marks)

b) Given three objects represented by A(10, 4, 2, 8), B(3, 12, 1 , 6) and C(15, 4, 1 , 7).

i) Compute the Euclidean distance between the three objects.

(6 marks)

ii) Construct the dissimilarity matrix for the objects. Which objects are the most similar
to others? Justify why.
(5 marks)

END OF QUESTION PAPER

© Hak Cipta Universiti Teknologi MARA CONFIDENTIAL

FINAL ASSESSMENT (FEBRUARY 2022)
ISP565 DATA MINING (PART 1) (DR
SHUZLINA)
Please read and fill up the STUDENT'S DECLARATION FORM before answering this question.
This question paper is for the first part of the Final Assessment. Answer ALL questions.
20 MULTIPLE CHOICE QUESTIONS
DATE: 14 FEBRUARY 2022
TIME : 9:00AM-10:30AM [90 MINUTES]

Please confirm everything before the submission.

The form allows only ONE response per person.
This is an individual assessment, DO NOT share questions or answers with others.

* Required

* This form will record your name, please fill your name.

What are the tasks of Data Mining?

(2 Points)

O All of the options

O Prediction and categorization

Q Cluster analysis and summarization

O Anomaly and Deviation analysis

Q Association and Market basket analysis

Searching patterns from the uncategorized data refers to .

(2 Points)

O None of the options

Q Supervised learning

O Reinforcement learning

O Unsupervised learning

O Hybrid learning

Which of the following option is the correct combination of the pre-processing

tasks in the knowledge discovery process? *
(2 Points)

O Data transformation, Data cleaning, Data selection, Modeling

O None of the options

O Data integration, Data transformation, Data cleaning, Clustering

O Data cleaning, Data selection, Classification, Data transformation

O Data selection, Data cleaning, Data transformation, Data integration

2/14/2022
4

In predicting the cases of COVID-19, the final number of total patients can be
considered as the .*
(2 Points)

O None of the options

Q Outcome

O Attribute

O Observation

O Features

The efficiency of data mining algorithms is based on execution time, meanwhile,

the scalability of data mining algorithms refers to the increment of attributes or
instances. The statement is best described as of the
algorithms.

(2 Points)

O performance

Q none of the options

O robustness

Q high dimensionality

O diverse data types

Which of the following statement is FALSE? *

(2 Points)

^ The output of a data mining task can be in the form of patterns, trends or rules that are implicit in the
^ data.

^ Data mining is an Artificial Intelligence powered tool that can discover useful information from human
that can then be used to improve actions.

O Data mining can discover the anomalies that might be significant in a particular business.

O All of the options.

x-v Data mining can best be described as business intelligence (Bl) technology that has various techniques
to extract comprehensible, hidden and useful information from a population of data.

"Children over age of 2 whose Body Mass Index (BMI) is less than the 5th
percentile are considered underweight". This statement refers to

(2 Points)

O none of the options

O wisdom

O information

O knowledge

O data

2/14/2022
8

The following are the problems in mining huge amount of data except

(2 Points)

O Integration - the ability to merge different sources of data type.

Q Ethics - Proliferation of security and privacy concerns by organizations.

O Time and speed - time taken to achieve a certain level of accuracy or for evaluation.

O None of the options

O Cost of the learning set - related to sample size for training and cost to achieve good accuracy.

Which one of the following correctly refers to the task of the classification?

(2 Points)

O Classification is a learning process without training and testing.

O Partitioning of a set of objects into several classes.

O Segmenting the objects into several subgroups.

O None of the options

O Determining a set of instances into categories.

2/14/2022
10

Which of the following is NOT a data mining task?

(2 Points)

O Classifying the propensity of the COVID-19 patient to have prolong COVID-19 or not.

Q Forecasting the air traffic levels based on existing routes.

O Analyzing the relationship of Facebook users towards a particular product.

O Dividing the customers of a company according to their location.

O None of the options

Cross-validation in model evaluation is

I. a statistical method used to estimate the performance (or accuracy) of the

models.

II. where a given data is randomly partitioned into two independent sets.

III. a procedure that has a single parameter called k that refers to the number of
groups that a given data sample is to be split into. As such, the procedure is often
called k-fold cross-validation.

IV. used to protect against overfitting in a predictive model, particularly in a case

where the amount of data may be limited.

(3 Points)

O None of the options

O I, HI, and IV

O All of the options

O I and II

O I, II, and III

Ensemble Methods are used to increase the accuracy of the classifier's model.
Which of these is NOT an ensemble method? *
(3 Points)

O Combining a set of heterogeneous classifiers.

Q Averaging the prediction over a collection of classifiers.

O Weighted vote with a collection of classifiers.

O None of the options

O Reduces the size of a collection of classifiers.

2/14/2022
A confusion matrix for multiple classes is shown in Table 1 is
for questions 13 to 15.

Which of the following statement is FALSE?

(3 Points)
1
p

Table 1: Confusion matrix of air quality

Good air quality (Predicted) Poor air quality (Predicted)
Good air quality (Actual) 109 11
Poor air quality (Actual) 8 112
'

Q 109 "good" instances are correctly classified and 8 "poor" instances are correctly classified as "poor".

o
w
11 "good" instances are incorrectly classified as "poor" and 109 instances are correctly classified as

"good".

^ 8 "poor" instances are wrongly classified as "good" and 11 "good" instances are wrongly classified as
^ "poor".

O None of the options

112 instances are correctly classified as "poor" and 8 "poor" instances are incorrectly classified as
o: good".

Calculate the correct values of sensitivity and precision.

(3 Points)

O Sensitivity = 93.33%, precision = 90.83%.

O Sensitivity = 90.83%, precision = 93.16%.

O Sensitivity = 93.16%, precision =91.06%.

O None of the options

O Sensitivity = 91.06%, precision = 93.16%.

2/14/2022
15

The following statement is TRUE except

(3 Points)

O None of the options

Q True Negative value is 112.

O The number of actual Poor instances is 123.

O The total number of instances is 240.

O The accuracy of the model is 92.08%.

2/14/2022
Questions 16-20 are based on the paragraphs below
(Paragraphs 1 - 4).
Paragraph 1]

PUTRAJAYA: The Health Ministry has identified 22 new COVID-19 clusters nationwide, of which 16
are workplace-related. Out the 16 workplace clusters, eight are linked to factories in Selangor,
Negeri Sembilan, Melaka, Kedah and Johor. One of them is the Industri Waja 2 cluster, which
involves workers of a factory at Kawasan Perusahaan Telok Panglima Garang in Kuala Langat,
Selangor.

[Paragraph 2]

A total of 249 individuals were involved in a targeted screening exercise which managed to detect
29 cases. There were also four new clusters linked to construction sites in Kuala Lumpur and
Selangor. One of the clusters is related to workers of a construction site at Lingkaran Eco Majestic
in Beranang, Hulu Langat. A total of 279 individuals had undertaken a targeted screening exercise,
of which 24 have been confirmed as COVID-19 positive.

[Paragraph 3]

The other workplace clusters are related to employees of supermarkets and a public institute.
There were also six clusters classified as community outbreaks. This includes the Dah Sawi cluster
in Kulim, Kedah, which is traced to house visits by several individuals.

[Paragraph 4]

The index case is a 50-year-old man who tested positive on June 26, after developing symptoms
on June 21. Some 35 people linked to the cluster have been screened, and 27 have so far tested
positive. Nationwide, there are now 880 active clusters.

Source: https://www.thestar.com.my/news/nation/2021/06/28/covid-19-16-out-of-22-new-
clusters-related-to-workplaces (https://www.thestar.com.my/news/nation/2021/06/28/covid-19-
16-out-of-22-new-clusters-related-to-workplaces)
16

Consider exact matching to identify the correct pair for the following document-
term matrix for the first and second paragraphs in the following news. *
(3 Points)

O COVID-19 (1,1); clusters (2,4); workers (0,1); Selangor (2,2).

O COVID-19 (1,1); clusters (2,2); workers (1,1); Selangor (2,1).

O None of the options

O COVID-19 (1,1); clusters (3,2); workers (1,0); Selangor (1,1).

O COVID-19 (1,1); clusters (2,3); workers (1,1); Selangor (2,1).

Consider exact matching to identify the correct pair for the following document-
term matrix for the second and forth paragraphs in the following news.

(3 Points)

O screening (2,0); symptoms (0,1); positive (1,2).

O screening (3,0); symptoms (0,1); positive (1,2).

O screening (2,0); symptoms (0,1); positive (1,1).

O None of the options

O screening (0,2); symptoms (1,0); positive (2,1).

2/14/2022
18

Compute the similarity among these 2 documents (Paragraph 1 and Paragraph

2) using Cosine similarity.

(3 Points)

O The similarity between document 1 and 2 is 0.955.

O The similarity between document 1 and 2 is 8.

O The similarity between document 1 and 2 is 2.646.

Q The similarity between document 1 and 2 is 3.162.

O None of the options

Compute the similarity among these 2 documents (Paragraph 2 and Paragraph 4)

using Cosine similarity. *
(3 Points)

O The similarity between documents 2 and 4 is 0.399.

O None of the options

O The similarity between documents 2 and 4 is 4.970.

O The similarity between documents 2 and 4 is 2.236.

O The similarity between documents 2 and 4 is 2.

2/14/2022
20

Conclude the similarity result for the documents based on the cosine similarity
that has been calculated above. *
(3 Points)

O Document 2 and Document 4 are the most similar.

Q Document 1 and Document 2 are the most similar.

O None of the options

O Document (1,2) is dissimilar with documents (2,4).

O Document (1,2) is similar with documents (2,4).

2/14/2022
Student's Details

Matric Number *

Name *

Group *

O CS259 4A

O CS259 4B

This content is neither created nor endorsed by Microsoft. The data you submit will be sent to the form owner.

• • Microsoft Forms

2/14/2022

Asc 303 Group Assignment Group 2
No ratings yet
Asc 303 Group Assignment Group 2
16 pages
Quiz 1 With Answer Sta116 Quiz 1
100% (1)
Quiz 1 With Answer Sta116 Quiz 1
8 pages
4A - GROUP 3 - KEDUA by Waks Shake
No ratings yet
4A - GROUP 3 - KEDUA by Waks Shake
38 pages
(Latest Edited) Full Note Sta404 - 01042022
No ratings yet
(Latest Edited) Full Note Sta404 - 01042022
108 pages
Mgt555 Final Test - July 2022
No ratings yet
Mgt555 Final Test - July 2022
8 pages
DataMining - Workbook MCQ
No ratings yet
DataMining - Workbook MCQ
16 pages
Asm452 Hot October 2022 - Question
No ratings yet
Asm452 Hot October 2022 - Question
8 pages
2021 ITS665 - ISP565 - GROUP PROJECT-revMac21
No ratings yet
2021 ITS665 - ISP565 - GROUP PROJECT-revMac21
6 pages
Mat530 Tutorial Static Models
No ratings yet
Mat530 Tutorial Static Models
3 pages
Acc 407 Sample - 0001
No ratings yet
Acc 407 Sample - 0001
8 pages
Feb2022 Mat560
No ratings yet
Feb2022 Mat560
4 pages
Answer Script: Universiti Teknologi Mara Test 1
No ratings yet
Answer Script: Universiti Teknologi Mara Test 1
7 pages
Past Test Sta572
No ratings yet
Past Test Sta572
3 pages
Past Year Questions Opm530
No ratings yet
Past Year Questions Opm530
12 pages
Notes F.berc 1 - Application Form
No ratings yet
Notes F.berc 1 - Application Form
10 pages
ASM510 - Accss - Assignment 1 - TASNEEM MAISARA&NUR IZZATY
No ratings yet
ASM510 - Accss - Assignment 1 - TASNEEM MAISARA&NUR IZZATY
14 pages
Mat530 - Mini Project - Group 2
No ratings yet
Mat530 - Mini Project - Group 2
25 pages
ITT400 - Group Assign
No ratings yet
ITT400 - Group Assign
28 pages
Final Test MGT657
No ratings yet
Final Test MGT657
12 pages
Lab Assignment MAT631
No ratings yet
Lab Assignment MAT631
8 pages
EWC662 QUIZ - Sample
No ratings yet
EWC662 QUIZ - Sample
4 pages
ICT501 Exercise 6 Joins New
100% (1)
ICT501 Exercise 6 Joins New
6 pages
Lab Assignment MAT183 (Question)
No ratings yet
Lab Assignment MAT183 (Question)
8 pages
Nur Ainina Najwa BT Zamzuri - Test 2
No ratings yet
Nur Ainina Najwa BT Zamzuri - Test 2
12 pages
Lab Assignment: DEC 2019/calculus I (MAT421)
No ratings yet
Lab Assignment: DEC 2019/calculus I (MAT421)
8 pages
Ict200 - Lab Exercise
No ratings yet
Ict200 - Lab Exercise
5 pages
ELC550 Annotated Biblography (Sample Article and Question)
No ratings yet
ELC550 Annotated Biblography (Sample Article and Question)
2 pages
Sta104 Chapter 1
No ratings yet
Sta104 Chapter 1
42 pages
Final Assessment Feb 2021 - MGT430
No ratings yet
Final Assessment Feb 2021 - MGT430
6 pages
Test CSC207 20242
No ratings yet
Test CSC207 20242
10 pages
MGT555 CH 6 Regression Analysis
No ratings yet
MGT555 CH 6 Regression Analysis
19 pages
MAT668 Assignment
No ratings yet
MAT668 Assignment
40 pages
Predict Students' Dropout and Academic Success Using Machine Learning Techniques
No ratings yet
Predict Students' Dropout and Academic Success Using Machine Learning Techniques
21 pages
Pitching Speech Elc590
100% (1)
Pitching Speech Elc590
4 pages
LCC111 Final Test - Sample
No ratings yet
LCC111 Final Test - Sample
10 pages
Phy 546 (Basic Analog and Digital Eletroniclab) : Voltage Gain of An Operational Amplifier
No ratings yet
Phy 546 (Basic Analog and Digital Eletroniclab) : Voltage Gain of An Operational Amplifier
13 pages
F3 Isp451 Empower Mom
No ratings yet
F3 Isp451 Empower Mom
8 pages
Ewc661 Report Writing
No ratings yet
Ewc661 Report Writing
32 pages
Slide Far670 7e VS Mynews
No ratings yet
Slide Far670 7e VS Mynews
31 pages
Final Assessment Sta220 Answer Scheme Feb2022
No ratings yet
Final Assessment Sta220 Answer Scheme Feb2022
6 pages
Its665 Isp565 Group Project March 2023
No ratings yet
Its665 Isp565 Group Project March 2023
10 pages
CSC577 (Test2) - 20222 20220620
No ratings yet
CSC577 (Test2) - 20222 20220620
3 pages
2023 Jan
No ratings yet
2023 Jan
5 pages
Answer Test Sta104 July 2022-Rujukan Pelajar
No ratings yet
Answer Test Sta104 July 2022-Rujukan Pelajar
6 pages
Fin430 Group Assignment
No ratings yet
Fin430 Group Assignment
34 pages
Soalan Final Hrm544
No ratings yet
Soalan Final Hrm544
12 pages
MGT345 - Chapter 5 (Total Cost Analysis) - Hazyan PDF
No ratings yet
MGT345 - Chapter 5 (Total Cost Analysis) - Hazyan PDF
3 pages
Fin430 - June2018
No ratings yet
Fin430 - June2018
6 pages
Revision Questions MGT657 2024 ANSWER
No ratings yet
Revision Questions MGT657 2024 ANSWER
16 pages
Group Project Sta589
No ratings yet
Group Project Sta589
16 pages
Print Expert
100% (1)
Print Expert
30 pages
Acc116 Q - Test 1 - Sample
No ratings yet
Acc116 Q - Test 1 - Sample
5 pages
Fin 242 Report Group Assignment
No ratings yet
Fin 242 Report Group Assignment
8 pages
SS July 23 Maf603
No ratings yet
SS July 23 Maf603
10 pages
Individual Assignment: Visualization 1
No ratings yet
Individual Assignment: Visualization 1
5 pages
Data Collection and Analysis: The Reason Why People Choose Not To Eat in Uitm
50% (2)
Data Collection and Analysis: The Reason Why People Choose Not To Eat in Uitm
26 pages
Acc 106 Ebook Answer Topic 4
No ratings yet
Acc 106 Ebook Answer Topic 4
13 pages
CHAPTER 1 MGT 345
No ratings yet
CHAPTER 1 MGT 345
18 pages
ECO120
No ratings yet
ECO120
12 pages
Final Exam BWA44603
No ratings yet
Final Exam BWA44603
4 pages
Best 10 CNC Machining Service Companies in Belgium
No ratings yet
Best 10 CNC Machining Service Companies in Belgium
5 pages
ABHISHEK
No ratings yet
ABHISHEK
3 pages
A Queueing Model With Server Breakdowns Repairs Va
No ratings yet
A Queueing Model With Server Breakdowns Repairs Va
13 pages
Cover Letter Qatar
No ratings yet
Cover Letter Qatar
1 page
HCA5 Rack Layout - Equipment Layout Specification
100% (1)
HCA5 Rack Layout - Equipment Layout Specification
43 pages
Krushi Bhavan
No ratings yet
Krushi Bhavan
5 pages
514 614 L28 32H Fuel Oil System
100% (1)
514 614 L28 32H Fuel Oil System
30 pages
Alto DJM-2 Mixer Schematics
No ratings yet
Alto DJM-2 Mixer Schematics
34 pages
Chapter 19 - Continual Improvement Methods With Six Sigma and Lean
No ratings yet
Chapter 19 - Continual Improvement Methods With Six Sigma and Lean
8 pages
DL QB With Ans
No ratings yet
DL QB With Ans
38 pages
PEAC Lesson Plan English 8
No ratings yet
PEAC Lesson Plan English 8
2 pages
SDO Animo Year End 2020-2021 - GBB - Lopez
No ratings yet
SDO Animo Year End 2020-2021 - GBB - Lopez
2 pages
Skoda Enyaq Brochure April 2024
No ratings yet
Skoda Enyaq Brochure April 2024
43 pages
A Reliable Architecture Based On Reactive Microservices For Iot Applications
No ratings yet
A Reliable Architecture Based On Reactive Microservices For Iot Applications
5 pages
Rotella DD: Two-Stroke Diesel Engine Oil
No ratings yet
Rotella DD: Two-Stroke Diesel Engine Oil
1 page
Vaixell Teseu
No ratings yet
Vaixell Teseu
5 pages
Solution Manual For Shelly Cashman Series Microsoft Office 365 and Access 2016 Intermediate 1st Edition Pratt Last 1337251216 9781337251211 Download
100% (10)
Solution Manual For Shelly Cashman Series Microsoft Office 365 and Access 2016 Intermediate 1st Edition Pratt Last 1337251216 9781337251211 Download
48 pages
Realtime Festival Overview
No ratings yet
Realtime Festival Overview
28 pages
DSP Seminar
No ratings yet
DSP Seminar
17 pages
X1E Spec (EN)
No ratings yet
X1E Spec (EN)
3 pages
Latest Cash in
No ratings yet
Latest Cash in
121 pages
Department of Civil Engineering (Bbit B.Tech Wing) A.Y. 2019-2020 Even Semester Faculty Database For Online Internal Exam in May 2020
No ratings yet
Department of Civil Engineering (Bbit B.Tech Wing) A.Y. 2019-2020 Even Semester Faculty Database For Online Internal Exam in May 2020
1 page
21 22
No ratings yet
21 22
14 pages
Wravor Catalog en
No ratings yet
Wravor Catalog en
28 pages
US Gov National Standards Strategy 2023
No ratings yet
US Gov National Standards Strategy 2023
14 pages
How To Create A Digital Strategy That Will Reach Thousands of Customers - Neil - Patel - Bukarest Romania
100% (4)
How To Create A Digital Strategy That Will Reach Thousands of Customers - Neil - Patel - Bukarest Romania
103 pages
Hacking CDDVDBlu-ray For Fun and Scientific Research
No ratings yet
Hacking CDDVDBlu-ray For Fun and Scientific Research
71 pages
Foundation Load (Reactions) Data FOR 45 M Diameter Thickener
No ratings yet
Foundation Load (Reactions) Data FOR 45 M Diameter Thickener
88 pages
Engine Immobilizer System
No ratings yet
Engine Immobilizer System
6 pages
Tanishka From
No ratings yet
Tanishka From
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.