0% found this document useful (0 votes)
23 views13 pages

(COMP1942) (2022) (S) Midterm Thliai 91588

Uploaded by

pananthanpillai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views13 pages

(COMP1942) (2022) (S) Midterm Thliai 91588

Uploaded by

pananthanpillai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

COMP1942 Question Paper

COMP1942 Exploring and Visualizing Data (Spring Semester 2022)


Online Midterm Examination (Question Paper)
Date: 30 March, 2022 (Wednesday)
Time: 10:35am-11:40am
Duration: 1 hour 5 minutes

Instructions:
(1) Guideline
(a) Please follow all instructions about the exam guideline (e.g., your face video capturing) stated in
the Canvas website.
(b) For the sake of space, we do not write them again.
(2) Question
(a) There are 2 parts in this exam, Part A (Short/Long Question) and Part B (Multiple-Choice
Question).
(b) Please answer all questions in Part A and Part B. The total scores in this exam are 100.
(3) Answer Sheet
(a) Please submit your answers in PDF to the Canvas website.
(b) Please use the cover page stated in the Canvas website as the first page of your PDF file. This
cover page includes your information and an agreement with your signature.
(c) Please start to write your answers starting on the second page of your PDF file.
(d) The PDF file should “clearly” show your answers without any blurred images. No marks will be
given to any “blurred” parts in the PDF file. Please make sure that the PDF file shows your
answers clearly.
(4) Online Exam
(a) This is an online exam where you could access all online materials.
However, it is not allowed to communicate with other people (except the instructor and the tutors
in this course) in any form (including but not limited to orally, electronically and in writing)
during the entire exam period together with the pre-15-minute preparation time and the post-15-
minute buffer time.
(5) File Submission
(a) We allow a 15-minute buffer for your PDF file upload. Remember to upload your file at around
11:40am. We allow your file uploading time at most 15 minutes. Canvas will terminate any file
uploading process at 11:55am if your file is still being uploaded at 11:55am.
(6) Zero-Score Regulation
(a) If your face could not be shown in your video for at least 10 seconds in the exam period together
with the pre-15-minute preparation time and the post-15-minute buffer time, your exam score will
be set to 0 (even though you submit your PDF file in Canvas).
(b) If you do not submit the first cover page which is filled and signed completely, your exam score
will be set to 0.
(c) We only mark your latest PDF file uploaded by 11:55am. Your exam score will be set to 0 if we
could not see any PDF file uploaded by 11:55am (even though you do the question paper or you
“could” upload your PDF file after 11:55am).

1/13
COMP1942 Question Paper

Part A (Short/Long Question)


In this part, there are 2 short/long questions, namely Q1 and Q2. The total scores in this part are 40 scores
(out of 100).

Q1 (20 Marks) (Version A)

We are given the following table containing 20 transactions and 16 items, namely a, b, … p, represented in a
binary matrix format. Please do the following parts.

TID a b c d e f g h i j k l m n o p
1 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0
2 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0
3 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
4 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
6 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0
8 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
11 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0
12 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
13 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
14 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
15 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
16 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0
17 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
18 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1

(a) For each of the following answers, please show steps and show the answer rounded up to 2 decimal places.
(i) What is the confidence of rule “{a, c}  b”?
(ii) What is the lift ratio of rule “{a, c}  b”?
(iii) What is the support of rule “{a, c}  b”?
(b) Suppose that the support threshold is set to 3.
Apply the algorithm of FP-growth and generate all the conditional FP-trees.
You are required to draw the original FP-tree and all conditional FP-trees.
What are the frequent itemsets generated?
You do not need to give the frequency of each frequent itemset.

2/13
COMP1942 Question Paper
Q1 (20 Marks) (Version B)

We are given the following table containing 20 transactions and 16 items, namely g, h, … v, represented in a
binary matrix format. Please do the following parts.

TID g h i j k l m n o p q r s t u v
1 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0
2 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0
3 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
4 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
6 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0
8 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
11 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0
12 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
13 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
14 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
15 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
16 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0
17 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
18 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1

(a) For each of the following answers, please show steps and show the answer rounded up to 2 decimal places.
(i) What is the confidence of rule “{g, i}  h”?
(ii) What is the lift ratio of rule “{g, i}  h”?
(iii) What is the support of rule “{g, i}  h”?
(b) Suppose that the support threshold is set to 3.
Apply the algorithm of FP-growth and generate all the conditional FP-trees.
You are required to draw the original FP-tree and all conditional FP-trees.
What are the frequent itemsets generated?
You do not need to give the frequency of each frequent itemset.

3/13
COMP1942 Question Paper

Q2 (20 Marks) (Version A)

(a) The following shows the error report for a decision tree found from a given training dataset.
Class # Cases # Errors % Error
Yes 5 2 40%
No 8 3 37.5%
Overall 13 5 38.46%

(i) Is it possible to know the confusion matrix for this decision tree according to this error report? If yes,
please give the confusion matrix. If no, please explain it and give the minimum set of additional
information so that we could give the confusion matrix.
(ii) Is it possible to know the decile-wise lift chart for this decision tree according to this error report? If
yes, please give the decile-wise lift chart and write down the height of each bar at the top of the bar in
the chart clearly (rounded up to 2 decimal places). If no, please explain it and give the minimum set
of additional information so that we could give the decile-wise lift chart.

(b) Consider the following table where the first three columns correspond to the input attributes and the fourth
column corresponds to the target attribute.
Age Education Married Insurance
young high no yes
old high yes yes
old low yes yes
old low yes yes
young low no no
young low no no
young low no no
old low no no

We want to train a CART decision tree classifier to predict whether a new customer will buy an insurance
policy or not. We define the value of attribute Insurance is the label of a record.
(i) Please find a CART decision tree according to the above example. In the decision tree, whenever we
process (1) a node containing at least 80% records with the same label or (2) a node containing at most
2 records, we stop to process this node for splitting.
Please show all of your steps and express the numbers rounded up to 4 decimal places.
(ii) Consider an old unmarried customer with high education. Please predict whether it is likely that this
customer will buy an insurance policy or not.

4/13
COMP1942 Question Paper
Q2 (20 Marks) (Version B)

(a) The following shows the error report for a decision tree found from a given training dataset.
Class # Cases # Errors % Error
Yes 5 2 40%
No 8 3 37.5%
Overall 13 5 38.46%

(i) Is it possible to know the confusion matrix for this decision tree according to this error report? If yes,
please give the confusion matrix. If no, please explain it and give the minimum set of additional
information so that we could give the confusion matrix.
(ii) Is it possible to know the decile-wise lift chart for this decision tree according to this error report? If
yes, please give the decile-wise lift chart and write down the height of each bar at the top of the bar in
the chart clearly (rounded up to 2 decimal places). If no, please explain it and give the minimum set
of additional information so that we could give the decile-wise lift chart.

(b) Consider the following table where the first three columns correspond to the input attributes and the fourth
column corresponds to the target attribute.
Age Education Gender Insurance
young high male yes
old high female yes
old low female yes
old low female yes
young low male no
young low male no
young low male no
old low male no

We want to train a CART decision tree classifier to predict whether a new customer will buy an insurance
policy or not. We define the value of attribute Insurance is the label of a record.
(i) Please find a CART decision tree according to the above example. In the decision tree, whenever we
process (1) a node containing at least 80% records with the same label or (2) a node containing at most
2 records, we stop to process this node for splitting.
Please show all of your steps and express the numbers rounded up to 4 decimal places.
(ii) Consider an old male customer with high education. Please predict whether it is likely that this
customer will buy an insurance policy or not.

5/13
COMP1942 Question Paper

Part B (Multiple-Choice Question)


In this part, there are 12 multiple-choice questions, namely Q3-Q14. The total scores in this part are 60 scores
(out of 100). Each question weighs 5 scores (out of 100). In your answer sheet, please write down the
following table on one of the pages of your PDF submission. In the corresponding cell, write down the answer
for each question.

Note: Please write the letter clearly (i.e., A, B, C, D or E) for each answer so that it could be distinguished
from other letters easily. In the past, some students wrote the letter unclearly which look like two possible
letters. One example is that the hand-written letter “B” (from some students) is similar to the hand-written
letter “E”. There are more examples which are not included here. In any case, if your letter is judged by us
that it is unclear, even though you “thought” that your answer is correct, 0 score will be given to you for that
question.

Part B

Question Your Answer


Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14

6/13
COMP1942 Question Paper

Q3. Which of the following statement(s) is/are true? Consider the Apriori approach.
(1) It is always true that the number of itemsets in L3 just after the counting step is smaller than the
number of itemsets in C3 just after the prune step.
(2) It is always true that the number of itemsets in C3 just after the prune step is smaller than the number
of itemsets in C3 just after the join step.
(3) It is always true that the number of itemsets in C3 just after the join step is smaller than the number
of itemsets in C2 just after the join step.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

Q4. Which of the following statement(s) is/are true?


(1) The lift ratio of an association rule in the form of “X  Y” is equal to the lift ratio of an association
rule in the form of “Y  X” where X and Y are two itemsets.
(2) It is always true that the number of itemsets with frequency at least 20 is smaller than the number
of itemsets with frequency at least 10.
(3) The expected confidence of the consequent of an association rule in the form of “X  Y” (where X
and Y are two itemsets) is defined to be the total number of transactions containing Y divided by the
total number of transactions in the given table.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

Q5. Which of the following statement(s) is/are true?


(1) It is always true that the total number of non-root nodes in the FP-tree is smaller than or equal to
the total number of occurrences of frequent items in all transactions.
(2) It is always true that in the FP-tree, the count stored in a node N is larger than or equal to the count
stored in each of its child nodes (i.e., each of the nodes under node N).
(3) In general, the total number of non-root nodes in the FP-tree constructed according to the original
item order is smaller than or equal to the total number of non-root nodes in the FP-tree constructed
according to the reverse item order.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

7/13
COMP1942 Question Paper
Q6. In XLMiner, given a table T, we set some parameters and generated the following output in association
rule mining.

Which of the following statement(s) is/are true?


(1) The confidence of “B  E” is larger than the confidence of “E  B”.
(2) If we set the confident threshold to 65% and the support threshold to 6 in XLMiner on table T, the
total number of rules generated is 4.
(3) We have enough information to say that there is no association rule with (a) confidence at least 50%
and at most 60% and (b) support at least 5.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

8/13
COMP1942 Question Paper
Q7. Which of the following statement(s) is/are true?
(1) Consider the original k-means method. The mean of a cluster is equal to the sum of all data points
in this cluster divided by the total number of all data points in this cluster.
(2) Compared with the original k-means method, the advantage of sequential k-means method is that
we could obtain the clustering results whenever there is a new point.
(3) Consider the forgetful sequential k-means method where parameter a is set to a real number greater
than 0 and smaller than 1. It is always true that the weight of an old data point is smaller than the
weight of a new data point.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

Q8. Consider eight data points.


The following matrix shows the pairwise distances between any two points.
1 2 3 4 5 6 7 8
1 0 
 
2  11 0 
3  5 13 0 
 
4 12 2 14 0 

5  7 17 1 18 0 

6 13 4 15 5 20 0 
 
7  9 15 12 16 15 19 0 
8  11 20 12 21 17 22 30 0 

Consider the agglomerative approach to group these points with distance group average linkage.

Which of the following statement(s) is/are true?


(1) Suppose that we want to find 3 clusters. The three clusters are {8}, {1, 3, 5, 7} and {2, 4, 6}.
(2) Suppose that we want to find 4 clusters. The total number of points in each of two clusters is equal
to each other, and the total number of points in each of the other two clusters is equal to each other
too.
(3) Suppose that we want to find 5 clusters. The largest cluster contain data point 3.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

9/13
COMP1942 Question Paper
Q9. In XLMiner, given a table T, in Raymond’s PC, we set some parameters and generated the following
output in k-means clustering.

10/13
COMP1942 Question Paper
Which of the following statement(s) is/are true?
(1) Consider the two clusters in the final output. Before we perform the k-means clustering, the initial
mean of one cluster is (57.8333333, 73.5) and the initial mean of another cluster is (10.25, 11).
(2) In XLMiner’s input dialog box, we chose “Fixed Start” under category “Options”.
(3) Suppose that student “Peter” set the same parameters as shown above in his PC and generated the
output in k-means clustering. Due to the randomness of k-means clustering, it is possible that the
clustering result in this output obtained from his PC is different from the clustering result in the
output obtained from Raymond’s PC.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

Q10. Consider the following table T with 4 records and 5 attributes, namely X1, X2, …, X5.

Record No. X1 X2 X3 X4 X5
1 1 1 0 0 1
2 1 0 0 1 1
3 0 1 1 1 0
4 0 1 1 0 1

Which of the following statement(s) is/are true?


(1) The monothetic approach could be used for clustering because table T contain binary attributes.
(2) The Jaccard’s coefficient between Record 1 and Record 2 is equal to 0.5.
(3) The matching coefficient between Record 3 and Record 4 is equal to 0.4.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

11/13
COMP1942 Question Paper
Q11. Consider the following 2 matrices, namely A and B.

10 20
A=
30 40
7
B=
8

Which of the following statement(s) is/are true?


230
(1) The matrix multiplication between A and B (i.e., AB) is equal to .
530
460
(2) The matrix multiplication between the transpose of A and B (i.e., ATB) is equal to .
310
(3) The determinant of A is 200.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

Q12. Which of the following statement(s) is/are true?


(1) The information gain of an attribute A used in ID3 is equal to the information gain used in C4.5
divided by SplitInfo(A).
(2) Consider a lift chart. It is always true that the value of the y-axis increases monotonically when the
x-axis value increases in the lift chart.
(3) In the decile-wise lift chart, it is possible that the height of a bar for a decile is greater than 4.0 (in
the y-axis).

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

Q13. Which of the following statement(s) is/are true? Consider two clusters, namely A and B.
(1) It is possible that the distance between Cluster A and Cluster B under the single linkage is equal to
the distance between Cluster A and Cluster B under the complete linkage.
(2) It is possible that the distance between Cluster A and Cluster B under the median linkage is equal
to the distance between Cluster A and Cluster B under the centroid linkage.
(3) The agglomerative approach is a process of splitting the large cluster into two clusters iteratively.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

12/13
COMP1942 Question Paper

Q14. Given a table T containing 3 input attributes (i.e., “No. of Phones”, “Age”, “Weight”) and 1 target
attribute “Insurance”, we want to predict whether a customer will buy an insurance policy. In XLMiner,
given this table T, we set some parameters and generated the following output in the classification tree.

Which of the following statement(s) is/are true?


(1) Suppose that we have one customer with age 20 and weight 66 kg having 2 phones. According to
the above output, the decision tree generated by XLMiner predicts that this customer will buy an
insurance policy.
(2) Suppose that we have one customer with age 19 and weight 70 kg having 1 phone. According to
the above output, the decision tree generated by XLMiner predicts that this customer will buy an
insurance policy.
(3) The total number of terminal nodes is greater than the total number of decision nodes.

A. Statements (1) and (2) only


B. Statements (1) and (3) only
C. Statements (2) and (3) only
D. Statements (1), (2) and (3)
E. None of the above choices

End of Paper

13/13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy