Exam2019 2020
Exam2019 2020
Time duration: 1h: 30m - Biannual Exam of Information Retrieval & Data Mining - University year: 2019/2020
By Dr. B. LOUNNAS
1. Cites some differences between exact match retrieval and best match retrieval? (1pt)
2. One of the pre-step in any retrieval model is Stemming. What’s the benefit of this
task? (0.5pt)
3. CRISP-DM defined and validated a data mining process that could be applicable in any domain
application sectors. Shortly what is the objectives of each step? (2pt)
4. What are the steps of building a classifier? (1pt)
5. What is the purpose of calculating idf in vector space model? (1pt)
6. In Boolean model we have encountered a problem called “Feast or Famine”, what is this
problem? (0.5pt)
You have the following values of tf of three (3) documents from a collection of 806791 documents, and
the values of tf and Idf of four (4) terms:
Compute the two top scoring documents on the query “best car insurance” for each of the following
SMART notation schemes:
o nnn.ltc
o ntc.nnn
You are stranded on a deserted island. Mushrooms of various types grow widely all over the island, but
no other food is anywhere to be found. Some of the mushrooms have been determined as poisonous
and others as not (determined by your former companions’ trial and error). You are the only one
remaining on the island. You have the following data to consider:
Page 1/2
Example NotHeavy Smelly Spotted Smooth Edible
A 1 0 0 0 1
B 1 0 1 0 1
C 0 1 0 1 1
D 0 0 0 1 0
E 1 1 1 0 0
F 1 0 1 1 0
G 1 0 0 1 0
H 0 1 0 0 0
U 0 1 1 1 ?
V 1 1 0 1 ?
W 1 1 0 0 ?
3. What is the information gain of the attribute you chose in the previous question?
4. Build the decision tree based only on observations like you do in question 2?
5. Classify mushrooms U, V and W using the decision tree?
Trace the results of using the Apriori algorithm on the grocery store example with support threshold
s=33.34% and confidence threshold c=60%. Show the candidate and frequent itemsets for each database
scan. Enumerate all the final frequent itemsets. Also indicate the association rules that are generated
and highlight the strong ones.
Transaction ID Items
T1 HotDogs, Buns, Ketchup
T2 HotDogs, Buns
T3 HotDogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 HotDogs, Coke, Chips
Page 2/2