0% found this document useful (0 votes)

66 views24 pages

Chapter 5 Retrieval Efective

Uploaded by

Dawit Sebhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views24 pages

Chapter 5 Retrieval Efective

Uploaded by

Dawit Sebhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Retrieval Effectiveness

• Evaluation of IR systems
• Relevance judgement
• Performance measures
• Recall, Precision
• Single-valued measures
• User centred measures
Why System Evaluation?
• Any systems needs validation and verification
–Check whether the system is right or not
–Check whether it is the right system or not
• It provides the ability to measure the difference between
IR systems
–How well do our search engines work?
–Is system A better than B? Under what conditions?
• Evaluation drives what to study
–Identify techniques that work well and do not work
–There are many retrieval models/algorithms. which one is the
best?
–What is the best component for:
• Index term selection (tokenization, stop-word removal,
stemming, normalization…)
• Term weighting (TF, IDF, TF*IDF, P(R|D)…)
• Similarity measures (cosine, Euclidean, string editing…)
2
Evaluation Criteria
What are the main evaluation measures to check the
performance of an IR system?
• Efficiency
– Time and space complexity
❑ Speed in terms of retrieval time, indexing time, query
processing time
❑ The space taken by corpus vs. index file
• Index size: determine Index/corpus size ratio
• Is there a need for compression
• Effectiveness
–How is a system capable of retrieving relevant documents
from the collection: precision & recall?
–Is system X better than other systems?
–User satisfaction: How “good” are the documents that are
returned as a response to user query?
–Relevance of results to meet information need of users
Types of Evaluation Strategies
•User-centered evaluation
– Given several users, and at least two retrieval
systems
• Have each user try the same task on both systems
• Measure which system works the “best” for users
information need
• How to measure users satisfaction?

•System-centered evaluation
– Given documents, queries, and relevance
judgments
• Try several variations of the system
• Measure which system returns the “best” hit list
The Notion of Relevance Judgment
• Relevance is the measure of a correspondence
existing between a document and query.
–Construct document - query matrix as determined by:
(i) the user who posed the retrieval problem;
(ii) an external judge;
(iii) information specialist
–Is the relevance judgment made by users and external
person the same ?

• Relevance judgment is usually:

– Subjective: depends upon a specific user’s judgment.
– Situational: relates to user’s current needs.
– Cognitive: depends on human perception and behavior.
– Dynamic: changes over time.
Measuring Retrieval Effectiveness
Relevant Irrelevant
Metrics often “Type one
used to evaluate
effectiveness of
the system
Retrieved
A B error”

Not
retrieved C D
“Type two error”

• Retrieval of documents may result in:

–False positive (Errors of omission): some irrelevant
documents may be retrieved by the system as relevant.
–False negative (False drop or Errors of commission):
some relevant documents may not be retrieved by the system
as irrelevant.
–For many applications a good index should not permit any
false drops, but may permit a few false positives.
Measuring Retrieval Effectiveness
Relevant Not
relevant Collection size = A+B+C+D
Retrieved A B Relevant = A+C
Retrieved = A+B
Not retrieved C D

| {Relevant} {Retrieved} |
Re call =
| {Relevant} |
Relevant +
Relevant
Retrieved Retrieved
| {Relevant}  {Retrieved} |
Pr ecision =
| {Retrieved} |
Irrelevant + Not Retrieved

• When is precision important? When is recall important?

Example
Assume that there are a total of 10 relevant document
Ranking Relevance Recall Precision
1. Doc. 50 R 0.10 1.00
2. Doc. 34 NR 0.10 0.50
3. Doc. 45 R 0.20 0.67
4. Doc. 8 NR 0.20 0.50
5. Doc. 23 NR 0.20 0.40
6. Doc. 16 NR 0.20 0.33
7. Doc. 63 R 0.30 0.43
8. Doc 119 R 0.40 0.50
9. Doc 21 NR 0.40 0.44
10. Doc 80 R 0.50 0.50
Graphing Precision and Recall
• Plot each (recall, precision) point on a graph
• Recall is a non-decreasing function of the number of
documents retrieved,
–Precision usually decreases (in a good system)
• Precision/Recall tradeoff
– Can increase recall by retrieving many documents (down to
a low level of relevance ranking),
•but many irrelevant documents would be fetched, reducing precision
– Can get high recall (but low precision) by retrieving all
documents for all queries
1 The ideal
Precision

Returns relevant
documents but
misses many Returns most relevant
useful ones too documents but includes
0 lots of junk
Recall 1
Need for Interpolation
•Two issues:
–How do you compare performance across
queries?
–Is the sawtooth shape intuitive of what’s going on?
1

0.8

0.6
Precision

0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
Recall

Solution: Interpolation!
Interpolate a precision value for each standard recall level
Interpolation
• It is a general form of precision/recall calculation
• Precision change w.r.t. Recall (not a fixed point)
– It is an empirical fact that on average as recall increases,
precision decreases
• Interpolate precision at 11 standard recall levels:
– rj {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0},
where j = 0 …. 10
• The interpolated precision at the j-th standard recall
level is the maximum known precision at any recall
level between the jth and (j + 1)th level:

P(rj ) = max P(r )

r j  r  r j +1
Result of Interpolation

0.8

0.6
Precision

0.4

0.2

0
0 0.1 0.2 0.3 0.4 0.5
Recall
Exercise
• Let total number of relevant documents = 6, compute recall and
precision for each cut off point n:
n doc # relevant Recall Precision
1 588 x 0.167 1
2 589 x 0.333 1
3 576
4 590 x 0.5 0.75
5 986
6 592 x 0.667 0.667
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x 0.833 0.38
14 990
Missing one relevant document. Never reach 100% recall 13
Interpolating a Recall/Precision
Curve: Exercise
Precision

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

Recall

14
Computing Recall/Precision Points:
Exercise 2
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 576
3 589 x
R=1/6=0.167;
4 342
P=1/1=1
5 590 x
R=2/6=0.333;
6 717 P=2/3=0.66
7 984 R=3/6=0.5;
7
8 772 x P=3/5=0.6
9 321 x R=4/6=0.667;
10 498 P=4/8=0.5
11 113 R=5/6=0.833;
12 628 P=5/9=0.556
13 772
14 592 x R=6/6=1.0;
15 p=6/14=0.42
9
Interpolating a Recall/Precision Curve:
Exercise 2
Precision

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

Recall

16
Interpolating across queries
• For each query, calculate precision at 11 standard
recall levels
• Compute average precision at each standard recall
level across all queries.
• Plot average precision/recall curves to evaluate
overall system performance on a document/query
corpus.

• Average precision favors systems which produce

relevant documents high in rankings
Single-valued measures
• Single value measures: calculate a single value for
each query to evaluate performance
– Mean Average Precision at seen relevant
documents
• Typically average performance over a large set of
queries.
– R-Precision
• Precision at Rth relevant documents
MAP (Mean Average Precision)
• Computing mean average for more than one query
MAP =   ( ) 
1 1 j
n Qi | Ri | D j Ri rij
– rij = rank of the jth relevant document for Qi
– |Ri| = number of relevant documents for Qi
– n = number of test queries
• E.g. Assume there are 3 relevant documents for Query 1 and 2
for query 2. Calculate MAP?
Relevant Docs. retrieved Query 1 Query 2
1st rel. doc. 1 4
2nd rel. doc. 5 8
3rd rel. doc. 10 -
1 1 1 2 3 1 1 2
MAP = [ ( + + ) + ( + )]
2 3 1 5 10 2 4 8
R- Precision
• Precision at the R-th position in the ranking of results
for a query, where R is the total number of relevant
documents.
– Calculate precision after R documents are seen
– Can be averaged over all queries
n doc # relevant
1 588 x
2 589 x R = # of relevant docs = 6
3 576
4 590 x
5 986
6 592 x
7 984
R-Precision = 4/6 = 0.67
8 988
9 578
10 985
11 103
12 591
13 772 x
20
14 990
F-Measure
• One measure of performance that takes into
account both recall and precision.
• Harmonic mean of recall and precision:

2 PR 2
F= = 1 1
P + R R+P
• Compared to arithmetic mean, both need to be high
for harmonic mean to be high.
• What if no relevant documents exist?

21
E Measure
• Associated with Van Rijsbergen
• Allows user to specify importance of recall and
precision
• It is parameterized F Measure. A variant of F
measure that allows weighting emphasis on precision
over recall:
(1 +  ) PR (1 +  )
2 2
E= = 2 1
 P+R
2
+
R P

• Value of  controls trade-off:

– = 1: Equal weight for precision and recall (E=F).
– > 1: Weight recall more. It emphasizes recall.
– < 1: Weight precision more. It emphasizes precision.
22
Problems with both precision and recall
• Number of irrelevant documents in the collection is not taken
into account.
• Recall is undefined when there is no relevant document in
the collection.
• Precision is undefined when no document is retrieved.

Other measures
• Noise = retrieved irrelevant docs / retrieved docs
• Silence/Miss = non-retrieved relevant docs / relevant docs
– Noise = 1 – Precision; Silence = 1 – Recall

| {Relevant}  {NotRetrieved} |
Miss =
| {Relevant} |
| {Retrieved} {NotRelevant} |
Fallout =
| {NotRelevant} | 23
Programming Assignment (Due date: ____)
• Design an IR system for one of the local language following the
principle discussed in class.
– Form a group having up to three members
1. Construct inverted file (vocabulary file & posting file)
– Taking N document corpus, generate content-bearing index terms and
organize them using inverted file indexing structure, include frequencies
(TF, DF, CF) & position/location information of terms in each document.
2.Develop Vector space retrieval model
– Implement a Vector space model that retrieve relevant documents in
ranking order
3. Test your system using five queries (with three to six words)
and report its performance
• Required: write a publishable report that have abstract (½ page),
introduction, problem statement & objective (1 page), literature
review (2 pages), methods used & architecture of your system (1
page) & experimentation (test results & findings) (2-3 pages),
concluding remarks with one basic recommendation (1 page) and
References (1 page).

Ai-900 3df695e8afa1
No ratings yet
Ai-900 3df695e8afa1
61 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
5 Retrievalefective
No ratings yet
5 Retrievalefective
22 pages
6 Retrieval Evaluation
No ratings yet
6 Retrieval Evaluation
28 pages
Ch5 Retrieval Evaluation 2021
No ratings yet
Ch5 Retrieval Evaluation 2021
26 pages
IR Lecture 5b
No ratings yet
IR Lecture 5b
36 pages
IR Lecture 5b
No ratings yet
IR Lecture 5b
36 pages
5 Retrieval Evaluation
No ratings yet
5 Retrieval Evaluation
20 pages
CS336 MIR w5 Evaluation
No ratings yet
CS336 MIR w5 Evaluation
38 pages
SIT772 Lecture 10
No ratings yet
SIT772 Lecture 10
34 pages
5-Retrieval Effectiveness
No ratings yet
5-Retrieval Effectiveness
20 pages
Modern Information Retrieval
No ratings yet
Modern Information Retrieval
58 pages
5 Retrieval Effectiveness
No ratings yet
5 Retrieval Effectiveness
20 pages
5 Retrievalefective
No ratings yet
5 Retrievalefective
13 pages
Evaluation
No ratings yet
Evaluation
41 pages
3 Retrieval Evaluation
No ratings yet
3 Retrieval Evaluation
31 pages
IR - Chapter 5
No ratings yet
IR - Chapter 5
28 pages
Performance Evaluation of Information Retrieval Systems
No ratings yet
Performance Evaluation of Information Retrieval Systems
46 pages
6 Retrieval Effectiveness
No ratings yet
6 Retrieval Effectiveness
18 pages
Evaluation of Information Retrieval Systems: Thanks To Marti Hearst, Ray Larson, Chris Manning
No ratings yet
Evaluation of Information Retrieval Systems: Thanks To Marti Hearst, Ray Larson, Chris Manning
108 pages
IR Chapt 5
No ratings yet
IR Chapt 5
55 pages
Chapter 6-8IR Revised
No ratings yet
Chapter 6-8IR Revised
76 pages
Performance Evaluation of Information Retrieval Systems
No ratings yet
Performance Evaluation of Information Retrieval Systems
45 pages
ISR Chap... 6
No ratings yet
ISR Chap... 6
14 pages
L15 IRSW Evaluation
No ratings yet
L15 IRSW Evaluation
49 pages
Evaluation 1
No ratings yet
Evaluation 1
63 pages
Information Retrieval: IR Evaluation
No ratings yet
Information Retrieval: IR Evaluation
36 pages
Lecture 7 - Evaluation in IR, Relevance Feedback, Query Expansion
No ratings yet
Lecture 7 - Evaluation in IR, Relevance Feedback, Query Expansion
79 pages
L05-IR Models MMN
No ratings yet
L05-IR Models MMN
23 pages
Lecture5 6
No ratings yet
Lecture5 6
30 pages
Retrieval Performance Evaluation
No ratings yet
Retrieval Performance Evaluation
31 pages
Ir Mod3 Notes
No ratings yet
Ir Mod3 Notes
54 pages
IR Evaluation Tugas Kampus
No ratings yet
IR Evaluation Tugas Kampus
25 pages
Performance Evaluation of Information Retrieval Systems
No ratings yet
Performance Evaluation of Information Retrieval Systems
28 pages
Unit 3 (Isr)
No ratings yet
Unit 3 (Isr)
9 pages
1727759531-6 Evaluation in Information Retrieval
No ratings yet
1727759531-6 Evaluation in Information Retrieval
24 pages
Introduction To Telecom Technologies (Telecom) : Getachew Mamo
No ratings yet
Introduction To Telecom Technologies (Telecom) : Getachew Mamo
65 pages
Theory of Measurements in Software Engineering Lec 5
No ratings yet
Theory of Measurements in Software Engineering Lec 5
13 pages
Lecture 6
No ratings yet
Lecture 6
58 pages
Chapter3 MA212 Evaluation
No ratings yet
Chapter3 MA212 Evaluation
63 pages
Information Retrieval CMSC 476/676: Evaluation and Result Summaries
No ratings yet
Information Retrieval CMSC 476/676: Evaluation and Result Summaries
45 pages
Unit3 ISR
No ratings yet
Unit3 ISR
15 pages
Retrieval Evaluation
No ratings yet
Retrieval Evaluation
7 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Topic 6 W7 W8 - IREvaluation - Uodated
No ratings yet
Topic 6 W7 W8 - IREvaluation - Uodated
37 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
18 pages
09 Evaluation
No ratings yet
09 Evaluation
22 pages
IR Unit 5
No ratings yet
IR Unit 5
5 pages
Cs 473 HW 5
No ratings yet
Cs 473 HW 5
4 pages
Title: Perform Evaluation of Any Popular Search Engine Based On Relevancy. (E.g Google) Theory
No ratings yet
Title: Perform Evaluation of Any Popular Search Engine Based On Relevancy. (E.g Google) Theory
9 pages
10 Evaluation FSS20
No ratings yet
10 Evaluation FSS20
24 pages
Evaluation of Information Retrieval Systems
No ratings yet
Evaluation of Information Retrieval Systems
9 pages
Lecture8-Evaluation 2013
No ratings yet
Lecture8-Evaluation 2013
44 pages
Evaluation and Result Summaries
No ratings yet
Evaluation and Result Summaries
60 pages
Information Storage and Retrival
No ratings yet
Information Storage and Retrival
31 pages
Chapter Four (ISR)
No ratings yet
Chapter Four (ISR)
25 pages
Ip 8
No ratings yet
Ip 8
51 pages
Unit-V
No ratings yet
Unit-V
54 pages
TREC Evalution Measures
No ratings yet
TREC Evalution Measures
10 pages
Slides Chap04 PDF
No ratings yet
Slides Chap04 PDF
144 pages
IR qb1
No ratings yet
IR qb1
78 pages
Chapter 3
No ratings yet
Chapter 3
90 pages
Red It
No ratings yet
Red It
30 pages
Multi Media Material
No ratings yet
Multi Media Material
101 pages
Chapter 1 Introduction To IR
No ratings yet
Chapter 1 Introduction To IR
18 pages
Chapter 4
No ratings yet
Chapter 4
83 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Chapter 2
No ratings yet
Chapter 2
25 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
Chapter 1 Part 2
No ratings yet
Chapter 1 Part 2
60 pages
Chapter 1 Part 1
No ratings yet
Chapter 1 Part 1
38 pages
Ethics ch1
No ratings yet
Ethics ch1
25 pages
4 Year 2 Semester Final Exam Schedule
No ratings yet
4 Year 2 Semester Final Exam Schedule
2 pages
IT Chapter 4 2015
100% (1)
IT Chapter 4 2015
30 pages
IT Chapter 2 2015
No ratings yet
IT Chapter 2 2015
26 pages
Chapter 3
No ratings yet
Chapter 3
52 pages
Session Validation
No ratings yet
Session Validation
2 pages
IT-Chapter 1 B PPT 2015-20
No ratings yet
IT-Chapter 1 B PPT 2015-20
21 pages
IT Chapter 5 2015
No ratings yet
IT Chapter 5 2015
41 pages
IT Chapter 6 2015
No ratings yet
IT Chapter 6 2015
20 pages
MID - Exam For Emerging Technology
100% (4)
MID - Exam For Emerging Technology
4 pages
Model Exam For Remedial Alliance
No ratings yet
Model Exam For Remedial Alliance
4 pages
Chemistry MODEL EXAM - 2
No ratings yet
Chemistry MODEL EXAM - 2
2 pages
Remedial Chemistry Model Exam 2024
100% (2)
Remedial Chemistry Model Exam 2024
6 pages
Edited Event 4th Reg
No ratings yet
Edited Event 4th Reg
7 pages
BST Bme688 An001 Part B
No ratings yet
BST Bme688 An001 Part B
38 pages
Empirical Analysis For Crime Prediction and Forecasting Using Machine
No ratings yet
Empirical Analysis For Crime Prediction and Forecasting Using Machine
15 pages
Class 10 Ai Sample Paper - 5
No ratings yet
Class 10 Ai Sample Paper - 5
5 pages
Cyberbullying Detection On Twitter Using Machine Learning A Review
No ratings yet
Cyberbullying Detection On Twitter Using Machine Learning A Review
5 pages
Lecture1 Introduction Jan09 2023
No ratings yet
Lecture1 Introduction Jan09 2023
49 pages
Machine Learning-Assignment-2
No ratings yet
Machine Learning-Assignment-2
3 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
22 pages
Untitled Collection 2ye0ujym Composite User Behaviour Assisted Rumour Detection Over 4abixgi4tv
No ratings yet
Untitled Collection 2ye0ujym Composite User Behaviour Assisted Rumour Detection Over 4abixgi4tv
6 pages
A Machine Learning Based Framework For A Stage-Wise Classification of Date Palm White Scale Disease
No ratings yet
A Machine Learning Based Framework For A Stage-Wise Classification of Date Palm White Scale Disease
10 pages
Maths
No ratings yet
Maths
21 pages
Azure Machine Learning Studio
No ratings yet
Azure Machine Learning Studio
17 pages
NCA-AIIO Exam Dumps
No ratings yet
NCA-AIIO Exam Dumps
5 pages
Lectura 1
No ratings yet
Lectura 1
13 pages
EBSCO-FullText-07 04 2025
No ratings yet
EBSCO-FullText-07 04 2025
16 pages
Predicting Click Through Rate For Advertising Data Using Logistic Regression
No ratings yet
Predicting Click Through Rate For Advertising Data Using Logistic Regression
14 pages
Applying Machine Learning Methods To Predict Geology Using Soil Sample Geochemistry
No ratings yet
Applying Machine Learning Methods To Predict Geology Using Soil Sample Geochemistry
13 pages
IEEE Paper2
No ratings yet
IEEE Paper2
11 pages
Iot D 24 00628
No ratings yet
Iot D 24 00628
28 pages
Sample Paper 3 AI Class 10
No ratings yet
Sample Paper 3 AI Class 10
7 pages
A Comparative Study On Mushroom Classification Using Supervised Machine Learning Algorithms
No ratings yet
A Comparative Study On Mushroom Classification Using Supervised Machine Learning Algorithms
8 pages
Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Fraud Detection in Banking Data by Machine Learning Techniques
10 pages
Object Detection and Ship Classification Using YOLOv5
No ratings yet
Object Detection and Ship Classification Using YOLOv5
10 pages
Report
No ratings yet
Report
15 pages
AI Class 10 Sample Paper 3
No ratings yet
AI Class 10 Sample Paper 3
6 pages
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
No ratings yet
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
22 pages
Development of A Vision - Based Anti-Drone Identification Friend or Foe Model To Recognize Birds and Drones Using Deep Learning
No ratings yet
Development of A Vision - Based Anti-Drone Identification Friend or Foe Model To Recognize Birds and Drones Using Deep Learning
30 pages
Optimal Scheduling of Wind-Thermal-Hydro-Storage M
No ratings yet
Optimal Scheduling of Wind-Thermal-Hydro-Storage M
602 pages
综述缺陷检测
No ratings yet
综述缺陷检测
28 pages
W XR Health 1037
No ratings yet
W XR Health 1037
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 5 Retrieval Efective

Uploaded by

Chapter 5 Retrieval Efective

Uploaded by

Retrieval Effectiveness

• Relevance judgment is usually:

• Retrieval of documents may result in:

• When is precision important? When is recall important?

P(rj ) = max P(r )

0.2 0.4 0.6 0.8 1.0

0.2 0.4 0.6 0.8 1.0

• Average precision favors systems which produce

• Value of  controls trade-off:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.