0% found this document useful (0 votes)

35 views46 pages

4 IRModels

Uploaded by

hailemariamhg93

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views46 pages

4 IRModels

Uploaded by

hailemariamhg93

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

IR models

• Why IR models?
• Boolean IR Model
• Vector space IR model
• Probabilistic IR model
What is Information Retrieval ?
• Information retrieval is the
process of searching for
relevant documents from
unstructured large corpus that
satisfy users information need.
• It is a tool that finds and selects
from a collection of items a
subset that serves the user’s
purpose
• Much IR research focuses more specifically on text retrieval. But
there are many other interesting areas:
 Cross-language vs. multilingual information retrieval,
 Multimedia (audio, video & image) information retrieval (QBIC, WebSeek,
SaFe)
 Question-answering (AskJeeves, Answerbus).
 Digital and virtual libraries
Information Retrieval serve as a
Bridge
• An Information Retrieval System serves as a bridge
between the world of authors and the world of
readers/users,
• That is, writers present a set of ideas in a document using a
set of concepts. Then Users seek the IR system for relevant
documents that satisfy their information need.

Black box
User Documents
Typical IR System
Architecture
Document
corpus

Query IR
String System

1. Doc1
2. Doc2
Ranked 3. Doc3
Relevant Documents .
.
Our focus during IR system design
• In improving Effectiveness of the system
• The concern here is retrieving more relevant documents as per users query
• Effectiveness of the system is measured in terms of precision, recall, …
• Main emphasis: text operations (such as stemming, stopwords removal, normalization,
etc.), weighting schemes, matching algorithms, …
• In improving Efficiency of the system
• The concern here is
• enhancing searching time, indexing time, access time…
• reducing storage space requirement of the system
• space – time tradeoffs
• Main emphasis:
• Compression
• Index terms selection (free text or content-bearing terms)
• indexing structures
Subsystems of IR system
The two subsystems of an IR system: Indexing and Searching
•Indexing:
• is an offline process of organizing documents using keywords extracted from
the collection
• Indexing is used to speed up access to desired information from document
collection as per users query
•Searching
• Is an online process that scans document corpus to find relevant documents that matches users
query
Indexing Subsystem

documents
Documents Assign document identifier

document document
Tokenization
IDs
tokens
Stopword removal
non-stoplist tokens
Stemming &
stemmed terms
Normalization
Term weighting

Weighted index
terms Index File
Searching Subsystem
query parse query
query tokens
ranked
Stop word non-stoplist
document
tokens
set
Ranking
Stemming & Normalize
relevant stemmed terms
document set
Similarity Query Term weighting
Measure terms
Index terms
Index
IR Models - Basic Concepts
IR systems usually adopt index terms to index and
retrieve documents
Each document is represented by a set of representative
keywords or index terms (called Bag of Words)
• An index term is a word useful for remembering the document main
themes
•Not all terms are equally useful for representing the
document contents:
less frequent terms allow identifying a narrower set of
documents
• But no ordering information is attached to the Bag of Words identified from the
document collection.
IR Models - Basic Concepts
•One central problem regarding IR systems is the issue of
predicting the degree of relevance of documents for a given
query
 Such a decision is usually dependent on a ranking
algorithm which attempts to establish a simple ordering
of the documents retrieved
 Documents appearning at the top of this ordering are
considered to be more likely to be relevant
•Thus ranking algorithms are at the core of IR systems
 The IR models determine the predictions of what is
relevant and what is not, based on the notion of
relevance implemented by the system
IR models

Probabilistic
relevance
How to find relevant documents for a query?
• Step 1: Map documents & queries into term-document vector space. Note that
queries are considered as short document
• Represent both documents & queries as N-dimensional vectors in a term-document
matrix, which shows occurrence of terms in the document collection or query
 
d j (t1, j , t 2, j ,..., t N , j ); qk (t1,k , t 2,k ,..., t N ,k )

T 1 T2 …. TN
– Document collection is mapped to term-
D1 … … .. … … by-document matrix
D2 … … .. … … – View as vector in multidimensional
: … … ..… …: space
… … ..… … • Nearby vectors are related
DM … … .. … …
Qi … … .. … …
How to find relevant documents for a query?
• Step 2: Queries and documents are represented as weighted vectors, wij
 Why we need weighting techniques?
 To know the importance of a term in describing the content of a given document.
 There are binary weights & non-binary weighting technique. Any difference between
the two?
 What method you recommend to compute weights for term i in document j and query
q; wij and wiq ?

T1 T2 …. TN
• An entry in the matrix corresponds to
the “weight” of a term in the document; D1 w11 w12 … w1N
zero means the term doesn’t exist in the D2 w21 w22 … w2N
document. : : : :
• Normalize for vector length to avoid : : : :
the effect of document length DM wM1 wM2 … wMN
Qi wi1 wi2 … wiN
How to find relevant documents for a query?
• Step 3: Rank documents (in increasing or decreasing order) based on their
closeness to the query.
 Documents are ranked by the degree of their closeness to the query.
 How closeness of the document to query measured?
 It is determined by a similarity/dissimilarity score calculation
 How many matching (similarity/dissimilarity measurements) you know? Which
one is best for IR?
 

n
d j q wi , j wi , q
sim(d j , q )     i 1

i 1 w i 1 i ,q
n n
dj q 2
i, j w 2
How to evaluate Models?
• We need to investigate what procedures the IR Models follow and what
techniques they use:
• What is the weighting technique used by the IR Models for measuring importance of
terms in documents?
• Are they using binary or non-binary weight?
• What is the matching technique used by the IR models?
• Are they measuring similarity or dissimilarity?
• Are they applying exact matching or partial matching in the course of finding relevant
documents for a given query?
• Are they applying best matching principle to measure the degree of relevance of
documents to display in ranked-order?
• Is there any Ranking mechanism applied before displaying relevant documents for the users?
The Boolean Model
•Boolean model is a simple model based on set theory
 The Boolean model imposes a binary criterion for
deciding relevance
 Documents must exactly match the query
•Terms are either present or absent. Thus,
wij  {0,1}
•sim(q,dj) 1 - if document satisfies the boolean
T1 T2 ….queryT
N
0 - otherwise D1 w11 w12 … w1N
- Note that, no weights D2 w21 w22 … w2N
assigned in-between 0 : : : :
and 1, just only values 0
: : : :
or 1
DM wM1 wM2 … wMN
The Boolean Model: Example
Given the following three documents, Construct Term – document matrix and find
the relevant documents retrieved by the Boolean model for the query
“gold silver truck”
• D1: “Shipment of gold damaged in a fire”
• D2: “Delivery of silver arrived in a silver truck”
• D3: “Shipment of gold arrived in a truck”
Table below shows document –term (ti) matrix

arrive damage deliver fire gold silver ship truck

D1
D2
D3
query

Also find the documents relevant for the queries:

(a)gold delivery; (b) ship gold; (c) silver truck
The Boolean Model: Further Example
• Given the following determine documents retrieved by the Boolean model based IR
system
• Index Terms: K1, …,K8.
• Documents:

1. D1 = {K1, K2, K3, K4, K5}

2. D2 = {K1, K2, K3, K4}
3. D3 = {K2, K4, K6, K8}
4. D4 = {K1, K3, K5, K7}
5. D5 = {K4, K5, K6, K7, K8}
6. D6 = {K1, K2, K3, K4}
• Query: K1 (K2  K3)
• Answer: {D1, D2, D4, D6} ({D1, D2, D3, D6} {D3, D5})
= {D1, D2, D6}
Exercise
Given the following four documents with the following contents:
• D1 = “computer information retrieval”
• D2 = “computer retrieval”
• D3 = “information”
• D4 = “computer information”

• What are the relevant documents retrieved for the queries:

• Q1 = “information  retrieval”
• Q2 = “information  ¬computer”
Drawbacks of the Boolean Model
•Retrieval based on binary decision criteria with no notion of
partial matching
•No ranking of the documents is provided (absence of a
grading scale)
•Information need has to be translated into a Boolean
expression which most users find awkward
•The Boolean queries formulated by the users are most often
too simplistic
 As a consequence, the Boolean model frequently returns
either too few or too many documents in response to a
user query
Vector-Space Model
• This is the most commonly used strategy for measuring relevance of documents for a
given query. This is because,
 Use of binary weights is too limiting
 Non-binary weights provide consideration for partial matches

• These term weights are used to compute a degree of similarity between a query and each
document
 Ranked set of documents provides for better matching

• The idea behind VSM is that

 the meaning of a document is conveyed by the words used in that document
 VSM represent documents and queries as vectors in a multi-dimensional space,
where each dimension corresponds to a unique term (word) from the entire corpus.
The weights of the terms in these vectors are typically derived from their frequency
in the document and their importance in the corpus.
Vector-Space Model
To find relevant documens for a given query,
• First, map documents and queries into term-document vector space.
Note that queries are considered as short document
• Second, in the vector space, queries and documents are represented as
weighted vectors, wij
There are different weighting technique; the most widely used one is computing
TF*IDF weight for each term

• Third, similarity measurement is used to rank documents by the closeness of

their vectors to the query.
To measure closeness of documents to the query cosine similarity score is used by most
search engines
Computing weights

freq (i, j )
wij  * log(N/n i )
max( freq ( k , j ))

freq (i, q )
wiq  0.5  [0.5 * ] * log(N/n i )
max( freq ( k , q ))
Example: Computing weights
• A collection includes 10,000 documents
 The term tA appears 20 times in a particular document j
 The maximum appearance of term tk in document j is 50 times
 The term tA appears in 2,000 of the document collections.

• Compute TF*IDF weight of term A?

 tf(A,j) = freq(A,j) / max(freq(k,j)) = 20/50 = 0.4
 idf(A) = log(N/DFA) = log (10,000/2,000) = log(5) = 2.32
 wAj = tf(A,j) * log(N/DFA) = 0.4 * 2.32 = 0.928
Similarity Measure
• A similarity measure is a function that computes the degree of
similarity/dissimilarity between document j and users query.
 

n
d j q w w
i 1 i , j i , q
sim(d j , q )    
i 1 i , j i 1 i ,q
n n
dj q w 2
w 2

• Using a similarity score between the query and each document:

• It is possible to apply best matching such that documents are ranked for
retrieval in the order of presumed relevance.
• It is possible to enforce a certain threshold so that we can control the size of
the retrieved set of documents.
Vector Space with Term Weights
and Cosine Similarity Measure
Di=(d1i,w1di;d2i, w2di;…;dti, wtdi)
Term B
Q =(q1i,w1qi;q2i, w2qi;…;qti, wtqi)
1.0 Q = (0.4,0.8)

t
D2 Q D1=(0.8,0.3)
j 1
w jq w jdi
0.8 D2=(0.2,0.7) sim(Q, Di ) 
 j 1 (w jq )  j 1 jdi
t 2 t 2
( w )
0.6
2 (0.4 0.2)  (0.8 0.7)
sim (Q, D 2) 
0.4
[(0.4) 2  (0.8) 2 ] [(0.2) 2  (0.7) 2 ]
D1
0.2 1 0.64
 0.98
0.42
0 0.2 0.4 0.6 0.8 1.0
.56
Term A sim(Q, D1 )  0.74
0.58
Example Vector-Space Model
• Suppose user query for: Q = “gold silver truck”. The database collection
consists of three documents with the following content.
D1: “Shipment of gold damaged in a fire”
D2: “Delivery of silver arrived in a silver truck”
D3: “Shipment of gold arrived in a truck”
• Show retrieval results in ranked order?
1.Assume that full text terms are used during indexing, without removing
common terms, stop words, & also no terms are stemmed.
2.Assume that content-bearing terms are selected during indexing
3.Also compare your result with or without normalizing term frequency
Example VSM: Weighting
Terms Counts TF W = TF*IDF i
Terms Q DF IDF
D1 D2 D3 Q D1 D2 D3

arrive 0 0 1 1 2 0.176 0 0 0.176 0.176

damage 0 1 0 0 1 0.477 0 0.477 0 0
deliver 0 0 1 0 1 0.477 0 0 0.477 0
fire 0 1 0 0 1 0.477 0 0.477 0 0
gold 1 1 0 1 2 0.176 0.176 0.176 0 0.176
silver 1 0 2 0 1 0.477 0.477 0 0.954 0
ship 0 1 0 1 2 0.176 0 0.176 0 0.176
truck 1 0 1 1 2 0.176 0.176 0 0.176 0.176
Example VSM: Weighting
Terms
Terms Q D1 D2 D3
arrive 0 0 0.176 0.176
damage 0 0.477 0 0
deliver 0 0 0.477 0
fire 0 0.477 0 0
gold 0.176 0.176 0 0.176
silver 0.477 0 0.954 0
ship 0 0.176 0 0.176
truck 0.176 0 0.176 0.176
Example VSM: similarity Measure
•Compute similarity using cosine Sim(q,d1)

• First, for each document and query, compute all vector lengths
(zero terms ignored)
|d1|= 0.477 2  0.477 2  0.1762  0.176
=2 = 0.719
0.517
|d2|= = 2
0.1762  0.477 2  0.9542  0.176 = 1.095
1.1996
|d3|= =2
0.176 2  0.1762  0.1762  0.176 0=
.124
0.352

|q|= 0.1762  0.4712  0.176=2 = 0.538

0.2896
• Next, compute dot products (zero products ignored)
Q*d1= 0.176*0.176 = 0.029392
Q*d2 =0.954*0.477 + 0.176 *0.176 = 0.4862
Q*d3 = 0.176*0.167 + 0.176*0.167 = 0.0620
Example VSM: Ranking
Now, compute similarity score
Sim(q,d1) = (0.029392) / (0.538*0.719) = 0.075678
Sim(q,d2) = (0.4862 ) / (0.538*1.095)= 0.8246
Sim(q,d3) = (0.0620) / (0.538*0.352)= 0.3271
Finally, we sort and rank documents in descending order according
to the similarity scores
Rank 1: Doc 2 = 0.8246
Rank 2: Doc 3 = 0.3271
Rank 3: Doc 1 = 0.075678
• Exercise: using normalized TF, rank documents using cosine
similarity measure? Hint: Normalize TF of term i in doc j
using max frequency of a term k in document j.
Vector-Space Model
• Advantages:
• Term-weighting improves quality of the answer set since it helps to display relevant
documents in ranked order
• Partial matching allows retrieval of documents that approximate the query
conditions
• Cosine ranking formula sorts documents according to degree of similarity to the
query

• Disadvantages:
• Assumes independence of index terms. It doesn’t relate one term with another
term: challenging relevance ranking by capturing semantic relationship
• Computationally expensive since it measures the similarity between each
document and the query
Exercise 1
Suppose the database collection consists of the following documents.
c1: Human machine interface for Lab ABC computer applications
c2: A survey of user opinion of computer system response time
c3: The EPS user interface management system
c4: System and human system engineering testing of EPS
c5: Relation of user-perceived response time to error measure
M1: The generation of random, binary, unordered trees
M2: The intersection graph of paths in trees
M3: Graph minors: Widths of trees and well-quasi-ordering
m4: Graph minors: A survey
Query:
Find documents relevant to "human computer interaction"
Exercise 2
• Consider these documents:
Doc 1 breakthrough drug for schizophrenia
Doc 2 new schizophrenia drug
Doc 3 new approach for treatment of schizophrenia
Doc 4 new hopes for schizophrenia patients
• Draw the term-document incidence matrix for this document collection.
• Draw the inverted index representation for this collection.

• For the document collection shown above, what are the returned results
for the queries:
• schizophrenia AND drug
• for AND NOT(drug OR approach)
Probabilistic Model
• IR is an uncertain process
• Mapping Information need to Query is not perfect
• Mapping Documents to index terms is a logical representation
• Query terms and index terms mostly mismatch

• This situation leads to several statistical approaches: probability theory,

fuzzy logic, theory of evidence, language modeling, etc.
• Probabilistic retrieval model is rigorous formal model that attempts to
predict the probability that a given document will be relevant to a given
query; i.e. Prob(R|(q,di))
• Use probability to estimate the “odds” of relevance of a query to a document.
• It relies on accurate estimates of probabilities
Probability Ranking Principle
• The relevance of a given document for users query can be determined by the
probability score
• High probability (prob(rel | di q): means more likely for users to get relevant
information by reading document di.
• A Probabilistic retrieval model follows Probability ranking principle
• You have a collection of Documents
• A set of relevant documents needs to be returned for queries issued by users
• Intuitively, want the “best” document to be first, second best - second, etc…
• According to probability ranking principle, documents are ranked in decreasing order
of probability of relevance to users information need
Terms Existence in Relevant Document
N=the total number of documents in the collection
n= the total number of documents that contain term ti
R=the total number of relevant documents retrieved
r=the total number of relevant documents retrieved that contain term t i
Document Relevance
For term ti No of relevant No of non-relevant Total
docs docs
No of docs including r n-r n
term ti
No of docs excluding R-r N-R-(n-r) N-n
term ti
Total R N-R N
(r  0.5)( N  n  R  r  0.5)
wi log
(n  r  0.5)( R  r  0.5)
Computing term probabilities
Probabilistic Model Example
d Document vectors <tfd,t>
cold day eat hot lot nine old pea pizza pot
1 1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1
6 1 1
wt 0.26 0.56 0.56 0.26 0.56 0.56 0.56 0.0 0.0 0.26
• q1 = eat
• q2 = eat pizza
• q4 = eat hot pizza
Improving the Ranking
• Now, suppose
• we have shown the initial ranking to the user
• the user has labeled some of the documents as relevant ("relevance feedback")
• We now have
• N documents in collection, R are known relevant documents
• ni documents containing ti, out of which ri are relevant
Relevance weighted Example
Document vectors <tfd,t>
d
cold day eat hot lot nine old pea pizza pot Relev
ance
1 1 1 1 1 NR
2 1 1 1 R
3 1 1 1 NR
4 1 1 1 NR
5 1 1 NR
6 1 1 NR
wt -0.33 0.00 0.00 -0.33 0.00 0.00 0.00 0.62 0.62 0.95
• query = hot pizza
• Document 2 is relevant
Probabilistic Retrieval Example
• D1: “Cost of paper is up.” (relevant)
• D2: “Cost of jellybeans is up.” (not relevant)
• D3: “Salaries of CEO’s are up.” (not relevant)
• D4: “Paper: CEO’s labor cost up.” (????)
Probabilistic Retrieval Example
cost paper Jellybean salary CEO labor up Releva
nce
D1 1 1 0 0 0 0 1 R
D2 1 0 1 0 0 0 1 NR
D3 0 0 0 1 1 0 1 NR
D4 1 1 0 0 1 1 1 ??
Wij 0.477 1.176 -0.477 -0.477 -0.477 0.222 -0.222

• D1=0.477 +1.176+ -0.222

• D2=0.477 + -0.477+ -0.222
• D3= -0.477 + -0.477+ -0.222
• D4=1.176 + -0.477 + 0.222 +0.477 + -0.222
Exercise
• Consider the collection below. The collection has 5 documents and each document is
described by two terms. The initial guess of relevance to a particular query Q is as given
in the table below. Assuming the query Q has a total of 2 relevant documents in this
collection solve the following questions
Document T1 T2 Relevance
D1 1 1 R
D2 0 1 NR
D3 1 0 NR
D4 1 0 R
D5 0 1 NR

• Using the probabilistic term weighting formula, calculate the new weight for each of the
query in Q
• Rank the documents according to their probability of relevance with the new query
Probabilistic model
• Probabilistic model uses probability theory to model the uncertainty in the
retrieval process
• Assumptions are made explicit
• Term weight without relevance information is IDF
• Relevance feedback can improve the ranking by giving better term probability
estimates
• Advantages of probabilistic model over vector ‐space
• Strong theoretical basis
• Since the base is probability theory, it is very well understood
• Easy to extend
• Disadvantages
• Models are often complicated
• No term frequency weighting
• Which is better: vector‐space or probabilistic?
• Both are approximately as good as each other
• Depends on collection, query, and other factors
Thank you

NLP Unit-Ii (Part-I)
No ratings yet
NLP Unit-Ii (Part-I)
19 pages
Information Retrieval Notes
No ratings yet
Information Retrieval Notes
42 pages
ISE Information Retrieval Mod-V
No ratings yet
ISE Information Retrieval Mod-V
48 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
61 pages
L02-IR Models MMN
No ratings yet
L02-IR Models MMN
27 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
Chapter 5 IR
No ratings yet
Chapter 5 IR
46 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
15 pages
Module 6 Updated Final
No ratings yet
Module 6 Updated Final
48 pages
4 IRModels
No ratings yet
4 IRModels
30 pages
Ir Mod2 Notes
No ratings yet
Ir Mod2 Notes
26 pages
4-IR Models
No ratings yet
4-IR Models
33 pages
IR Models
No ratings yet
IR Models
65 pages
Chapter 4
No ratings yet
Chapter 4
48 pages
IRS Unit 3 by Krishna
No ratings yet
IRS Unit 3 by Krishna
50 pages
Unit 2
No ratings yet
Unit 2
13 pages
Web Search
No ratings yet
Web Search
30 pages
Ir4 Retrieval Models - 6up
No ratings yet
Ir4 Retrieval Models - 6up
7 pages
Unit II
No ratings yet
Unit II
73 pages
5 IRModels IR
No ratings yet
5 IRModels IR
25 pages
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
No ratings yet
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
48 pages
IR - Models
100% (3)
IR - Models
58 pages
NLP See
No ratings yet
NLP See
27 pages
LIBS 894 Assignment Three Classic Models
No ratings yet
LIBS 894 Assignment Three Classic Models
8 pages
4 IRModels
No ratings yet
4 IRModels
32 pages
Unit Ii Part B 1. Write About Basic IR Model
No ratings yet
Unit Ii Part B 1. Write About Basic IR Model
17 pages
Unit-5 Adt
No ratings yet
Unit-5 Adt
11 pages
4 IRModels
No ratings yet
4 IRModels
46 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
Management Accounting 2marks Solved (2014-2021)
No ratings yet
Management Accounting 2marks Solved (2014-2021)
12 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
Bulu
No ratings yet
Bulu
47 pages
5 IRModels
No ratings yet
5 IRModels
30 pages
IR Chapter 4
No ratings yet
IR Chapter 4
15 pages
4-IR Models
No ratings yet
4-IR Models
33 pages
IR Unit II
No ratings yet
IR Unit II
4 pages
IR Models: Chapter Five
100% (1)
IR Models: Chapter Five
26 pages
Chapter Five IR Models
No ratings yet
Chapter Five IR Models
28 pages
Information Retrieval - 1
No ratings yet
Information Retrieval - 1
47 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Information Retrieval System-Chapter-1
No ratings yet
Information Retrieval System-Chapter-1
23 pages
Unit 2
No ratings yet
Unit 2
58 pages
Chaos Poincaré Seminar
100% (1)
Chaos Poincaré Seminar
281 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
62 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
5 B IRModels
No ratings yet
5 B IRModels
51 pages
IR Chap4
100% (1)
IR Chap4
32 pages
IR Chap4
100% (1)
IR Chap4
32 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
Chapter 4 IR Models
No ratings yet
Chapter 4 IR Models
43 pages
Arihant (Madam Rides The Bus)
No ratings yet
Arihant (Madam Rides The Bus)
8 pages
Epp
100% (1)
Epp
2 pages
Chapter 4 IR Models
No ratings yet
Chapter 4 IR Models
34 pages
02 Chap02a-BooleanAndvector Models
No ratings yet
02 Chap02a-BooleanAndvector Models
30 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
Chapter 2: Modeling: Advanced Topics in Information Retrieval
No ratings yet
Chapter 2: Modeling: Advanced Topics in Information Retrieval
28 pages
IR Unit 2
No ratings yet
IR Unit 2
54 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
482 Cr.P.C. Ramu & Ors by Prem Driver CJM 2017
No ratings yet
482 Cr.P.C. Ramu & Ors by Prem Driver CJM 2017
10 pages
Completed Unit II 17.7.17
No ratings yet
Completed Unit II 17.7.17
113 pages
MAPEH 7 Badminton
No ratings yet
MAPEH 7 Badminton
3 pages
Automatic Escalator Control System Using PLC Ijariie1975
100% (1)
Automatic Escalator Control System Using PLC Ijariie1975
5 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Yogashayan Sanskrit Asanas Names List
No ratings yet
Yogashayan Sanskrit Asanas Names List
4 pages
Session 2: Personal Professional Development: Pre-Test
No ratings yet
Session 2: Personal Professional Development: Pre-Test
9 pages
IR Systems Usually Adopt Index Terms To Process Queries Index Term
No ratings yet
IR Systems Usually Adopt Index Terms To Process Queries Index Term
24 pages
VIVACIDAD Vihtavuori
No ratings yet
VIVACIDAD Vihtavuori
1 page
Sample .Paper - 1 - Class Xii
No ratings yet
Sample .Paper - 1 - Class Xii
7 pages
ERPNEXT
No ratings yet
ERPNEXT
5 pages
2 - SITXHRM003 Lead and Manage People Student Assessment Guide
No ratings yet
2 - SITXHRM003 Lead and Manage People Student Assessment Guide
78 pages
Algebra P4
No ratings yet
Algebra P4
95 pages
1SDH001316R1002 Ekip Touch
No ratings yet
1SDH001316R1002 Ekip Touch
40 pages
Duraco Septic Tank
100% (1)
Duraco Septic Tank
6 pages
w9 - L2 - Review For Lecture Midterm 2
No ratings yet
w9 - L2 - Review For Lecture Midterm 2
14 pages
Library Manager
No ratings yet
Library Manager
20 pages
Business Plan of Rapido Deliveries
No ratings yet
Business Plan of Rapido Deliveries
85 pages
Planning Engineer or Business Analyst or Data Analyst or Plannin
No ratings yet
Planning Engineer or Business Analyst or Data Analyst or Plannin
2 pages
Multiple Choice Questions - 2016 Promotion
No ratings yet
Multiple Choice Questions - 2016 Promotion
136 pages
Hailemariam Hailegiorgis Tilahun
No ratings yet
Hailemariam Hailegiorgis Tilahun
59 pages
Sample Final 33
No ratings yet
Sample Final 33
43 pages
Sample Final 2
No ratings yet
Sample Final 2
38 pages
Media Are The Communication Outlets or Tools Used To Store and Deliver Information or Data
No ratings yet
Media Are The Communication Outlets or Tools Used To Store and Deliver Information or Data
7 pages
Sample
No ratings yet
Sample
17 pages
Evaluating and Choosing An Iot Platform
No ratings yet
Evaluating and Choosing An Iot Platform
26 pages
Nursing: A Concept-Based Approach To Learning: Volume One, Third Edition
No ratings yet
Nursing: A Concept-Based Approach To Learning: Volume One, Third Edition
32 pages
B .Inggris
No ratings yet
B .Inggris
4 pages
Applications of Phase Transformations: Lecture 42
No ratings yet
Applications of Phase Transformations: Lecture 42
21 pages
Submission Week6
No ratings yet
Submission Week6
7 pages
Wireless Television Notice Board
No ratings yet
Wireless Television Notice Board
10 pages
Deduction in Respect of Health Insurance Premia. 80D
No ratings yet
Deduction in Respect of Health Insurance Premia. 80D
2 pages
Greek Lit Quiz
No ratings yet
Greek Lit Quiz
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

4 IRModels

Uploaded by

4 IRModels

Uploaded by

IR models

arrive damage deliver fire gold silver ship truck

Also find the documents relevant for the queries:

1. D1 = {K1, K2, K3, K4, K5}

• What are the relevant documents retrieved for the queries:

• The idea behind VSM is that

• Third, similarity measurement is used to rank documents by the closeness of

• Compute TF*IDF weight of term A?

• Using a similarity score between the query and each document:

arrive 0 0 1 1 2 0.176 0 0 0.176 0.176

|q|= 0.1762  0.4712  0.176=2 = 0.538

• This situation leads to several statistical approaches: probability theory,

• D1=0.477 +1.176+ -0.222

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.