0% found this document useful (0 votes)

5 views11 pages

Unit 5_ Query Operations and Languages

Uploaded by

abhayadyo1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views11 pages

Unit 5_ Query Operations and Languages

Uploaded by

abhayadyo1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

IR - CHAPTER 5 2011

QUERY LANGUAGE
- A query is the formulation of a user information need.
- Most query languages try to use the content (semantics) and the structure of the text (syntax), to
find relevant documents.
- The retrieved unit is the basic element which can be retrieved as an answer to a query.
- The retrieval unit can be a file, a document, a webpage, etc.
 Keyword Based Queries
- Single Word Queries
- Context Queries
- Boolean Queries
- Natural Languages
 Pattern Matching
 Structural Queries

1. KEYWORD BASED QUERY

- A query is composed of keywords and the documents containing such keyword are searched for.
 Single Word Queries
- The most elementary query that can be formulated in text retrieval is a word.
- Documents are assumed to be long sequences of words.
- Word is a sequence of letters surrounded by separators.
- Some characters are not letters but do not split a word. For eg: hyphen (co-education).
- The result of word queries is the set of documents containing at least one of the words
of the query.
- Further, the set of resulting documents are ranked according to a degree of similarity to
the query, i.e. tf, tdf.
 Context Queries
- Many systems complement single word queries with the ability to search words in a
given context, i.e. hear other words.

1
IR - CHAPTER 5 2011
- Words which appear near each other may signal a higher likelihood of relevance than if
they appear apart.
- Two types of queries:
1. Phrasal Queries
- Phrase is a sequence of single word queries.
- An occurrence of the phrase is a sequence of words.
- Relevance documents are those that contain a specific phrase, i.e. ordered list of
contiguous word. For example: “buy camera” matches “buy a camera”, “buying the
cameras”, etc.
- Must have an inverted index that also stores positions of each keyword in a
document.
- Retrieving a document and position for each individual word, intersect documents
and then finally checks for ordered contiguity of keyword positions.

2. Proximity Queries
 A more relaxed version of the phrase query.
 In this case, a sequence of single words or phrase is given together with a
maximum allowed distance between them.
 List of words with specific maximal distance constraints between terms.
 For example: “dogs” and “race” within 4 words match.
“……dog will begin the race……”
 May also perform stemming and/or/not count stop words.
 Boolean Queries
- A Boolean query has a syntax composed of basic queries that retrieve documents and of
Boolean operators which work on their operands and deliver set of documents.
- Since this schema is general compositional, a query syntax tree is naturally defined,
where the leaves corresponds to the basic queries and the internal nodes to the
operators.

2
IR - CHAPTER 5 2011
- For example:

Fig.: it will retrieve all the documents which contain the word “translate” as well as
either the word “syntax” or the word “syntactic”.
- Operators most commonly used in Boolean queries are:
1. OR (e1 OR e2)
2. AND (e1 AND e2)
3. BUT (e1 BUT e2)  satisfy e1 but not e2
 Natural Language
- Full text queries as arbitrary strings.
- All the documents matching a portion of the user query are retrieved.
- Higher ranking is assigned to those documents matching more part of the query.
- Typically, such process is used by vector space model.

2. PATTERN MATCHING
- A pattern is a set of syntactic features that must occur in a text segment.
- Those segments satisfying the pattern specifications are said to match the pattern.
- Examples:
1. Prefixes: Pattern that matches start of the word. For example: “anti” means “antiquity”,
antibody”, etc.

2. Suffixes: Pattern that matches end of the word. For example: “ix” matches “fix”, “matrix”,
etc.

3
IR - CHAPTER 5 2011
3. Substrings: Pattern that matches arbitrary sub-sequence of characters. For example: “rapt”
matches “enrapture”, “velociraptor”, etc.

4. Ranges: Pair of strings that matches any word alphabetically between them. For example:
“tin” to “tix” matches “tip”, “tire”, “title”, etc.

ALLOWING ERRORS
- What if query or document contains misspellings?
- Judge the similarity of words using edit distance.

EDIT (LEVENSTEIN) DISTANCE

- Minimum number of character deletions, additions or replacements needed to make two strings
equivalent.
- For example: “misspell” to “misspell” is distance 1.
“misspell” to “mistell” is distance 2.
“misspell” to “misspelling” is distance 3.

REGULAR EXPRESSIONS
- Some text retrieval systems allow searching for regular expression.
- Examples:
1. (u/e) nabl (e/ing) matches
- unable, unabling, enable, enabling
2. (un/en) *able matches
- able, unable, untenable, enununenable

RELEVANCE FEEDBACK & QUERY EXPANSION

- In most collections, the same concept may be referred to using different words (synonyms).
- For example: a search for ‘restaurant” to match “café”.

4
IR - CHAPTER 5 2011
- Such problem can be addressed by user manually.
- Also the system can help with query refinement.
- Such methods for tackling this problem by system are classified into two classes.
1. Global Methods
- Query expansion/reformulation with a thesaurus or word net.
- Techniques like spelling correction.
2. Local Methods
- Relevance feedback
- Pseudo-relevance feedback
- Indirect relevance feedback

RELEVANCE FEEDBACK
- The idea of relevance feedback is to involve the user in the retrieval process so as to improve the
final result set.
- In particular, the user gives feedback on the relevance of documents in an initial set of results.

5
IR - CHAPTER 5 2011
BASIC PROCEDURE
- The user issues a query.
- The system returns an initial set of retrieval results.
- The user marks some returned documents as relevant or non-relevant.
- The system computes a better representation of the information need based on the user feedback.
- The system displays a revised set of retrieval results.
- Seeing some documents may lead users to refine their understanding of the information they are
seeking.
- Image search provides a good example or relevance feedback.

WHEN DOES RELEVANCE FEEDBACK WORK?

- The success of relevance feedback depends on certain assumptions.
- Firstly, the user has to have sufficient knowledge to be able to make an initial query which is at
least somewhere close to the documents they desire.
- Secondly, the relevance feedback approach requires relevant documents to be similar to each
other, i.e. they should cluster.

CASES WHERE RELEVANCE FEEDBACK ALONE IS NOT SUFFICIENT

1. Misspellings
- If the user spells a term in different way to the way it is spelled in any document in the
collection then relevance feedback is unlikely to be effective.
2. Cross language information retrieval (CLIR)
- It is difficult to cluster the same relevance documents in different language rather than some
language.
3. Mismatch of searcher’s vocabulary versus collection vocabulary
- For example: if the user searches for “laptop” but all the documents use the term “notebook
computer” then the query fails.

6
IR - CHAPTER 5 2011
PSEUDO RELEVANCE FEEDBACK
- Also called blind relevance feedback.
- Provides a method for automatic local analysis.
- Use relevance feedback methods without explicit user input.
- Just assume the top m retrieved documents are relevant and use them to reformulate the query.
- Allows for query expansion that includes terms that are correlated with the query terms.

INDIRECT RELEVANCE FEEDBACK

- On the web, direct hit introduced the idea of ranking more highly documents that users choose at
more often.
- Clicks on the links were assumed to indicate that the page was likely relevant to the query.
- Click stream mining.

7
IR - CHAPTER 5 2011
THESAURUS
- A thesaurus provides information on synonyms and semantically related words and phrases.
- The IR system might also suggest search terms by means of a thesaurus.
- A user can also be allowed to browse lists of the terms that are in the inverted index and thus find
good terms that appear in the collection.
- For example: Physician
Syn (Synonyms): doc, doctor, MD, medical, medicines, medico
Rel (Related): medic, general practitioner, surgeon

THESAURUS BASED QUERY EXPANSION

- For each term t in a query, expand the query with synonyms and related words of t from the
thesaurus.
- May weight added terms less than original query terms.
- Generally increases recall.
- May significantly decrease precision, particularly with ambiguous terms.
- For example: “interest rate”  “interest rate fascinate evaluate”.

WORD NET
- Word net is a lexical database for the English language.
- It groups English words into sets of synonyms called synsets providing various semantic relations
between these synonym sets.
- Word net is more detailed database of semantic relationships between English words.
- Developed by famous cognitive psychologist George Miller and a team at Princeton University.
- About 144000 English words.

WORD NET SYNSET RELATIONSHIP

1. Antonym: front  back
2. Attribute: benelovelence  good (noun to adjective)

8
IR - CHAPTER 5 2011
3. Pertainym: alphabetical  alphabet (adjective to noun)
4. Similar: unquestioning  absolute
5. Cause: kill  die
6. Entailment: breathe  inhale
7. Holonym: chapter  text (part of)
8. Meronym: computer  CPU (whole of)
9. Hyponym: tree  plant (specialization)
10. Hypernym: fruit  apple (generalization)

WORD NET QUERY EXPANSION

- Add synonyms in the same synset.
- Add hyponyms to add specialized term.
- Add hypernyms to generalize a query.
- Note: Y is a holonym of X, if X is a part of Y
Y is a meronym of X, if Y is a part of X

SPELLING CORRECTION
- Correcting spelling errors in queries.
- For instance, we may wish to retrieve documents containing the term “carrot” when the user types
the query “carot”.
- Two steps to solve this problem:
i. Edit distance
ii. K-gram overlap

IMPLEMENTING SPELLING CORRECTION

- Of various alternative correct spellings for a misspelled query, choose the nearest one (i.e. the
smallest edit distance).

9
IR - CHAPTER 5 2011
- When two correctly spelled queries are tied, select the one that is more common. For example:
“grunt” and “grant” both seem equally plausible as correction for “grnt”. Correction is done then
by examining which term (grunt or grant) is typed by the user in the query.

FORMS OF SPELLING CORRECTION

- Two forms:
i. Isolated term correction
ii. Context sensitive correction
- In isolated term correction, correct a single query term at a time, even when we have multiple term
queries.
- But sometimes, such isolated term correction fails to detect.
- For example: “flew form Nepal”  contains the misspelling of the term “from” but not detected by
isolated term correction. In such case we need context sensitive correction.

EDIT DISTANCE
- Given two character strings S1 and S2, the edit distance between them is the minimum number of
edit operations required to transform S1 into S2.
- Most commonly edit operations include the following operations:
i. Insert a character into a string.
ii. Delete a character from a string.
iii. Replace a character of string by another character.
- Edit distance is also called Levenstein distance.
- Algorithm:
EDIT DISTANCE (S1, S2)
int M[i, j] = 0
for i = 1 to |S1|
do M[i, 0] = i
for j = 1 to |S2|

10
IR - CHAPTER 5 2011
do M[0, j] = j
for i = 1 to |S1|
do for j = 1 to |S2|
do M[i, j] = min { M[i-1, j-1] + if (S1[i] = S2[j]) then 0 else 1, M[i-1, j] + 1, M[i, j-1] + 1}
return M[|S1|, |S2|]
- The [i, j] entry of the matrix (after execution of algorithm) will hold the edit distance between the
strings consisting of the first i characters of S1 and first j characters of S2.

K-GRAM INDEXES FOR SPELLING CORRECTION

- A k-gram is a sequence of k characters.
- Example: “cas”, “ast”, “stl” are 3 grams occurring in term “castle”.
- Use the k-gram index to retrieve vocabulary terms that have many k-grams in common with the
query.
- Example:

Fig.: Matching at least two of the three 2 gram in the query “bord”
- Suppose we want to retrieve vocabulary terms that contained at least two of these bigrams. We
would enumerate aboard, boardroom and border.

emutye
No ratings yet
emutye
20 pages
7 B - Query Languages
No ratings yet
7 B - Query Languages
33 pages
Chapter Five (ISR)
No ratings yet
Chapter Five (ISR)
17 pages
Ir - Chapter 1
No ratings yet
Ir - Chapter 1
7 pages
Information Retrieval: Adt-V Unit
No ratings yet
Information Retrieval: Adt-V Unit
106 pages
Made By:-Bhawana Agarwal Cs Iiiyr
No ratings yet
Made By:-Bhawana Agarwal Cs Iiiyr
29 pages
bulu
No ratings yet
bulu
47 pages
chapter 1 ir (1)
No ratings yet
chapter 1 ir (1)
37 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
Unit3 QueryLanguages Berlin
No ratings yet
Unit3 QueryLanguages Berlin
29 pages
IR Chap7
No ratings yet
IR Chap7
30 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Query Languages: Chapter Seven
No ratings yet
Query Languages: Chapter Seven
36 pages
Query Languages-WPS Office
No ratings yet
Query Languages-WPS Office
8 pages
Query Languages
No ratings yet
Query Languages
34 pages
Chap 4 Text IR PDF
No ratings yet
Chap 4 Text IR PDF
19 pages
Introduction To Information Storage and Retrieval Systems: BY-Research Scholar
No ratings yet
Introduction To Information Storage and Retrieval Systems: BY-Research Scholar
42 pages
7 Query Languages Operations
No ratings yet
7 Query Languages Operations
12 pages
1 IR Intro
No ratings yet
1 IR Intro
30 pages
Web Information Retrieval
No ratings yet
Web Information Retrieval
10 pages
Text Databases and Information Retrieval: Riloff, Hollaar@cs - Utah.edu&
No ratings yet
Text Databases and Information Retrieval: Riloff, Hollaar@cs - Utah.edu&
3 pages
Query Languages and Query Operation: Chapter Seven
No ratings yet
Query Languages and Query Operation: Chapter Seven
20 pages
IR ans
No ratings yet
IR ans
13 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
Information Retrieval
No ratings yet
Information Retrieval
72 pages
Unit 1: Introduction and Data Pre-Processing
No ratings yet
Unit 1: Introduction and Data Pre-Processing
71 pages
Chapter 1 Introduction To ISR
No ratings yet
Chapter 1 Introduction To ISR
39 pages
Ch1 IR
No ratings yet
Ch1 IR
39 pages
1_IR_Introductionn (1)
No ratings yet
1_IR_Introductionn (1)
30 pages
Irs 3
No ratings yet
Irs 3
14 pages
NLP UNIT-II(PART-I)
No ratings yet
NLP UNIT-II(PART-I)
19 pages
L02-IR Models MMN
No ratings yet
L02-IR Models MMN
27 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
ISE Information Retrieval Mod-V (Uploaded by Snaptricks.in)
No ratings yet
ISE Information Retrieval Mod-V (Uploaded by Snaptricks.in)
48 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
mod 4
No ratings yet
mod 4
35 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
4 pages
NLP 4
No ratings yet
NLP 4
33 pages
PPT08-Natural Language Processing
100% (1)
PPT08-Natural Language Processing
44 pages
Thesis Summary
No ratings yet
Thesis Summary
117 pages
NLP SEE
No ratings yet
NLP SEE
9 pages
Information Retrieval - 1
No ratings yet
Information Retrieval - 1
47 pages
01 Introduction to ISR
No ratings yet
01 Introduction to ISR
34 pages
Module_1-1
No ratings yet
Module_1-1
12 pages
What is Information Retrieval (IR) (1)
No ratings yet
What is Information Retrieval (IR) (1)
17 pages
Module1PartBInformationRetrievalWebdocuments
No ratings yet
Module1PartBInformationRetrievalWebdocuments
49 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Question Answering
No ratings yet
Question Answering
68 pages
Unit II
No ratings yet
Unit II
73 pages
module 7
No ratings yet
module 7
53 pages
Irt Ans
No ratings yet
Irt Ans
9 pages
ISE Information Retrieval Mod-V
No ratings yet
ISE Information Retrieval Mod-V
48 pages
Unit Iii - Information Retrieval Design Features of Information Retrieval Systems
No ratings yet
Unit Iii - Information Retrieval Design Features of Information Retrieval Systems
57 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Unit 4 NLP
No ratings yet
Unit 4 NLP
29 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
CS8080 Irt
100% (1)
CS8080 Irt
33 pages
Modern Information Retrieval Chapter 5 Query Operations
No ratings yet
Modern Information Retrieval Chapter 5 Query Operations
33 pages
Relevance Feedback Slides PDF
No ratings yet
Relevance Feedback Slides PDF
14 pages
User Interfaces and Visualization: Prof - Pravin V.Shinde
No ratings yet
User Interfaces and Visualization: Prof - Pravin V.Shinde
24 pages
Chapter 7
No ratings yet
Chapter 7
7 pages
Information Processing and Management: Junmei Wang, Min Pan, Tingting He, Xiang Huang, Xueyan Wang, Xinhui Tu T
No ratings yet
Information Processing and Management: Junmei Wang, Min Pan, Tingting He, Xiang Huang, Xueyan Wang, Xinhui Tu T
20 pages
From 3D Model Data To Semantics
No ratings yet
From 3D Model Data To Semantics
17 pages
Relevance Feedback & Query Expansion
No ratings yet
Relevance Feedback & Query Expansion
4 pages
Interactive Dense Retrieval and Query Refinement Systems_ a Synergistic Approach to Information Retrieval
No ratings yet
Interactive Dense Retrieval and Query Refinement Systems_ a Synergistic Approach to Information Retrieval
22 pages
Probabilistic Models in Information Retrieval by Norbert Fuhr
No ratings yet
Probabilistic Models in Information Retrieval by Norbert Fuhr
21 pages
AZ Lecture7-Queryexpansion
No ratings yet
AZ Lecture7-Queryexpansion
49 pages
IR-19 Asgmnt02 PDF
No ratings yet
IR-19 Asgmnt02 PDF
1 page
NLP Manual 1-8
No ratings yet
NLP Manual 1-8
15 pages
Sonali PPT Final
No ratings yet
Sonali PPT Final
33 pages
Information Retrieval: DR Sharifullah Khan Nust Seecs
No ratings yet
Information Retrieval: DR Sharifullah Khan Nust Seecs
32 pages
Lesson Plan
100% (1)
Lesson Plan
3 pages
Cs8080 - Irt - Notes All
No ratings yet
Cs8080 - Irt - Notes All
281 pages
Estimating Google’s search engine ranking function from a search engine optimization perspective
No ratings yet
Estimating Google’s search engine ranking function from a search engine optimization perspective
17 pages
Ccs369-Unit 3
No ratings yet
Ccs369-Unit 3
28 pages
Thesis Presentation Format
100% (2)
Thesis Presentation Format
4 pages
Spink A. (Ed), Cole Ch. (Ed) - New Directions in Cognitive Information Retrieval (2005)
No ratings yet
Spink A. (Ed), Cole Ch. (Ed) - New Directions in Cognitive Information Retrieval (2005)
249 pages
Relevance Feedback: Improving Results
No ratings yet
Relevance Feedback: Improving Results
41 pages
Relevance Feedback
No ratings yet
Relevance Feedback
37 pages
Query Operation 2021
No ratings yet
Query Operation 2021
35 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 5_ Query Operations and Languages

Uploaded by

Unit 5_ Query Operations and Languages

Uploaded by

IR - CHAPTER 5 2011

1. KEYWORD BASED QUERY

EDIT (LEVENSTEIN) DISTANCE

RELEVANCE FEEDBACK & QUERY EXPANSION

WHEN DOES RELEVANCE FEEDBACK WORK?

CASES WHERE RELEVANCE FEEDBACK ALONE IS NOT SUFFICIENT

INDIRECT RELEVANCE FEEDBACK

THESAURUS BASED QUERY EXPANSION

WORD NET SYNSET RELATIONSHIP

WORD NET QUERY EXPANSION

IMPLEMENTING SPELLING CORRECTION

FORMS OF SPELLING CORRECTION

K-GRAM INDEXES FOR SPELLING CORRECTION

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.