0% found this document useful (0 votes)
215 views14 pages

Irs 3

This document discusses query processing and operations. It covers different types of queries including keyword-based querying, phrase queries, proximity queries, Boolean queries, and natural language queries. Keyword-based querying is further broken down into single word queries and context queries like phrase and proximity queries. Boolean queries use operators like AND, OR, and NOT to combine keywords. Natural language queries aim to understand the structure and meaning of questions or narratives.

Uploaded by

Vanraj Pardeshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views14 pages

Irs 3

This document discusses query processing and operations. It covers different types of queries including keyword-based querying, phrase queries, proximity queries, Boolean queries, and natural language queries. Keyword-based querying is further broken down into single word queries and context queries like phrase and proximity queries. Boolean queries use operators like AND, OR, and NOT to combine keywords. Natural language queries aim to understand the structure and meaning of questions or narratives.

Uploaded by

Vanraj Pardeshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Module III

Query Processing
CHAPTER 3 and Operations

Sylabus
Query Languages Keyword based Querying, Pattern Matching,
Structural Queries, Query Protocols; Query Operations: User relevance
feedback, Multimedia IR models: Data Modeling
Self-learning Topics: Proximity Queries and Wildcard Queries

3.1 WHAT IS QUERY PROCESSING ?

GQ. Write the types of queries.


Query Processing is the activity performed in extracting data from the
database.
In query processing, it takes various steps for fetching the data from the
database.

The steps involved are: Parsing, Translation andOptimization.


The queries applied of structured and unstructured data stored in
databases combined with information rerieval techniques can lead
tofaster and efficient processing of data.
When a database is queried, it generates results using one of the multiple
available plans.

a 3.1.1 Query Languages


1) Keyword based Querying
(2) Pattern Matching
(3) Structural Queries
(Ouery

Ayeterm(MU)
ntonnahn
Fietrev QUERYING Intomation Ftetrievel fiysterm (MU) (Qusry Pr. B peratirns) Pa r (4)
KEYWORD-BASED
Querying. () Single Word (Querles
M 3.2 Keyword-based
shortnote
on Aquery is lutmulated by a word
Whitea kind of IR queries.
mostwidelyused Adtecunent is formulated by long wquences of word.
splestand combinations
tsthe sinply enter phrase A word is a scquence of leters uITOunded hy separators, fur exarnpie, a
userto word 'o ine'
rrquiresthe
similar docunents online Thc division of the text into words is not arbitrary.
doxcunents
lookfor
times, people Word qucries returIn alist of docunents that contajn at least onc of the
The majorityof
connection qucry words.
keywords
Alogical
AND
query keyword terms.
operator creates an
inplied

for example,
between The level of simil:arity between the returred docurnents and the query
determines their rarnking.
for
"infornat1on retricval," the Term frequcncy and invere document frequency are cornmonly used to
When searching contain both
will be documents thal the phra support ranking.
retrieved result (b) Context (Queries
"information" and "retrieval".
systems also retrieve Scarch words in a given context, that is, near other words
Additionally. the majority of
"information" or "retrieval" in
documents o
thern Words that are near together suggest a higher possibil1ty of rclevance
merely contain the words than words that are far apart.
key-words to the JR
Before delivering the filtered query most
engine,so
systems preprocess the data by removing the
frequent wo Types :
on. The order of these terms in
(sopwords), such as a, the, of, and so (1) Phrase (2) Proxinity
query is typically ignored by IR systens. Phrase
models.
Keyword scarches are supported by all retrieval
It is a sequence of single-word queries.
a 3.2.1 Types of Keyword-Based Querying An occurTence of the phrase is a sequence of words, for cxarrple.
"enhance retrieval'".
GQ What are the diferent types of Keyword-based Querying?
The phrase is generally enclosed within double quotes.
(a) Singie Word Queries Each retricved docunent must contain at least one instance of the exact
(b) ContextQueries phrase.
Phrase
Proximity
Proximity
(t) Boolean (Jueries Proximity refers to search that accounts for how close within a record
OR AND, BUT
multiple iterns should be to cach other.
(d) Natural Language It is a rrore relaxed version of phrase query.
(e) Widcard ueries Here, a sequence of single words or phrases, and a maximum allowed
distance between then are specified.

ien
wefwaderr year 22 23, (M787) (hNew Syl wefacadernic year 22-23) (M7-87) Tech-Neo Pubicatons
echties Publicatig
(MU) within 4 Words
RetrievalSysterm
Information should occur
retrieval" Infomation Retrieval System (MU) (Query Proc. & Operations) Pg. no. (3-5)
"enhance retrieval....
example. of
For thepower required to appear in
match..enhance
or
be
may not ihe sam Fuzzy Boolean
phrases may
The word or Retrieve documents appearing in some operands (The AND may require
query.
order as in the ittoappcar in more operands than the OR)
(c) Boolean Queries composed of atoms that (d) Natural Language
give a syntax
Boolean queries Boolean operators
which work their
on
NOT.
retrie
operands. It isgeneralization of "fuzzy Boolean'".
documents, and of AND, OR, ).
using the formulatioDs +, an A query is an enumeration of words and context queries.
systems allow keyword
Some IR combinations of All the documents matching a portion of the user query are retrieved.
as
Boolean operators in OR syntactic shown Few natural language search engines that aim to understand the
translation AND syntax
For example, structure and meaning of queries written in natural language text,
Fig. 3.2.1l. AND generally as question or narrative.
The system tries to formulate answers for these queries from reirieved
results.
Translation
OR
Semanticmodels can provide support for this query type.
Syntactic 3.3 PATTERN MATCHING
syntax
syntax tree
Fig. 3.2.l:A query I GQ. Define Pattern Matching.
IGQ. Write a short note on Pattern Matching.
terms be found.
AND requires that both
found. Data retrieval: allow the retrieval of pieces of text that have some
OR lets either term be excluded
containing the second term willbe property (match a pattern)
NOT means any record parentheses.
can be nested using A pattern is a set of syntactic features that must occur in atext segment.
0'means the Boolean operators should be place
requiring the term; the '+
+ is equivalent to AND, a 3.3.1 Types of Pattern Based Querying
directly in front of the search term.
to exclude the term: the What are the different types of Pattern Matching based Querying?
is equivalent to AND NOT and means GQ.
search term not wanted.
- should be placed directly in front of the
and thei (a) Words
Complex Boolean queries can be built out of these operators
evaluated according to the classical rules o
combinations, and they are Basic pattern
Boolean algebra. A string which must be a word in the text
No ranking is possible, because a document either satisfies such aquen (b) Prefixes
(is "relevant") or does not satisfy it (is "nonrele vant").
Adocument is retrievedfor a Boolean query if the query is logically tru A string which must form the beginning of the text word
international', etc
as an exact match in the document. For example, 'inter' in words 'interactive,
SystenMU)
inkoationRaeve
Infomation Retrieval System (MU) (Query Proc. &Operations) Pg no (3-7)
(e) Sefes ternunation of the text word
the
whihmastfrn cte concalenation: if el and e2 are regular expressions, the oxcurrences
ANtnng
wrds rekom, kingdom',
ám'in of (ele2) are forned by the occurrences of
Frevanle, followed tby those of e2
el immediately
( Sabetrings text wort
can ayarwithina repeition: ile is a regular expression, then (e®) matchs a sequence
whih
Asang alm, pals. principal, of zero or more contiguous occurrence ofe
' in worts Pal
Rr euample. 'procblemlein) (sle)\01I2)>'problem2" and 'proteins'
muNiahy e
3.4 STRUCTURAL QUERIES
(e) RangesMatchesany wond ing berweena pair of stringsin lexicographi
GQ. What are structural Queries? Give different structures.
order (alphabeticalorder)
rtrieve word such as
"hon GQ. Explain the following data structures giving suitable examples
and 'hold'
Fx exatpk, held' (a) Fixed (b) Hypertext (c) Hierarchical
hising
Allow user to query the text on their structure
(0 Alowing errors
Mixing contents and structure in queries
an error threshold
Aword together with
'similar' to the given word Contents : words, phrases, or patterns
Retrieve all text words which all
spell1ng or from oph Structural constraints: containment, proximity. or other
The parterm or text may have eror typing.
restrictions on structural elements
character reognition.
information retrieval are Three main structures
Models which can be used for
(a) Fixed structure
Edit distance
insertions, deletions.
the minimum number of character (b) Hypertext structure
replacements needed to make two strings equal (c) Hierarchical structure
for example. flower' and 'flo wer' (edit distance 1)
Maximum allowed edit distance
3.4.1 Fixed Structure
Fixed structure for text retrieval such as Form which is shown in
query specifies the maximum number of allowed errors for a wor
Fig. 3.4.1
to match the patern
extended to search substring and not only words For example : Mail archive
Each mail has a sender, a receiver, a date a subject and a body field as a
(g) Regular expressions fixed structure.
General pattem built up by simple string and following operators : Easy to search a mail based on a date, a receiver, a subject and so on
union: if el and e2 are regular expressions, then (elle2) matche Other examples : Log file (Document : a fixed set of fields)
what el or e2 matches

(New Syll. wef academic year 22-23) (M7-87) eTech-Neo Publication (New Syl. w.e.f academic year 22-23) (M7-87) Tech-Neo Publications
(MU)
RetrievalSysterm
Infomaton
Intormation Retrieval Systen (MU) (Query Proc. &Operations) Pg. no (3-9)
text
text
a 3.4.3 Hierarchical Structure
Intermediate level of flexibility
Fields text Lies between fixed structure and hypertext structure
Represent the recursive decomposition of the text
Fig. 3.4.3 represent a schematic view of Hierarchical structure
text

fixed structure
Fig. 3.4.l :Form-like
3.4.2 Hypertext Structure
suructure
Scarch by content and
nodes hold Some
directed graph where
A hypertext is a connections between nod
represent
(text contents). The links connectivity).
(structural
berween positions inside nodes Fig. 3.4.3 : Hierarchical structure
The user had to manually traverse the hypertext nodes following links
Fig. 3.4.4 as shown below gives an example of hierarchical structure
search what he wanted. with the page of book, its semantic view and a parsed query to retrieve
nodes and links
Fig. 3.4.2 representS a hypertext structure with the figure
chapter

section section

title titie figure

|Introduction We cover... .Structural.

in
Fig. 3.4.2 : Hypertext structure

Hypertext : Web Glimpse figure with


Web Glinpse combine browsing and searching on the Web. section with
It supports traditional navigation and enables searching for content
nearby the current node. title "structural"

Fig. 3.4.4 : An example-hierarchical structure

New Syl. wefacadermic year 22-23) (New Syll. w.e.f academic year 22-23) (M7-87) Tech-Neo Publications
(M7-87) Tech-Neo Publications
HIERARCI
SAMPLE OF
3.5 Intomation Retrieval Systen (MU) (Query Proc. & Operations) Pg no (31)
hierarchicalmodels
sampleof
Discussdierent structure withsuitable examples. The model allows to perforn set union and to combine regions.
Explain PAT
deta
G The model allows for the areas of a region to overlap, but not to nest.
3.5.1 PAT Expressions A 'followed by' operator adds the exra restrictions requiring that the
first region come before tlhe second area.
text index
sane index as the
Butlt on the the textt by tags An 'n words' operator creales the region containing all text's sequences
presumptivelyindicatedin of n words.
Structureis final tags
terms of initial and
Strucure is defined in and final tags It is not clear, whether overlapping is good or not for capturing the
defined by each pair ofinitial structural properties.
Region is region to overlap or nee.
the areas of a
The model allow for 3.5.3 Lists of References
system
PAT is a text searching
Uaiversity of Waterloo Model makes the definition and querying of structured text uniform
Developed at
set of suffix strings The structure of the document is fixed and hierarchical
PAT interprets text as a
every wordin this sentence yields the
For example, indexing
For exarnple, indexing every word in this sentence yield
12 string Allpossible regions are defined at indexing time
Overlap and nest are not allowed
12strings All elements must be of the same type, e.g. only sections, or only
this sentence yields the
example, indexing every word in 12 strin paragraphs.
indexing every word in this sentence yields the 12 strings Answer to the query is seen as list of 'references'
every word in this sentence yields the 12 strings
A reference is a pointer to a region of the database.
word in this sentence yields the 12 strings
in this sentence yields the 12 strings a 3.5.4 Proximal Nodes
this sentence yields the 12 strings This model tries to find a good compromise between expressiveness and
sentence yields the 12strings efficiency.
yields the 12 strings It does not define a specific language, but a model in which i is shown
the 12 strings that a number of useful operators can be included achieving good
12 strings efficiency.
strings The structure of the document is fiaed and hierarchical.

a 3.5.2 The model allows nested clements but no overlaps.


Overlapped Lists
The model considers the use of an a 3.5.5 Tree Matching
inverted list to index words as well
regions.
The model relies on tree inclusion.
(New Syll.
wefacademic year 22-23) (M7-87)
arech.
Tech-Neo Publication (New Syll. w.e.f academic year 22-23) (M7-87) aech:
Tech-Neo Publications
infomatonRetneva
text database and
structure of
boththe
into the database
the query
as
Lnterpretsthe embedding of the query Information Retrieval System (MU) (Ouery Proc. &Operations) Pg. no. (3-13)
the
todeterminerelationshipsbetweenthe
query's nodes
respecing
The leaves ofthe query can be not only structural elements but also
hierarchical
ancestor of the Jeaf must contain
(b) WAIS

Wide Area Information Service

patterns.
meaningthat the that pate Beginning in the 1990s
3.6 QUERYPROTOCOOLS Network publishing protocol
Query databases through the Intemet
Protocols.
on Query (c) CCL
Write a short note
GQ
Common Command Language
(a) Z39.50
National Standards Institute NISO proposal based on Z39.50
Approved by American
National Information Standards Organization(NISO) in 1995 (ANSI) Defines 19 commands
platform More popular in Europe
Can be implemented on any
bibliographical information using a standard Based on the classical Boolean model
Query
between the client and the host database
manager intert (d) CD-RDx
With query language, the protocol also specifies a way in wbi
Compact Disk Read only Data exchange
session, communicate and
client and server establish a
information, etc.
exchan Uses client server architecture on most platforms
Client is generic
Z39.50 protocol is part of WAIS Server is designed and provided by the CD-ROM publisher
Z39.50 Brief history Allows fixed length fields, images and audio
Supported by CIA, NASA and GSA
Work on the Z39.50 protocol began in the 1970s and loa
Successive versions in 1988. 1992, 1995 and 2003 (e) SFQL
Z39.50-1988(version 1) Structured Full-text Query Language
Z39.50-1992( version2) Based on SQL
Uses client server architecture
Z39.50-1995(version 3)
Adopted as a standard by aerospace community
Z39.50-2003(version 4) Documents are rows in relational tables which are tagged using
SGML

www Client| ww z39.50 Answer format has header and message area
z39.50 Repository
Server Digital 3.7 TRENDs AND RESEARCH ISSUES
Z39.50 Client Library
Table 3.7.1 shows the different basic queries allowed in the different
models.
Fig. 3.6.l: Using Z39.50 over the WWW

(New Syll wefacademic year 22-23) (M7-87) Tech-Neo Publications (New Syll. w.e.f academic year 22-23) (M7-87) Tech-Neo Publications
(MU
RetnevalSystem Network (BBN)
Infomation
Bayesian
Belief models intonabon Retrieval System (MU)
(Query Proc &
Operations) Pg no (3-15)
Probabilistic and
incoporalesseloperations.
Relationship
betweentypes of queries and models Approaches based on
Table3.7.1: Queries allowed feedback information from the user
documents initially retrieved
Model Queries allowed information derived from the set of
Boolean Words (called the local set of documents).
document collection
Vector Words global information derived from the
Probabilistic
Network Words 3.10 USER RELEVANCE
FEEDBACK
Bayesian Belief
TAXONOMY Define Relevance feedback model.
LANGUAGE GQ. Feedback. OR Give brief notes
3.8 QUERY I GQ. Give short notes for User Relevance
it is used in query i
operationscovered so far and about user Relevance feedback method and how
Fig. 3.8.1
types of
representsthe how t expansion
Relevance Feedback for !
can be structured. GQ. What are the two basic approaches in User
Boolean queries query processing?
Fuzzy Boolean
after reviewing. markS
natural User receives a list of searched documents and,
language basic queries the relevant documents
structured queries
that are attached to the
A selection of key terms or expressions
proximity document and identified by the user as relevant
Definition : Relevance Feedback Model
phrases pattern matching user to provide
After initial retrieval, results are presented, allow the
more of the retrieved documents.
errors feedback on the relevance of one or
Use this feedback information to reformulate the query and produce new
interactive multi
results based on reformulated query. Thus allows more
Words substrings regular expressions
pass process.
prefixes extended patterns
keywords and suffixes Two basic operations :
context
document
Query expansion : addition of new terms from relevant
Fig. 3.8.1 : Query Language Taxonomy (Expand queries with the vector model)
the
Term reweighting: modification of term weights based on
3.9 QUERY OPERATIONS user relevance judgement
It is difficult to formulate queries which are well designed for retrieval The usage of user relevance feedback to :
purposes. (a) expand queries with the vector model
model
Improving the initial query formulation through query expansion and (b) reweight query terms with the probabilistic
term reweighting the probabilistic model
(c) reweight query terms with a variant of
(New Syll. wefacademic year 22-23) Tech-Neo Publications
(M7-87)
Tech-Neo Publications (New Syll. w.e.f academic year 22-23) (M7-87)
InformatonRetrieV
3.11 VECTOR MODEL
Infomation Retrieval System (MU) (Query Proc. &Operations) Pg. no. (3-17)
weightingin
calculatetheterm document and
Howdo you VectorModel? Qu a 3.11.1 Query Expansion and Term Reweighting for
GQ in
termweight the Vector Model

Define :
GQ. What are the three classic and similar ways to calculate the modified
query qm?
Weight: term in the set K=/kj, ., k, |
be a genericindex Ideal case C, : the complete set C, of relevant documents to a given
Letthe k,
associated with each index term k; of ia
A weight w;; >Ois

documentindex term
vector :
with an index term vector d,
documem query q
the best query vector is presented by

document d, is associated represe


the 4opt = d, ..(3.11.6)
N-IC,I
by vd,e C
W.j) ..(3.1
d = (W|, j W2.j , The relevant documents C, are not known a priori, should be looking for
them
the term weighting :
N 3 classic and similar way to calculate the modified query are
Wi.j = f j xlog n; . «3.11 Standard_Rochio:

the normalized frequency :


d, ...(3.11.7)
freqi.j
max, freq; j . (3.11: vd,e C
document d, Ide_Regular :
freq, ; be the raw frequency of k, in the M
inverse document frequency for k; : = aq +B ...(3.11.8)
d, -Y d,
idf, = log ..(3.114 Vd,e D,
Ide Dec Hi:
the query term weight : |Z
0.5 fre4i q
Wi.q 0.5 +
max, freq1, q. ...(.3.115 Ym = aq +B d;-Y maxnon - relevant d, ) ..(3.1 1.9)
Vd,e D,
query vector :
The D, and D, are the document sets which the user judged
query vector q is defined as
The Rochio formulation is basically a direct adaptation of
D,: set of relevantdocuments identified by the user Equation (3.1 1.6) in which the terms of the original query are added in.
D,: set of non-relevant documents among the retrieved documents
C,: set of relevant documents among all documents in the Advantages : Simplicity and good result
collection
a. B. y :uning constants Disadvantages : No optimality criterion is adopted

(New Syll wefacadernic year 22-23) (M7-87) Tech-Neo Publication (New Syll. w.e.f academic year 22-23) (M7-87) ech.Neo Publications
Retrieval
Systen(MU)
THE
REWEIGHTINGFOR
Infomaton
no (319)
Inlonation Retrieval Systen (MU) (Query Proc &Operatons) Pg
PROBABILISTICMODEL
TERM
3.12
reweightingin Probabilistic
GQ. How
doyou
Similarity:
calculatethe term
the correlation betweenthe vectors d. and this coTrela MMode The similarity of d, to q:
n,-ID,|
sim (d, q) = 2 Wi.gw,j log (ID,I-ID,, !'N-ID, I- (n, - |D, D)
quantifiedas: i=1
canbe d, oq
..3.12.5)

sim(d, q) = There is no query expansion occurs in the procedure


Theprobabilistic model accordingto the probabilistic ranking princip Adjustment factor
adjustment
the term k; in Because of ID, Iand | D, ;lare certain small, take a 0.5
R): The probability of observing the ser R factor added to the P (k; IR) and P (k,IR)
P(k;|
relevant document
ID,.;|+0.5 ..3.12.6)
of observing the term k, in the set R. P(k, IR) =
P(ki | R) : the probability ID,l+ 1
non-relevant document
n-|D, |+0.5
be expressed ae ...(3.12.7)
document dj to a queryq can P(k; IR) = N-ID,I +1
The similarity of a
P(k; IR) P(k; IR) Alternative adjustment factor n,/N
+ log
sim(4, q) « 2v
Wi.j log81-P(k,|R)
1-P (k;I R) . 3121
P (k; IR) = ..(3.12.8)
For the initial search ID,I+1
assumptions
estimated above equation by following
P (k; |R) = 0.5 P(k;IR) =
n-1D,iltN ..3.12.9)
N-ID,I+1
P(kIR) = N S Advantages
n; is the number of documents which contain the index term k: get (1) Feedback process is directly related to the derivation of new weights for
For the feedback search query terms and that the term reweighting is optimal under the
The P (k;l R) and P (k; IR) can be approximated as: assumptions of term independence and binary document indexing.

D0sadvantages
P(k;IR) = N-ID,I ..(3.12.3
(1) Document term weights are not taken into account during the feedback
ID, loop.
P(k;|R) = ..(3.12.4)
ID,I (2) Weights of terms in the previous query formulations are also
the D, is the set of relevant documents according to the disregarded.
user judgement
the D,, is the subset of D, composed of the (3) No query expansion is used.
documents contain the term
(New Syll wefacadermic year 22-23) (M7-87) Tech-Neo Publications (New Syll. w.e.f academic year 22-23) (M7-87) Tech-Neo Publications
VARIANTOF PI RM
3.13 A
REWEIGHTING
M Infonation Retrieval System (MU) (Query Proc. &Operations) Pg. no. (3-21)

probabilistic term reweighting? 3.14 MULTIMEDIA IR


DisCUSSvariant of
GQ. weighting scheme by
Croft extended above suggesung IGQ. Discuss the architecture of Multimedia IR system.
1983, adapting the
initial search
methods and by
weights probabilistic iGQ. Give basic steps for data retrieval in Multimedia IR system.

include
within-document frequency
probabilisticterm
reweighting: forma iGQ. Write a short note on data retrieval in Multimedia IR system.

The architecture of a Multimedia IR system depends on tWO maun


The variant of
factors
ii.g
sim (d, q) « 2 Wi.q Wi,j
(1) Thc peculiar characteristics of multimedia data
i=1 (2) The kinds of operations to be performed on such data
the F,iis afactor which depends onthe riple [k, d, q).
Multimedia IR system support variety of data and different kinds of
formulations for the media
initial search and
UsIng distinct
searches fe tb () Text, images (both still and moving), graphs, and sound
(2) MiX of structured and unstructured data
Initial search :
(3) Metadata
Fjg = (C=idf,) fii
..313 (4) Semi-structured data

fij= K+(1 + K) max (f.i) (5) Data whose structure may not match, or only partially match, the
. .3.13 structure prescribed by the data schema
The f;,is a norma!ized within-document frequency Cand K (6) The system must typically extract some features from the
adjusted according to the collection should multimedia objects.
Feedback searches : Data retrieval

Exploiting data attributes and the content of multimedia objects


P (k, IR) 1-P(k, IR)
C+ log-P (k;lR}, + log
.(3.134 Basic steps for data retrieval :
P(k,IR) (1) Query specification
T Advantages Fuzzy predicates (Find all images similar to a car)
() Consider within document frequency. Content-based predicates (Find all objects containing an apple)
(2) Adopts a normalized Object attributes (Find all red images)
frequencies.
(3) Introduces constant C Structural predicates (Find all multimedia containing a video clip).
andK to provide greater
cS Disadvantages flexibility. (2) Query processing and optimization
() Query is parsed and compilcd into an intermal form
Consitutes more complex
formulation. (3) Query answer
(2) No query
expansion. The retrieved answers are returned to the user in decreasing order of
relevance
facademic year 22-23) (M7-87)
aTech-Neo Publicatiors (New Syll. w.e.f academic year 22-23) (M7-87) Tech-Neo Publications
(4)Queryiteratlon Inlomaton Retrieval Systern (MU) (Query Proc. &Operations) Pg. no. (-23)
iterateduntil|the user is
cexecutionis
Thiequery technology
satisfied
andIR and related qucry languages
CombineDBMS types
modeling
capabilities Define abstract data types to allow one to define ad hoc data
DBMS:Data for multimedia dala.
Similarity-basedI query capabilities
IRsystem: Multimedia data representation inside the system
MODELING Using attributes is not sufficient to describe data
3.15 DATA Information extracted from objects to use during query processing
M
Multimedia IR Multimedia object is represented as a set of features
therole of data modelingin system,
GQ. Explain Features can be assigned manually, automatically, or using a hybrd
modeling are approach
Main tasks in data Values of some specific features are assigned to an object by
(1) Adata modelshould be defined by which the user can specify the
comparing the object with some previously classified objects
system Feature extraction cannot be precise
to be stored into the
types
conventional and multimedia data Aweight is usually assigned to each feature value representing the
o Support uncertainty of assigning such a value to that feature
query such data
Provide methods to analyze, retrneve, and
(2) Provide a model for the intermal representation of multimedia data For example, 80% sure that a shape is a square

Object-oriented DBMS 3.16 SQL3


Provide rich data model
data I GQ. Explain the role of SQL3 in Multimedia IR system.
More suitable for modeling both multimedia types and th Write a short note on MULTOS Data Model.
semantic relationships GQ. Explain MULTOS Data Model with proper example.
Class in 00DBMS is characterized by both attributes and
Support extensible type system
operations
Provide constructs to define user-dependent abstract data types, in
" Classes are also related to inheritance hierarchies hence multimedia cla an object-oriented like manner
is aspecialization of one or more super classes
Drawback of 00DBMSs
Provides three types of Collection data types
Sets, multisets, and lists
The performances of storage techniques, query
processing, an The elements of acollection must have compatible types
transaction management is not comparable to that of relational DBMS
Highly non-standard Provides a restricted form of object identifier that supports sharing and
avoids data duplication.
Object-relational DBMS
Extend the relational model Example : MULTOS (MULTimedia Office Server)
Multimedia document Server
Represent complex data types Client/server
o Maintain the performance and the simplicity of Support filing and retrieval of multimedia objects
relational DBMS
Syll wefacademic year 22-23) (M7-87)
Tech-Neo Publications (New Syl. w.e.f academic year 22-23) (M7-87) Tech-Neo Publications
Infonahon Retrev
no (325)
descnhed by: (Query Proc 8 Operations) Pa
chapter. ... Information Retrieval Systern (MU)
Duentsare into,
structure title, frannes, .
logwal Image data in MULTOs
structure:pages, content-based queries
lavout structure: allows Analysis
conceptual structures are grouped into
conceptual low level :detect objects and positions
in
Docs similar
npes
Example:Generic_Letter
concep high level : image interpretation
Result of analysis
Conceptual
structure
of
Business_Product_Letter is shown
the
in Fig.
type
Generic_LetFitegr 3.
3.16.1 and description of objects found and their classes
certainty values
respectively Docurment Indices are used for fast access to this info
Object index. Includes pointers to objects and certainty values
Cluster index, with fuzzy clusters of similar images
Place
Date Receivert Sender
Letter_body 3.17 WILDCARD QUERIES
Address in
Name Address Name
It supports regular expressions and pattern matching-based searching
text.
Street City Country Rerieval models do not directly support for this query type.
Street City Country
the type Generic Letter In IR systems, certain kinds of wildcard search support may be
Fig. 3.16.1: Conceptual structure of
Document implemented.
Example : Usualy words ending with trailing characters (for example,
'data*' would retrieve data, database, datapoint, dataset, and so on).
Date Receiver+ Sender Letterbody Providing support for wildcard searches in IR Systems involves
Place
preprocessing over-head and is not considered worth the cost by many
Web search engines today.
Name Address Name Address
3.18 GENERAL QUESTIONS
Street City Country Street City Country Q. 1 What are the advantages and disadvantages of query processing?
Company_Logo V Ans. :
Imago Signature Advantages
Product_Presentation
Text Product_Cose
Text () It is simple : The fact that the modified term weights are computed
Product Description
Text
directly from the set of retrieved documents.

Fig. 3.16.2: Complete conceptual structure of the type (2) It gives good resuits : Observed experimentally and are due to the fact
Business_Product Letter that the modified query vector does reflect a portion of the intended
query semantics.
(New Syll wefacadem1c year 22 23)(M7-87)
ech -Neo Publications (New Syll. w.e.f academic year 22-23)(M7-87) Tech-Neo Publications
Intomat letrevaliysten

Intonation Retrioval Systerm (MU) (Query Proc. &Operations) Pg no (27)


Dieadvantages
Foodback mothod. dicriminat1ng measure for a term
) Nooptiahty Helovanco (3) Invcrse Docunent Frequency (idf): A
prooossforUsor iin collcction, i.e, how disciminating lerm i s. (idf 1) =log10n 7 d).
02 Eyplainthe where n is the number of docurment.
stalegy.
M Ans. : lornulation Chapter Ends
popularqucry prescnted
Iisthe
mNSt
cyclc, the uscr with a
relevancefecdback
In a
retrieveddocuments relevant.
those whichare
them,marks examined.
Thenexamine documentsare
20 ) ranked to
Only to I0(or cxpression, allached tlhe
Sclectingimportant
terns, or
of thesc
terns in a new query
documents
formulation
Enhancing the important
willbe
The newquery relevant
documents
Moved towards the
(1)
non-relcvant ones.
from the
(2) Away
Helevance Feedback
What are the
Advantages of User method?
a.3

V Ans. :
() It shields the user from the details of the query reformulation procea
relevance
because all the user has
documents.
to provide is a
judgement o
(2) It breaks down the whole searching task into a sequence of smal) step

which are easier to grasp.


designed to emphasize Some terms
(3) It provides a controlled process (non-relevant ones)
(relevant ones) and de-emphasize others
Q.4 Discuss the Parameters used in calculating a weight for a documen
term or query tem?

Ans. :
() Term Frequency (t) : Term Frequency is the number of times a termi
appears in document j (tfij).
(2) Document Frequency (df) : Number of documents a term iappears in
(dfi ).

wSyt wef academic year 22-23) (M7-87) Tech-Neo Publications

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy