0% found this document useful (0 votes)

8 views36 pages

5 The Term Vocabulary & Posting List

Uploaded by

Ajitesh Thawait

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views36 pages

5 The Term Vocabulary & Posting List

Uploaded by

Ajitesh Thawait

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

THE TERM

VOCABULARY AND
POSTINGS LISTS
INFORMATION RETRIEVAL
AND WEB SEARCH

LECTURE 8

Outline
 Elaborate basic indexing
 Preprocessing to form the term vocabulary
 Documents
 Tokenization
 What terms do we put in the index?

1
Recall the basic indexing pipeline

Documents to be indexed. Friends, Romans, countrymen.

Collect the documents to be indexed

Tokenizer
Token stream.
Tokenize the text Friends Romans Countrymen
Linguistic modules
Modified tokens. friend roman countryman
Do linguistic pre-processing of tokens
Indexer friend 2 4

Inverted index. roman 1 2

Index the documents that each term occurs in countryman 13 16
3

Introduction to
Information Retrieval

Obtaining the Character Sequence In A Document

2
What exactly is a document?
 Before indexing we need to understand
content of the digital document
 Documents are in digital formats so these are
called digital documents
 Is a input to an indexing process
 Is in the form of bytes in a file or may be on a web
server (type of file)

What exactly is a document?

 In order to work with documents
 We need to extract Character Sequence in the documents

. how it is stored on the machine)

Is a sequence of bytes in a file (that’s

Take this sequence of bytes and extract character sequence from it

This sequence of character is what we tokenize

3
What exactly is a document?

Byte Sequence
File/
Process into
Document
Character sequence

Tokenize

What exactly is a document?

•Need to understand certain properties of
document before we can fetch the character
sequence from the byte sequence

•So that comes under the

Parsing a document

4
Bytes in File

Parsing a Document Character Sequence

 Need to understand the format of the document

1) What is its format?
Documents comes in different formats
 PDF
Character sequence are extracted from
 Word
each of these kind of documents using
 Excel various techniques
 HTML etc...

Parsing a Document
2) What is its Language?
 Plain English, Hindi, Bengali etc.,
 The tokenization & linguistic pre-processing steps
depends on the language i.e.
 The kind of tokenization you do
 The kind of linguistic pre-processing you do

We should know the language of document in advance. So that

we can use appropriate scheme for tokenization

5
Parsing a Document
3) Which character set it is using?
 ASCII, Unicode UTF-8 or any other Vendor Specific Standard.

Based on these 3 task  need to figure out the formats, language and character set

is a

Classification Problem

In order to answer this questions

Solution is
11

Parsing a Document

6
Supervised Learning

 Classification: Machine is trained to classify something into

some class.
 Whether the class corresponds to
different formats of documents
different language
different character set

Classification tasks are often done heuristically

Character Sequence
Now we can say that we can extract Out of
Bytes in a File

Complications: Formats/Language

 Complications
 Documents being indexed can include documents from many
different languages.
 Example : combination of English & Hindi
 A single index may have to contain terms of several languages.
 Some documents in Hindi
 Some documents in French Need to
Tokenize
 Some documents in German and so on...
Linguistic Processes

Separately for documents belonging to different languages

7
Complications: Formats/Language
 Sometimes a document or its components can contain multiple
languages/formats
 English email with French pdf attachments

So these are some of the complication you have to deal with

if you
build an Information Retrieval System for web

This problem is usually solved by licensing a software library that

handles decoding documents formats and character encodings
All these discussed above are regarding extraction of
character sequence.

What is a Unit Document

 A file?
 An email? (Perhaps one of many in an mbox.)
 An email with 5 attachments?
 A group of files (PPT or LaTeX as HTML pages)

8
What is unit document?

 A file?
word file or PDF file or emails or ppt or so on...
 If there are 10 plays of Shakespeare then we could have 10
documents.
 Assign id’s to those documents

 A Email (Perhaps one of many in an mbox)?

 Traditional Unix (mbox-format) email file stores a sequence of
email messages (an email folder) in one file,
 But we need each email message as a separate document

Example of Documents

 An email with attachments?

 Many email messages now contain attached documents, and we
might then want to regard
 the email message and Total 6
as separate documents.
 each contained attachment documents are
being created
Single document into multiple files out of this
email
 If an email message has an attached zip file, you might want to
decode/unzip the zip file and regard each file it contains as a
separate document.

9
Example of Documents

 A group of files (PPT or Latex as HTML)

 If there are 30 slides in PPT.
 30 HTML pages are generated for these slides and
stored as separate files (here we don’t treat each HTML
page as a separate document)
 We might combine 30 pages into a single document

Multiple files into Single document

Example of Unit Document

 Lets say you have a huge book in PDF format

 Split book into individual chapters, and each chapter as a separate
document
Book

Chapter 1 Chapter 12 . . . . . . . . . Chapter n

Individual Documents

10
Unit Document

 Why you want to do such a thing ?

 Why would it makes sense sometimes to split an entire
book a number of documents instead of treating it as single
document?
 What are the advantages and disadvantages of doing that?

Lets go with an example to understand this

Issues of Indexing Granularity

 Granularity: the scale or level of detail in a set of data.

 For a collection of books, it would usually be a bad idea to

index an entire book as a document

11
Example (Issue)

 Consider we have a document (book) on “Middle Ages in

Europe”.
 That book contains the term such as
 Christ (appearing at many places)
 Also during this time there were
 Rise of University in Europe
 you would also see that University also appearing in some of those
chapters
 Search for terms “Christ” and “University”
 Above mentioned documents may be listed
We know that query has no relevance with the above document/book

Example (Issue)

 Such as for example if I go with query “Christ University”

 Will the “Middle Ages in Europe” book relevant to us Probably not

 Document size is high (too large)

 Precision : Low
 Recall: High

12
What could be done?

 Instead
 Index each chapter or paragraph as a mini-document.

 Matches are then more likely to be relevant

 Since the documents are smaller it will be much easier for

the user to find the relevant passages in the document.

 Document size is small

 Precision: High
 Recall : Low

13
Solution

 The problems with large document units can be alleviated

by use of explicit or implicit proximity search.

Looks for documents where two or more separately matching term

occurrences are within a specified distance

The number of intermediate words or characters

 An IR system should be designed to offer choices of

granularity.

Solution

 For this choice to be made well

 Person who is deploying the system must have
 A good understanding of the document collection
 The users and their information need and
 Usage patterns.

14
Vocabulary of Terms

TOKENS AND TERMS

Vocabulary of Terms

Tokenization
 Word – A delimited string of characters as it appears in the
text.
 Token – An instance of a word or term occurring in a
document.
 Example: Friends, Romans, Countrymen.
 Friends
 Roman Tokens
 Countrymen
 If there are two occurence of the word ‟Friends“ in the document then
there would be two tokens generated. Each having same string ‟Friends“

Token is output of the tokenizer

15
Vocabulary of Terms

Tokenization
 Term – A “normalized” word (case, morphology, spelling
etc); an equivalence class of words.
 Its an entry in the dictionary of inverted index
 So both instances of “Friends” map to the same term in the index
and that term would be “friend”

Term is what is being stored in dictionary

Vocabulary of Terms

Tokenization
 Input: “Friends, Romans and Countrymen”
 Output: Tokens Remove white space, punctuation marks (for input)
 Friends
 Romans
 Countrymen
 Each such token is now a candidate for an index entry, after
further processing
Linguistic processing
 But what are valid tokens to emit?

16
Vocabulary of Terms

Processing done on documents need to be done on query

Tokenization
How would you deal with this
 Issues in tokenization:
 Finland’s capital  How would you tokenize this word
Finland? Finlands? Finland’s?
The way you deal with apostrophes can impact performance of the system
This make sense
Aren’t Short form of “are not”
If we use same conversion as done above, we will be having “aren”

If your query is “aren’t”  would this document be returned

Yes, it would be, if query was processed in the same way it should be
as “aren’t” is also converted into “aren” So document containing “aren” would be in result
33

Tokenization
Kevin O’Brien Irish cricketer
Suppose we apply the same technique
We would get two separate tokens

Kevin O’Brien
As if you are tokenizing everything and removing the terms after O’

That would be disaster

So you can see that this type of generalization don’t work with names
Dealing with apostrophes is a non trivial problem 34

17
Vocabulary of Terms

Processing done on documents need to be done on query

Tokenization
How would you tokenize Hewlett-Packard

Question is how you deal with this hyphen (-)

 Hewlett-Packard  Hewlett and Packard as two

tokens? Will there be any consequence of splitting them

Query may contain Hewlett Packard (with a space)

Hewlett-Packard Document may

not be returned
(i.e., would not be
considered
35
relevant)

Tokenization
Suppose if Hewlett
Considered two separate tokens
Packard
then

Hewlett Packard
If query contains Hewlett - Packard
Documents
or would be
returned
Hewlett
Packard
36

18
Vocabulary of Terms

Processing done on documents need to be done on query

Tokenization
 Hewlett-Packard  Hewlett and Packard
as two tokens?
 state-of-the-art: break up hyphenated sequence.
 co-education You may want to preserve the hyphen in this case
 lowercase, lower-case, lower case ?
 It can be effective to get the user to put in possible hyphens

 San Francisco: one token or two?

 How do you decide it is one token?
You may have list of city names available to you

Both the starting letters are capitalized  may be a name of either a person or place
37

Vocabulary of Terms

Numbers
 3/20/91 Mar. 12, 1991 20/3/91
 55 B.C.
 B-52 Need to preserve the hyphens
 My PGP key is 324a3df234cb23e
 (800) 234-2333 Need to preserve the hyphens
 Often have embedded spaces
 Older IR systems may not index numbers
 But often very useful: think about things like looking up error
codes/stacktraces on the web 404 not found error
 (One answer is using n-grams)
 Will often index “meta-data” separately
 Creation date, format, etc.
38

19
Vocabulary of Terms

Tokenization: language issues

 French
 L'ensemble  one token or two?
 L ? L’ ? Le ?
 Want l’ensemble to match with un ensemble
 Until at least 2003, it didn’t on Google
 Internationalization!

 German noun compounds are not segmented

 Lebensversicherungsgesellschaftsangestellter
 ‘life insurance company employee’
 German retrieval systems benefit greatly from a compound splitter
module
 Can give a 15% performance boost for German
39

Vocabulary of Terms

Tokenization: language issues

 Chinese and Japanese have no spaces between
words:
 莎拉波娃现在居住在美国东南部的佛罗里达。
 Not always guaranteed a unique tokenization
 Further complicated in Japanese, with multiple
alphabets intermingled
 Dates/amounts in multiple formats
フォーチュン500社は情報不足のため時間あた$500K(約6,000万円)

Katakana Hiragana Kanji Romaji

End-user can express query entirely in hiragana! 40

20
Vocabulary of Terms

Tokenization: language issues

 Arabic (or Hebrew) is basically written right to left,
but with certain items like numbers written left to
right
 Words are separated, but letter forms within a word
form complex ligatures

 ← → ←→ ←
 ‘Algeria achieved its independence in 1962 after 132
years of French occupation.’
 With Unicode, the surface presentation is complex, but the
stored form is straightforward 41

Vocabulary of Terms

Stop words
 With a stop list, you exclude from the dictionary
entirely the commonest words. Intuition:
 They have little semantic content: the, a, and, to, be
 There are a lot of them: ~30% of postings for top 30 words

Dropping common words: a, an, and, are, as, ......

have
Little value in helping select the documents
 General Strategy for determining a stop word list is
to sort the terms by Collection Frequency
No. of times the termSo this is the
‘t’ appears third
in the frequency42
document

21
Stop words

When you build a index you can also keep track of

Collection Frequency

along with

Document Frequency

Term Frequency

Vocabulary of Terms

Stop words

 But the trend is away from doing this:

 Good compression techniques means the space
for including stop words in a system is very small
 Good query optimization techniques mean you
pay little at query time for including stop words.
 You need them for:
 Phrase queries: “King of Denmark”
 Various song titles, etc.: “Let it be”, “To be or not to
be”
 “Relational” queries: “flights to London”
44

22
Vocabulary of Terms

Normalization to terms
 We need to “normalize” words in indexed text as
well as query words into the same form
 We want to match U.S.A. and USA
 Result is terms: a term is a (normalized) word type,
which is an entry in our IR system dictionary
 We most commonly implicitly define equivalence
classes of terms by, e.g.,
 deleting periods to form a term
 U.S.A., USA  USA
 deleting hyphens to form a term
 anti-discriminatory, antidiscriminatory  antidiscriminatory
45

Vocabulary of Terms

Normalization is heavily language dependent

Normalization: other languages
 Accents: e.g., French résumé vs. resume.
 Umlauts: e.g., German: Tuebingen vs. Tübingen
 Should be equivalent
 Most important criterion:
 How users like to write their queries for these words?

 Even in languages that standardly have accents, users

often may not type them
 Often best to normalize to a de-accented term
 Tuebingen, Tübingen, Tubingen  Tubingen

23
Vocabulary of Terms

Normalization: other languages

 Normalization of things like date forms
 7月30日 vs. 7/30
 Japanese use of kana vs. Chinese characters

 Tokenization and normalization may depend on the

language and so is intertwined with language
detection
Is this
Morgen will ich in MIT … German “mit”?

 Crucial: Need to “normalize” indexed text as well as

query terms into the same form
47

Vocabulary of Terms

Case folding
 Reduce all letters to lower case
 exception: upper case in mid-sentence?
 e.g., General Motors
 Fed vs. fed (Federal Reserve System )
 SAIL vs. sail
 Often best to lower case everything, since
users will use lowercase regardless of
‘correct’ capitalization…

 Google example:
 Query C.A.T.
 #1 result is for “cat” (well, Lolcats) not
Caterpillar Inc. 48

24
Vocabulary of Terms

Normalization to terms

 An alternative to equivalence classing is to do

asymmetric expansion (Query expansion)
 An example of where this may be useful
 Enter: window Search: window, windows
 Enter: windows Search: Windows, windows, window
 Enter: Windows Search: Windows
 Potentially more powerful, but less efficient

Vocabulary of Terms

Thesauri and soundex

 Do we handle synonyms and homonyms?
 E.g., by hand-constructed equivalence classes
 car = automobile color = colour
 We can rewrite to form equivalence-class terms
 When the document contains automobile, index it under car-
automobile (and vice-versa)
 Or we can expand a query
 When the query contains automobile, look under car as well
 What about spelling mistakes?
 One approach is soundex, which forms equivalence classes
of words based on phonetic heuristics
 Will see in coming lectures
50

25
Vocabulary of Terms

Lemmatization
 Reduce inflectional/variant forms to base form
 Lemmatization is derived from a word Lemma  which refers
to root form of a particular word
 This is sophisticated NLP technique
 E.g.,
 am, are, is  be
 car, cars, car's, cars'  car
 the boy's cars are different colors  the boy car be different
color
Plural forms are converted into singular form
 Lemmatization implies doing “proper” reduction to dictionary
headword form 51

Vocabulary of Terms

Stemming Is more crude form of normalization

 Reduce terms to their “roots” before indexing

 “Stemming” suggest crude affix chopping
 language dependent
 e.g., automate(s), automatic, automation all reduced to
automat.

for example compressed for exampl compress and

and compression are both compress ar both accept
accepted as equivalent to as equival to compress
compress.
52

26
Vocabulary of Terms

Porter’s algorithmDeveloped by Martin Porter

 Commonest algorithm for stemming English
 Results suggest it’s at least as good as other stemming
options

 The algorithm has 5 phases of reductions

 phases applied sequentially
 each phase consists of a set of commands
 sample convention: Of the rules in a compound command,
select the one that applies to the longest suffix.

FASTER POSTINGS MERGES:

SKIP POINTERS/SKIP LISTS

27
Skip Pointers

Faster postings merges via Skip pointers/Skip lists

 Extension to posting list data structures
 Way to increase the efficiency of using
posting lists.

Skip Pointers

Recall basic merge

 Walk through the two postings simultaneously, in
time linear in the total number of postings entries

2 4 8 41 48 64 128 Brutus
2 8
1 2 3 8 11 17 21 31 Caesar

If the list lengths are m and n, the merge takes O(m+n)

operations.

Can we do better?
Yes (if index isn’t changing too fast). 56

28
Skip Pointers

Recall basic merge

Can we do better?
Yes (if index isn’t changing too fast).
 i.e.,
 There are not new entries been added or
deleted from the posting list

 Use skip list by augmenting posting lists

with skip pointers (at indexing time)
57

Skip Pointers

Look into an example

 Skip pointer is a pointer that points from a particular
node to some other node far ahead in the same list.
41 128

2 4 8 41 48 64 128

11 31
1 2 3 8 11 17 21 31

29
Skip Pointers

Benefits of adding skip pointers

 Let see how can we use these skip pointers to
increase our search and how do we add them
41 128
2 4 8 41 48 64 128

Intervening results Not useful for the answer

Skip Pointers

Augment postings with skip pointers

(at indexing time)
p1 p1 p1 p1

2 4 8 41 48 64 128
2 8

1 2 3 8 11 17 21 31

p2 p2 p2 p2 p2 And so on....
We are looking into the intermediate results between 11 & 31

 Two question need to be answered

 Where do we place skip pointers?
 How to do efficient merging using skip pointers? 60

30
Skip Pointers

Where do we place skips?

How many skip pointers should we add

 Tradeoff:
 More skips  shorter skip spans  more likely to skip.
But lots of comparisons to skip pointers.
 Fewer skips  few pointer comparison, but then long skip
spans  few successful skips.

Skip Pointers

Placing skips
 Simple heuristic: for postings of length L, use L
evenly-spaced skip pointers.
i.e., if total length of posting list is L, use L evenly-spaced skip
pointers.
L L L L L

 Easy if the index is relatively static; harder if L keeps

changing because of updates.
Best is static
Deleting/inserting elements 62

31
Skip Pointers

Important Points
 If Index is small  entirely fits into Memory (both
dictionary & posting list can fit into main memory)

 If corpus size is large  posting may have to be

stored on disk, while dictionary is kept in memory.

 If you index is entirely in memory  using skip

pointers will help
 Because you will end up doing fewer no. of operations to
transverse a particular posting list, if you follow the skip
pointers.
63

Skip Pointers

Skips
 Only AND queries.
 Does not work with OR queries. Why?

32
Skip Pointers

Algorithm: Postings lists intersection with skip pointers

Skip Pointers

Exercise Problems
 Problem 1:
 We have two-word query. For one term the postings list
consists of the following 16 entries
[4, 6, 10, 12, 14, 16, 18, 20, 22, 32, 47, 81, 120,122, 157, 180]
and for the other it is the one entry posting list
[47]
Workout how many comparisons would be done to intersect
the two posting lists with the following two strategies. Briefly
justify your answer.
(a) Using standard posting list.
(b) Using posting lists stored with skip pointers, with a skip
length of L
66

33
Skip Pointers

Problem 1 Solution
 (a) The no. of comparisons would be 11 as shown
 (4,47), (6,47), (10,47), (12,47), (16,47), (18,47), (20,47),
(22,47), (32,47), (47,47)
 (b) Total length of posting L=16
 Skip length L = 16 = 4
 4  14, 14  22, 22  120, 120  180
14 22 120

4 6 10 12 14 16 18 20 22 32 47 81 120

180

122 157 180

Skip Pointers

Problem 1 Solution
14 22 120

4 6 10 12 14 16 18 20 22 32 47 81 120

180

122 157 180

 14 < 47, 22 < 47 & 120 > 47

 So there will be no comparisons after (32,47) and (47,47)

 No. of comparisons would be 6

 (4,47), (14,47), (22,47), (120,47), (32,47), (47,47)

34
Skip Pointers

Problem 2

Skip Pointers

Problem 2 Solution
 (a) The skip pointers is followed only once, 24  75

 (b) 18 posting comparisons will be made by the algorithm in

total (with skip pointers)
 (3,3), (5,5), (9,89), (15,89), (24,89), (75,89), (92,89), (81,89), (84,89),
(89,89), (95,92), (95,115), (95,96), (97,96), (97,97), (99,100), (100,100),
(101,115)

 (c) 19 posting comparisons would be made if the posting lists

are intersected without the use of skip pointers
 (3,3), (5,5), (89,9), (89,15), (89,24), (89, 39), (89,60), (89,68), (89,75),
(89,81), (89,84), (89,89), (95,92), (95,96), (97,96), (97,97), (99,100),
(100,100), (101,115)
70

35
Skip Pointers

Assignment - II
 Why are skip pointers not useful for queries of the form
x OR y?
 Exercise 1.6, 1.11, 1.9, 2.2, 2.3

Gottlieb-Subtitling A New University Discipline
100% (4)
Gottlieb-Subtitling A New University Discipline
10 pages
Best English Grammar
No ratings yet
Best English Grammar
44 pages
Lecture 3-Term Vocabulary and Posting Lists
No ratings yet
Lecture 3-Term Vocabulary and Posting Lists
26 pages
Term Vocabulary and Postings List
No ratings yet
Term Vocabulary and Postings List
64 pages
Text Processing, Tokenization & Characteristics
100% (1)
Text Processing, Tokenization & Characteristics
89 pages
6_2018_09_11!11_16_16_AM
No ratings yet
6_2018_09_11!11_16_16_AM
101 pages
text-processing
No ratings yet
text-processing
114 pages
3. text-processing
No ratings yet
3. text-processing
70 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
IRS Chapter 2
No ratings yet
IRS Chapter 2
57 pages
Lecture 3 - Terms, Postings, Dictionaries, and Tolerant Retrieval
No ratings yet
Lecture 3 - Terms, Postings, Dictionaries, and Tolerant Retrieval
77 pages
lecture2-indexing
No ratings yet
lecture2-indexing
78 pages
Rubrics For Written Activity
No ratings yet
Rubrics For Written Activity
1 page
Lec 19
No ratings yet
Lec 19
60 pages
Chapter -2 Text operation( Lecture 2.1)
No ratings yet
Chapter -2 Text operation( Lecture 2.1)
63 pages
Lecture 2: Datastructures and Algorithms For Indexing: Information Retrieval Computer Science Tripos Part II
No ratings yet
Lecture 2: Datastructures and Algorithms For Indexing: Information Retrieval Computer Science Tripos Part II
47 pages
CSCI 7000 Modern Information Retrieval: Lecture 1: Introduction
No ratings yet
CSCI 7000 Modern Information Retrieval: Lecture 1: Introduction
16 pages
Chapter 2 Part 1 & 2
No ratings yet
Chapter 2 Part 1 & 2
58 pages
Chapter 3 IR
No ratings yet
Chapter 3 IR
56 pages
Lecture2 Dictionary
No ratings yet
Lecture2 Dictionary
62 pages
lecture2-dictionary
No ratings yet
lecture2-dictionary
37 pages
Lecture2-Dictionary - Term Vocabulary and Postings Lists ch2 and ch4
No ratings yet
Lecture2-Dictionary - Term Vocabulary and Postings Lists ch2 and ch4
33 pages
CSE 435/535 Information Retrieval: Chapter 2: Tokenization, Stemming, Lemmatization
No ratings yet
CSE 435/535 Information Retrieval: Chapter 2: Tokenization, Stemming, Lemmatization
48 pages
MSC IR 2021
100% (1)
MSC IR 2021
188 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
34 pages
chapter2-MA212-Indexing+&+Preprocessing
No ratings yet
chapter2-MA212-Indexing+&+Preprocessing
68 pages
2-Text Operations_new
No ratings yet
2-Text Operations_new
39 pages
02 Text Operation
No ratings yet
02 Text Operation
52 pages
CL_lec 6
No ratings yet
CL_lec 6
28 pages
Information Retrieval Systems Chap 2
67% (3)
Information Retrieval Systems Chap 2
60 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
69 pages
IR Lec03 Vocabulary Postings List
No ratings yet
IR Lec03 Vocabulary Postings List
28 pages
IR Chapter 2 Text Operations
No ratings yet
IR Chapter 2 Text Operations
25 pages
Win Skills Teacher's Guide 4ème
No ratings yet
Win Skills Teacher's Guide 4ème
150 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
2T-Inverted Index
No ratings yet
2T-Inverted Index
54 pages
2 Text Operations
No ratings yet
2 Text Operations
32 pages
Unit I
No ratings yet
Unit I
83 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
57 pages
2-Boolean IR and Indexing
No ratings yet
2-Boolean IR and Indexing
46 pages
2.boolean Retrieval Model
No ratings yet
2.boolean Retrieval Model
40 pages
Lecture1-Intro - Realted To Ch1
No ratings yet
Lecture1-Intro - Realted To Ch1
60 pages
C2 Dictionary
No ratings yet
C2 Dictionary
6 pages
Lecture01 Intro
No ratings yet
Lecture01 Intro
45 pages
1-Getting Started With ELK
No ratings yet
1-Getting Started With ELK
44 pages
1. 2_text Operation_1 (2)
No ratings yet
1. 2_text Operation_1 (2)
28 pages
lec5
No ratings yet
lec5
22 pages
Lecture 2 - Boolean Retrieval
No ratings yet
Lecture 2 - Boolean Retrieval
49 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
S2-18-SS ZG537-L1
No ratings yet
S2-18-SS ZG537-L1
47 pages
chap2part2
No ratings yet
chap2part2
20 pages
6 The Term Vocabulary & Posting List
No ratings yet
6 The Term Vocabulary & Posting List
19 pages
IR-Lec1 - Ch1-2023
No ratings yet
IR-Lec1 - Ch1-2023
41 pages
03 -Lect3 search engines-part2
No ratings yet
03 -Lect3 search engines-part2
32 pages
Lecture 3-Term Vocabulary and Posting Lists
No ratings yet
Lecture 3-Term Vocabulary and Posting Lists
38 pages
irs unit-ii-notes
No ratings yet
irs unit-ii-notes
18 pages
4_Indexing (2)
No ratings yet
4_Indexing (2)
29 pages
A General History of Horology Turner instant download
100% (2)
A General History of Horology Turner instant download
72 pages
Chapter 4 - Processing Text
No ratings yet
Chapter 4 - Processing Text
7 pages
AI6122 Topic 3.1 - Index
No ratings yet
AI6122 Topic 3.1 - Index
40 pages
Grammar Right Level E
No ratings yet
Grammar Right Level E
19 pages
Unit-Ii Notes
No ratings yet
Unit-Ii Notes
17 pages
(Ebook) Sir Thomas Elyot as Lexicographer by Gabriele G. Stein ISBN 9780199683192, 0199683190 2024 Scribd Download
100% (1)
(Ebook) Sir Thomas Elyot as Lexicographer by Gabriele G. Stein ISBN 9780199683192, 0199683190 2024 Scribd Download
52 pages
Contact & Credits: A Graduate Course - Spring 2006
No ratings yet
Contact & Credits: A Graduate Course - Spring 2006
6 pages
Pianist Must Haves
100% (6)
Pianist Must Haves
96 pages
Grade 6 Holiday Work
No ratings yet
Grade 6 Holiday Work
11 pages
Explain Text Operation
No ratings yet
Explain Text Operation
6 pages
Regular and Irregular Verbs: Simple Past, Present and Future Tenses
No ratings yet
Regular and Irregular Verbs: Simple Past, Present and Future Tenses
11 pages
Clases Inglés I
No ratings yet
Clases Inglés I
50 pages
MC Module 1 Notes
No ratings yet
MC Module 1 Notes
20 pages
3nd Partial Material English IV
No ratings yet
3nd Partial Material English IV
22 pages
Homework Help With Metaphors
100% (1)
Homework Help With Metaphors
8 pages
Homework Name: Date: Course:: Verb To-Be, Affirmative, Negative, Interrogative
100% (1)
Homework Name: Date: Course:: Verb To-Be, Affirmative, Negative, Interrogative
1 page
Tamucc Hesi
No ratings yet
Tamucc Hesi
2 pages
Unit 3 Practice Test
No ratings yet
Unit 3 Practice Test
7 pages
Step by Step (Pdfdrive)
No ratings yet
Step by Step (Pdfdrive)
71 pages
ELL Case Study
No ratings yet
ELL Case Study
11 pages
B1 B2 English Summary and Extended Test
No ratings yet
B1 B2 English Summary and Extended Test
3 pages
ENGLISH CLASS Adverbs of Frequency
No ratings yet
ENGLISH CLASS Adverbs of Frequency
5 pages
Clase San Valentin
No ratings yet
Clase San Valentin
5 pages
Review Unit 1 - Tlu - SC1. - ML1
No ratings yet
Review Unit 1 - Tlu - SC1. - ML1
8 pages
ANALYTICAL PAPRAGRAPH
No ratings yet
ANALYTICAL PAPRAGRAPH
5 pages
Career Objective:: T.Mani
No ratings yet
Career Objective:: T.Mani
2 pages
Guidelines English I
No ratings yet
Guidelines English I
2 pages
Grade 12 - 2022-2023 - Vocabulary Supplement 11 - Answers & Explanation
No ratings yet
Grade 12 - 2022-2023 - Vocabulary Supplement 11 - Answers & Explanation
3 pages
Thk2e BrE L0 Vocabulary Standard Unit 3
No ratings yet
Thk2e BrE L0 Vocabulary Standard Unit 3
2 pages
Written Excercise 2 (1438)
No ratings yet
Written Excercise 2 (1438)
2 pages
Review of Units 1,2,3 and 4
No ratings yet
Review of Units 1,2,3 and 4
1 page
Schematron: A language for validating XML
From Everand
Schematron: A language for validating XML
Erik Siegel
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.