0% found this document useful (0 votes)

9 views11 pages

Intro Notes

Intro notes to coa

Uploaded by

Khushi Gupta 7027

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views11 pages

Intro Notes

Intro notes to coa

Uploaded by

Khushi Gupta 7027

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Information Retrieval

and Web Search

Introduction

Information Retrieval
(IR)
• The indexing and retrieval of textual
documents.
• Searching for pages on the World Wide
Web is the “killer app.”
• Concerned firstly with retrieving relevant
documents to a query.
• Concerned secondly with retrieving from
large sets of documents efficiently.

Typical IR Task

• Given:
– A corpus of textual natural-language
documents.
– A user query in the form of a textual string.
• Find:
– A ranked set of documents that are relevant to
the query.

1
IR System

Document
corpus

Query IR
String System

1. Doc1
2. Doc2
Ranked 3. Doc3
Documents .
.

Relevance

• Relevance is a subjective judgment and may

include:
– Being on the proper subject.
– Being timely (recent information).
– Being authoritative (from a trusted source).
– Satisfying the goals of the user and his/her
intended use of the information (information
need).

Keyword Search

• Simplest notion of relevance is that the

query string appears verbatim in the
document.
• Slightly less strict notion is that the words
in the query appear frequently in the
document, in any order (bag of words).

2
Problems with Keywords

• May not retrieve relevant documents that

include synonymous terms.
– “restaurant” vs. “café”
– “PRC” vs. “China”
• May retrieve irrelevant documents that
include ambiguous terms.
– “bat” (baseball vs. mammal)
– “Apple” (company vs. fruit)
– “bit” (unit of data vs. act of eating)
7

Beyond Keywords

• We will cover the basics of keyword-based

IR, but…
• We will focus on extensions and recent
developments that go beyond keywords.
• We will cover the basics of building an
efficient IR system, but…
• We will focus on basic capabilities and
algorithms rather than systems issues that
allow scaling to industrial size databases.
8

Intelligent IR

• Taking into account the meaning of the

words used.
• Taking into account the order of words in
the query.
• Adapting to the user based on direct or
indirect feedback.
• Taking into account the authority of the
source.

3
IR System Architecture

User Interface
Text
User
Text Operations
Need
Logical View
User Query Database
Feedback Operations Indexing
Manager
Inverted
file
Query Searching Index
Text
Ranked Retrieved Database
Docs Ranking Docs
10

IR System Components
• Text Operations forms index words (tokens).
– Stopword removal
– Stemming
• Indexing constructs an inverted index of
word to document pointers.
• Searching retrieves documents that contain a
given query token from the inverted index.
• Ranking scores all retrieved documents
according to a relevance metric.

IR System Components (continued)

• User Interface manages interaction with the
user:
– Query input and document output.
– Relevance feedback.
– Visualization of results.
• Query Operations transform the query to
improve retrieval:
– Query expansion using a thesaurus.
– Query transformation using relevance feedback.

4
Web Search

• Application of IR to HTML documents on

the World Wide Web.
• Differences:
– Must assemble document corpus by spidering
the web.
– Can exploit the structural layout information
in HTML (XML).
– Documents change uncontrollably.
– Can exploit the link structure of the web.

Web Search System

Web Spider Document

corpus

Query IR
String System

1. Page1
2. Page2
3. Page3
Ranked
. Documents
.

Other IR-Related Tasks

• Automated document categorization

• Information filtering (spam filtering)
• Information routing
• Automated document clustering
• Recommending information or products
• Information extraction
• Information integration
• Question answering
15

5
History of IR

• 1960-70’s:
– Initial exploration of text retrieval systems for
“small” corpora of scientific abstracts, and law
and business documents.
– Development of the basic Boolean and vector-
space models of retrieval.
– Prof. Salton and his students at Cornell
University are the leading researchers in the
area.

IR History Continued

• 1980’s:
– Large document database systems, many run by
companies:
• Lexis-Nexis
• Dialog
• MEDLINE

IR History Continued

• 1990’s:
– Searching FTPable documents on the Internet
• Archie
• WAIS
– Searching the World Wide Web
• Lycos
• Yahoo
• Altavista

6
IR History Continued

• 1990’s continued:
– Organized Competitions
• NIST TREC
– Recommender Systems
• Ringo
• Amazon
• NetPerceptions
– Automated Text Categorization & Clustering

IR History Continued

• 2000’s
– Link analysis for Web Search
• Google
– Automated Information Extraction
– Parallel Processing
• Map/Reduce
– Question Answering
• TREC Q/A track

IR History Continued

• 2000’s continued:
– Multimedia IR
• Image
• Video
• Audio and music
– Cross-Language IR
• DARPA Tides
– Document Summarization
– Learning to Rank

7
IR History Continued

• 2010’s
– Intelligent Personal Assistants
• Siri
• Cortana
• Google Now
• Alexa
– Complex Question Answering
• IBM Watson
– Distributional Semantics
– Deep Learning
22

Recent IR History

• 2020’s
– Large Language Models (LLM’s)
• ELMO
• BERT
• GPT 1, 2, 3
– ChatBots
• ChatGPT, GPT 4
• Reinforcement Learning from Human Feedback
(RLHF)

Related Areas

• Database Management
• Library and Information Science
• Artificial Intelligence
• Natural Language Processing
• Machine Learning

8
Database Management

• Focused on structured data stored in

relational tables rather than free-form text.
• Focused on efficient processing of well-
defined queries in a formal language (SQL).
• Clearer semantics for both data and queries.
• Recent move towards semi-structured data
(XML) brings it closer to IR.

Library and Information Science

• Focused on the human user aspects of

information retrieval (human-computer
interaction, user interface, visualization).
• Concerned with effective categorization of
human knowledge.
• Concerned with citation analysis and
bibliometrics (structure of information).
• Recent work on digital libraries brings it
closer to CS & IR.
26

Artificial Intelligence

• Focused on the representation of knowledge,

reasoning, and intelligent action.
• Formalisms for representing knowledge and
queries:
– First-order Predicate Logic
– Bayesian Networks
• Recent work on web ontologies and
intelligent information agents brings it
closer to IR.
27

9
Natural Language Processing

• Focused on the syntactic, semantic, and

pragmatic analysis of natural language text
and discourse.
• Ability to analyze syntax (phrase structure)
and semantics could allow retrieval based
on meaning rather than keywords.

Natural Language Processing:

IR Directions
• Methods for determining the sense of an
ambiguous word based on context (word
sense disambiguation).
• Methods for identifying specific pieces of
information in a document (information
extraction).
• Methods for answering specific NL
questions from document corpora or
structured data like FreeBase or Google’s
Knowledge Graph.
29

Machine Learning

• Focused on the development of

computational systems that improve their
performance with experience.
• Automated classification of examples
based on learning concepts from labeled
training examples (supervised learning).
• Automated methods for clustering
unlabeled examples into meaningful
groups (unsupervised learning).
30

10
Machine Learning:
IR Directions
• Text Categorization
– Automatic hierarchical classification (Yahoo).
– Adaptive filtering/routing/recommending.
– Automated spam filtering.
• Text Clustering
– Clustering of IR query results.
– Automatic formation of hierarchies (Yahoo).
• Learning for Information Extraction
• Text Mining
• Learning to Rank 31

S4D480_EN_Col21
No ratings yet
S4D480_EN_Col21
270 pages
The Eve of War
No ratings yet
The Eve of War
90 pages
Information Retrieval: DR Sharifullah Khan Nust Seecs
No ratings yet
Information Retrieval: DR Sharifullah Khan Nust Seecs
32 pages
Information Retrieval and Web Search
No ratings yet
Information Retrieval and Web Search
29 pages
1stunit GN
No ratings yet
1stunit GN
36 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Information Retrieval: Dr. Bassel ALKHATIB
No ratings yet
Information Retrieval: Dr. Bassel ALKHATIB
55 pages
Lecture1 Chap1
No ratings yet
Lecture1 Chap1
22 pages
1 IR Introduction
No ratings yet
1 IR Introduction
23 pages
UNIT I_ Introduction and Motivation
No ratings yet
UNIT I_ Introduction and Motivation
57 pages
Information Retrieval 1 Introduction To IR
No ratings yet
Information Retrieval 1 Introduction To IR
12 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
1.introduction Information Retrival
No ratings yet
1.introduction Information Retrival
31 pages
UNIT I IR Final
No ratings yet
UNIT I IR Final
26 pages
Ch2_IR and LT
No ratings yet
Ch2_IR and LT
45 pages
Chap 1
No ratings yet
Chap 1
23 pages
Cs8080 - Irt - Notes All
No ratings yet
Cs8080 - Irt - Notes All
281 pages
1_IR_Introductionn (1)
No ratings yet
1_IR_Introductionn (1)
30 pages
1520784495 Lec5 Ir Introduction
No ratings yet
1520784495 Lec5 Ir Introduction
37 pages
1_introIR
No ratings yet
1_introIR
15 pages
chapter 1 ir (1)
No ratings yet
chapter 1 ir (1)
37 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
Ir - Chapter 1
No ratings yet
Ir - Chapter 1
7 pages
IR_MOD1_NOTES
No ratings yet
IR_MOD1_NOTES
20 pages
Information Retrieval 1
100% (2)
Information Retrieval 1
12 pages
chapter one IR
No ratings yet
chapter one IR
18 pages
Jeppiaar Institute of Technology: Department OF Computer Science and Engineering
No ratings yet
Jeppiaar Institute of Technology: Department OF Computer Science and Engineering
24 pages
2 Mod-1_Lec-2
No ratings yet
2 Mod-1_Lec-2
58 pages
Chapter 1 Introduction To ISR
No ratings yet
Chapter 1 Introduction To ISR
39 pages
Part I IR VTU M Tech SSE
No ratings yet
Part I IR VTU M Tech SSE
72 pages
Cs8080irtunitinotes 220515215754 E06d144b
No ratings yet
Cs8080irtunitinotes 220515215754 E06d144b
43 pages
What is Information Retrieval (IR)
No ratings yet
What is Information Retrieval (IR)
15 pages
Introduction Information Retrieval
No ratings yet
Introduction Information Retrieval
73 pages
1-Introduction-MIR
No ratings yet
1-Introduction-MIR
35 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
42 pages
1 IR Chapter-One
No ratings yet
1 IR Chapter-One
47 pages
Information Retrieval Techniques(1)
No ratings yet
Information Retrieval Techniques(1)
59 pages
RetrivalChapter One
No ratings yet
RetrivalChapter One
30 pages
ch1_Information Retrieval Systems
No ratings yet
ch1_Information Retrieval Systems
52 pages
01 Introduction to ISR
No ratings yet
01 Introduction to ISR
34 pages
Information Retrieval Systems
No ratings yet
Information Retrieval Systems
46 pages
7 B - Query Languages
No ratings yet
7 B - Query Languages
33 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
1 IRIntro
No ratings yet
1 IRIntro
95 pages
Intelligent
No ratings yet
Intelligent
20 pages
Module 1print
No ratings yet
Module 1print
5 pages
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
No ratings yet
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
77 pages
Unit1 Introduction
No ratings yet
Unit1 Introduction
31 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
63 pages
1 IR Intro
No ratings yet
1 IR Intro
30 pages
UNIT-5_AI
No ratings yet
UNIT-5_AI
74 pages
Introduction
No ratings yet
Introduction
32 pages
Introduction To IR Chapter 01
No ratings yet
Introduction To IR Chapter 01
29 pages
Unit - I - IR
No ratings yet
Unit - I - IR
39 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Chapter 1 Introduction to IR
No ratings yet
Chapter 1 Introduction to IR
18 pages
IR Chapter 1&2
No ratings yet
IR Chapter 1&2
88 pages
Chap 1
No ratings yet
Chap 1
22 pages
NLP_M5_Part-1_SPP
No ratings yet
NLP_M5_Part-1_SPP
55 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
M.tech Thesis Topics in Vlsi Design
100% (3)
M.tech Thesis Topics in Vlsi Design
5 pages
MAD Android LAb Manual Chapter 3
No ratings yet
MAD Android LAb Manual Chapter 3
4 pages
Vlsi Design Styles
No ratings yet
Vlsi Design Styles
61 pages
ISO 03964-2016
No ratings yet
ISO 03964-2016
16 pages
(AB1403) Intermediate Excel Quiz 5 - Working With Data - ..
No ratings yet
(AB1403) Intermediate Excel Quiz 5 - Working With Data - ..
9 pages
LED Flasher
No ratings yet
LED Flasher
4 pages
OR - Chapter 2
No ratings yet
OR - Chapter 2
52 pages
Ojo PHD Thesis
No ratings yet
Ojo PHD Thesis
111 pages
Dijkstra's and A-Star in Finding The Shortest Path: A Tutorial
No ratings yet
Dijkstra's and A-Star in Finding The Shortest Path: A Tutorial
5 pages
PSP Final For Print
No ratings yet
PSP Final For Print
49 pages
eBook IT Handbook Kinaxis (1)
No ratings yet
eBook IT Handbook Kinaxis (1)
26 pages
Miter Bevel Gears: Section Contents
No ratings yet
Miter Bevel Gears: Section Contents
16 pages
42LF652V_SB-EX-SI_1463991388
No ratings yet
42LF652V_SB-EX-SI_1463991388
100 pages
CSC320-Wireless-Networking micro syllabus
No ratings yet
CSC320-Wireless-Networking micro syllabus
6 pages
Final Aruba Instant On Optical Guide - 040324
No ratings yet
Final Aruba Instant On Optical Guide - 040324
26 pages
Next Level ABAP Development: Creating Efficient Code: Sandor Van Der Neut
No ratings yet
Next Level ABAP Development: Creating Efficient Code: Sandor Van Der Neut
11 pages
Resume Format 2
No ratings yet
Resume Format 2
1 page
AutoCAD Level 4 study manual
No ratings yet
AutoCAD Level 4 study manual
14 pages
Water Softener Manual
No ratings yet
Water Softener Manual
24 pages
Dolly Invention Field
No ratings yet
Dolly Invention Field
21 pages
Computer Architecture Notes Sjit
0% (1)
Computer Architecture Notes Sjit
3 pages
Pci Dss v3 2 1 Saq A Compliance Standards
No ratings yet
Pci Dss v3 2 1 Saq A Compliance Standards
24 pages
01-MCC400 Single Line Diagram For TWPS MCC of TLM Plant
No ratings yet
01-MCC400 Single Line Diagram For TWPS MCC of TLM Plant
13 pages
Bizhub c200 Instalacion
No ratings yet
Bizhub c200 Instalacion
14 pages
Technique Library PDF
No ratings yet
Technique Library PDF
328 pages
View Deals: Busy Mom Ordering Using Whats App (Use Case Diagram)
No ratings yet
View Deals: Busy Mom Ordering Using Whats App (Use Case Diagram)
1 page
CHAP 1.the Demand For Audit and Other Assurance Services
No ratings yet
CHAP 1.the Demand For Audit and Other Assurance Services
11 pages
Lecture 1 Introduction PDF
No ratings yet
Lecture 1 Introduction PDF
46 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Intro Notes

Uploaded by

Intro Notes

Uploaded by

Information Retrieval

and Web Search

• Relevance is a subjective judgment and may

• Simplest notion of relevance is that the

• May not retrieve relevant documents that

• We will cover the basics of keyword-based

• Taking into account the meaning of the

IR System Components (continued)

• Application of IR to HTML documents on

Web Search System

Web Spider Document

Other IR-Related Tasks

• Automated document categorization

• Focused on structured data stored in

Library and Information Science

• Focused on the human user aspects of

• Focused on the representation of knowledge,

• Focused on the syntactic, semantic, and

Natural Language Processing:

• Focused on the development of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.