0% found this document useful (0 votes)

15 views4 pages

Search engines

Uploaded by

saifiimrankhan2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

Search engines

Uploaded by

saifiimrankhan2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Search engines

Search engines are a program that search documents for specified key words and returns a list of the
documents where the keywords were found. A search engine is really a general class of programs;
however, the term is often used to specifically describe systems like Google, Bing and Yahoo! Search
that enable users to search for documents on the World Wide Web.
Web Search Engines
Typically, Web search engines work by sending out a spider to fetch as many documents as possible.
Another program, called an indexer, then reads these documents and creates an index based on the
words contained in each document. Each search engine uses a proprietary algorithm to create its
indices such that, ideally, only meaningful results are returned for each query.
A concept search (or conceptual search) is an automated information retrieval method that is used to
search electronically stored unstructured text (for example, digital archives, email, scientific
literature, etc.) for information that is conceptually similar to the information provided in a search
query. In other words, the ideas expressed in the information retrieved in response to a concept search
query are relevant to the ideas contained in the text of the query.
Why concept Search?
Concept search techniques were developed because of limitations imposed by classical
Boolean keyword search technologies when dealing with large, unstructured digital collections of
text. Keyword searches often return results that include many non-relevant items (false positives) or
that exclude too many relevant items (false negatives) because of the effects
of synonymy and polysemy. Synonymy means that one of two or more words in the same language
have the same meaning, and polysemy means that many individual words have more than one
meaning.
Polysemy is a major obstacle (difficulty) for all computer systems that attempt to deal with human
language. In English, most frequently used terms have several common meanings. For example, the
word fire can mean: a combustion activity; to terminate employment; to launch, or to excite (as in fire
up). For the 200 most-polysemous terms in English, the typical verb has more than twelve common
meanings, or senses. The typical noun from this set has more than eight common senses. For the 2000
most-polysemous terms in English, the typical verb has more than eight common senses and the
typical noun has more than five.
In addition to the problems of polysemous and synonymy, keyword searches can exclude
inadvertently misspelled words as well as the variations on the stems (or roots) of words (for
example, strike vs. striking). Keyword searches are also susceptible to errors introduced by optical
character recognition (OCR) scanning processes, which can introducerandom errors into the text of
documents (often referred to as noisy text) during the scanning process.
A concept search can overcome these challenges by employing word sense disambiguation (WSD),
and other techniques, to help it derive the actual meanings of the words, and their underlying
concepts, rather than by simply matching character strings like keyword search technologies.

Use of Concept Search:

 eDiscovery - Concept-based search technologies are increasingly being used for Electronic
Document Discovery (EDD or eDiscovery) to help enterprises prepare for litigation. In
eDiscovery, the ability to cluster, categorize, and search large collections of unstructured text on a
conceptual basis is much more efficient than traditional linear review techniques. Concept-based
searching is becoming accepted as a reliable and efficient search method that is more likely to
produce relevant results than keyword or Boolean searches.
 Enterprise Search and Enterprise Content Management (ECM) - Concept search
technologies are being widely used in enterprise search. As the volume of information within the
enterprise grows, the ability to cluster, categorize, and search large collections of unstructured
text on a conceptual basis has become essential.
 Content-Based Image Retrieval (CBIR) - Content-based approaches are being used for the
semantic retrieval of digitized images and video from large visual corpora. One of the earliest
content-based image retrieval systems to address the semantic problem was the Image Scape
search engine. In this system, the user could make direct queries for multiple visual objects such
as sky, trees, water, etc. using spatially positioned icons in a WWW index containing more than
ten million images and videos using key frames. The system used information theory to
determine the best features for minimizing uncertainty in the classification. The semantic gap is
often mentioned in regard to CBIR. The semantic gap refers to the gap between the information
that can be extracted from visual data and the interpretation that the same data have for a user in a
given situation. The ACM SIGMM Workshop on Multimedia Information Retrieval is dedicated
to studies of CBIR.
 Multimedia and Publishing - Concept search is used by the multimedia and publishing
industries to provide users with access to news, technical information, and subject matter
expertise coming from a variety of unstructured sources. Content-based methods for multimedia
information retrieval (MIR) have become especially important when text annotations are missing
or incomplete.
 Digital Libraries and Archives - Images, videos, music, and text items in digital libraries and
digital archives are being made accessible to large groups of users (especially on the Web)
through the use of concept search techniques. For example, the Executive Daily Brief (EDB), a
business information monitoring and alerting product developed by EBSCO Publishing, uses
concept search technology to provide corporate end users with access to a digital library
containing a wide array of business content. In a similar manner, the Music Genome
Project spawned Pandora, which employs concept searching to spontaneously create individual
music libraries or virtual radio stations.
 Genomic Information Retrieval (GIR) - Genomic Information Retrieval (GIR) uses concept
search techniques applied to genomic literature databases to overcome the ambiguities of
scientific literature.
 Human Resources Staffing and Recruiting - Many human resources staffing and recruiting
organizations have adopted concept search technologies to produce highly relevant resume search
results that provide more accurate and relevant candidate resumes than loosely related keyword
results.

Effective concept Searching

The effectiveness of a concept search can depend on a variety of elements including the dataset being
searched and the search engine that is used to process queries and display results. However, most
concept search engines work best for certain kinds of queries:
 Effective queries are composed of enough text to adequately convey the intended concepts.
Effective queries may include full sentences, paragraphs, or even entire documents. Queries
composed of just a few words are not as likely to return the most relevant results.
 Effective queries do not include concepts in a query that are not the object of the search.
Including too many unrelated concepts in a query can negatively affect the relevancy of the result
items.
 Effective queries are expressed in a full-text, natural language style similar in style to the
documents being searched. For example, using queries composed of excerpts from an
introductory science textbook would not be as effective for concept searching if the dataset being
searched is made up of advanced, college-level science texts. Substantial queries that better
represent the overall concepts, styles, and language of the items for which the query is being
conducted is generally more effective.
Guide Lines for Evaluating a Concept Search Engine

1. Result items should be relevant to the information need expressed by the concepts contained
in the query statements, even if the terminology used by the result items is different from the
terminology used in the query.
2. Result items should be sorted and ranked by relevance.
3. Relevant result items should be quickly located and displayed. Even complex queries should
return relevant results fairly quickly.
4. Query length should be non-fixed, i.e., a query can be as long as deemed necessary. A
sentence, a paragraph, or even an entire document can be submitted as a query.
5. A concept query should not require any special or complex syntax. The concepts contained in
the query can be clearly and prominently expressed without using any special rules.
6. Combined queries using concepts, keywords, and metadata should be allowed.
7. Relevant portions of result items should be usable as query text simply by selecting the item
and telling the search engine to find similar items.
8. Query-ready indexes should be created relatively quickly.
9. The search engine should be capable of performing Federated searches. Federated searching
enables concept queries to be used for simultaneously searching multiple data sources for
information, which are then merged, sorted, and displayed in the results.
10.A concept search should not be affected by misspelled words, typographical errors, or OCR
scanning errors in either the query text or in the text of the dataset being searched.

Beganner Levels 1-17
50% (4)
Beganner Levels 1-17
17 pages
Information Retrieval
No ratings yet
Information Retrieval
142 pages
History of Microbiology
50% (2)
History of Microbiology
23 pages
Powders
100% (2)
Powders
16 pages
2 Search Engines
No ratings yet
2 Search Engines
41 pages
IRS Unit 4 by Krishna
No ratings yet
IRS Unit 4 by Krishna
23 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Semantically Enhanced Information Retrieval: An Ontology-Based Approach
No ratings yet
Semantically Enhanced Information Retrieval: An Ontology-Based Approach
29 pages
Contextual Information Search Based On Domain Using Query Expansion
No ratings yet
Contextual Information Search Based On Domain Using Query Expansion
4 pages
UNIT 1 Notes
No ratings yet
UNIT 1 Notes
16 pages
Comp Sci - IJCSE - Topic Specfic Concept - Sonam Arora
No ratings yet
Comp Sci - IJCSE - Topic Specfic Concept - Sonam Arora
12 pages
Everything in Brief Introduction
No ratings yet
Everything in Brief Introduction
5 pages
Measures To Evaluate The Superiority of A Search Engine
No ratings yet
Measures To Evaluate The Superiority of A Search Engine
7 pages
Aesthetics and Technology in Building, Pier Luigi Nervi
100% (4)
Aesthetics and Technology in Building, Pier Luigi Nervi
146 pages
chap6
No ratings yet
chap6
70 pages
A Comparison of Open Source Search Engine
No ratings yet
A Comparison of Open Source Search Engine
46 pages
7 CurrentTrendsAndIssues
No ratings yet
7 CurrentTrendsAndIssues
50 pages
Irs Unit III
No ratings yet
Irs Unit III
74 pages
L001
No ratings yet
L001
49 pages
IR Presentation 1
No ratings yet
IR Presentation 1
41 pages
The Overview of Web Search Engines 16ep4np3gk
No ratings yet
The Overview of Web Search Engines 16ep4np3gk
23 pages
Introducción a Recuperación de Información y Sistemas de Recomendación
No ratings yet
Introducción a Recuperación de Información y Sistemas de Recomendación
40 pages
IRS Notes
No ratings yet
IRS Notes
10 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Next Generation Search Engine: Key Words
No ratings yet
Next Generation Search Engine: Key Words
7 pages
Semantic Search-Main
No ratings yet
Semantic Search-Main
2 pages
Lect 1 IRIntroduction
No ratings yet
Lect 1 IRIntroduction
59 pages
Pediatrics History Taking and Physical Examination
0% (1)
Pediatrics History Taking and Physical Examination
61 pages
IRS Unit 2 by Krishna
No ratings yet
IRS Unit 2 by Krishna
39 pages
irs unit-4 modified
No ratings yet
irs unit-4 modified
13 pages
The Main Differences Between The Three Schools of Usul Al Fiqh
No ratings yet
The Main Differences Between The Three Schools of Usul Al Fiqh
6 pages
Unit - 1
No ratings yet
Unit - 1
51 pages
IRS UNIT-4 NOTES_241202_150037
No ratings yet
IRS UNIT-4 NOTES_241202_150037
18 pages
Pengaruh Konsentrasi Terhadap Filtrasi Ampas Tepung Tapioka Kering Menggunkan Alat Filtrasi Plat and Team Filter Press
No ratings yet
Pengaruh Konsentrasi Terhadap Filtrasi Ampas Tepung Tapioka Kering Menggunkan Alat Filtrasi Plat and Team Filter Press
56 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
IRS_Notes_I&2 CSE A&B
No ratings yet
IRS_Notes_I&2 CSE A&B
27 pages
OS Search Engine Comparison
No ratings yet
OS Search Engine Comparison
46 pages
Requirements Modelling - [Use Cases]
No ratings yet
Requirements Modelling - [Use Cases]
62 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
All Units Notes TYBSC-CS-Information-Retrieval
No ratings yet
All Units Notes TYBSC-CS-Information-Retrieval
89 pages
Chap 1
No ratings yet
Chap 1
22 pages
Girador de Balde
No ratings yet
Girador de Balde
14 pages
Information Storage and Retrieval
No ratings yet
Information Storage and Retrieval
45 pages
93 - Prepositions of Movement PDF
No ratings yet
93 - Prepositions of Movement PDF
4 pages
Search Engines, Subject Direct Ories, and Met A-Search Engines
No ratings yet
Search Engines, Subject Direct Ories, and Met A-Search Engines
4 pages
Information Storage and Retrieval
No ratings yet
Information Storage and Retrieval
5 pages
bulu
No ratings yet
bulu
47 pages
What is Information Retrieval (IR) (1)
No ratings yet
What is Information Retrieval (IR) (1)
17 pages
Indexing Database Systems
No ratings yet
Indexing Database Systems
5 pages
Unit - I - IR
No ratings yet
Unit - I - IR
39 pages
LayingOutFrustumWithDividers 20jul2012
No ratings yet
LayingOutFrustumWithDividers 20jul2012
9 pages
Zaheer Ahmad, Presentation Information Literacy Skills
No ratings yet
Zaheer Ahmad, Presentation Information Literacy Skills
29 pages
HIidro Studio Culverts - Manual
No ratings yet
HIidro Studio Culverts - Manual
26 pages
Introduction to APD
No ratings yet
Introduction to APD
17 pages
Text
No ratings yet
Text
5 pages
Resorts
No ratings yet
Resorts
12 pages
Modern Information Retrieval: Computer Engineering Department Fall 2005
No ratings yet
Modern Information Retrieval: Computer Engineering Department Fall 2005
19 pages
Chap - Week8 - Queries and Information Needs
No ratings yet
Chap - Week8 - Queries and Information Needs
44 pages
Situational Analysis of Women Workers in Sericulture: A Study of Current Trends and Prospects in West Bengal
No ratings yet
Situational Analysis of Women Workers in Sericulture: A Study of Current Trends and Prospects in West Bengal
21 pages
Use of Coatings On Hydraulic Steel Structures
No ratings yet
Use of Coatings On Hydraulic Steel Structures
12 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
Search and Retrieval of Information
No ratings yet
Search and Retrieval of Information
7 pages
In Mobi
No ratings yet
In Mobi
4 pages
Information_Retrieval_systems_and_Web_Search_Engin
No ratings yet
Information_Retrieval_systems_and_Web_Search_Engin
4 pages
Nutr 407 Meal Planning Project-Ana
No ratings yet
Nutr 407 Meal Planning Project-Ana
14 pages
L01
No ratings yet
L01
33 pages
IRS_Unit_2
No ratings yet
IRS_Unit_2
15 pages
Information Retrieval System
No ratings yet
Information Retrieval System
4 pages
QUESEM: Towards Building A Meta Search Service Utilizing Query Semantics
No ratings yet
QUESEM: Towards Building A Meta Search Service Utilizing Query Semantics
10 pages
2019-2
No ratings yet
2019-2
14 pages
2008d Sigirforum Murdock
No ratings yet
2008d Sigirforum Murdock
4 pages
IT chap 12
No ratings yet
IT chap 12
8 pages
Complex Limits and Continuity Short Question
No ratings yet
Complex Limits and Continuity Short Question
9 pages
The Two Brothers. A Narrative Text
No ratings yet
The Two Brothers. A Narrative Text
4 pages
PM - Red Bus
No ratings yet
PM - Red Bus
11 pages
Mariano La Finals
100% (1)
Mariano La Finals
2 pages
Syllabus of Certificate Exam on KYC-AML & Compliance
No ratings yet
Syllabus of Certificate Exam on KYC-AML & Compliance
3 pages
The Case of The Missing Reservation
No ratings yet
The Case of The Missing Reservation
3 pages
Nature and Stages of A Criminal Case
No ratings yet
Nature and Stages of A Criminal Case
3 pages
Claricel Mae C. Baetiong: (Career Objective) (Skills and Qualifications)
No ratings yet
Claricel Mae C. Baetiong: (Career Objective) (Skills and Qualifications)
1 page
Angga Pratama Haloho - 2003511035 - Abstract Assigment - Class A
No ratings yet
Angga Pratama Haloho - 2003511035 - Abstract Assigment - Class A
3 pages
Thesis 01
100% (1)
Thesis 01
14 pages
Syllabus Renewal
No ratings yet
Syllabus Renewal
2 pages
RESOLUTION 15, SK Hono April
No ratings yet
RESOLUTION 15, SK Hono April
2 pages
Google Search Revealed: Mastering the Algorithm for Search Dominance
From Everand
Google Search Revealed: Mastering the Algorithm for Search Dominance
Azhar ul Haque Sario
No ratings yet
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Search engines

Uploaded by

Search engines

Uploaded by

Search engines

Use of Concept Search:

Effective concept Searching

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.