0% found this document useful (0 votes)
331 views6 pages

Search Engine Functionality For LLP: Apache Lucene

Apache Lucene is a free and open-source information retrieval software library written in Java. It allows developers to add full-text search and indexing capabilities to applications. Solr is an open-source enterprise search platform built on Lucene that provides powerful indexing, searching, and retrieval capabilities across various repositories. It allows developers to easily develop search and analytics applications through REST-like APIs and a web interface for administration. Both Lucene and Solr use tokenization, filtering, and analysis to process content for indexing and searching.

Uploaded by

vikashvardhan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
331 views6 pages

Search Engine Functionality For LLP: Apache Lucene

Apache Lucene is a free and open-source information retrieval software library written in Java. It allows developers to add full-text search and indexing capabilities to applications. Solr is an open-source enterprise search platform built on Lucene that provides powerful indexing, searching, and retrieval capabilities across various repositories. It allows developers to easily develop search and analytics applications through REST-like APIs and a web interface for administration. Both Lucene and Solr use tokenization, filtering, and analysis to process content for indexing and searching.

Uploaded by

vikashvardhan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

Search Engine Functionality for LLP

Apache Lucene Library and Solr Enterprise Search Server

Apache Lucene

• A high-performance, full-featured text search engine


library written entirely in Java.

• It is a technology suitable for nearly any application


that requires full-text search, especially cross-platform.

Features-Lucene is designed to make it easy to add indexing and


search capability to a broad range of applications, including:

• Searchable email: An email application could let users


search archived messages and add new messages to the
index as they arrive.

• Online documentation search: A documentation reader --


CD-based, Web-based, or embedded within the application --
could let users search online documentation or archived
publications.

• Searchable Webpages: A Web browser or proxy server


could build a personal search engine to index every
Webpage a user has visited, allowing users to easily revisit
pages.

• Website search: A CGI program could let users search your


Website.

• Content search: An application could let the user search


saved documents for specific content; this could be
integrated into the Open Document dialog.
• Version control and content management: A document
management system could index documents, or document
versions, so they can be easily retrieved.

• News and wire service feeds: A news server or relay


could index articles as they arrive.

Usage-Lucene can be used as follows:-

• Indexing Side: Write code to add Documents to the index.

• Search Side: Write code to transform user query into


Lucene Query instances.

• Submit Query to Lucene to Search.

• Display Results

-A Document is one or more Fields. A Field consists of a name,


content, and metadata on how to handle the content. Content is
made searchable by analyzing it. Analysis is completed by
chaining together a Tokenizer, which splits an input stream into
words (tokens) and zero or more TokenFilters, which can alter (for
example, stem) or remove the token.

Indexing- It is the process of preparing and adding text to


Lucene. Key Point is Lucene only indexes Strings, i.e.

• Lucene doesn’t care about XML, Word, PDF, etc.

• There are many good open source extractors available

• We need to convert whatever file format we have into


lucene format.

Solr
• Solr is an open source enterprise search server based on the
Lucene Java search library, with XML/HTTP and JSON APIs, hit
highlighting, faceted search, caching, replication, a web
administration interface and many more features. It runs in a
Java servlet container such as Tomcat.

Features: Its in the form of Java5 webapp (WAR) with web


services-like API. We put documents in it (called "indexing") via
XML over HTTP. And we query it via HTTP GET and receive XML
results.

• Advanced Full-Text Search Capabilities

• Optimized for High Volume Web Traffic

• Standards Based Open Interfaces - XML and HTTP

• Server statistics exposed over JMX for monitoring

• Scalability - Efficient Replication to other Solr Search Servers

• Flexible and Adaptable with XML configuration

• Extensible Plugin Architecture

The admin console :


Usage: Conceptually, Solr can be broken down into four main
areas:

• Schema (schema.xml) –describes the data


• Configuration (solrconfig.xml) - describes how people can
interact with the data
• Indexing
• Searching
As in case of Lucene, content is made searchable by analyzing it
by chaining together a Tokenizer. The Solr schema makes it easy
to configure this analysis process without code.

Configuration--The solrconfig.xml file specifies how Solr should


handle indexing, highlighting, faceting, search, and other
requests, as well as attributes specifying how caching should be
handled and how Lucene should manage the index.
Indexing and searching--Happens via HTTP requests sent to the
Solr server. Index is modified by POSTing XML Documents
containing instructions to add (or update) documents, delete
documents, commit pending adds and deletes.
• Loading data- Send XML add commands over HTTP. For example :

<add><doc>

<field name="id">canes</field>

<field name="name">Carolina Hurricanes</field>

</doc></add>

• Querying data: HTTP GET or POST, where parameters specifying


query options:

o http://solr/select?q=electronics

o http://solr/select?q=electronics&sort=price+desc

• Canonical response format is XML

<response>

<lst name="responseHeader">

<int name="status">0</int>

<int name="QTime">1</int>

</lst>

<result name="response" numFound="14" start="0">

<doc>

<arr name="cat">

<str>electronics</str>

<str>connector</str>

</arr>

<arr name="features">

<str>car power adapter, white</str>


</arr>

<str name="id">F8V7067APLKIT</str> ..…

Lucene v. Solr

Lucene Solr
Embedded/ lightweight Server-side

No Container HTTP as communication language

Provide low-level control over all Want ease of setup and


aspects of process configuration

Thick clients Can be used for Non-Java clients

Distributed Replication/Caching Out-of-the-Box

Need to use features not available JDK 1.5


in Solr

JDK 1.4

Links for installation and documentation:

Lucene:

http://lucene.apache.org/java/2_4_0/gettingstarted.html (official
website)

http://www.ibm.com/developerworks/web/library/wa-
lucene2/?S_TACT=105AGY82&S_CMP=GENSITE

Solr:

http://lucene.apache.org/solr/tutorial.html (official website)

http://www.ibm.com/developerworks/opensource/library/j-solr-
update/index.html?ca=drs-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy