0% found this document useful (0 votes)

331 views6 pages

Search Engine Functionality For LLP: Apache Lucene

Apache Lucene is a free and open-source information retrieval software library written in Java. It allows developers to add full-text search and indexing capabilities to applications. Solr is an open-source enterprise search platform built on Lucene that provides powerful indexing, searching, and retrieval capabilities across various repositories. It allows developers to easily develop search and analytics applications through REST-like APIs and a web interface for administration. Both Lucene and Solr use tokenization, filtering, and analysis to process content for indexing and searching.

Uploaded by

vikashvardhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

331 views6 pages

Search Engine Functionality For LLP: Apache Lucene

Uploaded by

vikashvardhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 6

Search Engine Functionality for LLP

Apache Lucene Library and Solr Enterprise Search Server

Apache Lucene

• A high-performance, full-featured text search engine

library written entirely in Java.

• It is a technology suitable for nearly any application

that requires full-text search, especially cross-platform.

Features-Lucene is designed to make it easy to add indexing and

search capability to a broad range of applications, including:

• Searchable email: An email application could let users

search archived messages and add new messages to the
index as they arrive.

• Online documentation search: A documentation reader --

CD-based, Web-based, or embedded within the application --
could let users search online documentation or archived
publications.

• Searchable Webpages: A Web browser or proxy server

could build a personal search engine to index every
Webpage a user has visited, allowing users to easily revisit
pages.

• Website search: A CGI program could let users search your

Website.

• Content search: An application could let the user search

saved documents for specific content; this could be
integrated into the Open Document dialog.
• Version control and content management: A document
management system could index documents, or document
versions, so they can be easily retrieved.

• News and wire service feeds: A news server or relay

could index articles as they arrive.

Usage-Lucene can be used as follows:-

• Indexing Side: Write code to add Documents to the index.

• Search Side: Write code to transform user query into

Lucene Query instances.

• Submit Query to Lucene to Search.

• Display Results

-A Document is one or more Fields. A Field consists of a name,

content, and metadata on how to handle the content. Content is
made searchable by analyzing it. Analysis is completed by
chaining together a Tokenizer, which splits an input stream into
words (tokens) and zero or more TokenFilters, which can alter (for
example, stem) or remove the token.

Indexing- It is the process of preparing and adding text to

Lucene. Key Point is Lucene only indexes Strings, i.e.

• Lucene doesn’t care about XML, Word, PDF, etc.

• There are many good open source extractors available

• We need to convert whatever file format we have into

lucene format.

Solr
• Solr is an open source enterprise search server based on the
Lucene Java search library, with XML/HTTP and JSON APIs, hit
highlighting, faceted search, caching, replication, a web
administration interface and many more features. It runs in a
Java servlet container such as Tomcat.

Features: Its in the form of Java5 webapp (WAR) with web

services-like API. We put documents in it (called "indexing") via
XML over HTTP. And we query it via HTTP GET and receive XML
results.

• Advanced Full-Text Search Capabilities

• Optimized for High Volume Web Traffic

• Standards Based Open Interfaces - XML and HTTP

• Server statistics exposed over JMX for monitoring

• Scalability - Efficient Replication to other Solr Search Servers

• Flexible and Adaptable with XML configuration

• Extensible Plugin Architecture

The admin console :

Usage: Conceptually, Solr can be broken down into four main
areas:

• Schema (schema.xml) –describes the data

• Configuration (solrconfig.xml) - describes how people can
interact with the data
• Indexing
• Searching
As in case of Lucene, content is made searchable by analyzing it
by chaining together a Tokenizer. The Solr schema makes it easy
to configure this analysis process without code.

Configuration--The solrconfig.xml file specifies how Solr should

handle indexing, highlighting, faceting, search, and other
requests, as well as attributes specifying how caching should be
handled and how Lucene should manage the index.
Indexing and searching--Happens via HTTP requests sent to the
Solr server. Index is modified by POSTing XML Documents
containing instructions to add (or update) documents, delete
documents, commit pending adds and deletes.
• Loading data- Send XML add commands over HTTP. For example :

<field name="id">canes</field>

<field name="name">Carolina Hurricanes</field>

</doc></add>

• Querying data: HTTP GET or POST, where parameters specifying

query options:

o http://solr/select?q=electronics

o http://solr/select?q=electronics&sort=price+desc

• Canonical response format is XML

</lst>

<result name="response" numFound="14" start="0">

<doc>

<str>electronics</str>

<str>connector</str>

</arr>

<str>car power adapter, white</str>

</arr>

<str name="id">F8V7067APLKIT</str> ..…

Lucene v. Solr

Lucene Solr
Embedded/ lightweight Server-side

No Container HTTP as communication language

Provide low-level control over all Want ease of setup and

aspects of process configuration

Thick clients Can be used for Non-Java clients

Distributed Replication/Caching Out-of-the-Box

Need to use features not available JDK 1.5

in Solr

JDK 1.4

Links for installation and documentation:

Lucene:

http://lucene.apache.org/java/2_4_0/gettingstarted.html (official
website)

http://www.ibm.com/developerworks/web/library/wa-
lucene2/?S_TACT=105AGY82&S_CMP=GENSITE

Solr:

http://lucene.apache.org/solr/tutorial.html (official website)

http://www.ibm.com/developerworks/opensource/library/j-solr-
update/index.html?ca=drs-

Apache Lucene
100% (1)
Apache Lucene
13 pages
Lucene
No ratings yet
Lucene
15 pages
Lucence / SOLR
No ratings yet
Lucence / SOLR
21 pages
Marc Krellenst's Session at Lucene Revolution 2011
No ratings yet
Marc Krellenst's Session at Lucene Revolution 2011
16 pages
Apache Lucene: Searching The Web and Everything Else
No ratings yet
Apache Lucene: Searching The Web and Everything Else
35 pages
Apache Solr For Indexing Data - Sample Chapter
No ratings yet
Apache Solr For Indexing Data - Sample Chapter
19 pages
Apache Solr Presentation
100% (1)
Apache Solr Presentation
37 pages
Luce Ne Bootcamp
No ratings yet
Luce Ne Bootcamp
83 pages
Advanced Search With Lucene
No ratings yet
Advanced Search With Lucene
30 pages
Built On Solr Simplified, Accelerated Produc Vity Cost Effec Ve Architecture
No ratings yet
Built On Solr Simplified, Accelerated Produc Vity Cost Effec Ve Architecture
7 pages
Welcome To Lucene!
No ratings yet
Welcome To Lucene!
11 pages
Lucene and Solr
No ratings yet
Lucene and Solr
24 pages
Requirements: Sun IBM BEA Solr Release
No ratings yet
Requirements: Sun IBM BEA Solr Release
5 pages
A Search Engine That Supports Rich Snippets
No ratings yet
A Search Engine That Supports Rich Snippets
37 pages
Musa Talukdar: Software Engineer 28 June, 2012
No ratings yet
Musa Talukdar: Software Engineer 28 June, 2012
19 pages
Lucene Domain Index
No ratings yet
Lucene Domain Index
78 pages
Paper 10
No ratings yet
Paper 10
8 pages
Tutorial
No ratings yet
Tutorial
59 pages
Tutorial 3
No ratings yet
Tutorial 3
38 pages
HD Mod10 Solr
No ratings yet
HD Mod10 Solr
73 pages
Solr Architecture
No ratings yet
Solr Architecture
5 pages
Searching and Indexing
No ratings yet
Searching and Indexing
21 pages
TimeZones For EnterpriseOne 91
0% (1)
TimeZones For EnterpriseOne 91
37 pages
Lucene Sail
No ratings yet
Lucene Sail
4 pages
NLP 05
No ratings yet
NLP 05
26 pages
Apache Lucene
No ratings yet
Apache Lucene
5 pages
4
No ratings yet
4
35 pages
Chapter_6 - Searching and Indexing
No ratings yet
Chapter_6 - Searching and Indexing
44 pages
Apache_Lucene
No ratings yet
Apache_Lucene
5 pages
Lucene 4 Cookbook - Sample Chapter
No ratings yet
Lucene 4 Cookbook - Sample Chapter
28 pages
Ib04 Asr9006 1
No ratings yet
Ib04 Asr9006 1
111 pages
Solr Presentation
No ratings yet
Solr Presentation
20 pages
Pyrate Cthulhu Tales of The Cthulhu Mythos, Volume One (PDFDrive - Com) - 1 PDF
100% (1)
Pyrate Cthulhu Tales of The Cthulhu Mythos, Volume One (PDFDrive - Com) - 1 PDF
240 pages
Container Stuffing Checklist
100% (4)
Container Stuffing Checklist
1 page
Chapter_5_7f8e2deab3714d75b0c5a514a8a99b6b_1712934164766
No ratings yet
Chapter_5_7f8e2deab3714d75b0c5a514a8a99b6b_1712934164766
13 pages
Apache Solr Search Patterns - Sample Chapter
No ratings yet
Apache Solr Search Patterns - Sample Chapter
33 pages
Chapter_5_Searching_and_Indexing_Big_Data_250525_070825
No ratings yet
Chapter_5_Searching_and_Indexing_Big_Data_250525_070825
19 pages
L01
No ratings yet
L01
33 pages
The Ottoman Houses of Haifa: Typologies of Domestic Architecture in A Late Ottoman Palestinian Town
0% (1)
The Ottoman Houses of Haifa: Typologies of Domestic Architecture in A Late Ottoman Palestinian Town
14 pages
Heritage Walk
No ratings yet
Heritage Walk
12 pages
CG Sor Bridge 2015
33% (3)
CG Sor Bridge 2015
60 pages
Knauf D131 System
No ratings yet
Knauf D131 System
20 pages
Solr Elasticsearch
No ratings yet
Solr Elasticsearch
10 pages
Index: CCSP™ Lab Workbook v2.2 Securing Networks With ASA
No ratings yet
Index: CCSP™ Lab Workbook v2.2 Securing Networks With ASA
152 pages
Microsoft Hyper-V Over SMB 3.0 With Clustered Data ONTAP: Best Practices
No ratings yet
Microsoft Hyper-V Over SMB 3.0 With Clustered Data ONTAP: Best Practices
21 pages
Ibrahim Star
No ratings yet
Ibrahim Star
2 pages
Introduction To Packet Tracer
No ratings yet
Introduction To Packet Tracer
10 pages
Readme
No ratings yet
Readme
3 pages
Bills of Quantities For Pit Latrine PDF
100% (2)
Bills of Quantities For Pit Latrine PDF
6 pages
Elden Ring Game Progress Route
No ratings yet
Elden Ring Game Progress Route
94 pages
An Introduction To The TELEHOUSE Channel Partner Program
No ratings yet
An Introduction To The TELEHOUSE Channel Partner Program
26 pages
CV Ramiro Bastos
No ratings yet
CV Ramiro Bastos
2 pages
Torrent Downloaded From Demonoid - PW
No ratings yet
Torrent Downloaded From Demonoid - PW
5 pages
Guidelines On Crack Repair in Concrete Structure
100% (3)
Guidelines On Crack Repair in Concrete Structure
17 pages
Mortar Hollow Block: Concrete
No ratings yet
Mortar Hollow Block: Concrete
25 pages
Lenovo V110 14IAP Platform Specifications
No ratings yet
Lenovo V110 14IAP Platform Specifications
1 page
EHP 6 Enterprise Asset Management
No ratings yet
EHP 6 Enterprise Asset Management
25 pages
Linux Tutorial Networking
100% (1)
Linux Tutorial Networking
22 pages
Nexxt Solutions Passive Data Sheet Aw220nxt30 34 Eng PDF
No ratings yet
Nexxt Solutions Passive Data Sheet Aw220nxt30 34 Eng PDF
2 pages
Perspective 1: Land Use & Zoning
No ratings yet
Perspective 1: Land Use & Zoning
1 page
ST 7211 Advanced Structural Engineering Laboratory: Observation
100% (1)
ST 7211 Advanced Structural Engineering Laboratory: Observation
35 pages
Art in Photography
100% (1)
Art in Photography
296 pages
Architecture Is The Learned Game, Correct and Magnificent, of Forms Assembled in The Light.
No ratings yet
Architecture Is The Learned Game, Correct and Magnificent, of Forms Assembled in The Light.
26 pages
Structural Appraisal Report
100% (1)
Structural Appraisal Report
14 pages
BOQ Waterproofing
100% (1)
BOQ Waterproofing
6 pages
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
From Everand
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
Anand Vemula
No ratings yet
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
From Everand
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
Anand Vemula
No ratings yet
Alfresco 4 Enterprise Content Management Implementation
From Everand
Alfresco 4 Enterprise Content Management Implementation
Munwar Shariff
3/5 (2)
Solr Essentials: Definitive Reference for Developers and Engineers
From Everand
Solr Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Alfresco 3 Enterprise Content Management Implementation
From Everand
Alfresco 3 Enterprise Content Management Implementation
Amita Bhandari
3/5 (2)
Scripting with PowerShell for Beginners: A Practical Guide with Examples
From Everand
Scripting with PowerShell for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
The Definitive Guide to PowerShell
From Everand
The Definitive Guide to PowerShell
Wesley Dunne
No ratings yet
Oracle SQL Developer 2.1
From Everand
Oracle SQL Developer 2.1
Sue Harper
No ratings yet
Alfresco Developer Guide
From Everand
Alfresco Developer Guide
Jeff Potts
No ratings yet
The Oracle Universal Content Management Handbook: Build, administer, and manage Oracle Stellent UCM Solutions
From Everand
The Oracle Universal Content Management Handbook: Build, administer, and manage Oracle Stellent UCM Solutions
Dmitri Khanine
5/5 (1)
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
From Everand
Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana
Saurabh Chhajed
No ratings yet
The Ultimate Django Guide: From Beginner to Advanced Web Development
From Everand
The Ultimate Django Guide: From Beginner to Advanced Web Development
Jiho Seok
No ratings yet
PHP Oracle Web Development: Data processing, Security, Caching, XML, Web Services, and Ajax
From Everand
PHP Oracle Web Development: Data processing, Security, Caching, XML, Web Services, and Ajax
Yuli Vasiliev
No ratings yet
Plone 3 Intranets
From Everand
Plone 3 Intranets
Victor Fernandez de Alba
No ratings yet
Web Devlopment
From Everand
Web Devlopment
Netra
No ratings yet
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
From Everand
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
Robert Johnson
No ratings yet
Building Websites with OpenCms
From Everand
Building Websites with OpenCms
Matt Butcher
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Living with Linux in the Industrial World
From Everand
Living with Linux in the Industrial World
Elaiya Iswera Lallan
No ratings yet
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
Aprende programación python aplicaciones web: python, #2
From Everand
Aprende programación python aplicaciones web: python, #2
Jesus Jonathan cuevas orozco
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Search Engine Functionality For LLP: Apache Lucene

Uploaded by

Search Engine Functionality For LLP: Apache Lucene

Uploaded by

Search Engine Functionality for LLP

Apache Lucene Library and Solr Enterprise Search Server

• A high-performance, full-featured text search engine

• It is a technology suitable for nearly any application

Features-Lucene is designed to make it easy to add indexing and

• Searchable email: An email application could let users

• Online documentation search: A documentation reader --

• Searchable Webpages: A Web browser or proxy server

• Website search: A CGI program could let users search your

• Content search: An application could let the user search

• News and wire service feeds: A news server or relay

Usage-Lucene can be used as follows:-

• Indexing Side: Write code to add Documents to the index.

• Search Side: Write code to transform user query into

• Submit Query to Lucene to Search.

-A Document is one or more Fields. A Field consists of a name,

Indexing- It is the process of preparing and adding text to

• Lucene doesn’t care about XML, Word, PDF, etc.

• There are many good open source extractors available

• We need to convert whatever file format we have into

Features: Its in the form of Java5 webapp (WAR) with web

• Advanced Full-Text Search Capabilities

• Optimized for High Volume Web Traffic

• Standards Based Open Interfaces - XML and HTTP

• Server statistics exposed over JMX for monitoring

• Scalability - Efficient Replication to other Solr Search Servers

• Flexible and Adaptable with XML configuration

• Extensible Plugin Architecture

The admin console :

• Schema (schema.xml) –describes the data

Configuration--The solrconfig.xml file specifies how Solr should

<field name="name">Carolina Hurricanes</field>

• Querying data: HTTP GET or POST, where parameters specifying

• Canonical response format is XML

<result name="response" numFound="14" start="0">

<str>car power adapter, white</str>

<str name="id">F8V7067APLKIT</str> ..…

No Container HTTP as communication language

Provide low-level control over all Want ease of setup and

Thick clients Can be used for Non-Java clients

Distributed Replication/Caching Out-of-the-Box

Need to use features not available JDK 1.5

Links for installation and documentation:

http://lucene.apache.org/solr/tutorial.html (official website)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.