Lucence / SOLR
Lucence / SOLR
Lucence / SOLR
SOLR Introduction
Why do we need a Search Engine ? What is Lucene/SOLR ? Advantages of SOLR SOLR Architecture Query Syntax Working with SOLR: Feed data, query data SOLR installation SOLR configuration
Database
What is Lucene/SOLR ?
Lucene
Apache Lucene is a free/open source information retrieval software library. Lucene is just an indexing and search library Lucene supports: Java, Delphi, Perl, C#, C++, Python, Ruby, and PHP
What is Lucene/SOLR ?
Solr
Solr is wrapper of Lucene for Java Solr is a web application (WAR) which can be deployed in any servlet container, e.g. Jetty, Tomcat Solr is a REST service
SOLR Introduction
Advantages of SOLR
Open source/free Administration Interface Rich Document Parsing and Indexing (PDF, Word, HTML, etc) Full-Text Search Faceted Search and Filtering Multi Server support
SOLR architecture
SOLR Shard
Query Syntax
Keyword matching
title:foo - Search for word "foo" in the title field. title:"foo bar - Search for phrase "foo bar" in the title field. -title:bar - Search everything, except "bar" in the title field.
Query Syntax
Wildcard matching
title:foo* - Search for any word that starts with "foo" in the title field. title:foo*bar - Search for any word that starts with "foo" and ends with bar in the title field.
10
Query Syntax
Proximity matching
"foo bar"~number Number = 0, exactly match Number = 1, The result may be bar foo
11
Query Syntax
Range searches
field:[a TO z] - Search the field has value in range [a->z] field:[* TO 100] - Search all values less than or equal to 100 field:[100 TO *] - Search all values greater than or equal to 100 field:[* TO *] - Matches all documents with the field
12
Query Syntax
Nested query
_query_:field:*lap OR _query_:field:*tran _query_:{!dismax qf=somefield} cat dog
13
Query Syntax
Join {!join from=inner-id to=outer-id}zzz:vvv SQL SELECT xxx, yyy FROM collection1 WHERE outer-id IN ( SELECT inner-id FROM collection1 where zzz = "vvv")
14
Query Syntax
Faceted Search
q=inStock:true&facet=true&facet.field=cat&facet.limit=5
<response> <responseHeader><status>0</status><QTime>4</QTime></responseHeader> <result numFound="12" start="0"/> <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="cat"> <int name="electronics">10</int> <int name="memory">3</int> <int name="drive">2</int> <int name="hard">2</int> <int name="monitor">2</int> </lst> </lst> </lst> </response>
15
SolrJ
Feed data
// make a connection to Solr server SolrServer server = new HttpSolrServer("http://localhost:8080/solr/"); // prepare a doc final SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField("id", 1); doc1.addField("firstName", "First Name"); doc1.addField("lastName", "Last Name"); final Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(); docs.add(doc1); // add docs to Solr server.add(docs); server.commit();
16
SolrJ
Query data
final SolrQuery query = new SolrQuery(); query.setQuery("*:*"); query.addSortField("firstName", SolrQuery.ORDER.asc); final QueryResponse rsp = server.query(query); final SolrDocumentList solrDocumentList = rsp.getResults(); for (final SolrDocument doc : solrDocumentList) { final String firstName = (String) doc.getFieldValue("firstName"); final String id = (String) doc.getFieldValue("id"); }
17
SOLR Introduction
SOLR installation
Ref: http://wiki.apache.org/solr/SolrInstall http://wiki.apache.org/solr/SolrTomcat http://lucene.apache.org/solr/4_ 2_ 1/tutorial.html
18
SOLR Introduction
Extract solr-4.2.1.zip to (D:\Project\solr_web\solr-4.2.1) Copy resource\solr-4.2.1\examples\solr to D:\Project\solr_web\solr = SOLR_HOME Copy resource\solr-4.2.1\dist\solr-4.2.1.war to SOLR_HOME and rename to solr.war Open the SOLR_HOME\collection1\conf\solrconfig.xml and modify the <dataDir> <dataDir>${solr.data.dir:D:/Project/sorl_web/solr/collection1/data}</dataDir> Create a Tomcat Context (solr.xml) file like this: <?xml version="1.0" encoding="utf-8"?> <Context docBase="D:/Project/solr_web/solr/solr.war" debug="0 crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="D:/Project/solr_web/solr" override="true"/> </Context> Copy this file (solr.xml) to tomcat.7.0.35\conf\Catalina\localhost Start Tomcat Open the SOLR dashboard with address: http://localhost:8080/sorl/#/
19
SOLR Introduction
SOLR Configuration
Ref: http://wiki.apache.org/solr/SolrConfigXml http://wiki.apache.org/solr/SchemaXml In the configuration of a Solr server, we need at least 2 xml files: solrconfig.xml and schema.xml Solrconfig.xml: contains the common configuration of a Core: size of memory, data path, transaction, Schema.xml: contains the definitions of data: structure, data type, fields name
20
SOLR Introduction
SOLR Configuration
Schema.xml field : a field will be indexed by solr <field name="firstName" type="string" indexed="true" stored="true"/>
dynamicField: like a field but the name is not specified yet <dynamicField name="*_i" type="int" indexed="true" stored="true"/>
name="*_i" will match any field ending in _i (like myid_i, z_i)
21