0% found this document useful (0 votes)
60 views7 pages

Nutch Installation Guide

This document provides instructions for installing and configuring Apache Nutch for use in the NTER system. It describes downloading and extracting Nutch software, creating required directories, configuring Nutch properties in NTER's portal-ext.properties file, and updating performance settings. The appendix lists the specific configuration used for the deployment on the website www.nterlearning.org.

Uploaded by

kaleab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views7 pages

Nutch Installation Guide

This document provides instructions for installing and configuring Apache Nutch for use in the NTER system. It describes downloading and extracting Nutch software, creating required directories, configuring Nutch properties in NTER's portal-ext.properties file, and updating performance settings. The appendix lists the specific configuration used for the deployment on the website www.nterlearning.org.

Uploaded by

kaleab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

 

NUTCH INSTALLATION & CONFIGURATION 
GUIDE FOR USE IN THE NTER SYSTEM 

Prepared By:  
Leigh Moulder, SRI International 
leigh.moulder@sri.com 
TABLE OF CONTENTS 

Document Change Log ................................................................................................................................................... 2 
Nutch Server Information .............................................................................................................................................. 3 

Account Information ................................................................................................................................................. 3 
Installation Locations ................................................................................................................................................ 3 
Resources .................................................................................................................................................................. 3 

Master Nutch Installation .............................................................................................................................................. 4 

Gather Software ........................................................................................................................................................ 4 
Nutch Home Directory .............................................................................................................................................. 4 
Configure NTER ......................................................................................................................................................... 4 

Upgrading Nutch to Release 1.4 .................................................................................................................................... 5 
Appendix A – Deployed Configuration .......................................................................................................................... 6 

Account Information ................................................................................................................................................. 6 
Installation Locations ................................................................................................................................................ 6 
 
 
   

Nutch Installation Guide    1 
DOCUMENT CHANGE LOG 
Release Date   Document Version  Notes
8/1/2011  1.0   Initial Release 
10/1/2011  1.1   Updated document formatting 
1/17/2012  1.2   Updated documentation for Nutch 1.4 Release 
2/17/2012  1.3   Simplified installation steps 
 
   

Nutch Installation Guide    2 
NUTCH SERVER INFORMATION 

ACCOUNT INFORMATION 

Account  Referenced As  Value 


Nutch Server host  ${nutch.host} 
Nutch Server user  ${nutch.user} 
Tomcat account  ${tomcat.user}
Solr host  ${solr.host} 
Solr URL  ${solr.url} 
Solr core  ${solr.core} 
Solr user (optional)  ${solr.user} 
Solr password (optional)  ${solr.password}
 

INSTALLATION LOCATIONS 

Directory  Referenced As  Value 


Tomcat home  ${catalina.home}
Tomcat base  ${catalina.base}
Nutch Home directory  ${nutch.home}
 

RESOURCES 
Nutch Download page  http://www.apache.org/dist/nutch/apache‐nutch‐1.4‐bin.tar.gz 
 
   

Nutch Installation Guide    3 
MASTER NUTCH INSTALLATION 

The Master Nutch installation only needs to be performed once per NTER deployment.  It is designed to run on the 
‘Master’ NTER node and provides full‐text crawling for all other NTER instances. 
 

GATHER SOFTWARE 

The majority of Nutch is included with the NTER course‐portlet webapp.  As such, these instructions assume NTER 
has successfully been deployed.   

1. Download and extract the Nutch binary file to the /tmp directory. 

cd /tmp
wget http://www.apache.org/dist/nutch/apache-nutch-1.4-bin.tar.gz
tar xzf apache-nutch-1.4-bin.tar.gz
 

NUTCH HOME DIRECTORY 

1. Create the Nutch home and data directories. 

cd /
mkdir –p ${nutch.home}
mkdir –p ${nutch.home}/data
mkdir –p ${nutch.home}/urls
 

2. Copy the Nutch plugins to the Nutch home directory. 

cd ${nutch.home}
cp –r /tmp/apache-nutch-1.4/runtime/local/plugins .
 

3. Set the following permissions on the Nutch home directories 

cd ${nutch.home}
chown –R ${tomcat.user}.${tomcat.user} *
 

4. Once NTER is configured with the correct Nutch home properties (below), all necessary data directories will 
automatically be created. 
5. Due to the tight integration between Nutch and the course‐portlet, no other configuration or binary files are 
needed. 
6. During various crawl stages, Nutch needs to create temporary directories.  These are automatically located 
under the working directory of the calling service, in this case ${tomcat.base}.  Ensure that the ${tomcat.user} 
is the owner of this and all subdirectories.  

chown –R ${tomcat.user}.${tomcat.user} ${catalina.base}


 

CONFIGURE NTER 

Nutch Installation Guide    4 
1. Make the following updates to NTER’s portal‐ext.properties file.   
a. nter.nutch.role : Should only be set if this is the master Nutch node.  If so, set to “master”. 
b. nter.nutch.home.dir :  Set to the Nutch home directory created above. 
c. nter.nutch.indexer.type : Determines the type of indexer used by Nutch.  Currently, the only valid option 
is “solr”. 
d. nter.nutch.solr.url : The URL of the Solr index server. 
e. nter.nutch.solr.user : The user account used to connect to the Solr index.  This is only needed if security 
has been configured on the Solr server. 
f. nter.nutch.solr.password : The password for the user account used to connect to the Solr index.  This is 
only needed if security has been configured on the Solr server. 

##
## Nutch Settings
##
nter.nutch.role=master
nter.nutch.home.dir=${nutch.home}
nter.nutch.indexer.type=solr
nter.nutch.solr.url=${solr.url}/solr/${solr.core}
nter.nutch.solr.user=${solr.user}
nter.nutch.solr.password=${solr.password}
  

2. Optionally, update any additional Nutch configuration settings.  The following performance configurations 
changes can be made in the portlet.xml file, located at ${catalina.base}/webapps/course‐portlet/WEB‐
INF/portlet.xml.   

Default Value  Description
Property 
crawlTimer  30  The interval (in minutes) between Nutch crawls. 
The maximum number of concurrent threads used to fetch web 
pages. Increasing this value can improve crawl speed since more 
threadsLimit  5 
threads are used concurrently. However, too high of a value can 
cause server performance issues. 
The maximum URL depth to traverse. Decreasing this value will 
speed up crawling and indexing, but reduce the number of pages 
depthLimit  10 
crawled.  Increasing this value will increase index time, and increase 
the depth of information. 
 

3. Restart Tomcat to have the changes take effect. 

/etc/init.d/tomcat6 restart
 
 

UPGRADING NUTCH TO RELEASE 1.4 

Due to NTER’s implementation of Nutch, no data is stored or used for future crawls.  Because of this, the simplest 
way to upgrade a previous Nutch installation is to remove the existing Nutch directory and perform a clean 
installation.   
   

Nutch Installation Guide    5 
APPENDIX A – DEPLOYED CONFIGURATION 

The following configuration was used for www.nterlearning.org. 

ACCOUNT INFORMATION 

Account  Referenced As  Value 


Nutch Server host  ${nutch.host}  Nterlearning.org
Nutch Server user  ${nutch.user}  root
Tomcat account  ${tomcat.user} tomcat6
Solr host (optional)  ${solr.host}  search.nterlearning.org
Solr URL (optional)  ${solr.url}  http://search.nterlearning.org/solr 
Solr core (optional)  ${solr.core}  nutch
 

INSTALLATION LOCATIONS 

Directory  Referenced As  Value 


Tomcat home  ${catalina.home} /usr/share/tomcat6
Tomcat base  ${catalina.base} /var/lib/tomcat6
Nutch Home directory  ${nutch.home} /var/lib/nutch  (maps to /mnt/nutch) 
 
 

Nutch Installation Guide    6 

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy