IDOL Server 5 Rev4
IDOL Server 5 Rev4
Administrators Guide
Information in this document is subject to change without notice. No part of this document may be reproduced or
transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express permission
of Autonomy Systems Ltd.
Windows is a trademark of Microsoft Corp., UNIX is a trademark of X/OPEN Ltd.
Autonomy Dashboard, Autonomy Desktop Suite, DAH, DIH, DiSH, IAS, IDOL server, Portal-in-a-Box and Retina
are trademarks of Autonomy Systems Ltd.
Table of Contents
Preface................................................................................................................................................ i
Autonomy ................................................................................................................................... i
Contact ...................................................................................................................................... ii
Downloading manual updates from Automater .........................................................................iii
Typographical conventions ........................................................................................................iii
Related documentation ............................................................................................................ iv
2.
3.
4.
5.
6.
7.
8.
9.
About results
26. Results ................................................................................................................................. 265
Relevance ranking ................................................................................................................ 265
Manipulating the relevance of query results ......................................................................... 266
Setting up a field process to boost result relevance ........................................................ 266
Using the BIAS field specifier to boost result relevance .................................................. 269
Using multipliers to boost result relevance ...................................................................... 271
Using Reference fields to filter results at query time ............................................................. 272
Displaying additional fields with results ................................................................................. 275
Configure IDOL server to automatically display additional fields .................................... 275
Display additional fields for individual queries ................................................................. 276
About fields
27. Fields .................................................................................................................................... 279
Processing fields and documents that contain specific fields ............................................... 281
Index fields ............................................................................................................................ 285
Setting up Index fields ..................................................................................................... 285
NumericDateType fields ........................................................................................................ 287
Setting up memory mapping for numerical date fields .................................................... 287
Numerical fields .................................................................................................................... 289
FieldCheckType fields ........................................................................................................... 291
Reference fields .................................................................................................................... 293
Setting up Reference fields ............................................................................................. 293
Simultaneously using KillDuplicates and Combine on Reference fields ......................... 295
Highlight fields ....................................................................................................................... 297
Setting up Highlight fields ................................................................................................ 297
Agentboolean fields .............................................................................................................. 299
Storing Boolean agents in agentboolean fields ............................................................... 299
Matching documents against agentboolean categories .................................................. 300
Meta fields ............................................................................................................................. 301
Changing field values ............................................................................................................ 303
About languages
28. Languages ........................................................................................................................... 307
Running IDOL server in multiple languages ......................................................................... 309
Checking which languages are set up in IDOL server .................................................... 311
Defining language types in IDOL server's configuration file ............................................ 312
Configuring IDOL server to associate language types with documents .......................... 314
Adding language type fields to documents ...................................................................... 318
Defining a default language type in IDOL server's configuration file ............................... 319
Enabling Automatic Language Detection ........................................................................ 320
Specifying the language type of your query .................................................................... 321
Converting results to a specific encoding ........................................................................ 322
Returning documents in multiple languages for your query ............................................ 323
Returning documents in a specific language for your query ........................................... 324
Administration
30. Administering IDOL server ................................................................................................. 357
Executing configuration changes .......................................................................................... 358
Deleting documents from IDOL server by reference ............................................................. 359
Deleting individual documents and ranges of documents from IDOL server ........................ 360
Restoring deleted documents to IDOL server ....................................................................... 361
Creating a new database in IDOL server .............................................................................. 362
To send a DRECREATEDBASE command to IDOL server ............................................ 362
To add a database to IDOL server's configuration file ..................................................... 363
Deleting a database and all the documents it contains ......................................................... 364
Deleting all documents from a database ............................................................................... 365
Expiring documents ............................................................................................................... 366
Exporting IDX documents from IDOL server ......................................................................... 369
Exporting XML documents from IDOL server ........................................................................ 371
Changing the index date, expire date or database of IDOL server documents ..................... 373
Changing field values in IDOL server documents ................................................................. 375
Compacting IDOL servers Data index .................................................................................. 376
Backing up IDOL servers Data index ................................................................................... 378
Initializing IDOL servers Data index ..................................................................................... 381
Exporting users, roles, agents and profiles ........................................................................... 382
Importing users, roles, agents and profiles ........................................................................... 383
Setting up log streams ........................................................................................................... 384
Appendices
Appendix A: The IDOL server configuration file........................................................................ 389
Displaying help on configuration settings .............................................................................. 389
Modifying configuration parameter values ............................................................................. 390
Configuration file sections ..................................................................................................... 391
[License] section .............................................................................................................. 392
[Service] section .............................................................................................................. 392
[Server] section ................................................................................................................ 393
[TermCache] section ....................................................................................................... 393
[IndexCache] section ....................................................................................................... 393
[SectionBreaking] section ................................................................................................ 394
[Paths] section ................................................................................................................. 394
[Databases] section ......................................................................................................... 394
[Schedule] section ........................................................................................................... 395
[Summary] section ........................................................................................................... 395
Preface
Autonomy
Autonomy employs a fundamentally different and unique combination of technologies to enable
computers to form an understanding of a page of text, web pages, emails, voice, documents
and people.
Autonomy's solution is therefore able to power any application dependent upon unstructured
information within every market sector, including: e-commerce, customer relationship
management, knowledge management, enterprise information portals and online publishing
applications.
This is evidenced by the significant penetration of the technology in a diversity of vertical
markets and has been achieved principally because every market sector needs to manage and
leverage the benefits of unstructured information.
Autonomy was founded in 1996 and has offices in Boston, Chicago, Dallas, San Francisco,
New York, and Washington, D.C. in the United States, as well as offices throughout EMEA,
including Amsterdam, Brussels, Cambridge, Frankfurt, Milan, Paris, Oslo, and Sydney. In July
1998, the company went public on the EASDAQ exchange (EASDAQ:AUTN). Autonomy
floated on The NASDAQ National Market (NASDAQ: AUTNY) in May 2000, and on the London
Stock Exchange (LSE: AU.) in November 2000.
Contact
To contact Autonomy, please get in touch with your nearest location listed below.
Switchboard:
Fax:
for information:
autonomy@autonomy.com
for support:
uksupport@autonomy.com
The Help Desk operates from 9.30 am to 6.00 pm (GMT) Monday to Friday.
Website: www.autonomy.com
USA
Autonomy Inc.
One Market
Spear Street Tower
San Francisco
CA 94105
Help Desk:
Switchboard:
Fax:
for information:
info@us.autonomy.com
for support:
support@us.autonomy.com
The Help Desk operates from 9.30 am to 6.00 pm (CST) Monday to Friday, toll-free.
Website: www.autonomy.com
ii
2.
Enter your Username and Password, and click on the Login button.
3.
4.
Under the Documentation and Release Notes heading, click on the Click here link,
then click on the Manuals folder to display the latest available manual versions. You can
display any of the manuals in your browser and download them.
Note: the manual's version number (for example, version 4.1.x) corresponds to the product
version. The last number of the product version has been replaced with an x for all manuals
as this number relates to minor product releases that have no effect on the documentation. If
a manual has a revision number (for example revision 5), it indicates that this manual has
been revised since it was first released. Automater always contains the latest available
revision of all manuals.
Typographical conventions
Autonomy documentation uses the following typographical conventions.
Formatting convention:
Type of information:
Bold type
Actions
Parameters
Courier font
Configuration examples
<text>
iii
iv
Preface
Related documentation
You should use the IDOL server manual in connection with the following:
IAS manual
The IAS manual contains details on how you can use Autonomys Intelligent Asset
Protection System (IAS) to ensure secure access through authentication and role
permissions.
DiSH manual
The DiSH (Distributed Service Handler) manual contains details on how you can use a
DiSH server to administer and control multiple Autonomy services.
Retina manual
The Retina manual contains details on the setup and usage of the Retina user interface.
Online help
Online help is provided for IDOL servers action commands and configuration parameters.
Please see Displaying online help on page 61 for details on how to display help.
iv
1. Autonomy infrastructure
"Today, 80% of business is conducted on unstructured information." Gartner Group
"85 per cent of all data stored is held in an unstructured format." Butler Group
"Unstructured data doubles every three months." Gartner Group
Information that you need in order to conduct business successfully comprises the following types:
In the past companies could only make use of 20% of the information that was relevant to them. In
order to deal with this information they used keyword search engines, tagging schemes, collaborative
filtering or linguistic methods. These methods were not only costly and time-inefficient but also nonscalable, inaccurate and taking the focus from core business.
80% of relevant information could not be utilized.
Page 1
Autonomy infrastructure
Autonomy's software infrastructure allows you to utilize 100% of the information that is relevant to you.
It automates all the business processes that formerly had to be dealt with manually.
By developing a patented combination of Bayesian Inference, Shannon's information theory and
pattern matching, Autonomy has enabled computers to understand unstructured, structured and semistructured information. This means that Autonomy's software infrastructure solves a fundamental
problem that affects every industry, and can be used in virtually any application that handles
unstructured information:
E-Commerce
CRM
Knowledge Management
Business Intelligence
Online Publishing
Autonomy's software infrastructure is fully scalable and allows you to process information:
automatically
in real time
in any language
Page 2
Autonomy infrastructure
IDOL server
Using Autonomy connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates
unstructured, semi-structured and structured information from multiple repositories through an
understanding of the content, delivering a real time environment in which operations across
applications and content are automated, removing all the manual processes involved in getting the
right information to the right people at the right time.
Connectors
Connectors enable automatic content aggregation from any type of local or remote repository (for
example, a database, a web site, a real-time telephone conversation etc.), forming a unified solution
across all information assets within the organization.
Interfaces
Portlets are windows that can be set up in Autonomy's Portal-in-a-Box or third party portals. Each
portlet contains an application that allows the portals' end users to benefit from a variety of IDOL
server functionality.
Retina, an easy-to-use web interface application that provides a full scale of retrieval methods
that adjust to the individual users proficiency.
Autonomy Desktop Suite brings the power of Autonomy to every desktop. Conducting a realtime analysis of the ideas involved in the content of any opened desktop application, Desktop
Suites ActiveKnowledge or Active Windows Extensions module provides real-time links to
relevant internal and external information without the user being needlessly diverted from his work
in progress to perform an exasperating search or retrieval operation.
Distributed systems
Autonomys distribution solutions facilitate linear scaling of systems through faster command
execution and reduction of processing time
DAH (Distributed Load Handler) enables the distribution of ACI (Autonomy Content
Infrastructure) action commands to multiple Autonomy IDOL servers, providing failover and load
balancing.
DIH (Distributed Index Handler) enables distributed indexing of documents into multiple
Autonomy IDOL servers, providing failover and load balancing.
Page 3
Autonomy infrastructure
Administration
DiSH (Distributed Service Handler) provides crucial maintenance, administration, control and
monitoring functionality for the Autonomy infrastructure. DiSH delivers a unified way to
communicate with all Autonomy services such as connectors, DIH, DAH and so on from a
centralized location
PODS
Autonomys Product Orientated Drop-in Solutions allow Autonomy solutions to be easily integrated
with third party applications and solution providers. PODS enable organizations to make their existing
applications compatible with IDOL with minimal configuration and administration requirements. Making
IDOL server a part of any solution delivers the direct benefits of content automation and the ability to
perform a vast range of IDOL server operations, irrelevant of file format or location.
Page 4
Autonomy infrastructure
Page 5
Autonomy infrastructure
Aggregation & Distribution
Connectors aggregate content from various repositories and index it into IDOL server or, if the content
needs to be distributed across multiple IDOL servers, a DIH (Distributed Index Handler).
Distributed Administration
The DiSH (Distributed Service Handler) enables administrators to maintain, configure and control
multiple Autonomy services via the Autonomy Service Dashboard, a front-end web interface.
Security
The Autonomy IAS (Intellectual Asset Protection System) ensures secure access through
authentication and role permissions. When a user logs on to a front end (for example, Retina or a 3rd
party portal) his authentication details are sent to IDOL server which returns the user's security details
to the front end, where they are stored until the user logs off or his session times out. Every time the
user issues a query, his security details are attached to the query string that is sent to IDOL server.
The group servers store the user group information of repositories that store users in groups. This
allows the front end to quickly retrieve user security information from the group servers, and send the
query and the user's security information to IDOL server in order to check if the user is permitted to
view result documents before they are displayed to the user.
IDOL server passes the user's security details to the security libraries for the data repositories that
contain result documents for the user's query. The security libraries then check the user's security
details against the ACLs for the documents that match the query. If the user is entitled to view a
document, it is returned as a result to the front end.
Page 6
Agents
Alerting
Categorization
Channels
Clustering
Collaboration
Dynamic Thesaurus
Eduction
Expertise
Hyperlinking
Mailing
Profiling
Retrieval
Spelling Correction
Summarization
Taxonomy Generation
Note: your license determines which of these operations your IDOL server installation can perform.
Page 7
Agents
Agents provide the facilities to find and monitor information from a configurable list of Internet
and Intranet sites, News Feeds, Chat Streams and internal repositories that is highly relevant to
the explicit interests of a user. Agents are created in a very user-friendly way using the following
options:
IDOL server provides the conceptual information that is needed to create agents. The server
accepts a piece of content (training text, a document or a set of documents) or reference
(identifier) and returns an encoded representation of the concepts, including each concepts
specific underlying patterns of terms and associated probabilistic ratings.
Users can retrain their agents by submitting a piece of content (training text, a document or a
set of documents) whose concepts the server uses to adapt the agent.
Alerting
IDOL server analyzes data in new documents (when it receives the documents) and compares
the concepts in documents with users agents. If new data matches a users agent, it
immediately notifies the user by email or a third party system (for example by SMS or a pager).
Categorization
IDOL server can automatically categorize data with no requirement for manual input
whatsoever. The flexibility of Autonomys Categorization feature allows you to precisely derive
categories using concepts found within unstructured text. This ensures that all data is classified
in the correct context with the utmost accuracy. Autonomys Categorization feature is a
completely scalable solution capable of handling high volumes of information with extreme
accuracy and total consistency.
Rather than relying on rigid rule based category definitions such as Legacy Keyword and
Boolean Operators, Autonomys infrastructure relies on an elegant pattern matching process
based on concepts to categorize documents and automatically insert tag data sets, route
content or alert users to highly relevant information pertinent to the users profile.
This highly efficient process means that Autonomy is able to categorize upwards of four million
documents in 24 hours per CPU instance, that's approximately one document, every 25
milliseconds. Autonomy hooks into virtually all repositories and data formats respecting all
security and access entitlements, delivering complete reliability.
Category matching
IDOL server accepts a category or piece of content and returns categories ranked by
conceptual similarity. This determines for which categories the piece of content is most
appropriate, so that the piece of content can subsequently be tagged, routed or filed
accordingly.
Page 8
Channels
IDOL server can automatically provide users with a set of hierarchical channels with highly
relevant information pertinent to the respective channel. Eliminating the requirement for manual
intervention or pre-tagging, real-time information is dynamically updated into the channels
automatically, minimizing the maintenance effort required. Moreover, the administrator can add
and remove channels on the fly, without having to re-categorize all of the data.
Clustering
IDOL server can automatically cluster information. Clustering is the process of taking a large
repository of unstructured data, agents or profiles and automatically partitioning the data so that
similar information is clustered together. Each cluster represents a concept area within the
knowledge base and contains a set of items with common properties.
Collaboration
IDOL server automatically matches users with common explicit interest agents or similar
implicit profiles. This information can be used to create virtual expert knowledge groups.
Dynamic Thesaurus
When it executes queries, IDOL server can automatically suggests alternative queries, allowing
users to quickly produce a variety of relevant result sets.
Eduction
Eduction identifies concepts in the document in order to add tags to the kind of content you
specify:
Tag training
Plain Tagging
ConceptValue Tagging
Expertise
IDOL server accepts a natural language or Boolean search string and returns users who own
matching agents or profiles. This allows instant identification of experts in any subjects at hand,
eliminating time consuming searches for specialists, and unnecessary researching of subjects
for which expert knowledge is already available.
Page 9
Hyperlinking
Hyperlinks can be automatically generated in real time. These link to contextually similar
content and can be used to recommend related articles, documents, affinity products or
services, or media content that relates to textual content. Because links are automatically
inserted at the time a document is retrieved, they can include references to documents and
articles written long before, or hyperlinks from archived material can link to the latest news or
material on that subject.
Mailing
IDOL server matches the agents and profiles against its document content in regular intervals,
and automatically notifies users of documents that match their agents and / or profiles by
sending them email.
Profiling
IDOL server automatically creates interest and expertise profiles for users, in real time.
Interest profiles are created by tracking the content that a user views and extracting a
conceptual understanding of it. IDOL server then uses this understanding to keep users
interest profiles up-to-date. Interest profiles can be used to target information on users,
recommend content to users, to alert users to the existence of content and to put users in touch
with other users who have similar interests.
Expertise profiles are created by tracking the content that a user creates and extracting a
conceptual understanding of it. IDOL server then uses this understanding to keep users
expertise profiles up-to-date. Expertise profiles can be used to trace users who are experts in
particular subject areas.
Retrieval
IDOL server offers a range of retrieval methods, from simple legacy keyword search to
sophisticated conceptual querying:
Conceptual matching
IDOL server accepts a piece of content (a sentence, paragraph or page of text, the body of an
e-mail, a record containing human readable information, or the derived contextual information
of an audio or speech snippet) or reference (identifier) as input and returns references to
conceptually related documents ranked by relevance, or contextual distance. This is used to
generate automatic hyperlinks between pieces of content.
Advanced Keyword search
IDOL server matches any term or phrase that appears in quotation marks in its exact prestemmed form.
Page 10
AND
XOR/EOR
WNEAR
NOT
NEAR
BEFORE
OR
DNEAR
AFTER
Exact Phrase
Provides the ability to search for exact phrases by putting quotation marks around a string of
words. For example: world market
Field restrictions
Simple field restrictions within a query's text restrict results to documents that contain specific
values in specific fields.
Field text queries
Field text queries provide a wide range of field specifiers that you can use in order to query
fields, restrict query results or bias query result scores.
Fuzzy queries
If a search string is not quite accurate (for example, if it contains spelling mistakes) a fuzzy
query returns results that contain words that are similar to the entered string. (Note that you
need to enable fuzzy queries before you can use them).
Parametric search
Advanced Parametric Refinement is used to provide an improved user experience coupled with
increased productivity via an advanced real time information discovery process. Real time
navigation across multiple taxonomies is supported with no additional manual configuration
necessary, including full access to intersections of diverse taxonomy definitions.
From among the complete set of field names present within the corpus, a subset of fields can
be defined in the servers configuration as of type "Parametric". These fields are known as
parametric fields.
Once indexed, IDOL server will create and store a structure containing information about all
tag-value pairs that occur within defined parametric fields (tag-value pairs are defined where a
field contains a textual or numerical value and the field name is considered paired to its textual
value). The user may then query IDOL server with the name of a parametric field or fields. IDOL
server returns a list of all textual values that appear within the given field or fields within the
documents stored in the server.
Page 11
Spelling Correction
IDOL server can automatically spell check query text that it receives and suggest correct
spelling for terms that it doesnt contain.
Summarization
IDOL server accepts a piece of content and returns a summary of the information. IDOL server
can generate different types of summary:
Conceptual summaries
Summaries that contain the most salient concepts of the content.
Contextual summaries
Summaries that relate to the context of the original inquiry - allowing the most applicable
dynamic summary to be provided in the results of a given inquiry.
Quick summaries
Summaries that comprise a few sentences of the result documents.
Page 12
Taxonomy Generation
IDOL server's automatic taxonomy generation feature can automatically understand and create
deep hierarchical contextual taxonomies of information. Clustering or any other conceptual
operation can be used as a seed for the process. The resulting taxonomy can be used to
provide insight into specific areas of the information, provide an overall information landscape,
or as training material for automatic categorization, which then allows information to be placed
into a formally dictated and controlled category hierarchy.
Automatic taxonomy based on cluster result
Based on cluster results, IDOL server can use the cluster results to build taxonomies
automatically and in real time.
Automatic taxonomy to category generation
Once the automatic taxonomy generation process has taken place it contextually understands
the type of data it is dealing with. From this, a deep hierarchical contextual taxonomy is
generated known also as an information landscape. Much like the automatic cluster to category
generation, this feature takes the taxonomy results and uses that data to create categories (in
order to perform categorization of information using the Categorization operation).
Page 13
System architecture
General
IDOL server uses the ACI (Autonomy Content Infrastructure) Client API to communicate with custombuilt applications that retrieve data using HTTP commands. This communication is implemented over
HTTP using XML and can adhere to SOAP.
Page 14
Page 15
Security
Text queries
IDOL server contains data that has been aggregated from one or more repositories. In this example
each of the repositories has its own group server which stores the repositories' user names and the
groups that these users belong to. IDOL server aggregates this security information from the group
servers.
When a user logs on to a client his authentication details are sent to IDOL server which returns the
user's security details to the client where they are stored until the client logs off or his session times
out. Every time the user issues a text query from a client, his security details are attached to the query
string that is sent to IDOL server.
Using the security information in the query string, IDOL server checks if the user who has sent the
query is permitted to access the documents that match the query (matching the security string against
the documents' ACLs), and returns all matching documents that the user is permitted to see to the
client.
Page 16
Page 17
DRE 3
IDOL
server
qmethods
IDOL
server
ACI
actions
Concept matching
Agent creation
Agent matching
Agent retraining
Agent alerting
Profile creation
Profile matching
Profile retraining
Profile alerting
Categorization
Summarization
Clustering
Active matching
Retrieval
Page 18
Available functionality
DRE 3
IDOL
server
qmethods
IDOL
server
ACI
actions
Available functionality
Fuzzy queries
Proximity search
XML indexing
Fields printing
Page 19
DRE 3
IDOL
server
qmethods
IDOL
server
ACI
actions
Available functionality
Compound sorting
Result sorting can be based on more than one field at a time.
Agentboolean matching
Boolean queries can be specified within a document, that
must be matched by the query text before it can be returned.
Highlighting
Terms or sentences, within a document or buffer of your
choice, can be highlighted if they satisfy certain criteria.
Restriction by document ID
Results can be restricted to fall in a certain range of
document ID.
Page 20
DRE 3
IDOL
server
qmethods
IDOL
server
ACI
actions
Available functionality
Field printing
IDOL server can return specific fields, all fields or combine
content to a single field.
Sorting
Results can be sorted by any field numerically or
alphabetically, or by document ID, as well as the usual
methods.
Term information
IDOL server can retrieve the total number of occurrences,
the APCM weighting, and document occurrences of any or
all terms, sorted by any of the values.
Result biasing
Results can be biased on a sliding scale of your choice,
according to your value in a certain field.
Page 21
DRE 3
IDOL
server
qmethods
IDOL
server
ACI
actions
Available functionality
Wildcard handling
Wildcard terms can now contain characters in all encodings.
Improved performance
IDOL servers performance is significantly faster than DRE 3
for many cases, including:
Parametric refinement
A parametric search allows you to search for items by their
characteristics (values in certain fields).
Parametric field dependence
IDOL server allows you to find parametric fields that occur
together.
Parametric counts
You can enable parametric count to find out how many
documents contain a specific parametric value.
Spellchecker
IDOL server suggests spelling corrections for misspelled
word in queries.
Query summary
IDOL server returns result documents with a summary that
comprises their best terms and phrases.
Taxonomy generation
IDOL server creates taxonomies and stores them in
categories and/or an XML file.
Page 22
You cannot run IDOL server with restricted file system permissions (for example disk
quotas, file handle limits or memory limits).
Your file system must permit file locking (this means that you cannot run IDOL server
on an NFS mount, for example).
If you are running anti-virus software on the machine that hosts IDOL server, you
should ensure that it doesnt monitor the IDOL server directories as this can have a
serious impact on IDOL servers performance.
Supported platforms
Microsoft Windows NT4, 2000, XP and 2003
Linux (all versions) kernel 2.2, 2.4 and 2.6
Sun Solaris for SPARC versions 5 - 9
Sun Solaris for Intel version 9
AIX version 4.3, 5 and 5.1
HP-UX for PA-RISC version 10, 11 and 11i
HP-UX for Itanium version 11i
Tru64 version 5.1
Note:
if you are installing IDOL server on Solaris, you require the libiconv library file which you
can download from http://www.gnu.org/software/libiconv/.
IDOL server also supports other POSIX UNIX versions on request.
Page 23
Page 24
1.
2.
The installation opens with the Welcome dialog. Read the text and click on Next.
3.
4.
5.
6.
Page 25
If you selected to install IDOL server, the IDOL server Port Settings dialog is displayed.
Enter the following, and click on Next:
ACI Port
The port that client machines use to send action commands to IDOL server. By default this
is 9000.
This entry sets the Port parameter in IDOL servers configuration file.
Index Port
The port that administrative client machines use to index documents into IDOL server (and
to administer IDOL server). By default this is 9001.
This entry sets the IndexPort parameter in IDOL servers configuration file.
Service Port
Enter the port number that IDOL server will use for DiSH communication. By default this is
9002. Note that this port must not be used by any other service.
This entry sets the ServicePort parameter in IDOL servers configuration file.
8.
If you selected to install the DiSH server, the DiSH Server Port Settings dialog is displayed.
Enter the following, and click on Next:
ACI Port
The port that client machines use to send action commands to the DiSH server. By default
this is 20000.
This entry sets the Port parameter in the DiSH configuration file.
Service Port
Enter the port number by which service commands can be sent to DiSH. By default this is
20003. Note that this port must not be used by any other service.
This entry sets the ServicePort parameter in the DiSH configuration file.
9.
Page 26
Page 27
Directory structure
Once the installation of IDOL server is completed and you have started your IDOL server, your
installation directory contains the following files and subdirectories (note that bold font indicates
folders):
AutonomyServiceDashboard
webapps
configuration
idol.cfg
retina.cfg
DiSH
audit
documentTracking
errors
graphs
license
logs
Folder that contains log files for each configured log stream.
queue
uid
DISH.exe
DiSH executable.
DISH.cfg
license.log
licensekey.dat
service.log
Page 28
IDOL
agentstore
category
category
cluster
2DMAPS
Folder that contains the 2D maps in gif format that have been
generated from clusters using the ClusterServe2DMap action.
CLUSTEREXPORT
CLUSTERS
SGCLUSTDOCS
SGDATA
SGPICCACHE
Folder that caches the images (in gif format) that have been
generated from data sets using the ClusterSGPicServe action.
SNAPSHOTS
imex
license
queue
taxonomy
uid
Page 29
community
license
queue
temp
uid
users
content
dynterm
license
main
nodetable
numeric
Folder that contains memory mapped files for fast field text
operation on numeric fields.
queue
refindex
status
tagindex
uid
indextasks
incoming
langfiles
dic
logs
Folder that contains log files for each configured log stream.
modules
queue
Page 30
templates
uid
<InstallationName>.cfg
<InstallationName>.exe
<InstallationName>.log
<InstallationName>cfg.log
license.log
service.log
IDOLUninstallerData
resource
Uninstall.exe
jre
UninstallerData
resource
webapps
IDOLserver_InstallLog.log
Page 31
1.
2.
The Welcome text is displayed. Read the text and press Enter.
3.
4.
IDOL Server
Installs Autonomy IDOL server.
Alternativelt, if you dont want to install some of the components, enter a comma-separated list of
the components that you do not want to install, and press Enter.
5.
Page 32
If you selected to install IDOL server, the IDOL server Port Settings are displayed.
Enter a number and press Enter for each of the following:
ACI Port
The port that client machines use to send action commands to IDOL server. By default this
is 9000.
This entry sets the Port parameter in IDOL servers configuration file.
Index Port
The port that administrative client machines use to index documents into IDOL server (and
to administer IDOL server). By default this is 9001.
This entry sets the IndexPort parameter in IDOL servers configuration file.
Service Port
Enter the port number that IDOL server will use for DiSH communication. By default this is
9002. Note that this port must not be used by any other service.
This entry sets the ServicePort parameter in IDOL servers configuration file.
7.
If you selected to install the DiSH server, the DiSH Server Port Settings are displayed.
Enter a number and press Enter for each of the following:
ACI Port
The port that client machines use to send action commands to the DiSH server. By default
this is 20000.
This entry sets the Port parameter in the DiSH configuration file.
Service Port
Enter the port number by which service commands can be sent to DiSH. By default this is
20003. Note that this port must not be used by any other service.
This entry sets the ServicePort parameter in the DiSH configuration file.
8.
9.
Page 33
Page 34
Directory structure
Once the installation of IDOL server is completed and you have started your IDOL server, your
installation directory contains the following files and subdirectories (note that bold font indicates
folders):
AutonomyServiceDashboard
webapps
configuration
idol.cfg
retina.cfg
DiSH
audit
documentTracking
errors
graphs
license
logs
Folder that contains log files for each configured log stream.
queue
uid
DISH.exe
DiSH executable.
DISH.cfg
license.log
licensekey.dat
service.log
Page 35
IDOL
agentstore
category
category
cluster
2DMAPS
Folder that contains the 2D maps in gif format that have been
generated from clusters using the ClusterServe2DMap action.
CLUSTEREXPORT
CLUSTERS
SGCLUSTDOCS
SGDATA
SGPICCACHE
Folder that caches the images (in gif format) that have been
generated from data sets using the ClusterSGPicServe action.
SNAPSHOTS
imex
license
queue
taxonomy
uid
Page 36
community
license
queue
temp
uid
users
content
dynterm
license
main
nodetable
numeric
Folder that contains memory mapped files for fast field text
operation on numeric fields.
queue
refindex
status
tagindex
uid
indextasks
incoming
langfiles
dic
logs
Folder that contains log files for each configured log stream.
modules
queue
Page 37
templates
uid
<InstallationName>.cfg
<InstallationName>.exe
<InstallationName>.log
<InstallationName>cfg.log
license.log
service.log
IDOLUninstallerData
jre
UninstallerData
webapps
IDOLserver_InstallLog.log
Page 38
Please refer to your Retina manual for details on how to deploy Retina.
Page 39
Licensing
The licensing that enables you to run Autonomy solutions is facilitated by an Autonomy DiSH server.
You must have a running Autonomy DiSH server that resides on a machine with a static known IP
address, MAC address or Volume Name.
To obtain a license, you need to contact Autonomy Support and request a license file for your specific
installation. This license file is tied to the IP address and ACI port of your DiSH server, and cannot be
transferred between machines. When you receive this file from Autonomy Support save it as
licensekey.dat to the DiSH subdirectory of your IDOL server installation.
Note that you can revoke licenses at any time, for example, if you want to re-allocate them to different
clients or if you want to change a client's IP address.
Important
You MUST NOT:
change the IP address of the machine on which a licensed module is running (if you are using
an IP address to lock your license).
change the service port of a module without first revoking the license.
replace the network card of a client without first revoking the license.
Page 40
Page 41
You can send the following command from a web browser to the running DiSH server in order to check
for free licenses.
http://<DiSH_host>:<DiSH_ACI_port>/action=LicenseInfo
In response to this command DiSH returns the requested license information. In the following example,
one IDOL server license is available for allocation to a client:
<autn:Product>
<autn:ProductType>IDOLSERVER</autn:ProductType>
<autn:Client>
<autn:IP>192.123.51.23</autn:IP>
<autn:ServicePort>1823</autn:ServicePort>
<autn:IssueDate>1063192283</autn:IssueDate>
<autn:IssueDateText>10/09/2003 12:11:23</autn:IssueDateText>
</autn:Client>
<autn:TotalSeats>2</autn:TotalSeats>
<autn:SeatsInUse>1</autn:SeatsInUse>
</autn:Product>
Page 42
<DiSH_host>
The IP address of the machine on which DiSH resides.
<DiSH_ACI_port>
The ACI port of DiSH (this must be the Port specified in the DiSH configuration file's [Server]
section).
<product_type>
The product type of the Autonomy solution whose license you want to revoke from the
inaccessible client.
<client_host>
The IP address of the inaccessible client.
<client_service_port>
The port by which service commands are sent to the Autonomy solution on the inaccessible
client (this is set by the ServicePort parameter in the Autonomy module configuration file's
[Service] section).
Page 43
Error : failed to decrypt license keys. Please contact Autonomy support. Error code is
SERVICE:<ERROR_CODE>
Contact Autonomy Support and provide them with the exact error message and your license file.
Error : failed to update the license from the license server. Shutting down
Failed to retrieve a license from the DiSH server or from the backup cache. Ensure that your DiSH
server can be contacted.
Error : your license keys are invalid. Please contact Autonomy support. Error code is
SERVICE:<ERROR_CODE>
Your license keys appear to be corrupt. Contact Autonomy Support and provide them with the exact
error message and your license file.
Page 44
Failed to revoke license from server. An instance of this application is already running. Please
stop the other instance first
You cannot revoke a license from a running service. Stop the service and try again.
Your license keys are invalid. Please contact Autonomy support. Error code is ACISERVER:<ERROR_CODE>
Failed to retrieve a license from the DiSH server. Contact Autonomy Support and provide them with
the exact error message and your license file.
Page 45
Hardware
IDOL servers performance depends on your system hardware (operating system, disk,
system memory and so on).
Document types
IDOL server requires less space for storing documents with high image content than for
documents that contain a lot of textual information.
Query types
Field text or parametric queries, for example, require more processing power than a simple
text query.
Query performance
Your system architecture is dependent on your requirement for response time and the
number of concurrent users.
For specific sizing requirements, please consult the Autonomy Sizing Service (you can contact the
Autonomy Sizing Service via Autonomy Support).
Page 46
Note: to set up a distributed IDOL server installation, you need a license for each IDOL server instance
that you want to install.
1.
Install the IDOL server that you want to be the main operations server. During the installation,
select to install all the Autonomy solutions that the installer comprises.
2.
Install the IDOL server that you want to be the data server. Only install the IDOL server (not DiSH
or web applications) and point it to the DiSH that you have installed with the main operations IDOL
server.
3.
4.
Open the configuration file of the Data IDOL server in a text editor.
5.
Check if the configuration file contains any of the following sections, and remove them if they are
present (these are sections that are not relevant to data storage operations):
[User]
[UserSecurityFields]
[UserSecurity]
Including all its subsections:
[Autonomy]
[NT]
[Notes]
[LDAP]
[Documentum]
[Exchange]
[Netware]
[Role]
Page 47
7.
Open the configuration file of the Main operations IDOL server in a text editor.
8.
Check if the configuration file contains any of the following sections, and remove them if they are
present (these are sections that are not relevant to data storage operations):
[TermCache]
[IndexCache]
[SectionBreaking]
Databases]
[Database<N>]
[FieldProcessing]
Including all its subsections:
[SetIndexFields]
[SetIndexAndWeightHigher]
[SetSectionBreakFields]
[SetDateFields]
[SetDatabaseFields]
[SetReferenceFields]
[SetTitleFields]
[SetHighlightFields]
[SetSourceFields]
[DetectNT_V4Security]
[DetectNotes_V4Security]
[DetectNetware_V4Security]
[DetectExchange_V4Security]
Page 48
Add a [DataDRE] section to the configuration file, and use the Host and ACIPort settings to
specify the location and port of the Data IDOL server. This allows the Main operations IDOL
server to communicate with the Data IDOL server.
For example:
[DataDRE]
Host=1.23.45.6
ACIPort=9000
Page 49
DRE4 or higher
UAServer 4 or higher
Export data
2.
3.
4.
5.
2.
Check if the template lists all the fields that you want to export. If it does not, you need to
set up each field that you want to export in addition.
Page 50
Save the template file as contentbody.txt (if your templates directory already contains a
contentbody.txt file, you need to delete this first).
4.
5.
Stop the DRE after the command has finished (you can check this using
http://<DRE_host>:<ACI_Port>/action=GetRequestLog).
Page 51
Stop the DRE after the command has finished (you can check this using
http://<DRE_host>:<ACI_Port>/action=GetRequestLog).
If you indexed IDX files or a mix of IDX and XML files into your IDOL server Suir
Note: the method outlined in the following steps ensures that the sections into which your data has
been indexed are preserved. If you do not use sectioning or it is not important to you, it is
recommended that you use the List action (see If you indexed XML files into IDOL server
Suir on page 53) to export your data instead. The List action will change the number of sections
that IDOL server contains once the data has been transferred.
1.
2.
Check if the template lists all the fields that you want to export. If it does not, you need to
set up each field that you want to export in addition.
Set up each additional field that you want to export before the #DRECONTENT field,
using the following format:
#DREFIELD MyField="<!-- ATNMY_FIELD PREFIX="" SUFFIX="" MATCH = "*/
MyField" TRIM="0" ESCAPE="0" -->"
For example, to set up an AUTHOR field:
#DREFIELD AUTHOR="<!-- ATNMY_FIELD PREFIX="" SUFFIX="" MATCH = "*/
AUTHOR" TRIM="0" ESCAPE="0" -->"
3.
Save the template file as contentbody.txt (if your templates directory already contains a
contentbody.txt file, you need to delete this first).
4.
Page 52
Stop IDOL server Suir after the command has finished (you can check this using
http://<DRE_host>:<ACI_Port>/action=GetRequestLog).
2.
Stop the IDOL server Suir after the command has finished (you can check this using
http://<IDOL server_host>:<ACI_Port>/action=GetRequestLog).
Page 53
to:
CATEGORY
IDOL\category\category
CLUSTER
IDOL\category\cluster
TAXONOMY
IDOL\category\taxonomy
LAUNE\LAUNE\CATEGORY
IDOL\category\category
LAUNE\LAUNE\CLUSTER
IDOL\category\cluster
LAUNE\LAUNE\TAXONOMY
IDOL\category\taxonomy
to:
UAServer\data
IDOL\community\users
NORE\NORE\data
IDOL\community\users
Page 54
2.
3.
Stop IDOL server once the process has finished (you can check this using
http://<IDOL server_host>:<ACI_Port>/action=IndexerGetStatus).
2.
3.
Stop IDOL server once the process has finished (you can check this using
http://<IDOL server_host>:<ACI_Port>/action=IndexerGetStatus).
Page 55
1.
2.
If you executed Step 2 to copy categories, taxonomies or clusters to IDOL server 5, issue the
following command from your web browser in order to index all your categories:
http://<host>:<ACI_port>/action=CategorySyncCatDre
<host>
Enter the IP address (or name) of the machine on which IDOL servers Category index is
located.
<ACI_port>
Enter the port number by which action commands are sent to IDOL server (this is specified by
the Port setting in the IDOL server configuration file's [Server] section).
3.
If you executed Step 3 to copy users to IDOL server 5, issue the following command from your
web browser in order to index all your users agents and profiles.
http://<host>:<ACI_port>/action=Index
<host>
Enter the IP address (or name) of the machine on which IDOL servers Agent index is located.
<ACI_port>
Enter the port number by which action commands are sent to IDOL server (this is specified by
the Port setting in the IDOL server configuration file's [Server] section).
4.
You have now finished upgrading and can run your IDOL server.
Page 56
Requesting support
Check that content has been moved successfully to IDOL server 5. If IDOL server does not behave as
expected, please check your log files and contact Autonomy Support.
2.
Enter your Username and Password, and click on the Login button.
3.
Click on the New Request menu option and issue your ticket (see
http://automater.autonomy.com/helpdesk/help/Submitting_a_new_support_request.htm
for details).
uksupport@autonomy.com
USA:
support@us.autonomy.com
Page 57
Page 58
2.
Page 59
1.
2.
Select the <InstallationName>IDOL server service, and click on the Stop button to stop
IDOL server.
3.
Page 60
http://<host>:<port>/action=Help
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the ACI port by which commands are sent to IDOL server (this is specified by the Port
setting in the IDOL server configuration file's [Server] section).
Example:
http://12.3.4.56:4000/action=Help
This command uses port 4000 to request Help on action commands from IDOL server which is located
on a machine with the IP address 12.3.4.56.
Note: to display help on configuration settings, click on the config help link in the top right-hand
corner (see Displaying help on configuration settings on page 389).
Page 61
http://<host>:<port>/action=<action>&<mandatory_parameters>&<optional_parameters>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the ACI port by which commands are sent to IDOL server (this is set by the Port
parameter in the IDOL server configuration file's [Server] section).
<action>
Enter the name of the action that you want IDOL server to execute (for example, Query).
<mandatory_parameters>
Enter the parameters that the action that you have specified requires (not all actions require
parameters).
<optional_parameters>
You can enter optional parameters for the action that you have specified (optional parameters
are not available for all actions).
Page 62
You can also configure IDOL server to process documents that it receives (for example from an
Autonomy connector) before it indexes them. You can set up a simple process by configuring IDOL
server to execute a single task on incoming documents, or set up a complex process by configuring
IDOL server to combine a number of tasks.
The available tasks allow you to do one or more of the following:
categorize documents
index documents
Page 63
Storing content
Disabling content storage
If you dont require IDOL server to return the content of fields or summaries with results, you can set
NodeTableStoreContent in IDOL server configuration files [Server] section to false in order to save
the memory that the storing of fields normally requires.
If you disable content storage, the performance of the following actions is affected:
GetContent
GetTagValues
Disabled.
List
Query
Only the references and the title of results are returned. You cannot
restrict by fields.
Suggest
Only the references and the title of results are returned. You cannot
restrict by fields.
SuggestOnText
Only the references and the title of results are returned. You cannot
restrict by fields.
Summarize
TermGetBest
IDOL server saves a documents best terms on indexing. These are the
only terms available.
Page 64
2.
3.
Create a section for the database field identifying process, in which you create a property for the
process (a property is later defined by one or more applicable configuration parameters). Identify
the fields that you want to associate with the process.
You can use the PropertyMatch parameter to identify a specific value that fields must have in
order to be processed.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
PropertyMatch=*myString*
[MySecondProcess]
Property=MySecondProperty
PropertyFieldCSVs=*/MyOtherField,*/MyOtherSecondField
[DatabaseFields]
Property=Database
PropertyFieldCSVs=*/DREDBNAME,*/DB,*/Database
4.
Page 65
Create a section for your indexing property in which you set the DatabaseType parameter to true.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[Database]
DatabaseType=TRUE
6.
Save IDOL servers configuration file and restart your IDOL server in order to execute your
changes.
Page 66
Index fields
Store fields that contain text which you want to query frequently as Index fields. Index fields
are processed linguistically when they are stored in IDOL server. This means that stemming
and stoplists are applied to text in Index field before they are stored, which allows IDOL
server to process queries for these fields more quickly (typically DRETITLE and
DRECONTENT are fields that should be set up as Index fields).
You should not store URLs or content that you are unlikely to use in Index fields. You should
also not store fields as Index fields that will be queried frequently but whose value is only ever
going to be queries in its entirety. It is more efficient to query such values using a field
specifier (for example, MATCH).
Indexing all fields in documents could potentially slow down the indexing process, increase
disk usage and requirements.
See Index fields on page 285 for details on how to set up Index fields.
numeric fields
Store fields that contain numerical values or dates as numeric fields and numeric date fields.
When these fields are indexed, IDOL server stores them in a fast-look-up table in memory
which enables it to return the fields more quickly.
See Numerical fields on page 289 and NumericDateType fields on page 287 for details on
how to set up numeric and numeric date fields.
Page 67
FieldCheckType fields
If a large number of the documents that you want to store in IDOL server contains a field
whose entire value will frequently be used to restrict results, you should store this field as a
FieldCheckType field. When this field is indexed, IDOL server stores it in a fast-look-up table
in memory which enables it to return the field more quickly.
See FieldCheckType fields on page 291 for details on how to set up FieldCheckType fields.
ordinary fields
By default IDOL server stores all fields that are not identified as special fields as ordinary
fields.
Note: you can query all stored fields using field specifiers in field text queries (see Field text
queries on page 199). Index fields can also be queried using text queries.
Page 68
*/<tag_name>/_ATTR_<attribute_name>
<tag_name>
Enter the name of the tag.
<attribute_name>
Enter the name of the attribute you want IDOL server to read.
For example:
Page 69
2.
3.
Create a section for the indexing process, in which you create a property for the process (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the processes.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
PropertyMatch=*myString*
[IndexingFields]
Property=IndexFields
PropertyFieldCSVs=*/FIELD/_ATTR_ANIMAL,*/FIELD/_ATTR_COLOR,*/
ROOM/_ATTR_Name,*/ITEM/_ATTR_Type
4.
5.
Create a section for your indexing property in which you set the Index parameter to true.
For example:
[MyFirstProperty]
HiddenType=true
[IndexFields]
Index=true
6.
Save IDOL servers configuration file and restart your IDOL server in order to execute your
changes.
Page 70
Page 71
Optimizing indexing
The speed of the indexing process is usually less critical than the speed of the query process,
however, with large amounts of data being indexed into IDOL server, it is still important to improve the
efficiency of the process where possible. In addition, the way you configure the indexing process can
have effects on the efficiency of the query process.
IDOL server creates a representation of the new data in the index cache.
2.
The cache is synchronized with data that IDOL server currently contains, and the new data is
stored on disk and removed from the index cache.
When you are scheduling indexing, you should consider this chapter's recommendations on IDOL
server content (particularly on selecting fields to be indexed), and on running indexing and querying
processes at different times. In addition, the delayed synchronization feature allows you to change the
stage at which the index cache is synchronized with IDOL server, depending on whether your priority is
achieving fast query speeds or making new information available to the user as quickly as possible.
Delayed synchronization
The delayed synchronization feature allows you to select how the index cache is synchronized with
IDOL servers data. This is useful in systems where indexing tasks are scheduled at times when IDOL
server is also handling queries.
By default, synchronization is occurs as soon as a representation of data has been made in the index
cache. New data is available to the user (as query results) quickly, so you should use this setting in
systems where up-to-date data is the priority. But synchronization uses resources that IDOL server
could otherwise use for querying. Delayed synchronization reduces the impact of this effect by
collecting multiple data representations in the index cache and then synchronizing them all with IDOL
server's data in one go. This is useful in systems where query speed is more important than having upto-date data.
Note: delayed synchronization is recommended if you are indexing a lot of small files (files that are
smaller than 100MB).
The following parameter in the [Server] section of IDOL server's configuration file allows you to specify
whether the indexing process uses delayed synchronization:
DelayedSync
Enter true if you want IDOL server to delay synchronization. If you set DelayedSync to true,
IDOL server only stores data on disk when:
the index cache contains some data and the time out specified by MaxSyncDelay has
expired
Page 72
Alert
to alert users to new documents that IDOL server has received, if these
documents are similar to agents that the users own.
Cat
Educe
FieldOp
FileWriter
HTTP
to send an HTTP call out to a web interface (for example, you can connect to a
third-party web application in order to store your data on a legacy SQL database).
LP
OCR
if you want to combine multiple tasks, you can use Route tasks in order to specify
conditions that determine which task IDOL server executes next (you can, for
example, use a Route task to route documents that IDOL server receives to an
OCR task or a Cat task).
Index
if you want to index data that has been processed by tasks into IDOL server, you
need to use an Index task.
Page 73
2.
In the [Server] section, use the StartTask parameter to specify the first task that you want IDOL
server to execute on incoming data.
3.
Create a section for the specified StartTask and for any other task that you want IDOL server to
execute before indexing.
Note:
you can give each task section a name of your choice; the type of task that each section
contains is identified by the Module parameter.
refer to the online help for details on which settings are available for the different task
types (see Displaying help on configuration settings on page 389).
you can use the NextTask and OnFailureTask settings to determine which task IDOL
server executes next, after it has carried out a task.
you can set up a Route task which allows you to direct data to appropriate processes
depending on fields that the data contains.
set up an Index task if you want to index data into IDOL servers Data index after it has
been processed.
4.
5.
Page 74
In this example, IDOL server is instructed to execute the MyACITask on any documents that it
receives.
The MyACITask automatically generates titles for incoming documents. Every time IDOL server
performs this task on an incoming document, it executes a Summarize action with the parameter
Summary set to Concept, the parameter Sentences set to 1 and the parameter Text set to the
content of the incoming document's DRETITLE field:
action=Summarize&Summary=Concept&Sentences=1&Text=<DRECONTENT_value>
When the action returns its result, IDOL server creates a DRETITLE field in the document and uses it
to store the content of the result's autn:summary field.
Configuration:
[Server]
StartTask=MyACITask
[MyACITask]
Module=ACI
Action=Summarize
Params=Summary,Sentences
Values=Concept,1
Fields=DRECONTENT
ReMapToFields=Text
XMLPaths=autnresponse/responsedata/autn:summary
XMLFieldNames=DRETITLE
NextTask=MyIndexTask
[MyIndexTask]
Module=Index
Page 75
In this example, IDOL server is instructed to execute the MyACITask on any documents that it
receives.
The MyACITask automatically generates titles for incoming documents. Every time IDOL server
performs this task on an incoming document, it executes a Summarize action with the parameter
Summary set to Concept, the parameter Sentences set to 1 and the parameter Text set to the
content of the incoming document's DRETITLE field:
action=Summarize&Summary=Concept&Sentences=1&Text=<DRECONTENT_value>
When the action returns its result, IDOL server creates a DRETITLE field in the document and uses it
to store the content of the result's autn:summary field. The document is then indexed into a different
IDOL server specified using the IdolServer setting. This servers index port is requested via its ACI
port, so that the document to be indexed can automatically be routed to the correct index port.
Configuration:
[Server]
StartTask=MyACITask
[MyACITask]
Module=ACI
Action=Summarize
Params=Summary,Sentences
Values=Concept,1
Fields=DRECONTENT
ReMapToFields=Text
XMLPaths=autnresponse/responsedata/autn:summary
XMLFieldNames=DRETITLE
NextTask=MyIndexTask
Page 76
Page 77
In this example, IDOL server is instructed to execute the MyCatTask on any documents that it
receives.
The MyCatTask matches documents that it receives against categories that IDOL servers Category
index contains and returns matching categories. It then tags the incoming documents according to
which categories they match, and forwards them to the MyHTTPTask.
The MyHTTPTask maps incoming documents Category and DRECONTENT fields to equivalent
fields in a SQL database, and sends the following http call via a web interface to this SQL database:
http://sqlengine/insert?Category=<Category field value>&Content=<Content field value>
This http call stores the content of the specified fields in the SQL database.
Configuration:
[Server]
StartTask=MyCatTask
[MyCatTask]
Module=Cat
TextFields=DRECONTENT
TagField=CategoryTag
NextTask=MyHTTPTask
[MyHTTPTask]
Module=HTTP
URL=http://sqlengine/insert
Fields=Category,DRECONTENT
RemapToFields=Category,Content
Page 78
In this example, IDOL server is instructed to execute the MyRouteTask on any documents that it
receives.
The MyRouteTask checks if incoming documents contain an OCR field. If they do, the documents are
forwarded to the MyOCRTask, otherwise they are forwarded to the MyCatTask.
The MyOCRTask evaluates the quality of the files that contain an OCR field. Files whose quality is
satisfactory are forwarded to the MyIndexTask. Files whose quality is unsatisfactory are forwarded to
the MyFileWriterTask which writes the files to disk.
The MyCatTask matches documents that it receives from the MyRouteTask against categories that
IDOL servers Category index contains and returns matching categories. It then tags the incoming
documents according to which categories they match, and forwards them to the MyIndexTask.
The MyIndexTask indexes the files it receives into IDOL servers Data index.
Configuration:
[Server]
StartTask=MyRouteTask
[MyRouteTask]
Module=Route
Condition=Exists
Parameter1=OCR
OnTrueTask=MyOCRTask
OnFalseTask=MyCatTask
Page 79
Page 80
In this example, IDOL server is instructed to execute the MyFirstRouteTask on any documents that it
receives.
The MyFirstRouteTask checks if incoming documents contain an OCR field. If they do, the
documents are forwarded to the MyOCRTask, otherwise they are forwarded to the
MySecondRouteTask.
The MyOCRTask evaluates the quality of the files that contain an OCR field. Files whose quality is
satisfactory are forwarded to the MyIndexTask. Files whose quality is unsatisfactory are forwarded to
the MyFileWriterTask which writes the files to disk.
The MySecondRouteTask checks if documents that it receives are BIF files (for example, by
checking if they contain a BIF field). If they are, the documents are forwarded to the MyLPTask,
otherwise they are forwarded to the MyCatTask.
The MyLPTask converts the legacy profiles in the BIF files that it receives from the
MySecondRouteTask, and stores them in IDOL servers Category index.
The MyCatTask matches documents that it receives from the MySecondRouteTask against
categories that IDOL servers Category index contains and returns matching categories. It then tags
the incoming documents according to which categories they match, and forwards them to the
MyIndexTask.
The MyIndexTask indexes the files it receives into IDOL servers Data index.
Page 81
Page 82
using a connector
The Autonomy connectors (for example, File System Fetch, HTTPFetch, Oracle Fetch
and so on) allow you to retrieve documents from different repositories and import them
into IDX file format only. Please refer to the appropriate connector manual for further
information on how to import documents.
manually
You can create a text file in XML or IDX format (see Appendix D: manually creating
IDX files on page 431), which contains the information that you want to index into your
IDOL server in specific IDOL server fields.
Once documents have been imported into XML or IDX file format, you can index them into IDOL
server:
using a connector
The Autonomy connectors allow you to index the IDX files that they have created into the
IDOL server that they connect to. Please refer to the appropriate connector manual for
further information on how to index documents.
directly
You can index XML and IDX files into an IDOL server using an HTTP request that you
can issue from your web browser.
Note: depending on where the data that IDOL server indexes is located, the indexing process takes
place in the following order:
IDOL server indexes a locally accessible
file:
1.
1.
2.
2.
3.
4.
3.
Page 83
Index commands
Index commands are used by Autonomy connectors to index data into IDOL server. You can also use
them to directly index data into IDOL server.
Note: before you index data into IDOL server, you should consider the points outlined in Storing
content in IDOL server on page 83.
<file_name> or <path>
DREDbName=<database_name>
Optional:
ACLFields=<ACL_fields>
CantHaveFields=<forbidden_fields>
DatabaseFields=<database_fields>
DateFields=<date_fields>
Delete
DocumentDelimiters=<doc_delimiters>
DocumentFormat=<doc_format>
ExpiryDateFields=<expiry_date_fields>
FlattenIndexFields=<fields>
IDXFieldPrefix=<prefix>
IndexFields=<index_fields>
KeepExisting=<true/false>
KillDuplicates=<kill_duplicates_option>
LanguageFields=<language_fields>
Page 84
LanguageType=<language_type>
MustHaveFields=<required_fields>
SectionFields=<section_fields>
SecurityFields=<security_fields>
SecurityType=<security_type>
TitleFields=<title_fields>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<file_name>
The IDX or XML file that you want to index.
<path>
The full path to the IDX or XML file that you want to index.
DREDbName=<database_name>
The IDOL server database into which you want the document to be indexed. You dont need to
specify this, if you your IDX or XML files already contain a database field (IDOL server is by default
configured to read from this field which database files should be indexed into).
<optional_parameters>
You can enter one or more of the following parameters (note that you must separate individual
parameters with an ampersand):
ACLFields=<ACL_fields>
Allows you to specify the fields in the document from which you want IDOL server to read ACLs
(Access Control Lists).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
Page 85
CantHaveFields=<forbidden_fields>
Allows you to specify the fields in XML documents that are discarded before the documents is
indexed. By default all fields are stored in IDOL server.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&CantHaveFields=*/StandardHeader
In this example, any StandardHeader fields that a document contains are discarded before the
document is indexed.
DatabaseFields=<database_fields>
Allows you to specify the fields in the document that contain the name of the database in which
you want the document to be stored.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&DatabaseFields=Document/DREDBName,*/myDB
In this example, IDOL server indexes the document into the database with the name that is
contained in any DREDBName field below the Document level and with the name that is
contained in any fields called myDB.
Page 86
DocumentDelimiters=<doc_delimiters>
Allows you to specify the fields in a file that indicates the beginning and end of a document, so
the documents are indexed individually. Make sure that document delimiters are not nested.
If you want to specify multiple fields, you must separate them with commas (there must be no a
comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&DocumentDelimiters=*/DOCUMENT,*/SPEECH
In this example, the beginning and end of individual documents in a file is marked by opening
and closing DOCUMENT and SPEECH tags.
DocumentFormat=<doc_format>
If a document that you are indexing has an ambiguous format that IDOL server cannot easily
identify as XML or IDX, DocumentFormat allows you to specify the format of the file. Enter
XML or IDX.
Page 87
Page 88
IDXFieldPrefix=<prefix>
When you index an IDX file it is transformed into XML by placing it under the Document
subtree (each of the IDX file's fields is prefixed with Document, so that a simple XML hierarchy
is constructed). If you don't want this subtree to be called Document, IDXFieldPrefix allows
you to specify an alternative name.
IndexFields=<index_fields>
Allows you to specify the fields in the document that you want to index explicitly into IDOL
server. Indexing fields explicitly optimizes the query process when you restrict queries using
these fields. Index fields should hold data that is particularly significant to you (for example the
title of the document), and that you are likely to use frequently in order to restrict queries.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&IndexFields=*/DRECONTENT,*/DRETITLE
In this example, the DRECONTENT and DRETITLE field in documents are explicitly indexed
into IDOL server.
KeepExisting=<true/false>
If you have set KillDuplicates to Reference, ReferenceMatch<N> or <FieldName>, you can
set KeepExisting to true if you want IDOL server to discard the document it has received for
indexing and keep the matching document that it already contains instead.
Page 89
Page 90
LanguageType=<language_type>
Allows you to specify the language type of documents (if the document does not contain fields
from which IDOL server can read the language type of the document).
For example:
&LanguageType=myEnglish
In this example, the file is indexed with the language type myEnglish. The way IDOL server
handles this language type is determined by the way it has been defined in IDOL server's
configuration file (that is by the settings that you have associated with this language type in the
configuration file).
MustHaveFields=<required_fields>
Allows you to specify the fields in a document (IDX only) that are stored in IDOL server. By
default all fields are stored in IDOL server. Document fields that are not listed are discarded
which means that they cannot be queried or printed.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&MustHaveFields=*/DRECONTENT,*/DRETITLE
In this example, IDOL server only stores a document's DRECONTENT and DRETITLE fields.
Page 91
SecurityFields=<security_fields>
Allows you to specify the fields in the document that contain the security type of the document.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&SecurityFields=Document/DRESecurity,*/mySecurity
In this example, IDOL server reads the security type of documents from any DRESecurity field
below the Document level and any mySecurity fields.
SecurityType=<security_type>
Allows you to specify the security type of documents (for example, if the document does not
contain fields from which IDOL server can read the security type of the document).
For example:
&SecurityType=mySecurity
In this example, the file is indexed with the security type mySecurity. The way IDOL server
handles this security type is determined by the way it has been defined in IDOL server's
configuration file (that is by the settings that you have associated with this security type in the
configuration file).
Page 92
Page 93
Command parameters:
Mandatory:
<data>
Optional:
ACLFields=<ACL_fields>
CantHaveFields=<forbidden_fields>
DatabaseFields=<database_fields>
DateFields=<date_fields>
Delete
DocumentDelimiters=<doc_delimiters>
DocumentFormat=<doc_format>
DREDbName=<database_name>
ExpiryDateFields=<expiry_date_fields>
FlattenIndexFields=<fields>
IDXFieldPrefix=<prefix>
IndexFields=<index_fields>
KeepExisting=>true/false>
LanguageFields=<language_fields>
LanguageType=<language_type>
MustHaveFields=<required_fields>
SectionFields=<section_fields>
SecurityFields=<security_fields>
SecurityType=<security_type>
TitleFields=<title_fields>
Page 94
<killduplicates_option>
NONE
REFERENCE
REFERENCEMATCH<N>
<FieldName>
<data>
The data that you want to index. This has to be in IDX or XML format.
<optional_parameters>
You can enter one or more of the following parameters (note that you must separate individual
parameters with an ampersand):
ACLFields=<ACL_fields>
Allows you to specify the fields in the document from which you want IDOL server to read ACLs
(Access Control Lists).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&ACLFields=*/AUTONOMYMETADATA
In this example, IDOL server reads ACLs from any fields that are called
AUTONOMYMETADATA.
CantHaveFields=<forbidden_fields>
Allows you to specify the fields in XML data that are discarded before the data is indexed.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&CantHaveFields=*/StandardHeader
In this example, any StandardHeader field that a document contains is discarded before the
data is indexed.
Page 95
Page 96
DocumentFormat=<doc_format>
If data that you are indexing has an ambiguous format that IDOL server cannot easily identify
as XML or IDX, DocumentFormat allows you to specify the format of the data. Enter XML or
IDX.
DREDbName=<database_name>
Allows you to specify the IDOL server database into which you want the data to be indexed.
ExpiryDateFields=<expiry_date_fields>
Allows you to specify the fields in the data that contain the expiry date of the data (that is the
date when the data is deleted, unless you have set ExpireIntoDatabase in IDOL server's
configuration file to move the data to another database).
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&ExpiryDateFields=Document/DREExpiryDate,*/myExpiryDate
In this example, IDOL server reads the expiry date from any DREExpiryDate field below the
Document level and from any fields called myExpiryDate.
Page 97
IDXFieldPrefix=<prefix>
When you index IDX data it is transformed into XML by placing it under the Document subtree
(each of the IDX file's fields is prefixed with Document, so that a simple XML hierarchy is
constructed). If you don't want this subtree to be called Document, IDXFieldPrefix allows you
to specify an alternative name.
Page 98
Page 99
Page 100
SecurityType=<security_type>
Allows you to specify the security type of documents (for example, if the document does not
contain fields from which IDOL server can read the security type of the document).
For example:
&SecurityType=mySecurity
In this example, the file is indexed with the security type mySecurity. The way IDOL server
handles this security type is determined by the way it has been defined in IDOL server's
configuration file (that is by the settings that you have associated with this security type in the
configuration file).
TitleFields=<title_fields>
Allows you to specify the field in the document from which you want IDOL server to read the
document's title. If a document contains several of these fields, IDOL server reads its title from
the first field it finds in the document.
If you want to specify multiple fields you must separate them with commas (there must be no
space before or after a comma). You can use wildcards.
When identifying fields you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. If you just specify the FieldName, IDOL server automatically adds a */
to it.
For example:
&TitleFields=*/DRETITLE
In this example, IDOL server reads a document's title from its DRETITLE field.
Page 101
Page 102
Page 103
is indexed as:
CHOLMONDLEI
WARNER
CHOLMONDLEYWARN
Barnes&Noble
is indexed as:
BARN
NOBL
BARNESNOBL
At query time, queries for terms that contain a hyphen or an ampersand are treated as follows:
http://<host>:<port>/action=Query&Text=Cholmondley-Warner
This query returns documents that contain "Cholmondley-Warner", " Cholmondley" or "Warner",
but also documents that contain, for example, "Cholmondley-Smythe" (documents that contain
"Cholmondley-Warner" would be returned with the highest relevance).
http://<host>:<port>/action=Query&Text=Barnes%26Noble
Note that in this query the ampersand has been escaped because it forms part of the query text
and should not be treated as a query syntax character by IDOL server.
This query returns documents that contain "Barnes&Noble", " Barnes" or "Noble", but also
documents that contain, for example, "Barnes&Greenough" (documents that contain "
Barnes&Noble " would be returned with the highest relevance).
Page 104
2.
In the [Server] section, set the KillDuplicates parameter to the Reference fields by which you
want to eliminate duplicates (you can identify fields that contain document references by setting
up an appropriate field process; see Setting up Reference fields on page 293). This ensures
that whenever a document is indexed that has the same Reference field value as a document that
IDOL server already contains, IDOL server deletes the document that it already contains and
replaces it with the new one.
3.
Save IDOL server's configuration file and start IDOL server. You can now index documents into
IDOL server.
Note: fields are identified as Reference fields through field processes in the IDOL server configuration
file (see Reference fields on page 293). If you use a <FieldName> Reference field to eliminate
duplicate documents, IDOL server automatically reads any fields that are listed alongside this field for
the PropertyFieldCSVs parameter in the field process, and also uses these fields to eliminate
duplicate documents.
For example:
[SetReferenceFields]
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE,*/URL
In this example, if KillDuplicates has been set to DREREFERENCE, IDOL server uses both a
documents DREREFERENCE field and URL field to eliminate duplicate copies.
If you want to define multiple reference fields but dont want them all to be used for document
elimination, you need to set up multiple field processes (see Using Reference fields to eliminate
duplicate copies of documents during indexing on page 105).
For example:
[SetReferenceFields]
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE
[SetMoreReferenceFields]
Property=Reference
PropertyFieldCSVs=*/URL
In this example, if KillDuplicates has been set to DREREFERENCE, IDOL server uses only a
documents DREREFERENCE field to eliminate duplicate copies, not its URL field.
Page 105
http://<IPAddress>:<Port>/action=IndexerGetStatus
<IPAddress>
Enter the IP address (or name) of the of the machine on which IDOL
server is installed.
<Port>
Enter the Port that you have specified in the IDOL server configuration
files [Server] section).
The IndexerGetStatus command displays the status of IDOL server's index queue:
-1
Finished
-2
-3
-4
The database into which you are trying to index could not
be found.
-5
Bad parameter
-6
Database exists
-7
Queued
-8
Unavailable
-9
Out of Memory
-10
Interrupted
-11
-12
Retrying interrupted
command
Page 106
-13
Backup in progress
-14
-15
-16
Index paused
-17
Index restarted
-18
Index cancelled
-19
-20
Index languagetype
not found
-21
Note: if the IndexerGetStatus command returns a positive number, this number indicates the
percentage of the indexing queue that has been completed.
Page 107
2.
3.
Save IDOL server's configuration file and restart your IDOL server in order to execute your
changes.
Once you have set up document tracking in IDOL server, you can add IDOL server as a child service
to your DiSH server and use the Autonomy Service Dashboard to track documents. Please refer to
your DiSH documentation and Autonomy Service Dashboard online help for details.
Page 108
Agents
Users can store queries in the form of agents in order to always be up-to-date on the
latest available information. Users can edit and retrain their agents.
Profiling
A profile is a set of agents that are trained using the documents the user is looking at,
and return data that matches the user's interests. You can set up your application so that
every time a user looks at a document, the profile decides whether this document is
relevant to its agent's training. It then either updates the training with the document's
content or creates a new profile agent for the user.
Collaboration
You can match users with common agents or similar profiles.
Alerting
When IDOL server receives new content that matches a users agents, the user is
immediately notifies the user by email or a third party system (for example by SMS or a
pager).
Mailing
IDOL server matches the agents and profiles against its document content in regular
intervals, and automatically notifies users of documents that match their agents and / or
profiles by sending them email.
Expertise
IDOL server accepts a natural language or Boolean search string and returns users who
own matching agents or profiles. This allows instant identification of experts in any
subjects at hand, eliminating time consuming searches for specialists, and unnecessary
researching of subjects for which expert knowledge is already available.
Page 109
Creating users
To create a flat user structure
Use the UserAdd action to create individual users.
For example:
http://<IPAddress>:<Port>/action=UserAdd&UserName=JaneBrown&Password=Sesame
<IPAddress>
Enter the IP address (or name) of the of the machine on which IDOL
server is installed.
<Port>
Enter the Port that you have specified in the IDOL server configuration
files [Server] section).
Decide how you want to structure your users. You can, for examples, group them according to
their roles and responsibilities in a company.
2.
Use the RoleAdd action to create a role for each user group.
For example:
http://<IPAddress>:<Port>/action=RoleAdd&RoleName=Sales
3.
4.
5.
Page 110
Set DeferLogin to true in the [Server] section of IDOL server's configuration file.
2.
When a user utilizes IDOL server for the first time, IDOL server creates a user with that user name and
allocates the default role's permissions and settings to this user.
Note: you can set DeferLoginSyncDuration in the [Server] section of IDOL server's configuration file
in order to specify how often IDOL server syncs the users it stores with the users in the third party
system.
Page 111
Page 112
8. Setting up security
If IDOL servers default security settings should not suit your environment, you can apply specific
security settings to documents that are indexed into IDOL server by identifying fields in the documents
that determine which security settings are appropriate to each of the documents (unless you want to
specify the security property of a document every time you index a document by sending an additional
parameter).
For details on the settings that the [Security] section can contain and on how you can configure them,
please refer to IDOL servers online help (see Displaying help on configuration settings on
page 389).
To set up automatic security application for documents:
1.
2.
In the [Security] section, list the security types that you want to use, and specify the security keys
that identify IDOL servers security type.
For example:
[Security]
SecurityInfoKeys=123,234,345,456
0=NT
1=Netware
2=Notes
3=Exchange
3.
Define a section for each of the security types that you have defined (the section must have the
same name as the security type), and specify appropriate settings for each security type in order
to determine how IDOL server handles this security type.
For example:
[NT]
SecurityCode=1
Library=nt_security.dll
Type=AUTONOMY_SECURITY_V4_NT_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Netware]
SecurityCode=2
Library=netware_security.dll
Type=AUTONOMY_SECURITY_NETWARE_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Notes]
SecurityCode=3
Library=notes_security.dll
Type=AUTONOMY_SECURITY_V4_NOTES_MAPPED
ReferenceField=*/AUTONOMYMETADATA
Page 113
Setting up security
[Exchange]
SecurityCode=4
Library=exchange_security.dll
Type=AUTONOMY_SECURITY_EXCHANGE_MAPPED
ReferenceField=*/AUTONOMYMETADATA
4.
In the [FieldProcessing] section, set up processes that allow IDOL server to recognize the
security type of documents (unless you want to specify the security property of a document every
time you index a document by sending an additional parameter). If you are using a version 4
security type (for example, AUTONOMY_SECURITY_V4_NOTES_MAPPED), you must include a
process that defines how you want to handle metadata.
For example:
[FieldProcessing]
Number=4
0=DetectNT
1=DetectNetware
2=DetectNotes
3=DetectExchange
4=DefineMetaData
5.
Create a section for each of the processes that you have listed, in which you create a property for
the process (security properties always point to a defined security type). Identify the field that you
want to associate with the processes (when identifying the fields from which IDOL server can read
a document's language type you should use the format /FieldName to match root-level fields, */
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to).
You can use the PropertyMatch parameter to identify a specific value that fields must have in
order to be processed.
Note: the properties that you create must not have the same name as processes.
For example:
[DetectNT]
Property=SetNTProperty
PropertyFieldCSVs=*/DRESECURITYTYPE
PropertyMatch=*nt
[DetectNetware]
Property=SetNetwareProperty
PropertyFieldCSVs=*/DRESECURITYTYPE
PropertyMatch=*netware
[DetectNotes]
Property=SetNotesProperty
PropertyFieldCSVs=*/DRESECURITYTYPE
PropertyMatch=*notes
Page 114
Setting up security
[DetectExchange]
Property=SetExchangeProperty
PropertyFieldCSVs=*/DRESECURITYTYPE
PropertyMatch=*exchange
[DefineMetaData]
Property=HideMetaData
PropertyFieldCSVs=*/AUTONOMYMETADATA
6.
List all the Properties that you have created in a [Properties] section.
For example:
[Properties]
0=SetNTProperty
1=SetNetwareProperty
2=SetNotesProperty
3=SetExchangeProperty
4=HideMetaData
7.
Create a section for each of the properties and specify appropriate configuration settings for each.
These configuration parameters define the processes that are applied to all the fields (or all
documents that contain the fields) that you have previously associated with the processes.
Note: if you are using a version 4 security type (for example,
AUTONOMY_SECURITY_V4_NOTES_MAPPED), you must set ACLType to true in the section
that sets up how IDOL server handles metadata, in order to implement optimized security.
[SetNTProperty]
SecurityType=NT
[SetNetwareProperty]
SecurityType=Netware
[SetNotesProperty]
SecurityType=Notes
[SetExchangeProperty]
SecurityType=Exchange
[HideMetaData]
HiddenType=true
ACLType=true
8.
9.
Note: for details on ensuring security in an Autonomy infrastructure, please refer to your IAS manual.
Page 115
Setting up security
Page 116
Alternatively, you can display the IDOL server online help, and click on the request log link in the top
right-hand corner. This displays the helps Log page which contains the log of requests that the
GetRequestLog action returns.
Page 117
Page 118
IDOL server
operations
11. Agents
Agents automatically find documents for you that you are interested in. A user who is interested in
football and gardening, could, for example, create a Real Madrid and a Pest Control agent. Each agent
is given training text when it is created. This training provides an example of the type of text the agent
is looking for, so that an agent will only return documents, profiles, categories or other agents that
conceptually match its training.
Note that while agents by default are matched against all IDOL servers databases (which store IDOL
servers data content, agents, profiles and categories), the matching can be restricted to one or more
databases (see Querying with an agent on page 122).
For example:
A user creates a Mortgage agent and trains it with text that is similar to the type of results he
expects the agent to return. The user can train the agent with text that he types himself or with
documents. Once the user has finished training the agent and specifying details for it (such as
the maximum number of results the agent can return, the minimum conceptual similarity of
results and so on), he can run the agent. The user can edit or retrain the agent at any time in
order to fine tune it.
Creating an agent
You can create agents using the AgentAdd action command. For details on this action, please refer to
the IDOL server online help (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=AgentAdd&UserName=Administrator&AgentName=Global+War
ming&Training=Factors+affecting+global+warming&FieldMinScore=60
This command uses port 4000 to create an agent called Global Warming for the Administrator user.
The agent is stored in IDOL servers Agent index which is situated on a machine with the IP address
12.3.4.56. The agent is trained to find documents whose concept matches the concept of the text
Factors affecting global warming. Only documents that have a conceptual relevance of at least 60%
to this text can be returned as results.
Page 121
Agents
Editing an agent
You can edit agents using the AgentEdit action command. For details on this action, please refer to
the IDOL server online help (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=AgentEdit&UserName=Administrator&AgentName=Global+War
ming&FieldMinScore=75
This command uses port 4000 to change the value of the Global Warming agents MinScore field to
75.
Retraining an agent
You can retrain agents using the AgentRetrain action command. For details on this action, please
refer to the IDOL server online help (see Displaying online help on page 61). When an agent is
retrained, the concepts of its training are modified with the concepts of the text that is used for the
retraining.
For example:
http://12.3.4.56:4000/action=AgentRetrain&UserName=Administrator&AgentName=Global+
Warming<br>&PositiveDocs=534+352+4534
This command uses port 4000 to retrain the Administrator user's Global Warming agent with the
documents that have the IDs 534, 352 and 4534.
Page 122
Agents
Copying an agent
You can copy an agent using the AgentCopy action command. For details on this action, please refer
to the IDOL server online help (see Displaying online help on page 61). Copying an agent is useful, if
you want to use one of your agents or another users agent as a template. You can copy the agent and
then modify the copy.
For example:
http://12.3.4.56:4000/action=AgentCopy&UserName=Administrator&AgentName=Global+W
arming<br>&DestinationUserName=JSmith&DestinationAgentName=Environment
This command uses port 4000 to copy the Administrator user's Global Warming agent details. The
agent's details are copied to the JSmith user's Environment agent.
Deleting an agent
You can delete an agent from IDOL servers Agent index using the AgentDelete action command. For
details on this action, please refer to the IDOL server online help (see Displaying online help on
page 61).
For example:
http://12.3.4.56:4000/
action=AgentDelete&UserName=Administrator&AgentName=Global+Warming
This command deletes the Administrator users Global Warming agent from IDOL server.
Page 123
Agents
Page 124
12. Alerting
IDOL server analyzes data in new documents (when it receives the documents) and compares the
concepts in documents with users agents. If new data matches a users agent, it immediately notifies
the user by email.
2.
Add a new section for the task. The name that you give this section must be unique.
For example:
[MyAlertTask]
3.
Add the following line to the section, in order to identify the task as an Alert task:
Module=Alert
4.
Use the IDOLserver parameter to specify the IP address (or name) of the machine that stores
IDOL server's Agent index, and the port that is used to query this Agent index.
For example:
IDOLserver=123.4.5.67
5.
Specify the fields in the new documents that you want to use to query fields in IDOL servers
Agent index:
For example:
Fields=Text,Title
6.
Specify the fields in IDOL servers Agent index that you want to query with the document fields
you have specified in step 5.
For example:
FieldMappings=DRECONTENT,DRETITLE
Page 125
Alerting
7.
Specify settings for your mail server. For details on available settings, please refer to the IDOL
server online help (see Displaying help on configuration settings on page 389).
For example:
SMTPServer=smtp.company.com
SMTPPort=25
SMTPSendFrom=administrator@mycompany.com
SMTPSendFromUsername=administrator
SMTPSendFromPassword=secret
SMTPSubject=Alert: new document DREREFERENCE
8.
Specify any other settings that you want to apply to your Alert task. For details on available
settings, please refer to the online help (see Displaying help on configuration settings on
page 389).
For example:
[MyAlertTask]
Module=Alert
IDOLserver=123.4.5.67
Fields=DRECONTENT,DRETITLE
FieldMappings=DRECONTENT,title
SMTPServer=smtp.company.com
SMTPPort=25
SMTPSendFrom=administrator@mycompany.com
SMTPSendFromUsername=administrator
SMTPSendFromPassword=secret
SMTPSubject=Alert: new document DREREFERENCE
AttachFileFromReference=true
AlwaysSendAttachment=true
Template=AlertTemplate1.txt
AttachmentTemplate=AlertTemplate2.txt
9.
Save the IDOL server configuration file and restart IDOL server for your configuration changes to
take effect.
Page 126
Alerting
Open the alertTemplate.html template file in a text editor and save it with a new name.
Alternatively, you can create a new file.
To display any of the following in alert emails, enter the associated field in the template:
To display:
A documents
reference
DREREFERENCE
For example:
Ref: DREREFERENCE
If you enter the above in the template and a
documents DREREFERENCE field contains
the value http://news.bbc.co.uk/index.html,
the alert email will contain the text:
Ref: http://news.bbc.co.uk/index.html
A documents title
DRETITLE
A documents
content
DRECONTENT
DRECONTENT
If you enter the above in the template and a
documents DRECONTENT field contains the
value Brime shrimp: popular with
aquaritics. High in protein and a nice snack
for many freshwater fish, the alert email will
contain the text:
Brime shrimp: popular with aquaritics.
High in protein and a nice snack for
many freshwater fish
Page 127
Alerting
Field value to
display
"FIELD<field_name>"
Examples
Author: "FIELDauthor"
If you enter the above in the template and a
documents author field contains the value JR
Hartley, the alert email will contain the text:
Author: JR Hartley
RESULTLINKS
The agents
relevance to the
document
RESULTWEIGHT
Relevance: RESULTWEIGHT %
If you enter the above in the template, and a
result agent has a conceptual similarity of 78%
to the document, the alert email will contain the
text:
Relevance: 78 %
The agents
training
AGENTTRAINING
Note: you should enter the fields in the position you want them to be displayed in the email.
2.
If you have set the SendToList configuration parameter to false, you can also include the
AGENTNAME and USERNAME fields in a template. (If SendToList is set to true, a single email
is sent to all users, so it is not possible for separate user and agent names to be displayed in the
email.) The email that the template creates will display the name of the agent that the new
document matches, and the user to whom the email is sent.
3.
Page 128
13. Categorization
IDOL servers Categorization operation allows you to do the following:
categorize data
You can automatically tag, categorize and index documents.
suggest categories
You can suggest conceptually similar categories for documents, text and other categories.
match categories
You can match categories against data, agents, profiles and other categories.
Page 129
Categorization
from scratch
from clusters
by copying categories
by generating a taxonomy
from XML
Note that all categories are stored on disk. They only become available for querying if they are indexed
into IDOL servers Category index.
Once you have created categories you can:
Page 130
Categorization
Page 131
Categorization
Page 132
Categorization
Mandatory tags
If you want to create an XML file to set up your category structure, you must use the following
Autonomy tags:
<autn:categories>
Marks the beginning of the XML categories that IDOL server reads. When you use the
CategoryImportFromXML action, IDOL server reads the XML within the opening and
closing <autn:categories> tags.
Required tags within <autn:categories>.
tag name
number allowed
<autn:category>
one or more
number allowed
<autn:name>
one
number allowed
<autn:positivetraining >
one
<autn:negativetraining>
one
<autn:details>
one
<autn:settings>
one
Page 133
Categorization
<autn:name>
Sets the name of the category. You must include one <autn:name> within each set of
<autn:category> tags.
Required tags within <autn:name>
none
Example content
<autn:name>UKpolitics</autn:name>
Optional tags
<autn:positivetraining>
Sets the positive training for a category. IDOL server identifies concepts that belong to
the category from this training set. You can include one <autn:positivetraining> within
each set of <autn:category> tags.
Required tags within <autn:positivetraining>.
tag name
number allowed
one
<autn:trainingdoc>
one or more
<autn:negativetraining>
Sets the negative training for a category. IDOL server identifies concepts that do not
belong to the category from this training set. You can include one
<autn:negativetraining> within each set of <autn:category> tags.
Required tags within <autn:negativetraining>
tag name
number allowed
Page 134
<autn:trainingtext>
one
<autn:trainingdoc>
one or more
Categorization
<autn:details>
Sets training details for the category. This can include the following:
You can include one set of <autn:details> within each set of <autn:category> tags.
Required tags within <autn:details>
none
Optional tags within <autn:details>
tag name
number allowed
<autn:boolean>
one
<autn:modifiedterms> and
<autn:modifiedweights>
one
<autn:settings>
Sets additional details for the category. You can include one set of <autn:settings>
within each set of <autn:category> tags.
Required tags within <autn:settings>
tag name
number allowed
<autn:categoryparameters>
one
<autn:trainingtext>
Sets a training text for a category. You can include only one <autn:training text> within
each set of <autn:positivetraining> or <autn:negativetraining> tags.
Required tags within <autn:trainingtext>
tag name
number allowed
<autn:training>
one
Page 135
Categorization
<autn:trainingdoc>
Sets a training document for a category. You can include any number of training
documents within each set of <autn:positivetraining> or <autn:negativetraining>
tags; each one must be marked by its own <autn:trainingdoc> tag.
Required tags within <autn:trainingdoc>
tag name
number allowed
<autn:trainingdoc>
one
<autn:title>
one
<autn:boolean >
Sets Boolean training for a category. You can include one <autn:boolean> within each
set of <autn:details> tags.
Required tags within <autn:boolean>
none
Example content
<autn:boolean>(phone AND mobile)</autn:boolean>
<autn:generatedterms>
Sets terms for a category only do this if you are editing an existing category from
which you can take the terms. You can include one <autn:generatedterms> within
each set of <autn:details> tags.
Note: if you are specifying terms for a category, then you must enter a corresponding list
of weights with <autn:generatedweights> tags.
Required tags within <autn:generatedterms>
none
Example content
<autn:generatedterms>LYMPH,MISDIAGNOS,PATHOLOGI</autn:generatedterms>
<autn:generatedweights>
Sets weights for a categorys terms only do this if you are editing an existing category
from which you can take the weights. You can include one <autn:generatedweights>
within each set of <autn:details> tags.
Note: if you are specifying weights for a category, then you must enter a corresponding
list of terms with <autn:generatedterms> tags.
Page 136
Categorization
Required tags within <autn:generatedweights>
none
Example content
<autn:generatedweights>5960,4035,4001</autn:generatedweights>
<autn:categoryparameters>
Sets additional category information. This can include the following details:
number allowed
<autn:numresults>
one
<autn:threshold>
one
<autn:[My_Field]>
any number
<autn:training>
Sets the training text to be used for training a category. You can enter one text with
<autn:training> for each set of <autn:trainingtext> or <autn:trainingdoc> tags.
Required tags within <autn:training>
none
Example content
<autn:trainingtext>The internet is coming to the South Pole following a decision to lay a
fibre-optic cable nearly two thousand kilometres across the polar ice. It will be one of the
most dramatic and challenging engineering tasks ever carried out in Antarctica. It will
take years to design and construct, but when finished it will revolutionise
communications with the South Pole. </autn:trainingtext>
Page 137
Categorization
<autn:title>
Sets the title of a training document to be used for training a category. You can enter one
title with <autn:title> for each set of <autn:trainingdoc> tags.
Required tags within <autn:title>
none
Example content
<autn:title>Internet to reach South Pole. </autn:title>
<autn:numresults>
Sets the number of results you require from category queries. You can include one
<autn:numresults> for each category within <autn:categoryparameters> tags.
Required tags within <autn:numresults>
none
Example content
<autn:numresults>10</autn:numresults>
<autn:threshold>
Sets the threshold you require for results of category queries. You can include
<autn:threshold> for each category within <autn:categoryparameters> tags.
Required tags within <autn:threshold>
none
Example content
<autn:threshold>25</autn:threshold>
Page 138
<autn:author>Dickens</autn:author>
Categorization
Examples
The minimum information you can give in your XML:
<?xml version="1.0" encoding="UTF-8" ?>
<autn:categories xmlns:autn="http://schemas.autonomy.com/aci/">
<autn:category>
<autn:name>MyCategory</autn:name>
</autn:category>
</autn:categories>
Page 139
Categorization
Training categories
Note: you only need to train categories that you have created with the CategoryCreate action.
Categories that you have created or imported using another action are already trained (you can,
however, retrain them).
You can use the CategorySetTraining action to train a category. A categorys training can comprise of
text, documents, a Boolean expression and category content or a combination of all of these. These
elements serve to identify text, documents, agents, profiles and other categories that match the
category.
For example:
http://<host>:<port>/action=CategorySetTraining&Category=323499876022105571056&Doc
ID=238,785,9912&BuildNow=true
In this example, IDOL server is instructed to train the category with the ID 323499876022105571056
using the content of the documents with the ID 238, 785 and 9912. The BuildNow parameter instructs
IDOL server to build the categories immediately, so they become active. You can also activate the
category at a later point using the CategoryBuild action (see Building categories on page 144).
Retraining categories
You can use the CategorySetTraining action to retrain a category. You can use text, documents, a
Boolean expression and category content or a combination of all of these to retrain a category. When a
category is retrained, its original training is merged with the new training supplied.
For example:
http://<host>:<port>/action=CategorySetTraining&Category=323499876022105571056&Bool
ean=dog AND NOT cat&BuildNow=true
In this example, IDOL server is instructed to retrain the category with the ID 323499876022105571056
using the Boolean expressions dog AND NOT cat. The BuildNow parameter instructs IDOL server to
build the categories immediately, so they become active. You can also activate the category at a later
point using the CategoryBuild action (see Building categories on page 144).
Moving categories
You can use the CategoryMove action to move individual categories in the category hierarchy.
For example:
http://<host>:<port>/action=CategoryMove&Category=124365780934532&Parent=12309823
4987345876
In this example, IDOL server is instructed to move the category that has the ID 124365780934532 to
the category with the ID 123098234987345876 (to make category 123098234987345876 the new
parent of category 124365780934532).
Page 140
Categorization
replace categories
activate categories
build categories
delete categories
sync IDOL servers Category index with the categories stored on disk
Categorization
Page 142
Categorization
Replacing categories
You can use the CategoryReplace action to replace a category with another category.
For example:
http://<host>:<port>/action=CategoryReplace&FromCategory=123456789012345&ToCatego
ry=98765432109876&BuildNow=true
In this example, IDOL server is instructed to replace the 98765432109876 category with the
123456789012345 category. The BuildNow parameter instructs IDOL server to build the categories
immediately, so they become active. You can also activate the category at a later point using the
CategoryBuild action (see Building categories on page 144).
Page 143
Categorization
Building categories
You can use the CategoryBuild action to build a category. You need to build a category after you have
created a new category and trained it, as well as every time you retrain a category. Building a category
identifies the concepts of the categorys training and indexes the category into the IDOL server's
Category index.
Note: if you have trained or retrained a category using the CategorySetTraining action with
TrainNow set to true, you do not have to execute a CategoryBuild action, as the category was built
immediately after it was trained.
For example:
http://<host>:<port>/action=CategoryBuild&Category=32349987602210557106
In this example, IDOL server is instructed to build the category with the ID 32349987602210557106.
Deleting categories
You can use the CategoryDelete action to delete a category. Deleting a category removes the
category from disk and from IDOL servers Category index.
For example:
http://<host>:<port>/action=CategoryDelete&Category=32349987602210557106
In this example, IDOL server is instructed to delete the category with the ID 32349987602210557106.
Page 144
Categorization
Page 145
Categorization
Categorizing data
You can configure IDOL server to automatically categorize data and index it.
To automatically categorize documents before they are stored in IDOL server, you need to set up a Cat
task. IDOL server matches incoming documents against categories that its Category index contains
and returns matching categories. It then tags the incoming documents according to which categories
they match.
For details on how to set up a Cat task, please see Processing data before indexing it on page 73.
Page 146
Categorization
Suggesting categories
IDOL server can suggest conceptually similar categories for:
documents
text
categories
Page 147
Categorization
Matching categories
You can use the CategoryQuery action to match categories against data, agents, profiles and other
categories.
For example:
http://<host>:<port>/action=CategoryQuery&Category=32349987602210557106
In this example, IDOL server matches the category with the ID 32349987602210557106 against all its
databases and returns conceptually similar data, agents, profiles and categories.
Page 148
14. Channels
IDOL server can automatically provide users with a set of hierarchical channels with highly relevant
information pertinent to the respective channel. Eliminating the requirement for manual intervention or
pre-tagging, real-time information is dynamically updated into the channels automatically, minimizing
the maintenance effort required. Moreover, the administrator can add and remove channels on the fly,
without having to re-categorize all of the data.
Page 149
Channels
Page 150
15. Clustering
IDOL server can automatically cluster information in order to make trends and developments in this
information visible. Clustering is the process of taking a large repository of unstructured data and
automatically partitioning it, so that similar information is clustered together. Each cluster represents a
concept area within the knowledge base and contains a set of items with common properties.
To cluster information, you need to take a snapshot of data that IDOL server stores. You can then
automatically cluster data within this snapshot (this does not require the setup of an initial taxonomy).
IDOL server takes a snapshot of the data it stores and, based on these snapshots, clusters related
information together. Each cluster represents a concept area that contains a set of items, which share
common properties.
Page 151
Clustering
Generating snapshots
The ClusterSnapshot action allows you to take a snapshot of the data stored in IDOL servers Data
index (by default this comprises the IDOL server databases News and Archive ). A snapshot
represents the content of the Data index at a particular time, and enable you to generate cluster
information and spectrographs at a later point, even if the Data index has changed. You can use a
single snapshot to generate both cluster information and spectrograph data in order to save process
time.
Each snapshot that is taken is time-stamped (with the number of seconds since 1st January 1970) and
stored in binary cls format in the Snapshots subdirectory of IDOL servers Cluster directory in your
IDOL server installation directory. This allows you to have several snapshots with the same name (for
example, of one particular IDOL server) and snapshots with different names (for example, of different
data sets).
You can set up a schedule that executes the ClusterSnapshot action in regular intervals (see
Setting up schedules on page 160).
Note: the IDOL server Data index of which you are taking a snapshot should ideally contain at least
several thousand documents with good quality content (that is relevant text for various topics).
Page 152
Clustering
The ClusterSGDataGen action allows you to generate spectrograph data from a set of snapshots that
you have taken using the ClusterSnapshot action.
Each spectrograph data set takes a succession of clusters from different time periods, calculates
cluster similarity measures across days, and applies a graph theoretic matching algorithm.
Calculations are made as to the conceptual spread of a cluster and its general quality. The size
(number of documents in a cluster) and quality of a cluster is represented by width and intensity on the
spectrograph.
All spectrograph data sets that you are generating are stored in the Sgdata subdirectory of the Cluster
directory in your IDOL server installation directory.
You can set up a schedule that executes the ClusterSGDataGen action in regular intervals (see
Setting up schedules on page 160).
You can retrieve the spectrograph image, data or documents using the ClusterSGPicServe,
ClusterSGDataServe and ClusterSGDocsServe actions, which are executed by the Spectrograph
applet.
Page 153
Clustering
The ClusterCluster action allows you to analyze clusters in a snapshot that you have taken using the
ClusterSnapshot action.
Clustering is a multi-stage, hybrid algorithm. After IDOL servers Adaptive Probabilistic Concept
Modelling (APCM) technology has identified similar documents, a hierarchical agglomerative
clustering algorithm groups documents into conceptually similar areas. Dynamic binding and fixating
produces the required clusters, whose title is generated automatically by cross-correlating important
concepts within a cluster with concepts within the titles of documents in that cluster.
You can set up a schedule that executes the ClusterCluster action in regular intervals (see Setting
up schedules on page 160).
Depending on which parameters you combine the action with, you can generate:
WhatsHot information
WhatsHot information is the most relevant information that is available for the clusters that
IDOL server identifies in your snapshot. Unlike WhatsNew information this is not restricted to
new information, which means that it can be used to follow the progress of particular news
items over time.
You can cluster WhatsHot information from a snapshot and use the Autonomy HotNews
portlet to display this information in a portal. You can also generate a 2D map from WhatsHot
information and display it in a portal using the Autonomy 2DMap portlet.
The 2D map gives a visualization of the similarities and difference between clusters. A
dimensionality reduction algorithm is used to maintain inter-cluster similarity measures so
that, clusters that are close together have some similarity and clusters that are not similar are
not close together. The distribution of documents throughout the space, along with non-linear
remapping, is then used to create the landscape.
Page 154
Clustering
WhatsNew information
WhatsNew information is the latest information that is available for the clusters that IDOL
server identifies in your snapshot.
You can generate WhatsNew information by comparing two snapshots (that have the same
name or different names).
The results of the ClusterCluster action are saved in cfg files in the Clusters subdirectory of the IDOL
server installation's Cluster directory from where you can retrieve them in XML format using the
ClusterResults action.
If you have configured the ClusterCluster action to generate a 2D map of WhatsHot cluster
information, you can use the ClusterServe2DMap action to return this map in one of the supported
image formats (that is, GIF, PNG or JPEG).
Page 155
Clustering
Configuring clustering
You can take a snapshot of the data content IDOL server stores. This snapshot identifies clusters of
conceptually similar documents, which enables you to generate a view of trends in the data. You don't
need to generate an initial taxonomy in order to take a snapshot.
A set of data can contain a few large clusters or many small clusters, as well as a number of outliers
that aren't part of any cluster. Clusters may consist of highly similar documents or of less closely
related ones. What constitutes optimal clustering depends to some extent on how you intend to use
your clusters, but the aim of clustering is always to generate an accurate characterization of the data
content in your IDOL server.
By default IDOL server uses internal settings to produce clusters. These default settings do not usually
need to be changed, but in some cases you may require more or less detail in your clusters, or the
amount and nature of your data may mean that default clustering is not satisfactory. You can optimize
clustering in these cases by setting parameters that adjust the size of the units on which clusters are
based, the degree of conceptual similarity that documents within clusters must have, or the number of
clusters that are created.
building "seeds"
Seed-building is implemented when the ClusterSnapshot action is executed. IDOL server takes a
sample of the documents it stores and tries to associate individual documents with each other - based
on the similarity of the concepts that the documents contain. Each of the groups of sample document
and similar documents produced at this stage is a seed. IDOL server stops trying to build a seed when
the seed meets the requirements that SeedSize specifies or when there are no more documents that
meet the similarity requirement that SeedBindLevel specifies (whichever condition is reached first).
IDOL server discards any seeds that don't reach the required size. The number of clusters you specify
with NumClusters affects the number of sample documents from which IDOL server tries to create
seeds at this stage (note that you can adjust the relationship between the number you specify here and
the size of the sample used by changing the value of StartingSuggestOverrideFactor).
Grouping seeds into clusters is implemented when the ClusterSGDataGen or ClusterCluster actions
are executed. IDOL server tries to create clusters by grouping seeds together. The grouping is based
on the similarity of the concepts that the seeds or clusters contain. Clustering is complete when the
number of clusters specified by NumClusters has been created, or when no more clusters can be
created that meet the similarity requirement specified by BindLevel (whichever condition is reached
first). Clusters that don't meet the quality requirement set by BindLevel or the size requirement set by
MinClusterDocs are discarded.
For details of the clustering actions, and the settings you can make to generate the clusters from your
data, please refer to the IDOL server online help (see Displaying online help on page 61).
Page 156
Clustering
Configuration settings
The ideal values for the parameters that affect clustering depend on the nature and amount of data in
your IDOL server. It is possible to make some general recommendations about how to change these
parameters according to your data. Parameters are closely interdependent, so you should make these
changes in combination with each other (rather than just changing one of the settings), and change
values in small increments or decrements.
Although you can make many changes to clustering, the number and size of clusters that IDOL server
can identify depends ultimately on the data content it contains:
MinClusterDocs
StartingSuggestOverrideFactor
SeedBindLevel
Page 157
Clustering
Clustering a large amount of data
If your IDOL server has a large amount of data, you will probably not need to edit any clustering
settings - since this is the situation in which clustering is most successful. In some cases (for example,
if your IDOL server contains more than 1 million documents), it may be beneficial to alter the following
setting:
StartingSuggestOverrideFactor
SeedBindLevel
BindLevel
Page 158
Clustering
Clustering very different data
If the documents in your IDOL server contain a wide variety of concepts, there may not be enough
similar documents for IDOL server to create seeds or clusters that characterize the data it stores. You
can lower the similarity requirement with the following settings:
SeedBindLevel
BindLevel
MinClusterDocs
BindLevel
Page 159
Clustering
Setting up schedules
You can set up up to 1024 schedules, which allow you to run the following actions in regular intervals:
ClusterSnapshot
ClusterCluster
ClusterSGDataGen
TaxonomyGenerate
For details on the settings that each [AnalysisSchedule] section can contain and on how you can
configure them, please refer to IDOL servers online help (see Displaying help on configuration
settings on page 389).
To set up schedules:
1.
2.
Create an [AnalysisSchedule<N>] section for each schedule that you want to run. Start the
numbering of the [AnalysisSchedule<N>] sections from 0 (so that the first schedule section is
called [AnalysisSchedule0]).
For example:
[AnalysisSchedule0]
[AnalysisSchedule1]
[AnalysisSchedule2]
[AnalysisSchedule3]
[AnalysisSchedule4]
[AnalysisSchedule5]
In this example 6 schedules have been created. Note that the schedules are listed in consecutive
order, starting from 0.
3.
Specify the settings that you want to apply to each schedule in the appropriate schedule's section.
You can specify the action that should be scheduled, the interval in which each schedule should
be executed, the number of times each schedule should be executed and so on.
For example:
[AnalysisSchedule0]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERSNAPSHOT
targetjobname=myjob
Page 160
Clustering
[AnalysisSchedule1]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERCLUSTER
sourcejobname=myjob
targetjobname=myjob_clusters
domapping=true
[AnalysisSchedule2]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERCLUSTER
sourcejobname=myjob
targetjobname=myjob_clusters_new
whatsnew=true
interval=86400
[AnalysisSchedule3]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERSGDATAGEN
interval=604800
sourcejobname=myjob
targetjobname=myjob_sg
[AnalysisSchedule4]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=CLUSTERSGDATAGEN
interval=86400
sourcejobname=myjob_content
targetjobname=compare_snapshots_sg
[AnalysisSchedule5]
schedulestarttime=now
scheduleinterval=1 day
schedulecycles=1
scheduleaction=TAXONOMYGENERATE
cluster=0,1,2,3,4,5,6,7,8,9
sourcejobname=myjob_clusters
targetjobname=myjob_taxonomy
writetaxonomy=true
numresults=25
4.
Page 161
Clustering
Page 162
16. Collaboration
IDOL server automatically matches users with common explicit interest agents or similar implicit
profiles. This information can be used to create virtual expert knowledge groups.
Page 163
Collaboration
Page 164
Page 165
Dynamic Thesaurus
Page 166
18. Eduction
IDOL servers eduction feature allows you to extract information that is embedded in unstructured data
and store it in fields. The information that you can extract comprises:
addresses
personal names
company names
dates
telephone numbers
other numbers
In order to extract built-in data types, you need to set up an eduction task in IDOL servers
configuration file before you start storing content in your IDOL server (see Processing data before
indexing it on page 73).
For example:
[EductionTask]
Module=Educe
NextTask=MyIndexTask
Page 167
Eduction
If you set up a task that extracts built-in data types, the eduction features creates a field for each builtin data type it finds in content that is indexed into IDOL server. Content is stored in fields as follows:
field name:
EDUCE_NAME
a persons name
EDUCE_ADDRESS
an address
EDUCE_DATE
a date
EDUCE_STREET
a street
EDUCE_CODE
EDUCE_PERCENT
EDUCE_INTEGER
EDUCE_FLOAT
EDUCE_PHONE
a phone number
EDUCE_COMPANY
a company name
Note: you can improve the automatic detection of names that are stored in the EDUCE_NAME field by
creating phrase set files that train IDOL server to recognize what good names are (names that it
should extract) and what bad names are (names that it should not extract):
1.
Create a file that list examples of good names and a file that lists examples of bad names. List
each name on a seperate line.
For example:
In the good name file:
Tom
Kate
Richard
Harry
Fred
Anna
The
Her
With
Which
That
His
This
Note that the more training you specify, the better the name recognition will work.
Page 168
Eduction
2.
3.
Add the PhraseSets parameter to your eduction task section, and use it to list both name types.
For example:
[EductionTask]
Module=Educe
NextTask=MyIndexTask
PhraseSets=GoodNames,BadNames
4.
Create configuration section for each of the name types you have listed. Use the
PhraseFileNames parameter to specify the location of the phrase list dat file that contains the
training and the ListType parameter to identify the training type of the file. Set the TagName
parameter to None to indicate that the content of the phrase lists is used for training and not for
exact matching.
For example:
[GoodNames]
PhraseFileNames=Good.dat
ListType=PosNames
TagName=None
[BadNames]
PhraseFileNames=Bad.dat
ListType=NegNames
TagName=None
5.
Save the configuration file. You can now start indexing data into IDOL server.
Every time data is indexed that contains names, IDOL server extracts these names and stores
them in EDUCE_NAME fields. Note that only names that start with capital letters are extracted.
Page 169
Eduction
Create a phrase list for each data type that you want to extract, and save it with a .dat extension.
In each file you need to list all words that belong to the files data type. List each word on a
separate line.
For example, if you want to extract country names, you could create a country.dat file, and list all
words that you want IDOL server to recognize as country data:
united kingdom
u.k.
uk
united states
u.s.a.
usa
u s a
2.
Create an eduction task in IDOL servers configuration file and use the PhraseSets parameter to
list all your user-defined data types. If you have already set up an eduction task to extract built-in
data types, you can add the PhraseSets parameter to this section.
For example:
[EductionTask]
Module=Educe
NextTask=MyIndexTask
PhraseSets=CountryNames
3.
Create configuration section for each of the data types you have listed and use the
PhraseFileNames and TagName parameters to specify the location of the phrase list dat file that
defines which data should be extracted, and in which field this data should be stored.
For example:
[CountryNames]
PhraseFileNames=Countries.dat
TagName=Country
4.
Save the configuration file. You can now start indexing data into IDOL server.
Every time data is indexed that contains a word that matches one of the words listed in one of the
phrase list dat files, IDOL server extracts this word and stores it in the appropriate field. Note that
the matching of phrase list word is not case sensitive.
Page 170
19. Expertise
IDOL server accepts a natural language or Boolean search string and returns users who own matching
agents or profiles. This allows instant identification of experts in any subjects at hand, eliminating time
consuming searches for specialists, and unnecessary researching of subjects for which expert
knowledge is already available
Page 171
Expertise
Page 172
20. Hyperlinking
When IDOL server returns results, it automatically generates hyperlinks in real time. These point to
contextually similar content and can be used to recommend related articles, documents, affinity
products or services, or media content that relates to textual content.
Because links are automatically inserted at the time a document is retrieved, they can include
references to documents and articles written long before, or hyperlinks from archived material can link
to the latest news or material on that subject.
For example:
New Media
When viewing an article on a new media internet site, Autonomy can be used to dynamically
link to contextually similar content and recommends related articles in real time.
Corporate
Within a corporate environment, as an employee is reading or writing a document, contextually
similar documents from various sources can be suggested to the person through dynamic
hyperlink creation enabling the user to immediately view documents, multimedia content and
related e-mails.
E-Commerce
Through contextual association, e-commerce vendors are able to increase customer retention
of their site through the ability to cross-sell and push other relevant content or products as they
browse product catalogues or content.
Legal
Typically within the legal arena, Autonomy facilitates the ability to suggest contextually relevant
legal content pertinent to the legal issues being researched. Through automatic Hyperlinking,
Autonomy significantly reduces the time taken to navigate to the right information, identify
previous precedents and facilitate reuse of existing material.
CRM
As a customer service representative attends a customer's enquiry, answers to frequently
asked questions and related e-mails are presented in the form of dynamic hyperlinks, enabling
the organization to raise its level of customer service, reduce the requirement for expertise in
the front line and ensure all issues are dealt with in shorter cycle times.
Page 173
Hyperlinking
Implementing hyperlinking
If you are connecting IDOL server to an Autonomy interface application (for example, Retina),
hyperlinks are automatically generated, for example, when query results are returned or when a user
refines a query.
If you are connecting IDOL server to a third party interface application, you can implement automatic
hyperlink generation by executing a Suggest action when query results are returned. Please refer to
your Autonomy ACI API documentation for details on the functions you require for this.
Page 174
21. Mailing
IDOL server matches users agents and subscription channels against its document content in regular
intervals, and automatically sends users email to notify them of documents that match their agents and
channels that they are subscribed to.
The format of email that the IDOL server sends is determined by templates (Mailer templates on
page 180 for details of these templates).
Page 175
Mailing
Open IDOL servers configuration file in a text editor, and find the [UserCustom] section. This
section lists all the custom processes that IDOL server executes.
2.
Check if the [UserCustom] section lists a section for emailing. If it doesnt, you need to add one.
For example:
[UserCustom]
0=Email
3.
Create a configuration file section for the emailing process you have listed.
For example:
[Email]
4.
5.
Specify a TestUser. While you are configuring mailing, all mail is sent to the TestUser email
address until you are ready to start mailing properly.
6.
If you are using a proxy server, specify the ProxyHost, ProxyPort, your ProxyUsername and
your ProxyPassword.
7.
Use SMTPHost and SMTPPort to specify the details of your mail server.
8.
Use Cycles and Interval to determine how many times the mailing operation should run and the
time span that you want to elapse between the sending of email. Set StartTime to now, so you
can test the mailing operation immediately when you start IDOL server.
9.
Set Retries to the number of times that IDOL server attempts to connect to its Agent index before
it times out, and use TimeoutMS to specify how long each of these attempts can take.
10. Use From, FromHost and FromName to set the details that are displayed for the sender of email
that the mailing operation sends. Specify the DefaultSubject that is displayed as the mail's
subject line.
11. Use XSLTemplate to specify which template you want to use for the email. The
DefaultEmailFormat and DefaultEmailResultsType settings allow you to specify the email
format and whether results are sent individually or in sets.
Page 176
Mailing
12. Use DefaultAddSetToReadDocuments and DefaultExcludeReadDocuments to determine if a
list of the results that a user has already viewed should be created, and if results that are
contained in this list should be excluded from mail that the IDOL server sends to users (so each
result can only sent to them once). Set DreTemplateReferenceStart and
DreTemplateReferenceEnd to ensure that IDOL server can extract the reference of documents
and determine if they have been viewed.
13. If you want to include channel results in the email that the mailing operation sends, you need to
configure the following settings:
ClassificationServerXSLTemplate
The template that you want to use to display channel results.
ClassificationServerNumResults
The maximum number of channel results to include in the email.
ClassificationServerThreshold
The quality of channel results to include in the email.
ClassificationServerParams
Parameters that should be included in the channels query that the mailing operation sends to
IDOL servers Category index.
ClassificationServerValues
The values of the specified ClassificationServerParams parameters.
ClassificationServerRetries
The number of times that the mailing operation attempts to connect to IDOL servers
Category index.
ClassificationServerTimeout
Specifies how long each of the ClassificationServerRetries can take, before the mailing
operation times out.
Note:
users will only receive channel results for categories that you have subscribed them to. You
can subscribe a user to one or more categories by sending a UserEdit action to IDOL server.
Use the CategorySubscribe action parameter to specify the categories whose results you
want to be mailed to the user. (You can unsubscribe a user by issuing a UserEdit action with
the CategoryUnsubscribe action parameter set to the categories whose results should no
longer be mailed to the user).
if you want to include channel results from another IDOL server installation in the emails that
the mailing operation sends, you need to use ClassificationServerHost and
ClassificationServerPort to specify the location of that IDOL server.
14. If you need to minimize the impact that the mailing operation has on your system resources, you
can set SleepBetweenRequests and MaxEmailsPerUser to values that are appropriate for your
environment.
Page 177
Mailing
15. Save IDOL servers configuration file and restart IDOL server. The mailing operation will start
immediately because you have set StartTime to now, so mail should be send to the TestUser
address you have specified. Check that the mail process is working smoothly.
16. Make any adjustments to your settings that you need, then save the configuration file again and
restart IDOL server. Note that you can enable VerboseLogging if you experience problems with
the mailing operation.
Open IDOL servers configuration file in a text editor, and find the [UserCustom] section.
2.
Delete the email address you have specified for TestUser and set StartTime to the time when you
want the mailing operation to start.
3.
Page 178
Mailing
Open IDOL servers configuration file in a text editor, and find the [UserCustom] section.
If you have already added a custom section in order to automatically email results to users (see
Automatically emailing agent and channel results on page 176), the same settings enable the
sending of custom emails. If you are using this existing section, ensure that you specify the
template to use for custom emails with EmailActionXSLTemplate. Continue with step 7.
If you want IDOL server to send custom emails without enabling automatic agent and channel
results emailing, specify a new custom section. Continue with step 2.
Note: for details of configuration parameters, please refer to IDOL servers online help (see
Displaying help on configuration settings on page 389).
2.
Add a section to the configuration file with the name that you specified in the [UserCustom]
section.
3.
4.
If you are using a proxy server, specify the ProxyHost, ProxyPort, your ProxyUsername and
your ProxyPassword.
5.
Use SMTPHost and SMTPPort to specify the details of your mail server.
6.
7.
8.
Send a Custom action to IDOL server, with Function set to email and Library set with the name
of the custom section in the IDOL server configuration file that sets up the mailing operation.
Please refer to the online help for details on the Custom action (see Displaying online help on
page 61).
9.
The mailing operation uses the template you specified with EmailActionXSLTemplate to create
the email that it sends to the specified user.
Page 179
Mailing
Mailer templates
The IDOL server installation comprises the following XSL templates for the mailing operation:
email.xss
Main template that the user_email library uses for results emails.
email.xss specifies the overall structure of emails and includes
specific instructions for displaying individual agent results.
channels.xss
Template that the user_email library uses for formatting the channel
results the email.xss template includes.
ondemand.xss
Template that specifies how to display the emails that IDOL server
sends in response to a Custom action command.
Page 180
Mailing
Editing templates
The XSL templates use XPath and XSLT to identify fields to sort and display from the XML returned in
response to action commands sent to IDOL server.
The XML fields that the template uses to create emails are identified by the select attribute in the
templates XSL tags. To identify the XML fields that a template can use, use a web browser to send the
HTTP action command for which IDOL server uses the template to display results. You can then
determine available field names from the autn tags in the XML that is returned. The action command
to send depends on the template you are editing:
template
action command
email.xss
AgentGetResults
channels.xss
CategoryQuery
ondemand.xss
For details of how to send these action commands, please refer to IDOL servers online help (see
Displaying online help on page 61).
For example, if you send an AgentGetResults action command to IDOL server, the following XML
could be returned:
<?xml version='1.0' encoding='UTF-8' ?>
<autnresponse xmlns:autn='http://schemas.autonomy.com/aci/'>
<action>AGENTGETRESULTS</action>
<response>SUCCESS</response>
<responsedata>
<autn:agent>
<autn:aid>2-A2</autn:aid>
<autn:training />
<autn:parent>2</autn:parent>
<autn:agentname>agent21</autn:agentname>
<autn:fields>
<retrained>true</retrained>
<private>false</private>
<fromdocument>true</fromdocument>
</autn:fields>
<autn:results>
<autn:numhits>1</autn:numhits>
<autn:hit>
<autn:reference>http://193.115.251.40/ArchiveData/
encarta\38000\msdata39439.htm</autn:reference>
Page 181
Mailing
<autn:id>1254</autn:id>
<autn:section>0</autn:section>
<autn:weight>70.77</autn:weight>
<autn:links>TAPESTRI,REVIV,WEAV,REACH,EUROPEAN,OCCUR,PRACTIC,TRA
DIT,REMAIN,EUROP,EXAMPL,WESTERN,ALTHOUGH,EAR</autn:links>
<autn:database>News</autn:database>
<autn:title>Tapestry Tapestry weaving may have been practiced in Europe as ...</
autn:title>
<autn:summary>Tapestry Tapestry weaving may have been practiced in Europe as
... . Tapestry Tapestry weaving may have been practiced in Europe as early as the
8th century, although no examples remain. Western European tapestry reached its
greatest development between the 14th and 18th centuries. During the 19th and
20th centuries, however, revivals of the tapestry tradition occurred. . </
autn:summary>
<autn:content>
<DOCUMENT>
<DREREFERENCE>http://193.115.251.40/ArchiveData/
encarta\38000\msdata39439.htm</DREREFERENCE>
<DRETITLE>Tapestry Tapestry weaving may have been practiced in Europe
as ... </DRETITLE>
<BLANK />
<IMAGE>archiv</IMAGE>
<PAPER />
<SUMMARY>Tapestry Tapestry weaving may have been practiced in
Europe as early as the 8th century, although no examples remain. Western
European tapestry reached its greatest development between the 14th and
18th centuries</SUMMARY>
<DOCTYPE>ARCHIVE</DOCTYPE><
DREDATE>907347778</DREDATE>
<DREDBNAME>ARCHIVE</DREDBNAME>
<DRECONTENT>Tapestry Tapestry weaving may have been practiced in
Europe as early as the 8th century, although no examples remain. Western
European tapestry reached its greatest development between the 14th and
18th centuries. During the 19th and 20th centuries, however, revivals of the
tapestry tradition occurred. </DRECONTENT>
<autn:content>
</autn:hit>
</autn:results>
</autn:agent>
</responsedata>
</autnresponse>
Page 182
Mailing
In this example, you can see from the XML that IDOL server returns that the following fields are
available as values for the select attribute:
agent
private
section
aid
fromdocument
weight
training
results
links
parent
numhits
database
agentname
hit
title
fields
reference
summary
retrained
id
content
You can include these fields as values in the XSL tags. For example, to display the value of the <autn:
title> tag for each result document, include the following lines in your template:
<xsl:for-each select=responsedata/hit">
<xsl:value-of select="title">
</xsl:for-each>
Note that you should remove the autn: part of the tag from the XSL tag that you specify. For example,
if the XML that IDOL server returns contains a tag called autn:title you should specify the tag as title
(as in select="title", in the example here).
Page 183
Mailing
Page 184
22. Profiling
IDOL server automatically creates profiles for users, in real time. You can configure IDOL server to
create up to four different profile types. By default it creates an interest and an expertise profile for
each user.
An interest profiles is created by tracking the content that a user views and extracting a conceptual
understanding of it. IDOL server then uses this understanding to keep the users interest profile up-todate. Interest profiles can be used to target information at users, recommend content to users, alert
users to the existence of content and to put users in touch with other users who have similar interests.
An expertise profile is created by tracking the content that a user creates and extracting a conceptual
understanding of it. IDOL server then uses this understanding to keep the users expertise profile upto-date. Expertise profiles can be used to trace users who are experts in particular subject areas.
Profiling a user
You can profile a user using the ProfileUser action command. For details on this action, please refer
to the IDOL server online help (see Displaying online help on page 61).
Page 185
Profiling
To create an expertise profile for a user:
Execute the ProfileUser action when a user creates text (for example, a document in IDOL server that
was authored by a user or text that a user enters in a helpdesk environment). IDOL server analyzes
the text the user has created and determines if it is similar to any concepts in the users existing
expertise profile (using MatchThreshold).
If the content of the viewed text is similar to an existing expertise profile concept, IDOL server updates
the existing concept with the text (if several concepts are similar, only the most similar one is updated).
If the text is not similar to an existing expertise profile concept, IDOL server creates a new concept in
the expertise profile.
Note: IDOL server only uses the five strongest concepts in a users expertise profile for expertise
matching.
For example:
http://12.3.4.56:4000/action=ProfileUser&UserName=Administrator&Document=The
chemical structure of everyone's DNA is the same. The only difference between people (or
any animal) is the order of the base pairs& MatchThreshold=60&NamedArea=Expertise
This command instructs IDOL server to analyze the specified text. If it has a conceptual relevance of at
least 60% to any concept the Administrator users expertise profiles, IDOL server uses it to update
the matching expertise profile concept (if several concepts are similar, only the most similar one is
updated). If the text does not have a conceptual relevance of at least 60% to an existing expertise
profile concept, IDOL server creates a new expertise profile concept from it.
Page 186
Profiling
Editing a profile
IDOL server stores interest and expertise agents in the form of terms and weights. You can edit a
profiles terms and weights using the ProfileEdit action command. For details on this action, please
refer to the IDOL server online help (see Displaying online help on page 61).
For example:
http://12.3.4.56:4000/action=ProfileEdit&PID=1-P2.3&TermCOLOR=2322
This command changes the weight of the 1-P2.3 profiles COLOR term to 2322.
Deleting a profile
You can delete a profile from IDOL servers Profile index using the ProfileClear action command. For
details on this action, please refer to the IDOL server online help (see Displaying online help on
page 61).
For example:
http://12.3.4.56:4000/action=ProfileClear&UserName=Administrator&PID=450-P0.1
This command deletes the Administrator users 450-P0.1 profile from IDOL server.
Page 187
Profiling
Page 188
23. Retrieval
You can query IDOL server with action commands using a web browser, an Autonomy interface
application (for example, Retina) or a third party portal that uses Autonomy portlets.
Action commands
IDOL server is queried via action commands. The following action commands are available to all clients that have permission to query IDOL server (set by QueryClients in the IDOL server configuration
file's [Server] section):
GetContent
GetQueryTagValues
GetTagNames
GetTagValues
Highlight
Query
Suggest
SuggestOnText
Summarize
TermGetBest
TermGetInfo
Allows you to return the weight and other available information for
specified terms.
Page 189
Retrieval
In addition, the following actions are available to administrative clients of IDOL server (set by
AdminClients in the IDOL server configuration file's [Server] section):
DetectLanguage
GetStatus
IndexerGetStatus
List
Allows you to list all documents that are stored in IDOL server or
any of its databases.
TermGetAll
Allows you list all terms that are stored in IDOL server.
Note:
for further details on action commands (see Displaying online help on page 61), please
refer to the online help.
for details on action command syntax, please see Action command syntax on page 62.
Page 190
Retrieval
Conceptual matching
You can use Query, Suggest and SuggestOnText action commands to perform conceptual matching.
IDOL server uses advanced pattern-matching technology to conceptually match the data with which it
is queried (via action commands) against the content it holds.
Content matching
You can submit natural language text or a piece of content to IDOL server, for which it returns
references to conceptually related documents ranked by relevance, or contextual distance.
Natural language queries make it possible for users to find the results they are looking for without
having to be familiar with search algorithms or syntax. Online shoppers, for example, can find specific
items without knowing the exact product or brand name.
Active matching
If you are using the Autonomy Desktop Suite (or one of the Active products that the Desktop Suite
comprises), IDOL server conceptually matches natural language text content in whichever application
a user is currently using, and returns a list of documents ordered by contextual relevance to the active
text.
Community matching
You can create agents from natural language and then match them conceptually. Profiles or natural
language text can also be submitted to IDOL server, for which it returns agents ranked by conceptual
similarity. This determines which users have similar interests (thus promoting collaboration) and
identifies experts in a field.
Category matching
You can submit a piece of content to IDOL server, for which it returns categories ranked by conceptual
similarity. This determines for which categories the piece of content is most appropriate for, so that the
piece of content can subsequently be tagged, routed or filed accordingly.
Clustering
You can use IDOL server to organize large volumes of content or large numbers of profiles into selfconsistent clusters. Clustering is an automatic agglomerative technique which allows IDOL server to
partition a corpus by grouping together information that contains similar concepts.
Page 191
Retrieval
Example queries
Agent or category query:
http://localhost:5552/action=Query&Text=MAMMALIAN~[254] MICROBIOLOGI~[112]
GENOM~[103] GENET~[100] MOLECULAR~[75] BIOTECHNOLOGI~[71] BIOLOGI~[69]
GENE~[59] BIOLOG~[43] CELL~[37]
In this example, an agent (or category) query is sent to IDOL server. The query contains the terms that
the agent's training comprises and the weight of each of the terms. IDOL server can return agents,
profiles, categories or documents that conceptually match the terms of the query.
Profile query:
http://localhost:5552/action=Query&Text= CHAMPIONLEAGU~[551] EVERTON~[407]
BAYERN~[402] UEFA~[391] PREMIERSHIP~[383] FIFA~[257] STRIKER~[226]
WORLDCUP~[215] EURO~[124] SOCCER~[114] CUP~[66]
In this example, a profile query is sent to IDOL server. The query contains the terms that the profile's
training comprises and the weight of each of the terms. IDOL server can return agents, profiles,
categories or documents that conceptually match the terms of the query.
Text query:
http://localhost:5552/action=Query&Text=Gene analysis discovered methods to determine
the exact sequence of nucleotides that compose a specific gene
In this example, a text query is sent to IDOL server. IDOL server can return agents, profiles, categories
or documents that conceptually match the query text.
Suggest query:
http://localhost:5552/action=Suggest&ID=10
In this example, a Suggest query is sent to IDOL server. IDOL server can return agents, profiles,
categories or documents that conceptually match the specified document (that is the document with
the ID 10).
SuggestOnText query:
http://localhost:5552/action=SuggestOnText&Text=Gene analysis discovered methods to
determine the exact sequence of nucleotides that compose a specific gene
In this example, a SuggestOnText query is sent to IDOL server. IDOL server can return agents,
profiles, categories or documents that conceptually match the terms with the highest weighting in the
query text.
Page 192
Retrieval
Note that if you enable advanced keyword searches, you can still execute a conceptual phrase search
that uses stemming by using DNEAR1.
For example:
http://<host>:<port>/action=Query&Text=Tony DNEAR1 Browne
This query returns documents that contain "Tony Brown", "Toni Browning" and so on.
2.
In the [Server] section, set the AdvancedSearch parameter to true. (If the [Server] section
doesn't contain the AdvancedSearch parameter, you should add it). Note that if you are enabling
AdvancedSearch, it is recommended that you set ProperNames to 0 or 7 in the appropriate
language type sections of the configuration file.
3.
4.
Index documents into IDOL server. Once you have finished indexing, you can execute advanced
keyword searches using the Query action command.
Page 193
Retrieval
Binary operator. Ensures that both terms are matched in every document that is
returned.
For example:
action=Query&Text=cat+AND+dog
This query only returns documents that contain both cat and dog.
NOT
Unary operator. Ensures that the term following NOT is excluded from any of the
returned documents.
For example:
action=Query&Text=cat+NOT+dog
This query only returns documents that contain "cat" but not "dog".
Note: if you want to use NOT to exclude multiple terms, you need to use brackets,
otherwise NOT only applies to the term that immediately follows it. If you want to use
NOT to exclude a phrase, you need to put the phrase in quotation marks and in
brackets.
For example:
Doc 1: I went to the city for the New Year
Doc 2: I went to New York City for the New Year
This query would match neither of the above documents:
action=Query&Text=city NOT (New York)
This query matches the first document but not the second:
action=Query&Text=city NOT ("New York")
OR
Binary operator. One or both terms must appear for the document to be returned. This
is the default behavior if no explicit operator is given between two terms.
For example:
action=Query&Text=cat+OR+dog
This query only returns documents that contain either cat, dog or both terms.
Page 194
Retrieval
EOR
or
XOR
Binary operator. Logical exclusive OR. Only one of the terms is permitted to appear for
the document to be returned. This is a rarely used operator.
For example:
action=Query&Text=cat+XOR+dog
This query only returns documents that contain either the term cat or the term dog.
Documents that contain both "cat" and "dog" are not returned.
()
Bracketed expressions. These are evaluated left to right and can be nested. They
dictate the precedence and behavior of combined operator statements.
For example:
action=Query&Text=(fish EOR pie) AND (chips EOR mash)
This query only returns documents that contain one of the following:
"fish and chips
fish and mash
pie and chips
pie and mash
Highest precedence:
NOT
NEAR; DNEAR
AND; BEFORE; AFTER
Lowest precedence:
Operators that have the same level of precedence have neither left or right associativity. You can use
brackets to bind terms together as appropriate (note that Proximity operators must have terms on
either side and cannot be adjacent to brackets).
Page 195
Retrieval
quotation marks
Phrases are stemmed and then matched by IDOL server. As with all query text, any
stopwords that the phrases contain are removed before matching and any punctuation that a
phrase contains is ignored.
Quotation marks
You can use the Query action commands Text parameter to match a phrase by putting quotation
marks around it. Note that the phrase is stemmed and that any stopwords that the phrase contains are
removed before it is matched. IDOL server ignores any punctuation that the phrase contains.
If you want to specify multiple phrases, IDOL server matches any one of them. You must separate the
individual phrases with plus symbols or spaces and put quotation marks around each phrase.
Examples:
http://<host>:<port>/action=Query&Text="world market"
This query returns documents that contain, for example, world market, all over the world. This
market, in world markets and so on.
http://<host>:<port>/action=Query&Text="Bank of England"
This query returns documents that contain, for example, Bank of England, banking in
England, the river bank. On Englands shores and so on.
Page 196
Retrieval
http://<host>:<port>/action=Query&Text="bird watching" "birds of prey"
This query returns documents that contain, for example, bird watching, the bird was watching
the worm, birds of prey bird of prey the bird is the prey and so on.
Page 197
Retrieval
Field restrictions
You can use simple field restrictions within a Query action's Text parameter in order to return results
that contain specific values in specific fields or, if you combine query text with a field restriction,
increase the relevance of results that contain specific values in specific fields. Note that these fields
must have been stored as Index fields in IDOL server (see Setting up field indexing on page 67).
You can use wildcards, but you cannot match more than one value or a value that contains spaces or
punctuation. You cannot use field restrictions on terms in brackets.
Example queries
http://<host>:<port>/action=Query&Text=cat:DRETITLE
This query only returns documents that contain the value cat in their DRETITLE field.
http://<host>:<port>/action=Query&Text=cat dog:DRETITLE
This query returns documents that contain the term cat in any field and the term dog in their
DRETITLE field. Documents that contain either cat (in any field) or dog in their DRETITLE field
are also returned, but with a lower relevance.
http://<host>:<port>/action=Query&Text=cat:CREATURE:FAUNA dog:ANIMAL
This query only returns documents that contain the value cat in their CREATURE or FAUNA
field and the value dog in their ANIMAL field. Documents that contain either cat in their
CREATURE or FAUNA field or dog in their ANIMAL field are also returned, but with a lower
relevance.
http://<host>:<port>/action=Query&Text=engin*:Title
This query only returns documents whose title field contains the specified string (for example,
"engineer", "engineering" and so on). Note that wildcard matching is carried out after stemming
has taken place.
Page 198
Retrieval
Note:
When identifying fields you should use the format /FieldName to match root-level fields,
FieldName to match all fields except root-level or /Path/FieldName to match fields that the
specified path points to. To identify XML attributes, use the format <tag_name>/
_ATTR_<attribute_name>, for example, FARM/_ATTR_ANIMAL. You can also use Wildcards
when identifying fields, for example, /Fi*d1, /Field* and so on.
All string matching is case insensitive, unless the parameter CaseSensitive=true is used.
MATCH
a number
a date
WILD
Page 199
Retrieval
Examples:
FieldText=MATCH{Archive,Web,docs}:DB:DATABASE
A document's DB or DATABASE field must have the value Archive, Web or docs for this
document to be returned as a result.
FieldText=MATCH{Premier league}:DB
A document's DB field must have the value Premier League for this document to be
returned as a result.
FieldText=MATCH{0-226-10389-7}:ISBN
A document's ISBN field must have the value 0-226-10389-7 for this document to be
returned as a result.
Page 200
Retrieval
EQUAL
The EQUAL field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that matches one of the numbers specified by you.
Format:
Examples:
FieldText=EQUAL{1234567890123}:ACCOUNT:KONTO
A document's ACCOUNT or KONTO field must contain the number 1234567890123 for this
document to be returned.
FieldText=EQUAL{3.9,4.9,7}:ID
A document's ID field must contain the number 3.9, 3.90, 4.9, 4.90, 7 or 7.0 for this
document to be returned.
Page 201
Retrieval
GREATER
The GREATER field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that is greater than a number specified by you.
Format:
Examples:
FieldText=GREATER{66}:ID
A document's ID field must contain a number greater than 66 for this document to be
returned.
FieldText=GREATER{5.59}:PRICE:PREIS
A document's PRICE or PREIS field must contain a number greater than 5.59 for this
document to be returned.
LESS
The LESS field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that is smaller than a number specified by you.
Format:
Page 202
Retrieval
Examples:
FieldText=LESS{66}:ID
A document's ID field must contain a smaller number than 66 for this document to be
returned.
FieldText=LESS{5.59}:PRICE:PREIS
A document's PRICE or PREIS field must contain a smaller number than 5.59 for this
document to be returned.
NOTEQUAL
The NOTEQUAL field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that does not match a number specified by you.
Format:
Examples:
FieldText=NOTEQUAL{1234567890123}:ACCOUNT:KONTO
A document's ACCOUNT or KONTO field must not contain the number 1234567890123 for
this document to be returned.
FieldText=NOTEQUAL{3.9}:ID
A document's ID field must not contain the number 3.9 for this document to be returned.
Page 203
Retrieval
NRANGE
The NRANGE field specifier (case sensitive) allows you to find documents in which a specified field
contains a number that falls within the inclusive range of two numbers specified by you.
Format:
Examples:
FieldText=NRANGE{1,99}:CODE
A document's CODE field must contain a number between 1 and 99 (inclusive) for this
document to be returned.
FieldText=NRANGE{1234567890123,2345678901234}:ACCOUNT:KONTO
A document's ACCOUNT or KONTO field must not contain a number between
1234567890123 and 2345678901234 (inclusive) for this document to be returned.
FieldText=NRANGE{36.5,42.3}:CODE
A document's CODE field must contain a number between 36.5 and 42.3 (inclusive) for this
document to be returned.
GTNOW
The GTNOW field specifier (case sensitive) allows you to find documents in which a specified field
contains a date that is greater than the current number of seconds since 1st January 1970 (or since
170 AD, if you have set ExtendedDateRange to true in IDOL server's configuration file).
Page 204
Retrieval
Format:
FieldText=GTNOW{}:<your fields>
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a date that is greater than the current number of seconds since 1st
January 1970 (or since 170 AD, if you have set ExtendedDateRange to true in IDOL
server's configuration file).
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=GTNOW{}:TIME
A document's TIME field must contain a date that is greater than the current number of
seconds since 1970 (that is all documents that were indexed with dates after the current
time) for this document to be returned.
FieldText=GTNOW{}:TIME:DATE
A document's TIME or DATE field must contain a date that is greater than the current
number of seconds since 1970 (that is all documents that were indexed with dates after the
current time) for this document to be returned.
LTNOW
The LTNOW field specifier (case sensitive) allows you to find documents in which a specified field
contains a date that is smaller than the current number of seconds since 1st January 1970 (or since
170 AD, if you have set ExtendedDateRange to true in IDOL server's configuration file).
Format:
FieldText=LTNOW{}:<your fields>
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a date that is smaller than the current number of seconds since 1st
January 1970 (or since 170 AD, if you have set ExtendedDateRange to true in IDOL
server's configuration file).
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=LTNOW{}:*/TIME
A document's TIME field must contain a date that is smaller than the current number of
seconds since 1970 (that is all documents that were indexed with dates before the current
time) for this document to be returned.
Page 205
Retrieval
FieldText=LTNOW{}:TIME:DATE
A document's TIME or DATE field must contain a date that is smaller than the current
number of seconds since 1970 (that is all documents that were indexed with dates before
the current time) for this document to be returned.
RANGE
The RANGE field specifier (case sensitive) allows you to find documents in which a specified field
contains a date that falls within the inclusive range of two dates specified by you.
Format:
A date.
For example, 1/3/05, 23/12/99 or 10/07/40.
If the year is a number less than 40, it is read as a year in the
2000s. If the year is a number between 40 and 99, it is read as a
year in the 1900s. For example, 1/02/1 is read as January 1st
2001, while 01/3/40 is read as March 3rd 1940.
DD/MM/YYYY
A date.
For example, 1/3/2005, 23/12/1999 or 10/07/1940.
<N>
<N>s
<N>e
Page 206
Retrieval
No restriction.
If you enter a full stop for the first point in time you are specifying, the beginning of the period is unrestricted (so the period
ranges up to the specified date, including any date before the
specified date).
If you enter a full stop for the second points in time you are
specifying, the end of the period is unrestricted (so the period
ranges from the specified date, including any date after the
specified date).
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a date that falls within the inclusive range of <your dates>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=RANGE{01/01/90,1/1/01}:DATE
A document's DATE field must contain a date between 01/01/1990 and 1/1/2001 for this
document to be returned.
FieldText=RANGE{01/01/02,01/01/2003}:DATE:DATUM
A document's DATE or DATUM field must contain a date between 01/01/2002 and 01/01/
2003 for this document to be returned.
FieldText=RANGE{-14,-7}:DATE
A document's DATE field must contain a date 14 to 7 days before the current date for this
document to be returned.
FieldText=RANGE{0,1}:DATE
A document's DATE field must contain today's or tomorrow's date (which is possible, for
example, if the document originates from a different time zone or if the field contains an
expiry date) for this document to be returned.
FieldText=RANGE{01/01/99,.}:DATE:FECHA
A document's DATE or FECHA field can contain any date after 01/01/1999 for this
document to be returned.
FieldText=RANGE{.,10/10/04}:DATE
A document's DATE field can contain any date before 10/10/2004 for this document to be
returned.
Page 207
Retrieval
FieldText=RANGE{-172800s,-1}:DATE
A document's DATE field must contain a time between 48 and 24 hours ago.
FieldText=RANGE{198765e,.}:DATE
A document's DATE field must contain a date between 198765 seconds after the epoch and
the current time.
Examples:
FieldText=WILD{*.html,*.htm}:URL
A document's URL field value must end with html or htm for this document to be returned
as a result.
FieldText=WILD{passi*incarnata}:Climbers:Plants
A document's Climbers or Plants field value must contain a phrase that begins with passi
and ends with incarnata (for example, passionflower incarnata or passiflora incarnata)
for this document to be returned as a result.
Page 208
Retrieval
FieldText=WILD{*www.autonomy.com*.txt}:PATH
A document's PATH field value must contain a path that contains www.autonomy.com and
ends with .txt (for example, http://www.autonomy.com/files/doc.txt) for this document to
be returned as a result.
FieldText=WILD{wom?n }:Clothes
A document's Clothes field value must contain a word that matches the specified wildcard
string (for example, woman or women) for this document to be returned as a result.
Note:
You can also use the WILD field specifier to find documents in which one of the following meta fields
(see Meta fields on page 301) contains a string that matches a wildcarded string specified by you:
autn_database
autn_langtype
Examples:
FieldText=WILD{Ira*}:autn_database
A document's autn_database field must contain a value that starts with Ira (for example,
Irak or Iran) for this document to be returned as a result.
FieldText=WILD{eng*}:autn_langtype
A document's autn_langtype meta field must contain a value that starts with eng (for
example, englishASCII or English_UTF8) for this document to be returned as a result.
Page 209
Retrieval
ARANGE
BITAND, BITANDHEX or
BITANDOFFHEX
BOOLEANFIELD
EMPTY
EXISTS
FUZZY
STRING, STRINGALL or
SUBSTRING
Page 210
Retrieval
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains a term that falls within the inclusive alphabetical range of <your terms>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=ARANGE{aardvark,alligator}:ANIMAL
A document's ANIMAL field must contain a value that alphabetically falls between aardvark
and alligator. If a document's ANIMAL field contains the value aardvark, ant, anteater,
antelope or alligator, the document is returned. If a document's ANIMAL field contains the
value armadillo, it is not returned.
FieldText=ARANGE{bear,buffalo}:ANIMAL:TIER
A document's ANIMAL or TIER field must contain a value that alphabetically falls between
bear and buffalo. If a document's ANIMAL or TIER field contains the value bear, bee,
Biene, bird or buffalo, the document is returned. If a document's ANIMAL field contains
the value Bffel or chipmunk, it is not returned.
BITAND
The BITAND field specifier (case sensitive) allows you find documents with a field whose integer value
does not result in 0 when a bitwise AND operation is carried out between this value and an integer
value specified by you.
Format:
Page 211
Retrieval
<your bit fields>
Enter one or more fields. A document is only returne, if it contains one of these fields, and if
this field contains an integer that results in a non-zero value when a bitwise AND operation
is carried out between it and <your integer>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
For example:
FieldText=BITAND{128}:BitField
The binary representation of the integer value 128 is compared with the binary
representations of the integer values that BitField fields in IDOL server contain. Only
documents whose BitField values result in a non-zero value when they are compared to
the binary representation of 128 are returned.
If a document's BitField, for example, contains the integer value 129, it is returned, while a
document whose BitField contains the value 127 is not returned.
Field value comparison:
Integer
Binary
128
1000 0000
129
1000 0001
1000 0000
Integer
Binary
128
1000 0000
127
0111 1111
0000 0000
Page 212
Retrieval
BITANDHEX
The BITANDHEX field specifier (case sensitive) allows you find documents with a field whose
hexadecimal string value does not result in 0 when a bitwise AND operation is carried out between this
value and a hexadecimal string specified by you.
Format:
For example:
FieldText=BITANDHEX{7F}:BitField
The binary representation of the hexadecimal value 7F is compared with the binary
representations of the hexadecimal values that BitField fields in IDOL server contain. Only
documents whose BitField values result in a non-zero value when they are compared to
the binary representation of 7F are returned.
If a document's BitField, for example, contains the hexadecimal value C0, it is returned,
while a document whose BitField contains the hexadecimal value 80 is not returned.
Field value comparison:
Hex
Binary
7F
0111 1111
C0
1100 0000
0100 0000
Hex
Binary
7F
0111 1111
80
1000 0000
0000 0000
Page 213
Retrieval
BITANDOFFHEX
The BITANDOFFHEX field specifier (case sensitive) allows you find documents with a field whose
hexadecimal string value does not result in 0 when a bitwise AND operation is carried out between this
value and a hexadecimal string specified by you.
Format:
For example:
FieldText=BITANDOFFHEX{01,0a001}:BitOffField
The binary representation of the hexadecimal value 01,0a001 is compared with the binary
representations of the hexadecimal values that BitOffField fields in IDOL server contain.
Only documents whose BitOffField values result in a non-zero value when they are
compared to the binary representation of 01,0a001 (after they have been left shifted by one
16 bit chunk) are returned.
If a document's BitOffField, for example, contains the value 1,bc01, it is returned, while a
document whose BitOffField contains the value 0,5ffeffff is not returned.
Field value comparison:
nn,hexstring
Hex
Binary
01,0a001
A0010000
1,bc01
BC010000
Page 214
Retrieval
nn,hexstring
Hex
Binary
01,0a001
A0010000
0,5ffeffff
5FFEFFFF
For example:
BOOLEANFIELD{The cat sat on the mat}:MyFirstBooleanField:MySecondBooleanField
Any document that has a MyFirstBooleanField or MySecondBooleanField field which
contains a Boolean or Proximity expression that matches the specified text is returned. For
example, the Boolean/Proximity expressions cat AND mat, cat OR mat, cat BEFORE mat
and cat DNEAR1 sat could match The cat sat on the mat, therefore documents that
contain any of these Boolean/Proximity expressions would be returned.
Documents whose MyFirstBooleanField or MySecondBooleanField fields contain, for
example, cat AND mat AND dog or mat BEFORE cat would not be returned.
Page 215
Retrieval
FieldText=EMPTY{}:<your fields>
<your fields>
Enter one or more fields. A document is only returned if it doesn't contain any of these fields
or if these fields are empty.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=EMPTY{}:ID
A document must not contain an ID field or hold no value within its ID field to be returned.
FieldText=EMPTY{}:ID:Name
A document must not contain an ID or Name field, or hold no value in its ID or Name field to
be returned.
FieldText=EXISTS{}:<your fields>
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields (even
if the field is empty).
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Page 216
Retrieval
Examples:
FieldText=EXISTS{}:ID
A document must contain an ID field to be returned.
FieldText=EXISTS{}:ID:NAME
A document must contain an ID or NAME field (or both) to be returned.
For example:
FieldText=FUZZY{Bisiness News,Arkive}:DRETITLE
A document's DRETITLE field value must be similar to the term Bisiness News or Arkive
for this document to be returned. (A document whose DRETITLE field contains Business
News would be returned, while a document whose DRETITLE field contains Document
Arkive would not).
Page 217
Retrieval
Examples:
FieldText=STRING{cat,dog}:ANIMAL
A document's ANIMAL field value must contain the substring cat or dog for this document
to be returned. If a document's ANIMAL field, for example, has the value scattering this
document will be returned.
FieldText=STRING{old cat}:ANIMAL:TOPIC
A document's ANIMAL or TOPIC field value must contain the substring old cat for this
document to be returned. If a document's ANIMAL field, for example, has the value old cat,
old caterpillar or bold cats, this document will be returned.
FieldText=STRING{autonomy.com}:COMPANY
A document's COMPANY field value must contain the substring autonomy.com for this
document to be returned. If a document's COMPANY field, for example, has the value
autonomy.com or http://www.autonomy.com/content/home, this document will be
returned.
FieldText=STRING{a\,b}:MISC
A document's MISC field value must contain the substring a,b for this document to be
returned. If a document's MISC field, for example, has the value a,b or a,b,c, this document
will be returned.
Page 218
Retrieval
Note:
You can also use the STRING field specifier to find documents in which one of the following meta fields
(see Meta fields on page 301) contains a substring specified by you:
autn_database
autn_langtype
Examples:
FieldText=STRING{Archiv}:autn_database
A document's autn_database field must contain the substring Archive for this document to
be returned. If a document's autn_database meta field, for example, has the value Archive
or Archives, this document will be returned.
FieldText=STRING{english}:autn_langtype
A document's autn_langtype meta field must contain the substring english for this
document to be returned. If a document's autn_database meta field, for example, has the
value englishASCII or English_UTF8, this document will be returned.
STRINGALL
The STRINGALL field specifier (case sensitive) allows you to specify one or more strings, which all
must be contained as a substring in a specified field.
Format:
Examples:
FieldText=STRINGALL{cat,dog}:ANIMAL
A document's ANIMAL field value must contain the substrings cat and dog for this
document to be returned. If a document's ANIMAL field, for example, has the value
grooming cats and dogs or doggedly scattering seeds, this document will be returned.
Page 219
Retrieval
FieldText=STRINGALL{old cat,young dog}:ANIMAL:TOPIC
A document's ANIMAL or TOPIC field value must contain the substrings old cat and young
dog for this document to be returned. If a document's ANIMAL field, for example, has the
value old cat chases young dog, or young.doggedly chasing bold cats, this document
will be returned.
FieldText=STRINGALL{a\,b,e\,f}:MISC
A document's MISC field value must contain the substring a,b and e,f for this document to
be returned. If a document's MISC field, for example, has the value a,b,c,d,e,f or 0=e,fx
1=da,ba, this document will be returned.
SUBSTRING
The SUBSTRING field specifier (case sensitive) allows you to return documents whose field value is a
substring of a specified string (or equal to a specified strings).
Format:
Examples:
FieldText=SUBSTRING{Telecommunications,Technology}:SECTOR
A document's SECTOR field must contain a string that is a substring of
Telecommunications or Technology. If a document's SECTOR field, for example, has the
value Telecom or Technology, the document will be returned. If a document's SECTOR
field has the value Latest Technology, the document will not be returned.
FieldText=SUBSTRING{scattering,doggedly}:ANIMAL
A document's ANIMAL field value must contain a substring of scattering or doggedly for
this document to be returned. If a document's ANIMAL field, for example, has the value cat
or dog, this document will be returned.
Page 220
Retrieval
TERM
The TERM field specifier (case sensitive) allows you to find documents with a specified field whose
value contains a conceptual match of one or more terms specified by you. A conceptual match exists if
a term you specify matches a term in a specified field after it has been stemmed.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:
Examples:
FieldText=TERM{shopping,centers}:DRETITLE
A document's DRETITLE field must contain a term that conceptually matches shopping or
centers for this document to be returned. If a document's DRETITLE field, for example, has
the value shop this document will be returned, while if it has the value bookshopping, it
will not be returned.
FieldText=TERM{training,football}:ITEM:PRODUCT
A document's ITEM or PRODUCT field must contain a term that conceptually matches
trainers or football for this document to be returned. If a document's ITEM or PRODUCT
field, for example, has the value train or footballers, this document will be returned, while if
it has the value trainer or soccer, it will not be returned.
Page 221
Retrieval
TERMALL
The TERMALL field specifier (case sensitive) allows you to find documents with a specified field
whose value contains conceptual matches of several terms specified by you. A conceptual match
exists if the terms you specify match terms in a specified field after they have been stemmed.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:
Examples:
FieldText=TERMALL{shopping,centers}:DRETITLE
A document's DRETITLE field value must contain a term that conceptually matches
shopping or centers for this document to be returned. If a document's DRETITLE field, for
example, has the value town center shop this document will be returned.
FieldText=TERMALL{walk,climb}:DRETITLE:TITLE
A document's DRETITLE or TITLE field value must contain a term that conceptually
matches walking or climbing for this document to be returned. If a document's DRETITLE
or TITLE field, for example, has the value hill walking and rock climbing this document
will be returned.
TERMEXACT
The TERMEXACT field specifier (case sensitive) allows you to find documents with a specified field
that contains an exact match of any of the terms specified by you.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Page 222
Retrieval
Format:
Examples:
FieldText=TERMEXACT{help,helped}:DRETITLE
A document's DRETITLE field value must contain the term help or helped for this
document to be returned. If a document's DRETITLE field, for example, has the value helps
or helping, the document will not be returned.
FieldText=TERMEXACT{Word,Excel}:FILE:DATEI
A document's FILE or DATEI field value must contain the term Word or Excel for this
document to be returned. If a document's FILE or DATEI field, for example, has the value
WordPerfect, the document will not be returned.
TERMEXACTALL
The TERMEXACTALL field specifier (case sensitive) allows you to find documents with a specified
field that contains an exact match of all terms specified by you.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:
Page 223
Retrieval
<your fields>
Enter one or more fields. A document is only returned if it contains one of these fields, and if
this field contains an exact match of all <your terms>.
If you want to specify multiple fields, you must separate them with colons (there must be no
space before or after a colon).
Examples:
FieldText=TERMEXACTALL{rabbits,eating,carrots}:DRETITLE
This query returns only documents whose DRETITLE field contains all the specified terms
(in their specified form). For example, a document whose DRETITLE field has the value
Rabbits like eating carrots or The carrots were there but the rabbits ate all the
cabbage will be returned as a result, while a document with a field that contains Rabbits
like to eat a carrot each day will not be returned.
FieldText=TERMEXACTALL{flour,milk,eggs}:DRETITLE:TITLE
This query returns only documents whose DRETITLE or TITLE field contains all the
specified terms (in their specified form). For example, a document whose DRETITLE or
TITLE field has the value Most cake recipes include milk, eggs and flower will be
returned as a result, while a document with a field that contains Use a cup of milk, two
cups of flour and one egg will not be returned.
TERMEXACTPHRASE
The TERMEXACTPHRASE field specifier (case sensitive) allows you to return documents in which a
specified field contains an exact match of a phrase specified by you. Your phrase is matched before
stemming is applied (stopwords are not removed). Any punctuation in the specifier or field is ignored.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:
Page 224
Retrieval
Examples:
FieldText=TERMEXACTPHRASE{Batman! and Robins}:FILM
A document whose FILM field contains Showing now, Batman and Robin's film, will be
returned as a result, while a document whose FILM field contains Showing now, 'Batman
and Robin' the movie will not be returned.
FieldText=TERMEXACTPHRASE{gift horse }:DRETITLE:TITLE
A document whose DRETITLE or TITLE field contains looking a gift horse in the mouth,
will be returned as a result, while a document whose DRETITLE or TITLE field contains the
gift horse's mouth had rotting teeth will not be returned.
TERMPHRASE
The TERMPHRASE field specifier (case sensitive) allows you to return documents in which a specified
field contains a conceptual match of a phrase specified by you. Your phrase is matched after stemming
is applied (stopwords are not removed). Any punctuation in the specifier or field is ignored.
Note: if the language that you are using does not match the DefaultLanguageType that you have
specified in IDOL server's configuration file, you must add the LanguageType parameter to your query
command (see Specifying the language type of your query on page 321).
Format:
Examples:
FieldText=TERMPHRASE{Batman! and Robins}:FILM
A document whose FILM field contains Showing now: 'Batman and Robin', will be
returned as a result.
FieldText=TERMPHRASE{gift horse }:DRETITLE:TITLE
A document whose DRETITLE or TITLE field contains the gift horse's mouth had rotting
teeth will be returned.
Page 225
Retrieval
Page 226
Retrieval
Fuzzy queries
If you are not quite sure how some of the words are spelled that you want to query for, you can use the
Query action command to submit a fuzzy query to IDOL server. A fuzzy query returns results that
contain words, which are similar to the entered string.
If you want to submit a fuzzy query, you have to specify the Query actions Text parameter using one
of the following formats:
Text=<my_query_text>DREFUZZY(fuzzy_query_text)
For example:
http://<host>:<port>/action=Query&Text=best selling author DREFUZZY(Rowlling)
Text=DREFUZZY(fuzzy_query_text)
For example:
http://<host>:<port>/action=Query&Text=DREFUZZY(Caroll Jabberwalky)
Page 227
Retrieval
Parametric searches
The GetTagValues and GetQueryTagValues action commands allows you to execute parametric
searches.
A parametric search allows you to search for items by their characteristics (values in certain fields).
When you provide fixed values in parametric fields, the parametric search returns consistent values in
the non-fixed parametric fields. For example, you can search an IDOL server wine database for
specific wine varieties from a specific region by specifying which fields must match these
characteristics, so that only wines that are of the specified variety and from the specified region are
returned.
Before you can execute parametric searches, you need to configure IDOL server to recognize
parametric fields.
2.
In the [Server] section, set the ParametricRefinement parameter to true (if the section doesnt
contain this parameter, you have to add it).
3.
4.
Create a section for each field process that you have listed, in which you create a property for the
process (a property is later defined by one or more applicable configuration parameters). Identify
the fields that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[ParametricFields]
Property=Parametric
PropertyFieldCSVs=*/Grape,*/Color,*/Region,*/Price
Page 228
Retrieval
5.
List the properties that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=Parametric
6.
Create a section for the parametric property in which you set the ParametricType parameter to
true. This enables IDOL server to recognize the associated PropertyFieldCSVs fields as
parametric fields.
For example:
[Parametric]
ParametricType=true
7.
Save and close the configuration file. You can now index your data into IDOL server.
GetTagValues
This action allows you to specify one or more parametric fields and return all values that are
stored within these fields in IDOL server. This includes values in documents that you dont have
access to and values in documents that have been deleted (unless you have compacted IDOL
servers Data index since they were deleted).
For example:
http://localhost:5552/action=GetTagValues&FieldName=Grape
This action command requests the different values that are stored in IDOL servers Grape
fields. This allows you to return a list of all grape varieties stored in an IDOL server wine
database, for example.
You can also restrict the command, so it only returns Grape field values if they are contained in
a document that also contains other specific fields that have specific values.
For example:
http://localhost:5552/
action=GetTagValues&FieldName=Grape&Restriction=MATCH{Barossa
Valley}:Region+MATCH{Red}:Color
This action command returns only Grape field values if they are contained in a document that
also contains a Region field that has the value Barossa Valley and a Color field that has the
value Red.
Page 229
Retrieval
GetQueryTagValues
This action allows you to combine query text with one or more parametric fields. When IDOL
server executes the query, it finds documents that match the specified query text, and returns
the values of the specified parametric fields for these documents. Unlike the GetTagValues
action, the GetQueryTagValues action does not return field values that are contained in
documents that you dont have access to or that have been deleted.
For example:
http://localhost:5552/action=GetQueryTagValues&FieldName=GRAPE,COUNTRY&Text=
A smooth red wine that complements game
This action command requests the different values that are stored in the GRAPE and
COUNTRY fields of documents that are conceptually similar to the specified Text.
You can also restrict the command by combining it with various action parameters.
For example:
http://localhost:5552/action=GetQueryTagValues&FieldName=GRAPE,COUNTRY&Text=
A smooth red wine that complements game&MaxValues=10&Sort=Alphabetical
This action command requests the 10 top values that are stored in the GRAPE and COUNTRY
fields of documents that are conceptually similar to the specified Text. IDOL server displays the
values in alphabetical order when it returns them.
http://localhost:5552/action=GetQueryTagValues&FieldName=GRAPE,COUNTRY&Text=
A smooth red wine that complements game&DocumentCount=true
This action command requests the different values that are stored in the GRAPE and
COUNTRY fields of documents that are conceptually similar to the specified Text. The
DocumentCount parameter instructs IDOL server to return the number of documents that
contain each value.
http://localhost:5552/action=GetQueryTagValues&FieldName=GRAPE,COUNTRY&Text=
A smooth red wine that complements game&FieldDependence=true
This action command requests the different values that are stored in the GRAPE and
COUNTRY fields of documents that are conceptually similar to the specified Text. The
FieldDependence parameter instructs IDOL server to find sets of values that occur together. If
IDOL server finds documents that contain the first parametric field listed, it checks if they also
contain the subsequently listed parametric fields.
For further details on available parameters for the GetTagValues and GetQueryTagValues actions,
please refer to the online help ( see Displaying online help on page 61).
Page 230
Retrieval
2.
Before content is stored in IDOL server, individual terms are always stemmed and individual
stopwords are always discarded. If you want to store Proper Name terms (adjacent terms that
begin with a capital letter) in addition to the normal content, you can set the ProperNames
parameter in the [LanguageTypes] section to one of the following.
0
The following ProperNames options are only required, if you need to be able to query for Proper
Names that contain stopwords (for example, "The Who" or "The Queen"):
3
Stopwords* that are adjacent to terms* are compounded with these, then
stemmed and indexed as a unit.
Adjacent stopwords* are compounded, then stemmed and indexed as a unit.
Adjacent terms* are compounded, then stemmed and indexed as a unit.
Stopwords* that are adjacent to terms* are compounded with these, and indexed
unstemmed as a unit.
Adjacent stopwords* are compounded, then stemmed and indexed unstemmed
as a unit.
Adjacent terms* are compounded and indexed unstemmed as a unit.
Page 231
Retrieval
Stopwords* that are adjacent to terms* are compounded with these, and indexed
unstemmed as a unit.
Adjacent stopwords* are compounded, then stemmed and indexed unstemmed
as a unit.
Note: it is recommended that you use this setting, if you have set
AdvancedSearch to true in the IDOL server configuration file's [Server] section.
* these must begin with a capital letter (followed by lower case).
Note that you need to set this parameter for each of the languages that you want to enable name
recognition for (if a language's settings don't include the ProperNames parameter, you should
add it).
For example:
[LanguageTypes]
DefaultLanguageType=English
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
0=English
1=Deutsch
2=Francais
[English]
LanguageCode=1
Language=ENGLISH
Encoding=ASCII
ProperNames=1
[Deutsch]
LanguageCode=2
Language=GERMAN
Encoding=ASCII
ProperNames=1
[Francais]
LanguageCode=2
Language=FRENCH
Encoding=ASCII
ProperNames=1
3.
4.
Index documents into IDOL server. Once you have finished indexing, any Query action command
is automatically treated by IDOL server as a Proper Name query.
Page 232
Retrieval
Example
Depending on the ProperNames setting, IDOL server stores the following terms for the sentence Tom
Jones And His greatest hits:
0
TOM
JONE
GREAT
HIT
TOM
TOMJON
JONE
GREAT
HIT
TOM
TOMJON
JONE
GREAT
TOM
TOMJON
JONE
TOM
TOMJON
JONE
TOM
TOMJONES
JONE
TOM
TOMJONES
JONE
TOM
JONE
GREATESTHIT
HIT
ANDHI
GREAT
HIT
ANDHI
GREAT
HIT
ANDHIS
GREAT
HIT
JONESAND
ANDHIS
GREAT
HIT
JONESAND
ANDHIS
GREAT
HIT
JONESAND
If IDOL server contains the following documents, the queries below produce different results according
to what ProperNames has been set to:
Doc 1:
Doc 2:
action=Query&Text=Tom Jones
If ProperNames has been set to 0 or 7, both documents are returned with the same
relevance (in both cases, IDOL server is queried with the terms TOM and JONE which are
matched by both documents).
If ProperNames has been set to 1, 2, 3, 4, 5 or 6, Doc 2 is returned with a higher relevance
than Doc 1 (because it matches not just the terms TOM and JONE but also TOMJON or
TOMJONES).
action=Query&Text=tom jones
If ProperNames has been set to 0, 1, 3, 4, 5, 6 or 7, both documents are returned with the
same relevance (in both cases, IDOL server is queried with the terms TOM and JONE which
are matched by both documents).
If ProperNames has been set to 2, Doc 2 is returned with a higher relevance than Doc 1
(because it matches not just the terms TOM and JONE but also TOMJON).
Page 233
Retrieval
action=Query&Text=The The
If ProperNames has been set to 0, 1 or 2, the query returns no results (because both
instances of the word "The" are discarded as stopwords).
If ProperNames has been set to 3, 4, 5, 6 or 7, only Doc 1 is returned (because in all these
cases IDOL server is queried with the term THETH or THETHE which are only matched by
Doc 1).
action=Query&Text=the the
If ProperNames has been set to 0, 1, 2, 3, 4, 5, 6 or 7, no results are returned (because both
instances of the word "the" are discarded as stopwords).
Page 234
Retrieval
Proximity searches
You can use the Query action command to submit proximity searches which allow you to give words
that appear close together in the search string a higher weighting.
You apply the following operators to words, exact phrases or Boolean expressions in order to execute
a Proximity search. Note that APCM (Adaptive Probabilistic Concept Modeling) is used to rank the
results that match the Boolean query.
NEAR<N>
Only returns documents in which the second term is within <N> words of the first
term. If you dont specify <N>, NEAR defaults to 6.
For example:
action=Query&Text=cat+NEAR1+dog
This query only returns documents in which the term cat is no more than 1 word
away from dog. This means that documents, which contain "cats and dogs" and
documents that contain "dogs and cats" are returned, while documents that contain
"cats do not like dogs" are not returned (as the terms are not close enough to each
other).
DNEAR<N>
Directed NEAR. Only returns documents in which the second term is within <N>
words of the first term, in the specified order. If you dont specify <N>, DNEAR
defaults to 6.
For example:
action=Query&Text=cat+DNEAR1+dog
This query only returns documents in which the term "dog" follows the term "cat", but
is no more than 1 word away from the term "cat". This means that documents, which
contain "cats and dogs" are returned, while documents that contain "dogs and cats"
or "cats do not like dogs" are not returned.
WNEAR<N>
Weighted NEAR. Proximity operator that promotes relevance when term spacing is
less than the specified <N> word distance (closer together implies higher relevance).
If you dont specify <N>, WNEAR defaults to 6.
For example:
action=Query&Text=dog+WNEAR7+cat
In this query extra relevance is given to documents in which "cat" and "dog" appear
within 7 words of each other in a piece of text. This weight increases as the terms
get closer to each other.
Page 235
Retrieval
BEFORE
Only returns documents in which the first term precedes the second one.
For example:
action=Query&Text=cat+BEFORE+dog
This query only returns documents in which the term "dog" appears later than the
term "cat".
AFTER
Only returns documents in which the first term appears later than the second one.
For example:
action=Query&Text=cat+AFTER+dog
This query only returns documents in which the term "cat" appears later than the
term "dog".
Highest precedence:
NOT
NEAR; DNEAR
AND; BEFORE; AFTER
Lowest precedence:
Operators that have the same level of precedence have neither left or right associativity. You can use
brackets to bind terms together as appropriate (note that Proximity operators must have terms on
either side and cannot be adjacent to brackets).
Page 236
Retrieval
2.
In the [Server] section, set the Soundex parameter to 1. (If the [Server] section doesn't contain
the Soundex parameter, you should add it).
3.
4.
Index documents into IDOL server. Once you have finished indexing, you can perform Soundex
keyword searches using the Query action command.
Page 237
Retrieval
Synonym queries
A synonym query returns results which are conceptually similar to the terms in a Query actions Text
parameter and / or conceptually similar to the synonyms that are available for the Text terms.
To be able to send synonym queries to IDOL server, you need to:
1.
2.
For details on the settings that the [Synonym] sections can contain and on how you can configure
them, please refer to IDOL servers online help (see Displaying help on configuration settings on
page 389).
Create a text file and save it in IDOL server's installation directory using the File name you have
specified in the IDOL server configuration file's [<Synonym_type>] section.
2.
Create sections for each language type that you have defined in IDOL server's configuration file.
For example:
[English_ASCII]
[German_UTF8]
3.
In each section create a line for each word that you want to list synonyms for (using the same
encoding that you are using for the associated language type).
For example:
[English_ASCII]
cat
dog
[German_UTF8]
Katze
Hund
Page 238
Retrieval
4.
List synonym strings next to each word and save the file. Note that you must separate the word
and each string with commas and that there must be no space before or after a comma. The
individual terms can contain spaces but must not contain any punctuation.
Note: the synonym file should not comprise more than 100 lines.
For example:
[English_ASCII]
cat,feline,grimalkin,moggy,mouser,puss,pussy,tabby
dog,bitch,cur,hound,mans best friend,mongrel,mutt,pooch,puppy
[German_UTF8]
Katze,Mietze,Mietzekatze,Mietzekater,Kater,Mulle,Ktzchen
Hund,Wau Wau,Hndin,Tle,Klffer,Hndchen,Welpe
2.
3.
Create a section for the synonym process that you have listed, in which you create a property for
the process (synonym properties always point to a defined synonym job). Identify the fields that
you want to associate with the process (when identifying the fields that IDOL server uses for
synonym matching you should use the format /FieldName to match root-level fields, */FieldName
to match all fields except root-level or /Path/FieldName to match fields that the specified path
points to).
Note: the properties that you create must not have the same name as processes.
For example:
[SynonymMatch]
Property=ApplySynonymMatch
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT
In this example IDOL server will only return documents for synonym queries, if their DRETITLE or
DRECONTENT field values match the query.
Page 239
Retrieval
4.
5.
Create a section for the property in which you set the SynonymType parameter to the name of
the synonym job that specifies which settings IDOL server should apply to synonym queries.
[ApplySynonymMatch]
SynonymType=Synonym_job
6.
7.
Define a section for your synonym job (the section must have the same name as the synonym job)
in which you specify the settings that you want to apply to synonym queries.
For example:
[Synonym_job]
File=animals.txt
MaxExpandLevel=1
8.
Page 240
Retrieval
Page 241
Retrieval
Page 242
Retrieval
Page 243
Retrieval
Highest precedence:
NOT
NEAR; DNEAR
AND; BEFORE; AFTER
Lowest precedence:
Operators that have the same level of precedence have neither left or right associativity. You can use
brackets to bind terms together as appropriate (note that Proximity operators must have terms on
either side and cannot be adjacent to brackets).
Page 244
Retrieval
Page 245
Retrieval
Examples:
http://<host>:<port>/action=Query&Text=rollersk*
Wildcard matching is carried out after stemming has taken place. The term "rollerskating", for
example, is stemmed to rollersk when it is indexed into IDOL server.
This means that the query above returns documents that contain any terms that have been
stemmed to rollersk, for example, "rollerskating", "rollerskater", "rollerskate", "rollerskates".
The query http://<host>:<port>/action=Query&Text=rollerskat*, however, would not return
any results.
http://<host>:<port>/action=Query&Text=Mi?rotech
This query returns documents that contain the term "Mikrotech" or "Microtech".
http://<host>:<port>/action=Query&Text="Co*ins":Name:Author+Arm?dale:Title
This query returns documents that contain a Name or Author field whose value matches the
wildcard string Co*ins (for example, "Collins") and documents that contain a Title field whose
value matches the wildcard string Arm?dale (for example "Armadale").
Page 246
Retrieval
When identifying fields you should use the format /FieldName to match root-level fields,
FieldName to match all fields except root-level or /Path/FieldName to match fields that
the specified path points to.
Strings can contain punctuation (except curly brackets), which means that if you want to match a string
that contains html with IDOL server content, you may need to escape the html to avoid confusion with
"&" and so on.
If you want to match a string that contains a comma, you need to escape the comma with a backslash,
otherwise IDOL server reads it as a separator.
You can match multiple fields simultaneously by separating them with colons.
Examples:
http://<host>:<port>/action=Query&FieldText=WILD{wom?n }:Clothes
A document's Clothes field must contain a word that matches the specified wildcard string (for
example, "woman" or "women") for this document to be returned as a result.
Retrieval
http://<host>:<port>/action=Query&FieldText=WILD{Glory is fleeting\, but * is
forever}:QuotesNapoleon
A document's QuotesNapoleon field must contain a string that matches the specified wildcard
string (for example, "Glory is fleeting, but obscurity is forever") for this document to be returned
as a result.
http://<host>:<port>/action=Query&FieldText=WILD{*.html,*.htm}:URL
A document's URL field value must end with html or htm for this document to be returned as a
result.
http://<host>:<port>/action=Query&FieldText=WILD{passi*incarnata}:Climbers
A document's Climbers field must contain a phrase that begins with passi and ends with
incarnata (for example, "passionflower incarnata" or "passiflora incarnata") for this document to
be returned as a result.
http://<host>:<port>/action=Query&FieldText=WILD{
passi*incarnata,passi*alata*}:Climbers
A document's Climbers field must contain a string that matches one of the specified wildcard
strings (for example, "passionflower incarnata", "passiflora incarnata", "passionflower alata", "
passiflora alata","passionflower alata shannon" or " passiflora alata shannon") for this
document to be returned as a result.
http://<host>:<port>/action=Query&FieldText=WILD{*www.autonomy.com*.txt,*www.aut
onomy.com*.pdf}:PATH:URL
A document's PATH or URL field must contain a path that contains www.autonomy.com and
ends with .txt or .pdf (for example, "http://www.autonomy.com/files/doc.txt" or "http://
www.autonomy.com/fields/technicalbrief.pdf") for this document to be returned as a result.
Page 248
Retrieval
Text
Specify your entire query string including the non-alphanumeric characters.
Note:
if the string you are searching for comprises an ampersand (&), you must escape it (since it is a
special character used by the query syntax).
If any of the following characters occurs in the middle of the string you are searching for, you must
replace them with a space, unless you have explicitly removed them from the list of characters
that IDOL server uses as separators (using the DiminishSeparators parameter in IDOL server's
configuration file):
~[]*?:()"
Alternatively, you can set the IgnoreSpecials action parameter (which you can set for the Query
and GetQueryTagValues action) to true to instruct IDOL server to interpret the following
characters as normal characters in query syntax:
*?":() and Boolean / Proximity operators AND, NOT, OR, EOR, XOR, NEAR, DNEAR,
WNEAR, BEFORE, AFTER
This disables wildcarding, phrase queries, field restriction and Boolean operations.
FieldText
Use the STRING field specifier to search for your entire query string including the non-alphanumeric
characters in the appropriate field in IDOL server.
Note:
if the string you are searching for comprises an ampersand (&), you must escape it (since it is a
special character used by the query syntax).
if the string you are searching for comprises a comma, you must escape it by prefixing it with a
backslash (\)
Page 249
Retrieval
Examples:
To search for "Auto*":
http://<host>:<port>/action=query&text=Auto&FieldText=STRING{Auto*}:DRECONTENT
Page 250
Retrieval
Query syntaxes
A querys processing time depends on the syntax that the query uses. While different syntaxes are
available, some of them require fields to have been created in a specific way.
Fastest
Syntax:
Requires:
Example:
Page 251
Retrieval
Syntax:
action=Query&Text=<text>&FieldText=MATCH{<attribute>}:<field name>
Requires:
Example:
England
France
Germany
USA
In documents a field is created for each category that they belong to. For
example, if a document belongs to the categories France and USA:
#DREFIELD Cat1=France
#DREFIELD Cat2=USA
The following query returns documents that match the specified Text and
contain the value France in one of their Cat fields (for example, Cat1).
action=Query&Text=presidential election&FieldText=MATCH{France}:Cat*
Syntax:
Requires:
that you create a numeric field for all attributes (if you have more than
32 bits, you need more fields)
Example:
Page 252
Retrieval
Slowest
Syntax:
action=Query&Text=<text>&FieldText=STRING{<attribute>}:<field name>
Requires:
Example:
England
France
Germany
USA
In documents a field is created that contains a CSV of the categories that
the documents belong to. For example, if a document belongs to the
categories France and USA:
#DREFIELD Cat=France,USA
The following query returns documents that match the specified Text and
contain the value France in their Cat field.
action=Query&Text=presidential election&FieldText=STRING{France}:Cat
Page 253
Retrieval
Page 254
In the IDOL server configuration file, configure the following settings in the [Server] section:
SpellCheckCorrectMinDocOccs
The minimum number of documents that a term has to appear in before IDOL server can use
it as a spell check suggestion.
SpellCheckIncorrectMaxDocOccs
The maximum number of documents that a term can appear in for IDOL server to search for a
spelling correction for it.
SpellCheckMaxCheckTerms
IDOL server's spelling correction has no effect on queries that comprise more than the
specified number of non-stopword terms (ProperName and hyphenated terms are also
ignored).
2.
Page 255
Spelling correction
Page 256
25. Summarization
IDOL server can automatically generate one of the following summary types for the results it produces.
All summaries are generated in real time.
Concept
A conceptual summary of each result document. A concept summary comprises sentences that
are typical of the result's content (these sentences can be from different parts of the result
document).
Context
A conceptual summary of each result document that is biased by the terms in the querys Text
and/or FieldText. A context summary comprises sentences that are particularly relevant to the
terms in the query (these sentences can be from different parts of the result document).
Quick
A brief summary of each result document. A quick summary comprises the first few sentences
of the result document.
ParagraphConcept
A conceptual summary of each result document which comprises the paragraphs that are most
typical of the result's content (these paragraphs can be from different parts of the result
document).
ParagraphContext
A conceptual summary of each result document that is biased by the terms in the query Text
and/or FieldText. This summary comprises paragraphs that are particularly relevant to the
terms in the query.
Page 257
Summarization
Send a Query, Suggest or SuggestOnText action to IDOL server that includes the Summary
parameter. Set the Summary parameter to the type of summary that you want to return for results
(Concept, Context, Quick, ParagraphConcept or ParagraphContext).
For example:
http://<host>:<port>/action=Query&Text=Undulant fever&Summary=Concept
Each result of this query that is returned with a conceptual summary.
2.
You can optionally set the following settings in the IDOL server configuration file depending on
which type of summary you want to generate:
For Concept, Context or Quick summaries
In the [Summary] section, use the SourceFields parameter to specify the fields from which the
summary should be generated.
For Concept or Context summaries
In the [Summary] section, set the MinWordsPerSentence parameter to the minimum number
of words that a sentence must comprise in order to be considered as a sentence that can be
used in the summary.
For Context summaries
In the [Server] section, set the ContextSummaryQueryTermWeight parameter to the weight
that should be used for the terms in the user's query. The context summary will give this weight
to sentences that contain terms in common with the query text. The other terms will be given
their APCM weight.
3.
Save the IDOL server configuration file and restart IDOL server for your configuration changes to
take effect.
Page 258
Summarization
Page 259
Summarization
Page 260
Generating taxonomies
The TaxonomyGenerate action allows you to generate a hierarchical taxonomy from one or more
clusters (see Clustering on page 151 for details on how to generate clusters) or query results.
The taxonomy generator adapts the Bayesian and information theoretic methods to concept selection.
Bayesian algorithms are applied to identify statistical relationships between concepts and sets of
concepts (at the document and document set level), which are then filtered to form the hierarchic
structure of the final taxonomy.
You can write the taxonomy to disk as a directory structure, or import the taxonomy into the category
hierarchy.
Note that before you create a taxonomy from an IDOL server, you must make sure that IDOL server
does not contain duplicate documents or text that is repeated in multiple documents (for example,
document headers). Ensure that these are stripped out at the import stage in order to gain optimal
results.
You can set up a schedule that executes the TaxonomyGenerate action in regular intervals.
Page 261
Taxonomy generation
Page 262
Results
28. Results
Relevance ranking
In evaluating all types of queries, IDOL server employs complex algorithms based on a combination of
Information Theory and Bayesian methods to weight and rank the document returns by statistical
relevance. In doing so it makes use of information theoretic values calculated dynamically for all
concepts on indexing, allowing relevance to be evaluated both as a percentage, and in the case of
agents, as absolute values.
In practice, the relevance can be seen as a measure of the conceptual overlap between the query text
and the text within a document. This can be affected in several ways; certain fields can be given extra
weight by associating a weighting factor with them at indexing time. For example, extra weight can be
given when query terms appear in a document's title as opposed to the body of the text.
Page 265
Results
using BIAS
You can use the BIAS field specifier at query time to boost the percentage relevance of a
query's results according to the numerical proximity of a specified field to a given value.
using multipliers
You can use multiply the weight of individual query terms in order to boost the relevance of
results that match these terms accordingly.
For each field that whose content you want to use to determine if a result's weight is boosted, list
a process that indexes the field and manipulates its term weights in the [FieldProcessing]
section. Note that if you want to boost terms in several fields by the same factor, you only need to
create one process for this.
For example:
[FieldProcessing]
Number=2
0=IndexAndWeightHigher1
1=IndexAndWeightHigher2
Page 266
Results
2.
Create a section for each of the processes that you have listed, in which you create a property for
the process (a property is later defined by one or more applicable configuration parameters).
Identify the fields that you want to associate with the processes.
Note: the properties that you create must not have the same name as processes.
For example:
[IndexAndWeightHigher1]
Property=IndexHigherWeight1
PropertyFieldCSVs=*/DRETITLE
[IndexAndWeightHigher2]
Property=IndexHigherWeight2
PropertyFieldCSVs=*/SUMMARY
3.
List all the properties that you have created in a [Properties] section.
For example:
[Properties]
0=IndexHigherWeight1
1=IndexHigherWeight2
Create a section for each of the properties and specify configuration settings for each. The Index
parameter ensures that the fields that are associated with the field process are indexed, while the
Weight parameter determines the factor by which terms in the associated PropertyFieldCSVs
fields are boosted if they match query terms.
For example:
[IndexHigherWeight1]
Index=true
Weight=4
[IndexHigherWeight2]
Index=true
Weight=2
Save the configuration file and restart IDOL server. When you send a query to IDOL server, the
percentages that indicate the results' conceptual similarity to the query will now be affected by
how many times a result's SUMMARY and DRETITLE field terms match the query's terms.
For example:
If you send the following query to IDOL server, results whose SUMMARY and DRETITLE field
matches the query's terms "cat" and "dog" are boosted:
http://<IP_address>:<port>/action=query&text=cat and dog
Page 267
Results
This means that the following results would be returned in the following order:
Result 1
Title = Cats & Dogs
Summary = Cats and dogs duke it out in this live action feature about a professor on the brink
of discovering a cure for dog allergies. The dogs assign an agent to protect the professor and
his family from a feline invasion
Content = Unbeknownst to humans, dogs have fought for thousands of years to keep mankind
from falling under the rule of cats. Using combinations of live animals, animatronic puppets,
and digital wizardry, this film has just enough imagination to match its effects, climaxing with a
feline global-domination scheme involving mice sprayed with chemicals that will make all
humans allergic to their canine friends.
Result 2
Title = Garfield
Summary = Garfield comes to life in an all new live action major motion picture.
Content = Garfield is a fat cat. A cat that eats lots of Lasagne. A cat that is lazy and sleeps as
much as possible. Nevertheless, Garfield is a clever cat, always able to outwit his owner, Jon
and the neighbor's dog, Odie. Garfield is a cool and sarcastic cat but he is also a cat with a
heart as is shown when he comes to the rescue of Odie the dog, in the movie that is coming
out this year. The hapless pup disappears and is kidnapped by a nasty dog trainer, and
Garfield feels responsible. Pulling himself away from the TV, Garfield springs into action.
Maybe it's friendship for cat and dog after all.
Result 3
Title = Tom and Jerry : The movie
Summary = The celebrated cat and mouse team meets a young run-away who desperately
needs their help to find her missing father. Along the way they run into her evil Aunt who tosses
them into a pet prison. Bonding together, Tom & Jerry outwit the Aunt and mastermind a great
escape to set off on the wildest adventures of their cat and mouse careers.
Content = The popular animated duo team up again to appear this time on the big screen.
Homeless, the 'toons end up helping out a young girl who stays with a nasty auntie while she is
separated from her father. Will the young Robyn be reunited with her loving father? Will the odd
pair make it on the streets? Will they find a home? Those are some of the burning questions
that may plague the minds of young viewers of this fun adventure.
If the weight of the SUMMARY and DRETITLE field had not been boosted, Result 2 would have been
the top result, with Result 1 following in second place. Note that Result 3 is not ranked higher than
Result 2. Although its weight is slightly boosted because its SUMMARY field contains one of the query
terms, this boost is not sufficient to outrank Result 2 whose SUMMARY and DRETITLE field does not
contain any of the query's terms (the conjunction "and" in the query text is stripped before matching).
Page 268
Results
BIAS{<optimum>,<range>,<percentage>}
<optimum>
The value that the specified field must contain to increase or decrease the result's weight by the
maximum percentage.
<range>
A positive number that determines the range of the specified optimum. If the specified field
contains a value that is in the range of (optimum - range) to (optimum + range), the result's
weight is increased or decreased according to the specified percentage.
<percentage>
A percentage in the range -100 to 100. If the value of the specified field is within the specified
range, the score of the result is increased or decreased according to how close the value is to
the specified optimum.
For example:
http://<IP_address>:<port>/action=Query&FieldText=BIAS{100,50,10}:*/PRICE
A document whose PRICE field value is within the range 50 either side of 100 will have its
weight increased on a linear scale from 10% if the price is 100, to 0% if the price is 50 or 150:
Page 269
Results
http://<IP_address>:<port>/action=Query&FieldText=BIAS{100,50,-10}:*/PRICE
A document whose PRICE field value is within the range 50 either side of 100 will have its
weight decreased on a linear scale from -10% if the price is 100, to -0% if the price is 50 or 150:
Note:
You can also use the BIAS field specifier to bias the score of results according to the numerical
proximity in their autn_date meta field (see Meta fields on page 301) to a given value.
For example:
FieldText=BIAS{1103918400,259200,25}:autn_date
A document whose autn_date field value is within the range 259200 either side of 1103918400
will have its weight increased on a linear scale from 25% if the price is 1103918400, to 0%, if
the date is 1103659200 or 1104177600.
Page 270
Results
For example:
http:<host>:<ACI_port>/action=Query&Text=bread[*2.5]+brown+loaf
In this example, the weight of the query term bread is multiplied by 2.5 while the weight of the
query terms brown and loaf does not change.
When results are returned for the query, the relevance of documents that contain the term
bread is boosted relative to those that do not.
http:<host>:<ACI_port>/action=Query&Text=SOUNDEX(bred)+bred[*4]
In this example, a supermarket wants to ensure that a customers online search for bread
returns appropriate results. The supermarket has found that customers tend to misspell "bread"
as bred. If a customer queries for "bread", appropriate results are returned as usual. If a
customer queries for bred, the term is submitted twice - once as a Soundex keyword search
(see Soundex keyword searches on page 237) and once with a multiplier. This ensures that if
results exist that match bred (for example, a new CD by a band called bred), they are returned
with a higher relevance than results that match bred phonetically.
Similarly, multipliers can be used to reduce the influence of individual query terms.
For example:
http:<host>:<ACI_port>/action=Query&Text=cat[*0.5]+dog
In this example, the weight of the query term cat is halved by multiplying it by 0.5 while the
weight of the query terms dog does not change.
When results are returned for the query, the relevance of documents that contain the term cat is
reduced relative to those that do not.
Page 271
Results
MatchReference
The MatchReference action parameter allows you to specify one or more references that a
document's Reference field must match for the document to be returned as a result.
For example:
http://<host>:<port>/action=Query&Text=Bayes&MatchReference=http://
www.autonomy.com/Content/Technology.html
This query only returns documents that have a Reference field with the value
http://www.autonomy.com/Content/Technology.html.
DontMatchReference
The DontMatchReference action parameter allows you to specify one or more references that a
document's Reference field must not match for the document to be returned as a result.
For example:
http://<host>:<port>/action=Query&Text=Bayes&DontMatchReference=http://
www.autonomy.com/Content/Technology.html
This query only returns documents if they don't have a Reference field with the value
http://www.autonomy.com/Content/Technology.html.
Combine
The Combine action parameter allows you to ensure that, if several results derive from the same
document or contain the same content or the same value in a specific Reference field, only one of
these results is displayed (by default this is the result with the highest relevance, however, you can use
the Sort action parameter to set alternative sorting methods).
Page 272
Results
You can set Combine to one of the following:
Simple
This is the recommended Combine option.
When very long texts are indexed into IDOL server, they are by default broken up into sections and
then indexed as individual documents (each document has its own ID but they all have the same
document reference). This makes the indexing process more stable and ensures that when you
query IDOL server, the most relevant section of a text is returned (rather than, for example, an
entire book). However, if several sections are relevant to the query, each of them is returned as a
result. This means that a query can return multiple results that have the same document reference
and belong to the same text, for example, different pages that belong to the same book (if you
displayed each of these results using Print=AllSections you would receive the same text every
time).
You can prevent IDOL server from returning different sections of the same source text by adding
Combine=Simple to the query. IDOL server will only display the section that has the highest
conceptual similarity to the query (unless you add Print=AllSections to the query, in which case
the entire source text would be displayed). If multiple sections have the same conceptual
relevance, IDOL server returns the one with the lowest section number.
For example:
http://<host>:<port>/action=Query&Text=The Moonstone&Combine=Simple
In this example, if several results derive from the same source text, only the result that has the
highest relevance to the query's text is displayed.
FieldCheck
Results are combined based on the hash value of their FieldCheckType field (see
FieldCheckType fields on page 291). The FieldCheckType field holds a value that is frequently
used to restrict results (for example, a field that stores category names). When a FieldCheckType
field is indexed, IDOL server stores it in a fast-look-up table in memory, so it can be returned
quickly.
Note: if you set URLAnalysis to true in your IDOL server configuration files [Server] section, you
cannot identify a field as a FieldCheckType field, as IDOL server automatically uses the domain of
the URL it finds in the documents Reference fields as the FieldCheck value.
<reference_fields>
A plus, space or comma separated list of Reference fields. If a query produces several results that
contain the same value in one or more of the specified Reference fields, IDOL server only returns
the most relevant result. If several results have the same relevance, the result with the highest
DOCID is returned (unless a Sort option has been enabled that overrides this).
For example:
http://<host>:<port>/action=Query&Text=The Moonstone&Combine=DRETITLE
In this example, if several results contain the same value in the DRETITLE field, only the result that
has the highest relevance to the query's text is displayed.
Page 273
Results
Note:
When you instruct IDOL server to combine using a specific Reference field, it automatically uses
any field that is listed for PropertyFieldCSVs alongside this Reference field in IDOL server
configuration file to combine as well. To ensure that IDOL server only combines using the specified
field, you can set up an individual process (See Processing fields and documents that contain
specific fields on page 281) that identifies this field as a Reference field. If you want to combine
using multiple Reference fields, it can be useful to set up a separate process that identifies each of
these fields as Reference fields.
For example:
[SetupReferenceFields]
Property=ReferenceFields
PropertyFieldCSVs=*/DREREFERENCE,*/url
[CombineField1]
Property=ReferenceFields
PropertyFieldCSVs=*/DRETITLE
[CombineField2]
Property=ReferenceFields
PropertyFieldCSVs=*/CombineField
If you instructed IDOL server to combine using the DRETITLE and CombineField fields and they
were listed alongside the DREREFERENCE and url field in the [SetupReferenceFields] section,
IDOL server would automatically use the DREREFERENCE and url fields to combine as well.
Note:
You can combine the Simple and FieldCheck options, in which case you must specify Simple first.
For example:
Combine=Simple+FieldCheck
If you set Combine to <reference_fields>, you cannot combine the fields with another Combine
option.
Page 274
Results
2.
3.
Create a section for the print fields process that you have listed, in which you create a property for
the process (a property is later defined by one or more applicable configuration parameters).
Identify the fields that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[PrintFields]
Property=Print
PropertyFieldCSVs=*/AUTHOR,*/TITLE,*/ISBN
Page 275
Results
4.
List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyFirstProperty
1=Print
5.
Create a section for the property in which you set the PrintType parameter to true. This displays
the associated PropertyFieldCSVs fields for query results.
For example:
[Print]
PrintType=true
6.
Save and close IDOL servers configuration file, and restart IDOL server to execute your changes.
Page 276
Fields
30. Fields
Data is passed to IDOL server (for example, from Autonomy Connectors) in the form of IDX or XML
fields. IDOL server stores all the fields that it receives, so that you can search any of the fields using
field text queries. However, in order to make sure that IDOL servers performance is optimized, you
need to determine how it should process and store the fields it receives (see Setting up field
indexing on page 67).
This is done through IDOL servers configuration file where you can associate some fields with special
properties, for example, in order to instruct IDOL server to treat these fields (or documents that contain
them) in a specific way or read specific information from them. Note that you can associate a field with
more than one property, provided the properties dont clash.
You can associate fields with the following properties:
ACLType
DatabaseType
DateType
DocumentTrackingType
ExpireDateType
FieldCheckType
FlattenIndexType
HiddenType
HighlightType
Index
InvertedAgentType
LanguageType
NumericDateType
NumericType
Page 279
Fields
ParametricType
PrintType
ReferenceType
SectionBreakType
SecurityType
SourceType
SynonymType
Field that hold the name of the synonym job whose settings
apply to documents that contain associated fields.
TitleType
TrimSpaces
Weight
Please refer to the IDOL server online help (see Displaying online help on page 61) for further
details on properties settings that identify the field types.
For details on how to associate properties with fields, please refer to Processing fields and
documents that contain specific fields on page 281.
Page 280
Fields
List the processes that you want to apply to fields in the [FieldProcessing] section.
For example:
[FieldProcessing]
Number=4
0=MyFirstProcess
1=IndexFields
2=MyCombinedProcess
3=IndexAndWeightHigher
2.
Create a section for each of the processes that you have listed, in which you create a property for
the process (a property is later defined by one or more applicable configuration parameters).
Identify the fields that you want to associate with the processes.
You can use the PropertyMatch parameter to identify a specific value that fields must have in
order to be processed (this is useful if you are setting up a process that identifies security or
language fields).
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
PropertyMatch=*myString*
Page 281
Fields
[IndexFields]
Property=MySecondProperty
PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE
[MyCombinedProcess]
Property=MyCombinedProperty
PropertyFieldCSVs=*/MyDateField,*/MyIndexField
[IndexAndWeightHigher]
Property=IndexHigherWeight
PropertyFieldCSVs=*/SUMMARIES
3.
List all the properties that you have created in a [Properties] section.
For example:
[Properties]
0=MyFirstProperty
1=MySecondProperty
2=MyCombinedProperty
3=IndexHigherWeight
4.
Create a section for each of the properties and specify appropriate configuration settings for each.
These configuration parameters define the processes that are applied to all the fields (or all
documents that contain the fields) that you have previously associated with the processes.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[MyCombinedProperty]
DateType=true
Index=true
[IndexHigherWeight]
Index=true
Weight=2
Note: for details on available configuration settings please refer to IDOL server's configuration online
help (See Displaying help on configuration settings on page 389).
Page 282
Fields
Example:
[FieldProcessing]
Number=6
0=IndexFields
1=IndexAndWeightHigher
2=SectionBreakFields
3=DateFields
4=DatabaseFields
5=SetReferenceFields
[IndexFields]
// Controls which fields are indexed
Property=Index
PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE
[IndexAndWeightHigher]
// Fields which are indexed with a weight
Property=IndexWeight
PropertyFieldCSVs=*/SUMMARIES
[SectionBreakFields]
// Field containing document section number
Property=Section
PropertyFieldCSVs=*/DRESECTION
[DateFields]
// Fields containing the document date
Property=Date
PropertyFieldCSVs=*/DREDATE,*/harvest_time
[DatabaseFields]
// CSV of field names that define the document's database
Property=Database
PropertyFieldCSVs=*/DREDBNAME
[SetReferenceFields]
//CSV of fields that define the document's URL
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE,*/DRETITLE
Page 283
Fields
//---------------------------Properties----------------------//
[Properties]
0=Index
1=IndexWeight
2=Section
3=Date
4=Database
5=Reference
[Index]
Index=TRUE
[IndexWeight]
Index=TRUE
Weight=2
[Section]
SectionBreakType=TRUE
[Date]
DateType=TRUE
[Database]
DatabaseType=TRUE
[Reference]
ReferenceType=TRUE
TrimSpaces=TRUE
Page 284
Fields
Index fields
You should store fields that contain text which you want to query frequently as Index fields. Index fields
are processed linguistically when they are stored in IDOL server. This means that stemming and
stoplists are applied to text in Index field before they are stored, which allows IDOL server to process
queries for these fields more quickly (typically DRETITLE and DRECONTENT are fields that should be
set up as Index fields).
You should not store URLs or content that you are unlikely to use in Index fields. You should also not
store fields as Index fields that will be queried frequently but whose values are only ever going to be
queried in their entirety. It is more efficient to query such values using a field specifier (for example,
MATCH).
Also, you should not store fields that contain numeric values or dates as index fileds. Instead store
these fields as numerical fields and numeric date type fields (see Numerical fields on page 289 and
NumericDateType fields on page 287).
2.
3.
Create a section for the indexing process, in which you create a property for the process (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the process.
You can use the PropertyMatch parameter to identify a specific value that fields must have in
order to be processed.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
PropertyMatch=*myString*
Page 285
Fields
[MySecondProcess]
Property=MySecondProperty
PropertyFieldCSVs=*/MyOtherField,*/MyOtherSecondField
[IndexingFields]
Property=IndexFields
PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE
4.
5.
Create a section for your indexing property in which you set the Index parameter to true.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[IndexFields]
Index=true
6.
Save IDOL servers configuration file and restart your IDOL server in order to execute your
changes.
Page 286
Fields
NumericDateType fields
You can configure IDOL server to identify fields that contain dates. When these fields are indexed,
IDOL server stores them in a fast lookup table in memory, so it can quickly return the fields.
IDOL server converts dates to numerical values (epoch seconds) and identifies the fields that contain
the numerical date values.
2.
List a process that identifies numerical date fields in the [FieldProcessing] section.
For example:
[FieldProcessing]
Number=2
0=MyFirstProcess
1=NumericDateFields
3.
Create a section for each process that you have listed, in which you create a property for it (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[NumericDateFields]
Property=NumDate
PropertyFieldCSVs=*/BIRTHDAY,*/STARTDATE
4.
List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=NumDate
Page 287
Fields
5.
Create a section for the property in which you set the NumericDateType parameter to true. This
enables IDOL server to memory map the associated PropertyFieldCSVs fields, and identify them
as fields that contain date values.
For example:
[NumDate]
NumericDateType=true
6.
Save IDOL server's configuration file and restart IDOL server to execute your changes.
If you now send a query for a specific value that is stored in the BIRTHDAY field, IDOL server will
memory map the range that this value is in, so it can return results more quickly next time a value that
lies in this range is queried.
Example:
http://12.3.4.56:4000/action=Query&FieldText=RANGE{01/01/1980,31/12/1980}:BIRTHDAY
A document's BIRTHDAY field must contain a numerical date value that is between 01/01/1980 and
31/12/1980 for this document to be returned.
Page 288
Fields
Numerical fields
You can configure IDOL server to identify fields that contain numerical values. When these fields are
indexed, IDOL server stores them in a fast-look-up table in memory, so it can quickly return the field.
Note that a numerical field can contain a comma-separated list of numbers, each of which will be
stored as a numeric value for this field, for this document.
2.
3.
Create a section for each process that you have listed, in which you create a property for it (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[PriceFields]
Property=Price
PropertyFieldCSVs=*/PRICE
4.
List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=Price
Page 289
Fields
5.
Create a section for the property in which you set the NumericType parameter to true. This
enables IDOL server to memory map the associated PropertyFieldCSVs fields.
For example:
[Price]
NumericType=true
6.
Save IDOL server's configuration file and restart IDOL server to execute your changes.
If you now send a query for a specific value that is stored in the PRICE field, IDOL server will memory
map the range that this value is in, so it can return results more quickly next time a value that lies in
this range is queried.
Examples:
http://12.3.4.56:4000/action=Query&FieldText=NRANGE{50,100}:PRICE
A document's PRICE field must contain a number between 50 and 100 (including decimal numbers)
for this document to be returned.
http://12.3.4.56:4000/action=Query&Text=computer&Sort=PRICE:numberincreasing
The results that IDOL server returns for the query are sorted according to the values they their PRICE
fields contain. The results whose PRICE field contains the smallest value is listed first, followed by
results with increasing values in the PRICE field.
Page 290
Fields
FieldCheckType fields
You can configure IDOL server to identify a field contained in a large number of documents whose
entire value is frequently used to restrict results (for example, a field that stores category names).
When this field is indexed, IDOL server stores it in a fast-look-up table in memory, so it can quickly
return the field.
Note: if you set URLAnalysis to true in your IDOL server configuration files [Server] section, you
cannot identify a field as a FieldCheckType field, as IDOL server automatically uses the domain it
finds in documents Reference fields as FieldCheck value.
2.
3.
Create a section for each process that you have listed, in which you create a property for it (a
property is later defined by one or more applicable configuration parameters). Identify the fields
that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[FieldCheckTypeIdentification]
Property=FieldCheck
PropertyFieldCSVs=*/CATEGORY
4.
List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=FieldCheck
Page 291
Fields
5.
Create a section for the property in which you set the FieldCheckType parameter to true. This
enables IDOL server to memory map the associated PropertyFieldCSVs fields.
For example:
[FieldCheck]
FieldCheckType=true
6.
Save IDOL server's configuration file and restart IDOL server to execute your changes.
When you now use a Query, Suggest or SuggestOnText action to query for results, you can:
use the Combine action parameter to restrict the result output to the most relevant result for
each available FieldCheckType field value (by setting it to FieldCheck).
use the FieldCheck action parameter to restrict the result output to documents whose
FieldCheckType field matches a specific value (this is also available for the
GetQueryTagValues action).
the most relevant of the documents whose Category contains the value Sport
the most relevant of the documents whose Category contains the value Gardening
Page 292
Fields
Reference fields
Reference fields are used to identify documents. Before a document is indexed into IDOL server, you
have to set up a field process that determines which of the fields in a document will be used as its
Reference field (note that a document can have multiple Reference fields).
At index time Reference fields can be used to eliminate duplicate copies of documents (see Using
Reference fields to eliminate duplicate copies of documents during indexing on page 105). At
query time Reference fields can be used to filter results (for example, by using the Combine action
parameter or by specifying references that results must or mustn't match, see Using Reference fields
to filter results at query time on page 272).
Note that if you want to eliminate duplicate document copies and use the Combine action parameter,
you should set up separate Reference fields for these processes (see Simultaneously using
KillDuplicates and Combine on Reference fields on page 295).
2.
3.
Create a section for the process that you have added, in which you create a property for the
process (a property is later defined by one or more applicable configuration parameters). Identify
the fields that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
Page 293
Fields
[MySecondProcess]
Property=MySecondProperty
PropertyFieldCSVs=*/MyThirdField
[SetReferenceFields]
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE,*/URL
4.
List all the properties that you have created in a [Properties] section.
For example:
[Properties]
0=MyFirstProperty
1=MySecondProperty
2=Reference
5.
Create a section for each of the properties and specify appropriate configuration settings for each.
These configuration parameters define the processes that are applied to all the fields (or all
documents that contain the fields) that you have previously associated with the processes.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[Reference]
ReferenceType=TRUE
TrimSpaces=TRUE
6.
Save IDOL server's configuration file and start IDOL server. You can now index documents into
IDOL server.
Note:
If you don't set up a field process that identifies Reference fields, IDOL server automatically allocates a
unique number to each document that is indexed. This number will be used as the document's
reference.
Page 294
Fields
2.
In the [FieldProcessing] section add two processes that identify Reference fields (note that you
must set up a field process to identify Reference fields before you start indexing documents into
IDOL server). One of them will be used to eliminate duplicate copies of documents and the other
one will be use for the Combine operation.
For example:
[FieldProcessing]
Number=4
0=MyFirstProcess
1=MySecondProcess
3=SetUpReferenceFields
4=SetUpMoreReferenceFields
3.
Create a section for the processes that you have added, in each of which you create a property for
the respective process (a property is later defined by one or more applicable configuration
parameters). Identify the fields that you want to associate with each process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyFirstProperty
PropertyFieldCSVs=*/MyField,*/MySecondField
[MySecondProcess]
Property=MySecondProperty
PropertyFieldCSVs=*/MyThirdField
[SetUpReferenceFields]
Property=ReferenceFields
PropertyFieldCSVs=*/DREREFERENCE,*/URL
Page 295
Fields
[SetUpMoreReferenceFields]
Property=MoreReferenceFields
PropertyFieldCSVs=*/DRETITLE
4.
List all the properties that you have created in a [Properties] section.
For example:
[Properties]
0=MyFirstProperty
1=MySecondProperty
2=ReferenceFields
3=MoreReferenceFields
5.
Create a section for each of the properties and specify appropriate configuration settings for each.
These configuration parameters define the processes that are applied to all the fields (or all
documents that contain the fields) that you have previously associated with the processes.
For example:
[MyFirstProperty]
HiddenType=true
[MySecondProperty]
Index=true
[ReferenceFields]
ReferenceType=TRUE
TrimSpaces=TRUE
[MoreReferenceFields]
ReferenceType=TRUE
TrimSpaces=TRUE
6.
Once you have indexed documents into IDOL server, you can use, for example, the */
DREREFERENCE field to eliminate duplicate copies of documents. (IDOL server then automatically
also uses the */URL field for deduplication because it is listed alongside */DREREFERENCE for
PropertyFieldCSVs.) This leaves you free to use the */DRETITLE field for the Combine operation.
Page 296
Fields
Highlight fields
When you execute a Query, Suggest or SuggestOnText action command, you can highlight
sentences or words in the results that are related to the terms in the query (or the terms in the text or
document that you are suggesting on).
IDOL server checks which fields highlighting applies to and then highlights all sentences or words that
are based on the terms in the results that it returns.
2.
3.
Create a section for each process that you have listed, in which you create a property for the
process (a property is later defined by one or more applicable configuration parameters). Identify
the fields that you want to associate with the process.
Note: the properties that you create must not have the same name as processes.
For example:
[MyFirstProcess]
Property=MyProperty
PropertyFieldCSVs=*/MyField,*/MyOtherField
[HighlightFields]
Property=Highlight
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT
4.
List the property that you have created in the [Properties] section.
For example:
[Properties]
0=MyProperty
1=Highlight
Page 297
Fields
5.
Create a section for the property in which you set the HighlightingType parameter to true. This
enables the highlighting of all matched terms that are contained in the associated
PropertyFieldCSVs fields.
For example:
[Highlight]
HighlightType=true
6.
Save and close IDOL server's configuration file and restart IDOL server to execute your changes.
Page 298
Fields
Agentboolean fields
If you are upgrading to IDOL server from legacy technologies that use Boolean agents (a Boolean or
Proximity expression) to categorize documents, you can store these agents in agentboolean IDOL
server fields. You can then query IDOL server with text and an agentboolean field to return categories
that this text matches.
Page 299
Fields
categories that do not contain a MyABField field but match the query text in an Index
field (for example a DRECONTENT field).
categories that match the query text in an Index field field (for example a DRECONTENT
field), and have a MyABField field which contains a Boolean or Proximity expression
that matches The cat sat on the mat (for example, cat AND mat, cat OR mat, cat
BEFORE mat and cat DNEAR1 sat could return The cat sat on the mat, therefore
caegories that contain any of these Boolean/Proximity expressions in a MyABField field
would be returned).
Categories whose MyABField fields contain, for example, cat AND mat AND dog or
mat BEFORE dog would not be returned.
Note: if you are always storing the Boolean agents in the same field, you can use the
AgentBooleanCacheField configuration parameter to load this field into memory, so that
agentboolean queries which use this field can be executed more quickly.
Tip:
You can use ACI, ALert and Cat tasks to automatically match documents that IDOL server receives
against agentboolean categories, automatically alert users to documents that match specific
categories and automatically categorize documents (see Processing data before indexing it on
page 73).
Page 300
Fields
Meta fields
Meta fields are fields that IDOL server creates for documents at index time in order to display
information about the documents when they are returned as results for a query. Some of a documents
meta fields are always displayed when IDOL server returns this document as a query result. You can
display all a documents meta fields by adding XMLMeta=true to your query.
The following meta fields are displayed for results:
<autn:baseid
If the document has multiple sections, this is the ID of the documents first section. If the
document is not sectioned, this is the same as the documents ID.
<autn:content>
The documents text content.
<autn:database>
The IDOL server database in which the document is stored.
<autn:date>
The date (in epoch seconds) the document was created. This date is read from the field
that has been identified by the DateType parameter in IDOL servers configuration file. If
no field has been identified, the date the document was indexed is used instead.
<autn:expiredate>
The date (in epoch seconds) the document will expire. This date is read from the field
that has been identified by the ExpireDateType parameter in IDOL servers
configuration file. When a document expires it is deleted from IDOL server or moved to a
different database (depending on what ExpireIntoDatabase has been set to in IDOL
servers configuration file).
<autn:id>
The documents ID. A documents ID is assigned to it at index tme. If IDOL server is
compacted, the IDs of documents change.
Page 301
Fields
<autn:language>
<autn:languageencoding>
<autn:languagetype>
The language, encoding and language type associated with the document. The
documents language type is read from the field that has been identified by the
LanguageType parameter in IDOL servers configuration file. The language and
encoding of the document are read from the Language and Encoding parameters that
have been set for this language type in the configuration file.
If no field from which the language type can be read has been identified, the
DefaultLanguageType that has been set in the configuration file is used instead,
unless Automatic Language Detection is enabled, or the document has been submitted
to IDOL server with an index command that sets a specific language type for the
document .
<autn:links>
A list of stemmed terms that are contained both in the query and in the result document.
<autn:reference>
The documents reference. This is read from the field that has been identified by the
ReferenceType parameter in IDOL servers configuration file. If no field has been
identified, IDOL server automatically generates a reference for the document at index
time.
<autn:section>
The number of sections the document has been split up into at index time.
<autn:title>
The documents title. This is read from the field that has been identified by the TitleType
parameter in IDOL servers configuration file. If no field has been identified, the
document is not given a title.
<autn:weight>
The percentage relevance that the document has to the query.
Page 302
Fields
Page 303
Fields
Page 304
Languages
32. Languages
IDOL server is based on probabilistic modeling and therefore does not require any form of language
dependent parsing, dictionaries or translation modules.
Treating words as abstract symbols of meaning allows Autonomy's technology to derive understanding
through the context in which symbols occur rather than a rigid definition of grammar. Slang and other
variations in language do not confuse the software.
Building up a statistical understanding of the patterns in any language, IDOL server can be trained on
the patterns of any language. The more information IDOL server is given about a particular type of
information (for example, legal terms, pharmaceutical developments, technology and so on), the more
understanding it gains of those topics.
A new language can be thought of as simply another type of information, for which IDOL server needs
enough material to learn from. Therefore, it is possible to mix more than one language in IDOL server
as long as the amounts for each language are sufficient to build its understanding.
The choice of language does not compromise the accuracy of the concepts extracted by IDOL server.
The underlying algorithm is the same regardless of the language used.
Cross-lingual systems
IDOL server can be used to set up cross-lingual systems. This allows you to produce
multilingual results for queries or to restrict results to documents in a specific language or
encoding. For example, an English query may return information both in English and Spanish.
Page 307
Languages
While Autonomy's technology is language independent, it can be beneficial to use language
dependent features in order to optimize IDOL servers ability to match concepts irrespective of their
appearance in text. Autonomy therefore provides the following features:
Stemming
In languages some words have a common morphological root. Autonomy provides stemming
algorithms that reduce words to this form. This is useful because it allows concepts to be
matched regardless of the grammatical use of words. In English for example, the words "help",
"helpful", "helping" and "helped" can all be stripped down to their stem "help" without significant
loss of meaning.
Autonomy provides as standard a set of stemming algorithms for the most commonly used
languages. Stemming is applied after stopwords have been discarded both at index time (when
content is stored in IDOL server) and at query time (query text is stopped and stemmed before
it is matched).
Stoplists
Every language has words that do not carry much significant meaning. In grammatical terms
these are normally prepositions, conjunctions, auxiliary verbs and so on (for example, words
such as "the", "a", "and", "to" in English). These words can be safely ignored when processing
content.
Autonomy provides as standard a set of stoplists for the most commonly used languages.
Multiple encodings
Autonomy supports multiple encodings for languages such as Greek and Russian. Different
encodings can be used interchangeably which means that it does not matter which encoding a
language is given in. This makes it, for example, possible to query in one recognized encoding
for a language and receive results that are in other encodings.
Transliteration schemes
Transliteration is the ability to represent letters that do not belong to the Latin alphabet or words
that comprise accented letters with the corresponding characters of another alphabet. This
make familiarity with the accents and special characters of different languages unnecessary.
Canonicalization of characters
Some encodings have more than one way of representing a character. The Japanese katakana
script, for example, can be written in full width or half width characters. Regardless of its width
the character in itself carries the same meaning.
Autonomy's software infrastructure uses canonicalization to ensure that all character forms are
treated equally through automatic conversion to an internationally recognized canonical form.
Page 308
Languages
Check that the IDOL server configuration file contains the languages you want to use (see
Checking which languages are set up in IDOL server on page 311).
2.
If the configuration file does not contain all the languages you want to use, you need to add the
missing languages (see Defining language types in IDOL server's configuration file on
page 312), and set up a field process that enables IDOL server to associate these languages with
documents (see Configuring IDOL server to associate language types with documents on
page 314).
3.
Check the documents that you want to index into IDOL server:
if your documents (or some of your documents) do not contain fields from which IDOL server
can read their language type (see Adding language type fields to documents on
page 318), IDOL server assumes that the default language type applies to the documents
(see Defining a default language type in IDOL server's configuration file on page 319).
If you dont want the default language type to be associated with your documents, you need
to enable automatic language detection (see Enabling Automatic Language Detection on
page 320).
Alternatively, you can manually index your documents into IDOL server, adding the language
type of the documents to each index command (see Index commands on page 84). In this
case you have to index your documents in batches, where each batch must have the same
language type (that is language and encoding).
documents that contain fields from which IDOL server can read their language type, are
automatically processed correctly (provided you have added any missing languages to IDOL
servers configuration file in step 2).
If the language type (that is the querys language and encoding) of a query is not the default
language type that you have defined in IDOL servers configuration file, you need to include the
querys language type in the query string (see Specifying the language type of your query on
page 321).
Page 309
Languages
If you want to return results in a specific encoding for your query, you need to include the
OutputEncoding parameter in your query (see Converting results to a specific encoding on
page 322). Note that you can only return encodings that are compatible with the querys language.
If you want to return documents in multiple languages for your query, you need to include the
AnyLanguage parameter in your query (see Returning documents in multiple languages for
your query on page 323).
If you want to return documents in a specific language for your query, you need to include the
AnyLanguage and MatchLanguage parameters in your query (see Returning documents in a
specific language for your query on page 324).
Page 310
Languages
Page 311
Languages
2.
Find the [LanguageTypes] section and list the language types that you want IDOL server to be
able to process (note that you must use ASCII characters when specifying a language type).
For example:
[LanguageTypes]
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
0=englishASCII
1=englishUTF8
2=afrikaansASCII
3=afrikaansUTF8
4=albanianASCII
5=albanianUTF8
Note: resource files (for example, stoplists) that IDOL server uses when processing languages
are stored in the specified LanguageDirectory
3.
For each of the language types that you have listed, create a section with the same name. In this
section, specify appropriate settings that determine how IDOL server handles this language type.
For details on the configuration settings that you can use, please refer to IDOL server's online help
(see Displaying help on configuration settings on page 389)
For example:
[englishASCII]
LanguageCode=1
Language=ENGLISH
Encoding=ASCII
Stoplist=english.dat
IndexNumbers=1
[englishUTF8]
LanguageCode=2
Language=ENGLISH
Encoding=UTF8
Stoplist=english.dat
IndexNumbers=1
Page 312
Languages
[afrikaansASCII]
LanguageCode=3
Language=AFRIKAANS
Encoding=ASCII
IndexNumbers=1
[afrikaansUTF8]
LanguageCode=4
Language=AFRIKAANS
Encoding=UTF8
IndexNumbers=1
[albanianASCII]
LanguageCode=5
Language=ALBANIAN
Encoding=ASCII
IndexNumbers=0
Note: in IDOL server the StripLanguage and CharConv settings have been deprecated (the
functionality has been automated according to language and encoding).
4.
5.
You can now configure IDOL server to associate the language types you have defined with
documents (see Configuring IDOL server to associate language types with documents on
page 314).
Page 313
Languages
if all the documents that you want to index into IDOL server contain a field that contains the
language type, you can configure your IDOL server as follows:
1.
2.
Create a section for the process, in which you create a Property for the process and
identify the field that you want to apply the process to.
For example:
[LookForLanguage]
Property=SetLanguage
PropertyFieldCSVs=*/DRELANGUAGE,*/myLanguageType
3.
List the Property that you have created in the [Properties] section.
For example:
[Properties]
0=SetLanguage
4.
Create a section for this property, in which you set the LanguageType parameter to true
to map the values of the */DRELANGUAGE fields to the equivalent language type in the
[LanguageTypes] section.
For example:
[SetLanguage]
LanguageType=true
[LanguageTypes]
0=russianISO
1=russianKOI8
2=russianUTF8
Page 314
Languages
[russianISO]
LanguageCode=1
Language=Russian
Encoding=CYRILLIC_ISO
[russianKOI8]
LanguageCode=2
Language=Russian
Encoding=CYRILLIC_KOI8
[russianUTF8]
LanguageCode=3
Language=Russian
Encoding=UTF8
5.
6.
You can now index documents into IDOL server (see Index commands on page 84).
if all the documents that you want to index into IDOL server contain a field that contains data that
can be used to identify the language type, you can configure your IDOL server as follows:
1.
Use the [FieldProcessing] section of IDOL server's configuration file to define each
language property that you want IDOL server to be able to detect.
For example:
[FieldProcessing]
Number=6
0=DetectArabic
1=DetectArabicISO
2=DetectEnglish
3=DetectChSimplified
4=DetectChTraditional
5=DetectFrench
2.
For each of the languages that you have defined in the [FieldProcessing] section, you
need to define a section with the name of the respective language type. In this section
you can then specify the fields that IDOL server should look for and the values that those
fields must have in order for the document to be recognized as a particular language
type.
For example:
[DetectArabic]
Property=SetArabicProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=arabic
Page 315
Languages
[DetectArabicISO]
Property=SetArabicISOProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=arabicISO,ISOarab*
[DetectEnglish]
Property=SetEnglishProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=*eng*,uk,*british
[DetectChSimplified]
Property=SetChSimplifiedProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=*ChSimp*,ChineseSimp*
[DetectChTraditonal]
Property=SetChTraditionalProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG
PropertyMatch=*ChTrad*,ChineseTrad*
[DetectFrench]
Property=SetFrenchProperty
PropertyFieldCSVs=*/DRELANGUAGETYPE, */DRELANGAGETYPE,*/LANG
PropertyMatch=*fre*,fran*
3.
For each Property that you have defined in the [FieldProcessing] subsections, you
need to define a section with the same value of the respective property. In this section
you can then specify the language type (which you also need to list in IDOL server's
[LanguageTypes] section where you define how you want IDOL server to handle the
individual languages).
For example:
[SetArabicProperty]
LanguageType=Arabic
HiddenType=TRUE
[SetArabicISOProperty]
LanguageType=ArabicISO
HiddenType=TRUE
[SetEnglishProperty]
LanguageType=English
HiddenType=TRUE
Page 316
Languages
[SetChSimplifiedProperty]
LanguageType=ChSimplified
HiddenType=TRUE
[SetChTraditionalProperty]
LanguageType=ChTraditional
HiddenType=TRUE
[SetFrenchProperty]
LanguageType=French
HiddenType=TRUE
4.
5.
You can now index documents into IDOL server (see Index commands on page 84).
Page 317
Languages
2.
Use the FixedField<N> and FixedFieldValue<N> settings to specify the name and the value of
the field that you want to add to documents the connector retrieves.
For example:
FixedField0=DRELanguage
FixedFieldValue0=englishASCII
Note: if you add these settings to a connectors fetch job section, they only apply to the fetch job
defined in that section. If you add the settings to a connectors default section, they apply to all
fetch jobs.
3.
Page 318
Languages
2.
Find the [LanguageTypes] section and list the language type that you want IDOL server to
associate with any document that doesnt contain a language type field (note that if you are using
automatic language detection, IDOL server uses this to determine the language type of
documents and not the default language type).
For example:
[LanguageTypes]
DefaultLanguageType=englishASCII
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
Note: resource files (for example, stoplists) that IDOL server uses when processing languages
are stored in the specified LanguageDirectory.
3.
For the default language types that you have listed, create a section with the same name. In this
section, specify appropriate settings that determine how IDOL server handles this language type.
For details on the configuration settings that you can use, please refer to IDOL server's online help
(see Displaying help on configuration settings on page 389)
For example:
[englishASCII]
LanguageCode=1
Language=ENGLISH
Encoding=ASCII
Stoplist=english.dat
IndexNumbers=1
Note: in IDOL server the StripLanguage and CharConv settings have been deprecated (the
functionality has been automated according to language and encoding).
4.
5.
Page 319
Languages
2.
Find the [Server] section and add the following setting to it:
AutoDetectLanguagesAtIndex=true
3.
4.
5.
Note: if you have Automatic Language Detection enabled and a field process set up that reads a
document's language from one of its fields, IDOL server uses the field process rather than autodetection to determine the document's language and encoding.
Page 320
Languages
For example:
This query uses the language and encoding that has been specified for the DefaultLanguageType, so
you can send it to IDOL server without adding the LanguageType parameter:
http://12.3.4.56:4000/action=Query&Text=The Bayes theory of probability
This query uses the language and encoding that has been specified for the GermanASCII language
type:
http://12.3.4.56:4000/action=Query&Text=Einsteins Relativittstheorie&LanguageType=
GermanASCII
Page 321
Languages
Text queries
Queries that contain some form of query text (for example, Query, SuggestOnText,
Summarize and so on).
Text-free queries
Queries that do not contains any query text (for example, Suggest, List, GetContent and so
on).
Text queries
When you send a query action to IDOL server, it returns by default results that use the same language
and encoding as the query text (that is the language that has been specified for the LanguageType
that is sent with the query, or for the DefaultLanguageType if no LanguageType is sent with the
query).
If you want a query action to return results in a specific encoding, you must add the OutputEncoding
to your query. This parameter allows you to convert the results of a query to any type of encoding that
is compatible with the query's language (if you specify an encoding that is not compatible with the
query's language, IDOL server indicates this in the results).
For example:
http://12.3.4.56:4000/action=Query&Text=Neurologia i Neurochirurgia&LanguageType=Polis
hEASTERNEUROPEAN&OutputEncoding=EASTERNEUROPEAN_ISO
In this example, IDOL server will convert all query results to EASTERNEUROPEAN_ISO.
Text-free queries
Query actions that do not contain any query text by default return results in the OutputEncoding that
has been specified for the DefaultLanguageType. If any of the query's results is not compatible with
this encoding, IDOL server indicates this in the results.
If you want a query action to return results in a specific encoding, you can add the OutputEncoding to
your query. IDOL server converts all results to this encoding, provided they are compatible with it. If
any of the query's results is not compatible with this encoding IDOL server returns an appropriate
message.
For example:
http://12.3.4.56:4000/action=Suggest&ID=9016&OutputEncoding=EASTERNEUROPEAN_ISO
In this example, IDOL server will convert all query results to EASTERNEUROPEAN_ISO.
Page 322
Languages
Note that the query will only return documents in multiple languages if they contain terms that match
terms in the query (for example, query text that contains the term "Baghdad" might return documents in
English, French, German and so on).
Page 323
Languages
Page 324
All IDOL server Encoding settings can alternatively be set to UTF8 or UCS2.
Afrikaans
Script:
Latin
AFRIKAANS
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
ALBANIAN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Albanian
Page 325
Arabic
Script:
Arabic
ARABIC
For encoding:
windows-CP1256
ARABIC
iso-8859-6
ARABIC_ISO
UTF-8
UTF8
Script:
Cyrillic
AZERI
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Script:
Latin
BASQUE
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Azeri
Basque
Page 326
Belarussian
Script:
Cyrillic
BELARUSSIAN
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Script:
Latin
BRETON
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Cyrillic
BULGARIAN
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Breton
Bulgarian
Page 327
Catalan
Script:
Latin
CATALAN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Big-5
CHINESE
For encoding:
Big-5
CHINESETRADITIONAL
UTF-8
UTF8
Script:
GB2312-80
CHINESE
For encoding:
gb2312
CHINESESIMPLIFIED
UTF-8
UTF8
Chinese
Traditional
Chinese
Simplified
Page 328
Croatian
Script:
Latin
CROATIAN
For encoding:
windows-CP1250
EASTERNEUROPEAN
iso-8859-2
EASTERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
CZECH
For encoding:
windows-CP1250
EASTERNEUROPEAN
iso-8859-2
EASTERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
DANISH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Czech
Danish*
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
Page 329
Dutch*
Script:
Latin
DUTCH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
ENGLISH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
ESTONIAN
For encoding:
windows-CP1257
NORTHERNEUROPEAN
iso-8859-4
NORTHERNEUROPEAN_ISO
UTF-8
UTF8
English*
Estonian
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.:
Page 330
Faroese
Script:
Latin
FAROESE
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
FINNISH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
FRENCH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Finnish
French*
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.:
Page 331
Gaelic
Script:
Latin
GAELIC
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
GALICIAN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
GERMAN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Galician*
German*
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
Page 332
Greek*
Script:
Greek
GREEK
For encoding:
windows-CP1253
GREEK
iso-8859-7
GREEK_ISO
UTF-8
UTF8
Script:
Latin
GREENLANDIC
For encoding:
windows-CP1257
NORTHERNEUROPEAN
iso-8859-4
NORTHERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Hebrew
HEBREW
For encoding:
windows-CP1255
HEBREW
iso-8859-8
HEBREW_ISO
UTF-8
UTF8
Greenlandic
Hebrew
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
Page 333
Hindi
Script:
UTF8
HINDI
For encoding:
UTF-8
UTF8
Script:
Latin
HUNGARIAN
For encoding:
windows-CP1250
EASTERNEUROPEAN
iso-8859-2
EASTERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
ICELANDIC
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Hungarian
Icelandic
Page 334
Indonesian
Script:
Latin
INDONESIAN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
ITALIAN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Japanese
JAPANESE
For encoding:
Shift-JIS
SHIFTJIS
EUC
EUC
JIS
JIS
UTF-8
UTF8
Italian*
Japanese**
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
** The language has stemming embedded in sentence breaking.
Page 335
Kazakh
Script:
Cyrillic
KAZAKH
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Script:
Hangul
KOREAN
For encoding:
KS C 5601-1987
KOREAN
KS C 5601-1992
KOREAN
UTF-8
UTF8
Script:
Latin
KURDISH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Korean**
Kurdish
Page 336
Kyrgyz
Script:
Cyrillic
KYRGYZ
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Script:
Latin
LAPPISH
For encoding:
windows-CP1257
NORTHERNEUROPEAN
iso-8859-4
NORTHERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
LATIN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Lappish
Latin
Page 337
Latvian
Script:
Latin
ITALIAN
For encoding:
windows-CP1257
NORTHERNEUROPEAN
iso-8859-4
NORTHERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
LITHUANIAN
For encoding:
windows-CP1257
NORTHERNEUROPEAN
iso-8859-4
NORTHERNEUROPEAN_ISO
UTF-8
UTF8
Lithuanian
Luxembourgish
Page 338
Script:
Latin
LUXEMBOURGISH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Macedonian
Script:
Cyrillic
MACEDONIAN
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Script:
Latin
MALAY
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
UTF8
MALTESE
For encoding:
UTF-8
UTF8
Malay
Maltese
Page 339
Maori
Script:
Latin1
MAORI
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Cyrillic
MONGOLIAN
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Script:
Latin
NORWEGIAN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Mongolian
Norwegian*
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
Page 340
Persian
Script:
UTF8
PERSIAN
For encoding:
UTF-8
UTF8
Script:
Latin
POLISH
For encoding:
windows-CP1250
EASTERNEUROPEAN
iso-8859-2
EASTERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
PORTUGUESE
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Polish
Portuguese*
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
Page 341
Romanian
Script:
Latin
ROMANIAN
For encoding:
windows-CP1250
EASTERNEUROPEAN
iso-8859-2
EASTERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Cyrillic
RUSSIAN
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Script:
Cyrillic
SERBIAN
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Russian*
Serbian
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
Page 342
Slovak
Script:
Latin
SLOVAK
For encoding:
windows-CP1250
EASTERNEUROPEAN
iso-8859-2
EASTERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
SLOVENIAN
For encoding:
windows-CP1250
EASTERNEUROPEAN
iso-8859-2
EASTERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
SOMALI
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Slovenian
Somali
Page 343
Sorbian
Script:
Latin
SORBIAN
For encoding:
windows-CP1250
EASTERNEUROPEAN
iso-8859-2
EASTERNEUROPEAN_ISO
UTF-8
UTF8
Script:
Latin
SPANISH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
SWAHILI
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Spanish*
Swahili
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
Page 344
Swedish*
Script:
Latin
SWEDISH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Latin
TAGALOG
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Script:
Cyrillic
TATAR
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Tagalog
Tatar
* A stemming algorithm is available for this language and is applied by default. If you do not want to
apply stemming to this language, set Stemming to false for this language in the configuration file.
Page 345
Thai
Script:
Thai
THAI
For encoding:
windows-CP874 / iso-8859-11
THAI
UTF-8
UTF8
Script:
Latin
TURKISH
For encoding:
windows-CP1254 / iso-8859-9
TURKISH
UTF-8
UTF8
Script:
Cyrillic
UKRAINIAN
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Turkish
Ukrainian
Page 346
Urdu
Script:
UTF8
URDU
For encoding:
UTF-8
UTF8
Script:
Cyrillic
UZBEK
For encoding:
windows-CP1251
CYRILLIC
KOI8-R
CYRILLIC_KOI8
iso-8859-5
CYRILLIC_ISO
UTF-8
UTF8
Script:
Latin
VALENCIAN
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Uzbek
Valencian
Page 347
Vietnamese
Script:
Vietnamese
VIETNAMESE
For encoding:
windows-CP1258
VIETNAMESE
UTF-8
UTF8
Script:
Latin
WELSH
For encoding:
windows-CP1252 / iso-8859-1
ASCII
UTF-8
UTF8
Welsh
Page 348
30
Arabic
30
Chinese
30
Hebrew
30
Korean
30
Japanese
30
Thai
30
German
40
Greek
Page 349
Transliteration:
Japanese
Chinese
Full width 0-9, A-Z, a-z to single byte 0-9, A-Z, a-z
Greek
Spanish
Portuguese
Transliteration:
Western
European
=a
=aa
=c
=e
=i
=o
=oe
=u
(oe)=oe
=ae
=ss
=nh
=y
=d
=th
German
Scandinavian
Page 350
=ue
Russian
=oe
=oe
=ue
Japanese
Required files:
NT
UNIX
japanesebreaking.dll
japanesebreaking.so
\dic\jtag.attr
/dic/system/jtag.attr
dic\JTAG.hash
/dic/system/jtag.hash
dic\jtag.id
/dic/system/jtag.id
\dic\jtag.mrph
/dic/system/jtag.mrph
dic\JTAG.offset
/dic/system/jtag.offset
\dic\jtag.table
/dic/system/jtag.table
jtag.dll
/dic/system/jtag.trie
jtag.ini
jtag.ini
jtag_at.dll
libcodeconv.so
dic\JTAG.trie
Traditional Chinese
Required files:
NT
UNIX
chinesebreaking.dll
chinesebreaking.so
big5togb.txt
big5togb.txt
wordlist.txt
wordlist.txt.so
Page 351
Simplified Chinese
Required files:
NT
UNIX
chinesebreaking.dll
chinesebreaking.so
big5togb.txt
big5togb.txt
wordlist.txt
wordlist.txt
NT
UNIX
thaibreaking.dll
thaibreaking.so
thaidict.txt
thaidict.txt
NT
UNIX
koreanbreaking.dll
koreanbreaking.so
main.dat
prob.dat
main.dat
main.fst
prob.dat
prob.fst
main.fst
pos.nam
prob.fst
tag.nam
pos.nam
tagout.nam
tag.nam
connection.txt
tagout.nam
stopposnam.txt
connection.txt
tagname.txt
Thai
Required files:
Korean
Required files:
stopposnam.txt
tagname.txt
Page 352
For all operations, IDOL server will recognize words as stopwords irrespective of the encoding they
are given in. For example, in Russian you could list a stopword in the KOI8 encoding in the stoplist file
and it would be recognized if it occurred in a document in UTF8.
Note:
For each encoding that you want to use you must create a section in your stoplist file. Name the
section after the language type that you are using (the language types are listed in the "Set encoding
parameter to" column of the "Encoding settings for supported languages" list). Words can be in
upper or lower case, and can be separated by spaces or new lines.
For example:
[cyrillic_koi8]
[cyrillic_iso]
s
In this example, a Russian stoplist contains 10 words, of which 5 are in CYRILLIC_KOI8 encoding and
five are in the CYRILLIC_ISO encoding.
Page 353
Page 354
Administration
expire documents
Page 357
For UNIX:
Use the Stop.sh stop script to stop IDOL server and then start it again using the Start.sh script
(the scripts are supplied in the IDOL server installer).
Page 358
http://<host>:<port>/DREDELETEREF?docs=<document references>&field=<fields>&DREdbn
ame=<database name>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<document references>
Enter the escaped references of the documents that you want to delete. If you want to specify
multiple references, you must separate them with plus symbols (there must be no space before or
after a plus symbol).
<fields>
This parameter is optional.
Allows you to restrict which documents are deleted by specifying one or more fields that a
document must contain in order to be deleted. Only documents that have one of the specified
references and at least one of the specified fields are deleted.
If you want to specify multiple fields, you must separate them with commas or spaces (there must
be no space before or after a comma).
<database name>
This parameter is optional.
Allows you to specify the name of the database, which contains the documents that you want to
delete. If you don't specify a database and the specified document is contained in several
databases, it is deleted from all of them.
For example:
http://12.3.4.56:4001/DREDELETEREF?docs=http%3A%2F%2Fnews%2Enewssite%2Ecom%
2Findex%2Ehtml+http%3A%2F%2Fnews%2Enewssite%2Ecom%2Fcoverstory%2Ehtml
This command uses port 4001 to delete the documents with the specified URLs from IDOL server
which is located on a machine with the IP address 12.3.4.56.
Page 359
http://<host>:<port>/DREDELETEDOC?docs=<doc IDs>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<doc IDs>
Specify one or more individual documents and / or a range of documents that you want to delete.
Use one or a combination of the following formats to do this (if you want to combine the two formats
you must separate them with plus symbols (there must be no space before or after a plus symbol):
doc ID
Specify the IDs of one or more documents. If you want to specify multiple document IDs,
you must separate them with plus symbols (there must be no space before or after a plus
symbol).
range=[<min doc ID>,<max doc ID>]
Enter the document ID of the first and last document in a range of documents that you want
to delete. You can delete up to 5000 documents at a time.
For example:
http://12.3.4.56:4001/DREDELETEDOC?docs=3+5+range=[7,10]
This command uses port 4001 to delete the documents with the DOCID 3, 5, 7,8,9 and 10 from IDOL
server which is located on a machine with the IP address 12.3.4.56.
Page 360
http://<host>:<port>/DREUNDELETEDOC?docs=<doc IDs>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<doc IDs>
Specify one or more individual documents and / or a range of deleted documents that you want to
restore. Use one or a combination of the following formats to do this (if you want to combine the
two formats you must separate them with plus symbols (there must be no space before or after a
plus symbol):
doc ID
Specify the IDs of one or more deleted documents. If you want to specify multiple document
IDs, you must separate them with plus symbols (there must be no space before or after a
plus symbol).
range=[<min doc ID>,<max doc ID>]
Enter the document ID of the first and last document in a range of deleted documents that
you want to restore. You can restore up to 5000 documents at a time.
For example:
http://12.3.4.56:4001/DREUNDELETEDOC?docs=3+5+range=[7,10]
This command uses port 4001 to restore the documents with the DOCID 3, 5, 7,8,9 and 10 to IDOL
server which is located on a machine with the IP address 12.3.4.56.
Page 361
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<database name>
Enter the name of the database that you want to create in IDOL server.
For example:
http://12.3.4.56:4001/DRECREATEDBASE?DREdbname=Archive
This command uses port 4001 to create a new Archive database in IDOL server which is
located on a machine with the IP address 12.3.4.56.
Page 362
Open IDOL servers configuration file in a text editor and find the [Databases] section. This
section contains the NumDBs setting which indicated how many databases IDOL server
currently contains. It also contains a section for each of these databases with settings that
apply to these databases. Note that the names of the individual database sections use the
format Database<N>, where <N> numbers the databases in consecutive order, starting from
0.
For example:
[Databases]
NumDBs=2
[Database0]
Name=News
[Database1]
Name=Archive
2.
3.
Create a new section for the database that you want to add. Note that the name of the section
must use the format Database<N>, where <N> numbers the databases in consecutive order,
starting from 0.
Use the Name setting to specify a name for your new database. Please refer to the IDOL
server online help for details on which other settings are available for databases (see
Displaying help on configuration settings on page 389).
For example:
[Databases]
NumDBs=3
[Database0]
Name=News
[Database1]
Name=Archive
[Database2]
Name=myNewDatabase
4.
5.
http://<host>:<port>/DREREMOVEDBASE?DREdbname=<database name>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<database name>
Enter the name of the database that you want to delete from IDOL server. The documents in this
database are deleted from IDOL server as well.
For example:
http://12.3.4.56:4001/DREREMOVEDBASE?DREdbname=Archive
This command uses port 4001 to delete the Archive database and all documents that this database
contains from IDOL server which is located on a machine with the IP address 12.3.4.56.
Page 364
http://<host>:<port>/DREDELDBASE?DREdbname=<database name>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<database name>
Enter the name of the database from which you want to delete all documents.
For example:
http://12.3.4.56:4001/DREDELDBASE?DREdbname=Archive
This command uses port 4001 to delete all documents from IDOL server's Archive database. IDOL
server is located on a machine with the IP address 12.3.4.56.
Page 365
Expiring documents
In order to ensure that the documents in your IDOL server are up to date, you can execute an Expiry
operation which deletes or archives documents that have reached a specific age. You can expire
documents:
By default documents are deleted when they expire. If you want to archive them instead, enter the
name of the database that you want to use for archiving for the ExpireIntoDatabase setting in each of
the IDOL server configuration file's database sections.
The date that determines whether a document should be expired can be read from a field in the
document or from the expiry time that has been set for the database that contains the document. If
IDOL server is unable to determine whether a document should be expired (because the document
does not contain a field that sets its expiry date and the document's database has no expiry time set),
IDOL server does not expire the document.
Open IDOL server's configuration file in a text editor and find the [FieldProcessing]
section.
2.
Add a new field process to the list of field processes that the [FieldProcessing] section
contains and increase the Number setting by one.
For example:
[FieldProcessing]
Number=2
0=IndexFields
1=IndexAndWeightHigher
The above [FieldProcessing] section lists two field processes. To add a new field
process, you need to add a new line to the list:
[FieldProcessing]
Number=3
0=IndexFields
1=IndexAndWeightHigher
2=ExpireDateFields
Note that the listed field processes are numbered in consecutive order, starting from 0.
Page 366
Create a section for your new field process in the configuration file. Create a property for
the new process and use the PropertyFieldCSVs settings to identify the document fields
that should determine whether documents should be expired (a document expires once
the time in this field has elapsed).
For example:
[ExpireDateFields]
Property=SetExpireDate
PropertyFieldCSVs=*/DREEXPIRE,*/valid_time
4.
Find the [Properties] section and add your new property to the list of properties that the
[Properties] section contains.
For example:
[Properties]
0=Index
1=IndexWeight
3=SetExpireDate
5.
Create a section for your new property in the configuration file and set the
ExpireDateType to true in order to indicate that the associated PropertyFieldCSVs
fields hold the document expiry date.
For example:
[SetExpireDate]
ExpireDateType=TRUE
Page 367
Open IDOL server's configuration file in a text editor and find the [Schedule] section. If
the configuration file does not contain a [Schedule] section, you can add one.
2.
3.
Set the following setting in the individual database sections to specify where a
database's documents are archived when they expire:
ExpireIntoDatabase
Enter the name of the database that you want to use to archive expired documents.
If you want documents to be deleted when they expire, don't specify this setting.
For example:
[News]
ExpireIntoDatabase=Archive
4.
Page 368
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<file name>
The path to the directory where the IDX files that are exported will be stored. The path must include
a basic file name which IDOL server will postfix with incremental numbers and an appropriate
extension. If you dont specify a file name the files are exported to the current working directory
(IDOLserver\IDOL\content), and IDOL server creates a filename in the format AUTN-IDXEXPORT-<date>-<time>-<incremental number>.<extension>.
<true / false>
Enter true if you want to compress the exported files (this is the default). Enter false if you dont
want to compress the files.
<database CSV>
If you dont want to export documents from all IDOL server databases, enter one or more
databases to which you want to restrict the export. If you want to specify multiple databases, you
must separate them with plus symbols, commas or spaces (there must be no space before or after
plus symbols or commas).
<size>
The number of document sections that you want to export to one IDX file. By default this is 100,000
sections.
Page 369
<max date>
The latest creation date or time that a document can have in order to be exported.
Examples:
http://12.3.4.56:4001/DREEXPORTIDX?FileName=/export/data/backup/
output&Compress=true&DatabaseMatch=News,Archive&BatchSize=1000&mindate=01/01/
2003&maxdate=01/01/2004
In this example, all IDX documents that have dates between the 1st of January 2003 and the 1st of
January 2004 are exported from the News and Archive databases to a series of compressed files
in the /export/data/backup directory. The files that are created in this directory will be called
output-0.idx.gz, output-1.idx.gz and so on.
http://12.3.4.56:4001/DREEXPORTIDX?
In this example, all IDX documents in IDOL server are exported to a series of compressed files in
the IDOL server's current working directory (IDOLserver\IDOL\content). The files that are created
in this directory will be called AUTN-IDX-EXPORT-12.04.2005-02.15.41-0.idx.gz, AUTN-IDXEXPORT-12.04.2005-02.15.41-1.idx.gz and so on.
Note:
Multisection documents are not split across chunks, so the specified BatchSize is not used
exactly if this would require a multisection document to be split.
You dont need to uncompress compressed IDX files before indexing them. For example, the
command DREADD?output-0.idx.gz indexes the output-0.idx.gz file correctly without you
having to uncompress the file first.
Page 370
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<file name>
The path to the directory where the XML files that are exported will be stored. The path must
include a basic file name which IDOL server will postfix with incremental numbers and an
appropriate extension. If you dont specify a file name the files are exported to the current working
directory (IDOLserver\IDOL\content), and IDOL server creates a filename in the format AUTNXML-EXPORT-<date>-<time>-<incremental number>.<extension>.
<true / false>
Enter true if you want to compress the exported files (this is the default). Enter false if you dont
want to compress the files.
<database CSV>
If you dont want to export documents from all IDOL server databases, enter one or more
databases to which you want to restrict the export. If you want to specify multiple databases, you
must separate them with plus symbols, commas or spaces (there must be no space before or after
plus symbols or commas).
<size>
The number of document sections that you want to export to one XML file. By default this is
100,000 sections.
Page 371
<max date>
The latest creation date or time that a document can have in order to be exported.
Examples:
http://12.3.4.56:4001/DREEXPORTXML?FileName=/export/data/backup/
output&Compress=true&DatabaseMatch=News,Archive&BatchSize=1000&mindate=01/01/
2003&maxdate=01/01/2004
In this example, all XML documents that have dates between the 1st of January 2003 and the 1st
of January 2004 are exported from the News and Archive databases to a series of compressed
files in the /export/data/backup directory. The files that are created in this directory will be called
output-0.xml.gz, output-1.xml.gz and so on.
http://12.3.4.56:4001/DREEXPORTXML?
In this example, all XML documents in IDOL server are exported to a series of compressed files in
the IDOL server's current working directory (IDOLserver\IDOL\content). The files that are created
in this directory will be called AUTN-XML-EXPORT-12.04.2005-02.15.41-0.xml.gz, AUTN-XMLEXPORT-12.04.2005-02.15.41-1.xml.gz and so on.
Note:
Multisection documents are not split across chunks, so the specified BatchSize is not used
exactly if this would require a multisection document to be split.
You dont need to uncompress compressed XML files before indexing them. For example, the
command DREADD?output-0.xml.gz indexes the output-0.xml.gz file correctly without you
having to uncompress the file first.
Page 372
http://<host>:<port>/DRECHANGEMETA?Type=<type>&Refs=<doc refs>&Docs=<doc
IDs>&NewValue=<value>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
<type>
Enter one of the following to specify which value type you want to set to the specified new <value>:
date
The index date of the documents. Note the date you specify must be in a format that you
have set for DateFormatCSVs in IDOL servers configuration file.
expiredate
The expire date of the documents. Note the date you specify must be in a format that you
have set for DateFormatCSVs in IDOL servers configuration file.
database
The database of the documents.
<doc refs>
Enter the references of the documents whose index date, expire date or database values you want
to change (you must escape the references). If you want to specify multiple references, you must
separate them with plus symbols (there must be no space before or after a plus symbol).
Page 373
For example:
http://12.3.4.56:4001/DRECHANGEMETA?Type=database&Docs=3+5+range=[7,10]&NewVal
ue=Archive
This command uses port 4001 to change the database that stores the documents with the ID 3, 5, 7, 8,
9 and 10 to the Archive database. IDOL server is located on a machine with the IP address 12.3.4.56.
Page 374
<data>
The fields that you want to replace in IDOL server. You need to specify each field as follows:
#DREDOCID <N> or #DREDOCREF <N>
#DREFIELDNAME <X>
#DREFIELDVALUE <Y>
<N>
The DocID or reference (URL) of the document that contains the field, which you want to
replace.
<X>
The name of the field whose value you want to change.
<Y>
The value that you want field x to change to. For example:
#DREDOCID 1
#DREFIELDNAME Price
#DREFIELDVALUE 10
#DREDOCREF http://www.autonomy.com/autonomy/dynamic/autopage442.shtml
#DREFIELDNAME Country
#DREFIELDVALUE UK
#DREENDDATA
In this example, the value of the Price field in the document with the DocID 1 is changed to
10. The value of the Country field in the document with the reference
http://www.autonomy.com/autonomy/dynamic/autopage442.shtml is changed to UK.
If the fields whose values you are changing are Index or ACL fields (see page 279and page 285),
IDOL server needs to reindex the documents in which you are making the changes. If you are
changing numerical fields, numerical date fields (see page 287 and page 289) or fields that you have
assigned another property to, IDOL server can execute your changes without reindexing, so that these
changes are made very quickly.
Page 375
Note: you can automatically back up IDOL servers Data index whenever a DRECOMPACT
command is issued (see To back up the Data index automatically whenever a DRECOMPACT
command is issued: on page 380). It is good practice to back up IDOL server (see Backing up IDOL
servers Data index on page 378) before compacting it.
http://<host>:<port>/DRECOMPACT
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
For example:
http://12.3.4.56:4001/DRECOMPACT
This command uses port 4001 to compact the data content of an IDOL server that is located on
a machine with the IP address 12.3.4.56.
Page 376
Open IDOL server's configuration file in a text editor and find the [Schedule] section. If
the configuration file does not contain a [Schedule] section, you can add one.
2.
For example:
[Schedule]
Compact=true
CompactTime=00:00
CompactInterval=24
Page 377
Issue a DREBACKUP command (case sensitive) from your web browser to copy all the
IDOL server Data index's *.DB files to a new location:
http://<host>:<port>/DREBACKUP?<path>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files
[Server] section.
<path>
Enter the path to the location where you want to create IDOL server's backup.
For example:
http://12.3.4.56:4001/DREBACKUP?E:\Backup
This command uses port 4001 to create a backup of IDOL servers Data index on
E:\Backup. The IDOL server whose Data index is backed up is located on a machine
with the IP address 12.3.4.56.
4.
Issue a DREINITIAL command (case sensitive) from your web browser in order to
restore the files to an IDOL server:
http://<host>:<port>/DREINITIAL?<path>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
Alternatively, if you are using multiple IDOL servers, enter the IP address (or name)
of the machine on which your DIH is installed.
Page 378
For example:
http://12.3.4.56:4001/DREINITIAL?E:\DataIndex_Backup
This command uses port 4001 to restore the files backed up on E:\DataIndex_Backup
to an IDOL server that is located on a machine with the IP address 12.3.4.56.
Open IDOL server's configuration file in a text editor and find the [Schedule] section. If
the configuration file does not contain a [Schedule] section, you need to add one.
2.
Page 379
Page 380
http://<host>:<port>/DREINITIAL?
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the IndexPort that you have specified in the IDOL server configuration files [Server]
section.
For example:
http://12.3.4.56:4001/DREINITIAL?
This command uses port 4001 to reset the Data index of an IDOL server that is located on a machine
with the IP address 12.3.4.56 to its original state.
Page 381
http://<host>:<port>/action=Export&FileName=<file name>
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the Port that you have specified in the IDOL server configuration files [Server] section.
<file name>
Enter the name of the XML file to which you want to export IDOL servers users, roles, agents and
profiles. If the XML file is not stored in the same directory as IDOL server, you must specify the
path to the file as well.
For example:
http://12.3.4.56:4000/action=Export&FileName=MyFile.xml
This command uses port 4000 to export IDOL servers users, roles, agents and profiles to the
MyFile.xml file.
Page 382
For example:
http://12.3.4.56:4000/action=Import&FileName=MyFile.xml
This command uses port 4000 to import IDOL servers users, roles, agents and profiles from the
MyFile.xml file.
Page 383
2.
Find the [Logging] section. (If the configuration file does not contain a [Logging] section, you
need to create one).
3.
Under the [Logging] section's heading, create a list of the log streams that you want to set up
using the format <N>=<log stream name>.
For example:
[Logging]
0=INDEX_LOG_STREAM
1=QUERY_LOG_STREAM
2=APP_LOG_STREAM
In this example 3 log streams have been defined, which log index, query and application
occurrences. Note that the log streams are listed in consecutive order, starting from 0.
4.
Create a new section for each of the log streams that you have defined. Each section must have
the same name as the log stream.
For example:
[INDEX_LOG_STREAM]
[QUERY_LOG_STREAM]
[APP_LOG_STREAM]
5.
Specify the settings that you want to apply to each log stream in the appropriate log stream's
section. You can specify the type of logging that should be performed (for example, full logging), if
log messages should be displayed on the console, the maximum size of log files and so on.
For example:
[INDEX_LOG_STREAM]
logfile=logs/index.log
loghistorysize=50
logtime=true
logecho=false
maxlogsizekbs=1024
logtypecsvs=index
loglevel=full
Page 384
7.
Page 385
Page 386
Appendices
<host>
Enter the IP address (or name) of the machine on which IDOL server is installed.
<port>
Enter the port number that client machines use to communicate with IDOL server (this is
specified by the Port setting in the IDOL server configuration file's [Server] section).
2.
Click on the config help link in the top right-hand corner to display the configuration parameter
help (by default the action command help is displayed).
Note: the configuration file sections that each configuration parameter can be used in are listed
under Allowed in Sections.
Note:
You can also generate configuration help without starting IDOL server. Issue the following command
from the command line to generate html files in your installation directory:
<IDOLserver_installation_directory_path><IDOLserver_installation_name>.exe -help
Page 389
If you want to enter a comma separated list of strings for a parameter, and one of the strings contains
a comma, you must indicate the start and the end of this string with quotation marks.
For example:
ParameterName=cat,dog,bird,"wing,beak",turtle
If any string within a comma separated list contains quotation marks, you must put this string into
quotation marks and escaped the quotation marks in the string by putting a slash in front of them.
For example:
ParameterName="<font face=\"arial\"size=\"+1\"><b>",dog,bird,"wing,beak",turtle
Page 390
[License]
[Service]
[Server]
[TermCache]
[IndexCache]
[SectionBreaking]
[Paths]
[Databases]
[Schedule]
[Summary]
[FieldProcessing]
[Properties]
[Security]
[User]
[UserSecurityFields]
[UserSecurity]
[Role]
[Agent]
[Profile]
[ProfileNamedAreas]
[Community]
[UserCustom]
[UserStructure]
[DRE]
[DataDRE]
[Cluster]
[Taxonomy]
Page 391
[AnalysisSchedules]
[IndexTasks]
[DocumentTracking]
[Synonym]
[Templates]
[Logging]
[LanguageTypes]
Note: which of the above listed configuration segments you require depends on which operations you
want your IDOL server to carry out.
[License] section
The [License] section contains licensing details which you should not change.
For example:
LicenseServerHost=127.0.0.1
LicenseServerACIPort=20000
LicenseServerTimeout=600000
LicenseServerRetries=1
[Service] section
The [Service] section contains settings that determine which machines are permitted to use
and control the IDOL server service.
For example:
[Service]
ServicePort=40010
ServiceControlClients=127.0.0.1
ServiceStatusClients=127.0.0.1
Page 392
[Server] section
The [Server] section contains general settings.
For example:
IndexClients=*.*.*.*
AdminClients=*.*.*.*
IndexPort=9001
Port=9000
Threads=4
MaxInputString=16000
DelayedSync=TRUE
AutoDetectLanguagesAtIndex=TRUE
XSLTemplates=FALSE
DateFormatCSVs=SHORTMONTH#SD+#SYYYY,DD/MM/YYYY,YYYY/MM/DD,YYYYMM-DD,EPOCHSECONDS
KillDuplicates=*/DREREFERENCE
DocumentDelimiterCSVs=*/DOCUMENT
CantHaveFieldCSVs=*/DRESTORECONTENT,*/CHECKSUM,*/DREWORDCOUNT,*/
DRETYPE,*/IMPORTBODYLEN,*/IMPORTMETALEN,*/IMPORTLINKLEN,*/
IMPORTTITLELEN,*/IMPORTQUALITY,*/DREPAGE,*/DREFILENAME,*/
dredoctype
InactiveSchedules=all
[TermCache] section
The [TermCache] section contains settings that determine how much memory IDOL server
uses to cache query terms.
For example:
[TermCache]
TermCacheMaxSize=102400
[IndexCache] section
The [IndexCache] section contains settings that determine how much memory IDOL server
uses to cache data for indexing.
For example:
[IndexCache]
IndexCacheMaxSize=102400
Page 393
[SectionBreaking] section
The [SectionBreaking] section contains settings that determine the size of the sections that
documents are broken up into before they are indexed.
For example:
[SectionBreaking]
MinFieldLength=80
MaxSectionLength=2000
[Paths] section
The [Paths] section contains settings that allow you to split the database into multiple partitions
and settings that indicate the location of files that IDOL server uses.
For example:
[Paths]
DyntermPath=./dynterm
NodetablePath=./nodetable
RefIndexPath=./refindex
MainPath=./main
StatusPath=./status
Main=./extendedindex
UserPath=./users
Modules=./res/modules
ClusterDirectory=./cluster
TaxonomyDirectory=./taxonomy
CategoryDirectory=./category
ImExDirectory=./imex
TemplateDirectory=./templates
[Databases] section
The [Databases] section lists the databases in which IDOL server stores its data and contains
a subsection for each of the databases, in which you specify settings that only apply to this
database. Note that if you are indexing documents in multiple languages, you don't need to
create a database for each of the languages.
For example:
[Databases]
NumDBs=2
[Database0]
Name=News
[Database1]
Name=Archive
Page 394
[Schedule] section
The [Schedule] section contains settings that allow you to schedule when IDOL server is
compacted and when documents are expired from databases.
For example:
[Schedule]
Compact=true
Expire=true
CompactTime=00:00
CompactInterval=672
ExpireTime=00:00
ExpireInterval=24
[Summary] section
The [Summary] section contains settings that determine summary details.
For example:
[Summary]
MinWordsPerSentence=10
[FieldProcessing] section
The [FieldProcesssing] section lists the processes that you want to apply to fields, and
contains a subsection for each of the processes, in which you define the process.
For example:
[FieldProcessing]
Number=15
0=SetIndexFields
1=SetIndexAndWeightHigher
2=SetSectionBreakFields
3=SetDateFields
4=SetDatabaseFields
5=SetReferenceFields
6=SetTitleFields
7=SetHighlightFields
8=SetSourceFields
9=DetectNT_V4Security
10=DetectNotes_V4Security
11=DetectNetware_V4Security
12=DetectExchange_V4Security
13=DetectDocumentum_V4Security
14=HideAutonomyMetaDataField
Page 395
[SetIndexFields]
Property=IndexFields
PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE
[SetIndexAndWeightHigher]
Property=IndexWeightFields
PropertyFieldCSVs=*/SUMMARIES
[SetSectionBreakFields]
Property=SectionFields
PropertyFieldCSVs=*/DRESECTION
[SetDateFields]
Property=DateFields
PropertyFieldCSVs=*/DREDATE,*/DATE
[SetDatabaseFields]
Property=DatabaseFields
PropertyFieldCSVs=*/DREDBNAME,*/DATABASE
[SetReferenceFields]
Property=ReferenceFields
PropertyFieldCSVs=*/DREREFERENCE,*/REFERENCE
[SetTitleFields]
Property=TitleFields
PropertyFieldCSVs=*/DRETITLE,*/TITLE
[SetHighlightFields]
Property=HighlightFields
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT
[SetSourceFields]
Property=SourceFields
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT
[DetectNT_V4Security]
Property=SecurityNT_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=nt
[DetectNotes_V4Security]
Property=SecurityNotes_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=*notes_v4
Page 396
[DetectNetware_V4Security]
Property=SecurityNetware_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=*netware_v4
[DetectExchange_V4Security]
Property=SecurityExchange_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=*exchange_v4
[DetectDocumentum_V4Security]
Property=SecurityDocumentum_V4
PropertyFieldCSVs=*/SECURITYTYPE
PropertyMatch=*documentum
[HideAutonomyMetadataField]
Property=HideMetaDataFields
PropertyFieldCSVs=*/AUTONOMYMETADATA
[Properties] section
The [Properties] section lists the properties that you have created for the processes that you
have listed in the [FieldProcessing] section, and contains a subsection for each of the
properties, in which you set configuration parameters that are applied to associated fields.
For example:
[Properties]
0=IndexFields
1=IndexWeightFields
2=SectionFields
3=DateFields
4=DatabaseFields
5=ReferenceFields
6=TitleFields
7=HighlightFields
8=SourceFields
9=SecurityNT_V4
10=SecurityNotes_V4
11=SecurityNetware_V4
12=SecurityExchange_V4
13=SecurityDocumentum_V4
14=HideMetaDataFields
Page 397
[IndexFields]
Index=TRUE
[IndexWeightFields]
Index=TRUE
Weight=2
[SectionFields]
SectionBreakType=TRUE
[DateFields]
DateType=TRUE
[DatabaseFields]
DatabaseType=TRUE
[ReferenceFields]
ReferenceType=TRUE
TrimSpaces=TRUE
[TitleFields]
TitleType=TRUE
[HighlightFields]
HighlightType=TRUE
[SourceFields]
SourceType=TRUE
[SecurityNT_V4]
SecurityType=NT_V4
[SecurityNotes_V4]
SecurityType=Notes_V4
[SecurityNetware_V4]
SecurityType=Netware_V4
[SecurityExchange_V4]
SecurityType=Exchange_V4
[SecurityDocumentum_V4]
SecurityType=Documentum_V4
[HideMetaDataFields]
HiddenType=TRUE
ACLType=TRUE
Page 398
[Security] section
The [Security] section lists the security modules that you are using, and contains a subsection
for each of the security modules, in which you can specify the settings that you want to apply to
each module.
For example:
[Security]
SecurityInfoKeys=123,144,564,231
0=NT_V4
1=Netware_V4
2=Notes_V4
3=Exchange_V4
4=Documentum_V4
[NT_V4]
SecurityCode=1
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_NT_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Netware_V4]
SecurityCode=2
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_NETWARE_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Notes_V4]
SecurityCode=3
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_NOTES_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Exchange_V4]
SecurityCode=4
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_EXCHANGE_GRPS_MAPPED
ReferenceField=*/AUTONOMYMETADATA
[Documentum_V4]
SecurityCode=5
Library=C:\IDOLserver\IDOL\modules\mapped_security
Type=AUTONOMY_SECURITY_V4_DOCUMENTUM_MAPPED
ReferenceField=*/AUTONOMYMETADATA
Page 399
Note:
use the [FieldProcessing] and [Properties] section to identify fields that determine the
security type of documents and the processes that should be applied to these fields or
documents.
if you are running your IDOL server on a UNIX platform, you need to specify the
LD_LIBRARY_PATH to ensure that IDOL server can find the shared objects that it
requires in order to implement security.
[User] section
The [User] section contains settings that determine how many agents each user can have and
which fields belong to these agents.
For example:
[User]
XmlTempDirectory=C:\IDOLserver\IDOL\community\temp\userxml
MaxAgents=10
IndexFieldCSVs=drelanguagetype
[UserSecurityFields] section
The [UserSecurityFields] section lists the security fields.
For example:
[UserSecurityFields]
0=username
1=password
2=group
3=domain
Page 400
[UserSecurity] section
The [UserSecurity] section lists your security repositories, specifies generic settings for them,
and contains a subsection for each of the listed security repositories, in which you can specify
the settings that you want to apply to this security repository.
Note: you can list up to 8 security types.
For example:
[UserSecurity]
DefaultSecurityType=0
DocumentSecurity=TRUE
SyncRolesFromGroups=FALSE
SecurityUsernameDefaultToLoginUsername=FALSE
0=Autonomy
1=NT
2=Notes
3=LDAP
4=Documentum
5=Exchange
6=Netware
[Autonomy]
Library=C:\IDOLserver\IDOL\modules\user_autnsecurity
EnableLogging=FALSE
DocumentSecurity=FALSE
SecurityFieldCSVs=none
[NT]
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
Library=C:\IDOLserver\IDOL\modules\user_ntsecurity
EnableLogging=FALSE
DocumentSecurity=TRUE
V4=TRUE
SecurityFieldCSVs=username,domain
Domain=DOMAIN
DocumentSecurityType=NT_V4
Page 401
[Notes]
Library=C:\IDOLserver\IDOL\modules\user_notessecurity
EnableLogging=FALSE
NotesAuthURL=http://notesserver/names.nsf
DocumentSecurity=TRUE
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
SecurityFieldCSVs=username
DocumentSecurityType=Notes_V4
[LDAP]
Library=C:\IDOLserver\IDOL\modules\user_ldapsecurity
EnableLogging=FALSE
RDNAttribute=CN
Group=OU=Users,O=Company
LDAPServer=127.0.0.1
LDAPPort=389
FieldCSVs=email,emailaddress,telephone
LDAPAllAttributeValues=TRUE
LDAPAttributeValueSeparatorChar=,
SecurityFieldCSVs=none
DocumentSecurity=FALSE
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
[NT]
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
Library=C:\IDOLserver\IDOL\modules\user_ntsecurity
EnableLogging=FALSE
DocumentSecurity=TRUE
V4=TRUE
SecurityFieldCSVs=username,domain
Domain=DOMAIN
DocumentSecurityType=NT_V4
[Documentum]
DocumentSecurity=TRUE
SecurityFieldCSVs=username
DocumentSecurityType=Documentum_V4
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
Page 402
[Exchange]
DocumentSecurity=TRUE
V4=FALSE
SecurityFieldCSVs=username,domain
DocumentSecurityType=Exchange_V4
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
[Netware]
DocumentSecurity=TRUE
DocumentSecurityType=Netware_V4
SecurityFieldCSVs=username
CaseSensitiveUserNames=FALSE
CaseSensitiveGroupNames=FALSE
[Role] section
The [Role] section contains role details.
For example:
[Role]
DefaultRolename=everyone
AutoSetDatabases=TRUE
DatabasePrivilege=databases
[Agent] section
The [Agent] section determines how agents are going to operate.
For example:
[Agent]
DynamicAgentFields=TRUE
DreCombine=Simple
DreSentences=3
DreCharacters=300
DrePrint=All
DreSummary=Context
DontCopyAgentFields=emailaddress
ResultsCacheDuration=60
AgentResultsCacheDuration=60
AgentIndexFieldCSVs=drelanguagetype
Page 403
[Profile] section
The [Profile] section contains settings that apply to profiles.
For example:
[Profile]
DreCombine=Simple
DreSentences=3
DreCharacters=300
DrePrint=All
DreSummary=Context
ResultsCacheDuration=60
AgentResultsCacheDuration=60
DreMaxQueryTerms=20
[ProfileNamedAreas] section
The [ProfileNamedAreas] section determines the names of the areas that contain the profiles
that are created when users read or write documents.
For example:
[ProfileNamedAreas]
0=default
1=authored
[Community] section
The [Community] section determines how community queries operate.
For example:
[Community]
DreMinScore=20
DreWeighFieldText=FALSE
ExpandQuery=FALSE
ExpandQueryLog=FALSE
ExpandQueryMinScore=60
ExpandQueryMaxResults=30
ExpandQueryMaxScore=80
Page 404
[UserCustom] section
The [UserCustom] section allows you to add custom functionality to IDOL server. It lists the
functionality that you are adding and contains a subsection for each functionality in which you
can specify the settings that apply to this functionality (for example which shared library it uses).
For example:
[UserCustom]
0=Email
[Email]
Library=C:\IDOLserver\IDOL\modules\user_email
FromHost=127.0.0.1
SmtpHost=smtp.company.com
SMTPPort=25
DrePrint=all
XSLTemplate=C:\IDOLserver\IDOL\templates/email.xss
EmailActionXSLTemplate=C:\IDOLserver\IDOL\templates/ondemand.xss
ClassificationServerXSLTemplate=C:\IDOLserver\IDOL\templates/
channels.xss
RunMailer=FALSE
Retries=2
TimeoutMS=15000
StartTime=9:00
Interval=1 day
Cycles=-1
FromName=IdolMailer
DefaultSendEmail=TRUE
DefaultEmailFormat=text/html
DefaultExcludeReadDocuments=TRUE
DefaultAddSetToReadDocuments=TRUE
DefaultSubject=USERNAME's Results
MaxEmailsPerUser=20
From=user@company.com
Page 405
[UserStructure] section
The [UserStructure] section comprises settings that determine the structure of the binary data
files that are stored on disk. You cannot change these settings once you have finished setting
up IDOL server.
For example:
[UserStructure]
MaxAgents=20
AgentTrainingLength=512
TermSize=20
SecurityFieldLength=64
AgentFixedFieldLength=1
[DRE] section
The [DRE] section allows you to list Query, TermGetBest and TermGetInfo action
parameters that you want to be available for Agent and Profile queries (by setting them for the
DRE<QueryParameter> parameter in the [Agent] or [Profile] section).
For example:
[DRE]
AdditionalDREQueryParameters=Characters,MaxDate
AdditionalDRETermGetBestParameters=Weights
AdditionalDRETermGetInfoParameters=OnlyExisting
[DataDRE] section
If you are distributing your IDOL server across multiple machines, the [DataDRE] section
allows you to specify Data index settings.
For example:
[DataDRE]
Host=7.89.01.2
AciPort=6002
Timeout=5000
Page 406
[Cluster] section
The [Cluster] section contains the details for clustering.
For example:
[Cluster]
ResultExpiryDays=30
SnapshotExpiryDays=30
SGExpiryDays=30
DownloadDocAction=drecontents
TitleFromSummary=TRUE
SummaryField=autn:summary
[Taxonomy] section
The [Taxonomy] section contains the taxonomy details.
For example:
[Taxonomy]
MaxConcepts=100
RelevanceThreshold=20
DistributionThreshold=10
ConceptThreshold=400
MinConceptOccs=15
CompoundRelevance=40
SiblingStrength=20
MinChildren=1
OnlyMatchSubset=0
MaxQNum=5000
[AnalysisSchedules] section
The [AnalysisSchedules] section summarizes the number of classification schedules that you
want to execute, and contains a subsection for each of these schedules in which you can
specify details for this schedule . You can schedule the following actions:
ClusterSnapshot
ClusterCluster
ClusterSGDataGen
TaxonomyGenerate
For example:
[AnalysisSchedules]
Number=5
Page 407
[AnalysisSchedule0]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=CLUSTERSNAPSHOT
TargetJobname=myjob
[AnalysisSchedule1]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=CLUSTERCLUSTER
SourceJobName=myjob
TargetJobName=myjob_clusters
DoMapping=TRUE
[AnalysisSchedule2]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=CLUSTERCLUSTER
SourceJobName=myjob
TargetJobName=myjob_clusters_new
WhatsNew=TRUE
Interval=86400
[AnalysisSchedule3]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=CLUSTERSGDATAGEN
Interval=604800
SourceJobName=myjob
TargetJobName=myjob_sg
[AnalysisSchedule4]
ScheduleStartTime=now
ScheduleInterval=1 day
ScheduleCycles=-1
ScheduleAction=TAXONOMYGENERATE
Cluster=0,1,2,3,4,5,6,7,8,9
SourceJobName=myjob_clusters
TargetJobName=myjob_taxonomy
NumResults=25
Page 408
[IndexTasks] section
The [IndexTasks] section determines which tasks IDOL server performs on data before
indexing. It includes the StartTask setting, which identifies which task ahould be executed first
and a subsection for each of the tasks in which you can specify details for each task.
For example:
[IndexTasks]
StartTask=CatTask
[AlertTask]
Module=Alert
IdolServer=localhost:9000
NextTask=IndexTask
SMTPServer=1.23.45.6
SMTPPort=25
SMTPSubject="Alert from IDOLServer"
SMTPSendFrom=postmaster
Template=res/templates/alertTemplate.html
AttachmentTemplate=res/templates/alertTemplate.html
Fields=DRECONTENT
FieldMappings=text
Queryparameters=MinScore=80
ContentType=text/html
[CatTask]
Module=Cat
IdolServer=localhost:9000
NextTask=AlertTask
TextFields=DRECONTENT
TagField=CategoryTag
[IndexTask]
Module=Index
IdolServer=127.0.0.1:9001
Page 409
[DocumentTracking] section
The [DocumentTracking] section contains settings that enable the tracking of documents
through import and indexing using an Autonomy Service Dashboard.
For example:
[DocumentTracking]
DiSHACIPort=7002
DiSHHost=1.23.45.6
DiSHRetries=4
DiSHTimeout=120000
DocumentTrackingActive=true
[Synonym] section
The [Synonym] section lists the settings that determine how IDOL server handles synonym
queries. A synonym query returns results which are conceptually similar to the query's terms
and / or conceptually similar to the synonyms that are available for the query's terms.
For example:
[Synonym]
0=PC_Syn
[PC_Syn]
File=myfile.txt
MaxExpandLevel=1
Note: to be able to send synonym queries to IDOL server, you need to set up a synonym file
and add the Synonym action parameter to your query.
[Templates] section
The [Templates] section lists the templates that are required to output results in a manner that
is compatible with non-ACI compatible applications. It contains a section for each of the listed
templates, in which the templates' components are defined.
For example:
[TEMPLATES]
0=results_template
1=content_template
DefaultResultsTemplate=results_template
DefaultContentTemplate=content_template
Page 410
[results_template]
TemplateHeader=templates/resultsheader.txt
TemplateBody=templates/resultsbody.txt
TemplateFooter=templates/resultsfooter.txt
TemplateMimeType=text/plain
[content_template]
TemplateHeader=templates/contentheader.txt
TemplateBody=templates/contentbody_limitedfields.txt
TemplateFooter=templates/contentfooter.txt
[Logging] section
The [Logging] section lists the logging streams that you want to set up in order to create
separate log files for different log message types (query, index and application). It also contains
a subsection for each of the listed logging streams, in which you can configure the settings that
determine how each stream is logged.
For example:
[Logging]
LogArchiveDirectory=C:\IDOLserver\IDOL\logs\archive
LogDirectory=C:\IDOLserver\IDOL\logs
// These values apply to all streams, override on an individual
basis
LogTime=TRUE
LogEcho=TRUE
LogLevel=normal
OldLogFileAction=compress
LogOldAction=move
LogHistorySize=50
MaxLogSizeKbs=10240
//log streams
0=ApplicationLogStream
1=QueryLogStream
2=IndexLogStream
3=QueryTermsLogStream
4=UserLogStream
5=CategoryLogStream
6=ClusterLogStream
7=TaxonomyLogStream
8=ScheduleLogStream
9=CommunityTermLogStream
Page 411
[ApplicationLogStream]
LogFile=application.log
LogTypeCSVs=application
[QueryLogStream]
LogFile=query.log
LogTypeCSVs=query
[IndexLogStream]
LogFile=index.log
LogTypeCSVs=index
[QueryTermsLogStream]
LogFile=queryterms.log
LogTypeCSVs=queryterms
[UserLogStream]
LogFile=user.log
LogTypeCSVs=user
[CategoryLogStream]
LogFile=category.log
LogTypeCSVs=category
[ClusterLogStream]
LogFile=cluster.log
LogTypeCSVs=cluster
[TaxonomyLogStream]
LogFile=taxonomy.log
LogTypeCSVs=taxonomy
[ScheduleLogStream]
LogFile=schedule.log
LogTypeCSVs=schedule
[CommunityTermLogStream]
LogFile=term.log
LogTypeCSVs=term
Note: all queries are truncated to 4000 characters in query logs.
Page 412
[LanguageTypes] section
The [LanguagesTypes] section lists the language types that you want to use. It contains a
section for each of the listed language types, in which you configure the settings that determine
how each language type is handled.
For example:
[LanguageTypes]
DefaultLanguageType=englishASCII
DefaultEncoding=UTF8
LanguageDirectory=C:\IDOLserver\IDOL\langfiles
0=englishASCII
1=englishUTF8
2=chineseCHINESESIMPLIFIED
3=chineseCHINESETRADITIONAL
4=chineseUTF8
5=frenchASCII
6=frenchUTF8
7=germanASCII
8=germanUTF
[englishASCII]
LanguageCode=1
Language=ENGLISH
Encoding=ASCII
Stoplist=english.dat
IndexNumbers=1
[englishUTF8]
LanguageCode=2
Language=ENGLISH
Encoding=UTF8
Stoplist=english.dat
IndexNumbers=1
[chineseCHINESESIMPLIFIED]
LanguageCode=21
Language=CHINESE
Encoding=CHINESESIMPLIFIED
SentenceBreaking=chinesebreaking
IndexNumbers=1
Page 413
[chineseCHINESETRADITIONAL]
LanguageCode=22
Language=CHINESE
Encoding=CHINESETRADITIONAL
SentenceBreaking=chinesebreaking
IndexNumbers=1
[chineseUTF8]
LanguageCode=23
Language=CHINESE
Encoding=UTF8
SentenceBreaking=chinesebreaking
IndexNumbers=1
[frenchASCII]
LanguageCode=38
Language=FRENCH
Encoding=ASCII
Stoplist=french.dat
IndexNumbers=1
[frenchUTF8]
LanguageCode=39
Language=FRENCH
Encoding=UTF8
Stoplist=french.dat
IndexNumbers=1
[germanASCII]
LanguageCode=42
Language=GERMAN
Encoding=ASCII
Stoplist=german.dat
IndexNumbers=1
[germanUTF8]
LanguageCode=43
Language=GERMAN
Encoding=UTF8
Stoplist=german.dat
IndexNumbers=1
Page 414
-2
-3
-4
-5
-6
-7
-8
-9
-10
-11
-12
-13
-14
-15
-16
-17
-18
-19
-20
-21
-22
-23
-24
-25
Page 415
Error messages
VQL conversion error messages
The following error messages are produced by Legacy Profile tasks if they encounter ill-formed or
unexpected content during the VQL conversion that it performs in order to import a VQL category into
IDOL server.
If you have set up a log stream in IDOL server's configuration file (see Setting up log streams on
page 384) that has LogTypeCSVs set to ExtendedIndex, VQL conversion error messages are written
to log files under the following line of text:
[LP - VQLTask] VQL conversion failed
Note: while corrupt VQL may produce several errors, only a single error will be reported.
General errors
Unknown operator <operator_name>
The category that the Legacy Profile task is importing includes an element in angled brackets
that is not one of the operators which the task can convert.
For example:
<cat>
Page 416
Proximity errors
NEAR requires at least two operands or
Problematic proximity operator - at least two operands required
The category that the Legacy Profile task is importing contains an expression in which a
proximity operator occurs with only one operand.
The proximity operators NEAR, PARAGRAPH and SENTENCE require two or more words or
phrases to be within a specified distance from each other.
Unrecognized
The category the Legacy Profile task is importing contains an expression in which it does not
recognize the usage of the NEAR operator. This can be for one of the following reasons:
the usage of NEAR is unsupported. For example, the Legacy Profile task does not
convert nested NEAR statements such as:
dog <NEAR/10> (kennel <NEAR/10> bone)
Page 417
Page 418
Page 419
Page 420
Unmatched quotes
The category that the Legacy Profile task is importing contains an odd number of double quote
marks.
Page 421
Expression errors
VQL is not in conjunctive IN format
The conjunctive in format requires that each line of VQL consists of one or more component
expressions that are connected with the AND operator.
Each component must exclude the IN operator, or be of the form expression <IN> field, where
expression excludes the IN operator.
For example:
expression A
In this example, expression A does not use the IN operator.
Page 422
Page 423
GetConfig
The GetConfig command returns the services configuration file settings.
http://<host>:<port>/action=GetConfig
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
GetLogStream
The GetLogStream command returns a specific log stream for the service.
http://<host>:<port>/action=GetLogStream&Name=<name>&FromDisk=<true/
false>&Tail=<number>
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
<name>
Enter the name of the log stream that you want to return.
<true/false>
Enter true if you want the log stream to be read from disk rather than from memory. By default this is
false.
<number>
Enter the number of lines that you want to return from the log stream. The lines are read from the top
(that is the most recent lines are retuned). Enter -1 to return all entries (this is the default).
Page 424
GetLogStreamNames
The GetLogStreamNames command returns the names of the log streams that have been set up for
the service.
http://<host>:<port>/action=GetLogStreamNames
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
GetStatistics
The GetStatistics command returns statistics for the service.
http://<host>:<port>/action=GetStatistics
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
Page 425
GetStatus
The GetStatus command returns the services status (running or stopped).
http://<host>:<port>/action=GetStatus
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
GetStatusInfo
The GetStatusInfo command returns status information for the service (for example, the services
product name, version number and so on).
http://<host>:<port>/action=GetStatusInfo
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
Page 426
MergeConfig
The MergeConfig command allows you to merge IDOL servers configuration file with one or more
configuration file sections. Alternatively, you can use it to set or delete individual configuration
parameters.
Using MergeConfig to merge a configuration file with one or more configuration file sections
If IDOL servers configuration file already contains a section that has the same name as the section
with which it is going to be merged, any settings that only the new section contains are added to the
existing section. If the new section contains settings that are already present in the existing section,
the new section's settings overwrite the settings of the old section.
Note: This command requires a POST request method
action=MergeConfig&Config=<configuration_file_content>
<configuration_file_content
Enter the configuration file content that you want to merge with the content of IDOL servers
configuration file.
Note that you must escape the configuration file content.
Page 427
<value>
The value that you want to set for the corresponding <param>.
For example:
http://1.23.45.6:10000/action=MergeConfig&Key0=Server/DeleteAfterAdd&Value0=true&
Key1=UserEmail/RunMailer&Value1=true
In this example, the MergeConfig command is used to set the value of the DeleteAfterAdd parameter
in the configuration files [Server] section to true, and to set the value of the RunMailer parameter in
the configuration files [UserEmail] to true.
For example:
http://1.23.45.6:10000/action=MergeConfig&Key0=Server/DeleteAfterAdd&Key1=UserEm
ail/RunMailer
In this example, the MergeConfig command is used to delete the DeleteAfterAdd parameter from the
configuration files [Server] section, and to delete the RunMailer parameter from the configuration
files [UserEmail] section.
Page 428
SetConfig
The SetConfig command allows you to set IDOL servers configuration file.
Note: this command requires a POST request method
action=SetConfig&Config=<configuration_file_content>
<configuration_file_content
Enter the configuration file content with which you want to overwrite the current content of the IDOL
server configuration file.
Note that you must escape the configuration file content.
Stop
The Stop command stops the service
http://<host>:<port>/action=Stop
<host>
The IP address (or name) of the machine that hosts the service.
<port>
Enter the ServicePort that you have specified in the IDOL server configuration files [Service] section.
Page 429
Page 430
#DRETITLE
Enter the title of the document. You can enter multiple lines.
#DRECONTENT
Enter the content of the document. You can enter multiple lines.
(This parameter is optional. However, if you dont enter
#DRECONTENT, you should specify one or more #DREFIELD
<Name><N>= as otherwise the document will not have any
content).
#DREFIELD
<Name><N>
Specify the name of each DREFIELD that you are defining, and
enter an appropriate value for it. For example, if you want to index
customer details:
#DREFIELD
#DREFIELD
#DREFIELD
#DREFIELD
#DREFIELD
#DREFIELD
surname1="Smith"
forename1="Peter"
title1="Mr."
surname2="Miller"
forename2="Susan"
title2="Dr."
Enter the creation date of the document using the format that you
have specified for DateFormat in IDOL servers configuration file.
By default this yyyy/mm/dd.
#DREDBNAME
Enter the name of the database into which you want to index the
document.
Page 431
#DRESTORECONTENT
#DRESECTION <N>
#DREENDDOC
Indicates the end of the document. You must enter this delimiter.
Note: The text file must start with #DREREFERENCE and end with #DREENDDOC.
Example text file
The following is an example of a text file that can be indexed into IDOL server:
#DREREFERENCE 392348A0
#DREFIELD authorname1="Brown"
#DREFIELD authorname2="Edgar"
#DREFIELD title="Dr."
#DREDATE 1998/08/06
#DRETITLE
Jurassic Molecules
#DRECONTENT
Scientists announced last week the successful reproduction of a
possible precursor to all life on Earth. The molecules consist of a
part of DNA and the molecular "scissors" responsible for destroying
messenger RNA in humans.
Using a technique called test tube evolution, scientists created a
nucleic acid enzyme, the first known enzyme that uses an amino acid to
start chemical activity. Scientists hope that the creation of this
molecule will lead to the elusive precursor. The precursor, by
definition, will have to contain both the genetic code for replication
and an enzyme to trigger self replication.
#DRETYPE text
#DREDBNAME Science
#DRESTORECONTENT y
#DREENDDOC
Page 432
Sectioning a document
If a document that you want to index contains more than 500 words, you should split it up into sections
in order to make it more manageable for IDOL server. If you want to index XML rather than IDX, you
dont need to section your data as IDOL server automatically applies sectioning to it.
Declare a separate document for each of the sections into which you are splitting the original
document, and give each section a #DRESECTION number. Note that if you split up a document into
sections:
you must put the content of each section into the #DRECONTENT field
apart from the #DRESECTION number and the #DRECONTENT each section must contain
the same DRE field values.
Page 433
#DREDBNAME Science
#DRESTORECONTENT y
#DREENDDOC
#DREREFERENCE 392348A0
#DREFIELD authorname1="Brown"
#DREFIELD authorname2="Edgar"
#DREFIELD title="Dr."
#DREDATE 1998/08/06
#DRETITLE
Jurassic Molecules
#DRESECTION 1
#DRECONTENT
Scientists have known for some time that the key ingredients for life
are DNA, RNA, and proteins. An interesting chicken-egg dilemma has
developed: which came first, RNA, DNA, or proteins? Many believe that
a replicating RNA molecule is the likely precursor to all life on
Earth.
RNA serves as both a genetic molecule and an enzyme in the body, which
scientists believe strongly suggests the likelihood of an RNA
precursor to all life. They speculate that RNA was first, followed by
DNA, the much more stable of the two. It would serve as an efficient
storehouse for the genetic code. Proteins, better catalysts than RNA,
likely evolved later as well. At some point, the current three-based
system developed from the initial one-based system of RNA.
Scientists hope that these scissors molecules may also have practical
uses in medicine, since the molecules can efficiently shred specific
DNA. Theoretically, it may be possible to tailor such a molecule to
attack and shred harmful DNA from pathogenic organisms. These
molecules could be made to be activated only in specific
circumstances.
#DRETYPE text
#DREDBNAME Science
#DRESTORECONTENT y
#DREENDDOC
Page 434
Glossary
ACI (Autonomy Content Infrastructure)
The Autonomy Content Infrastructure is a technology layer that automates operations on unstructured
information for cross enterprise applications, thus enabling an automated and compatible business-tobusiness, peer-to-peer infrastructure.
The ACI allows enterprise applications to understand and process content that exists in unstructured
formats, such as e-mail, Web pages, office documents, and Lotus Notes.
Agent index
IDOL server stores agents and profiles in its Agent index. By default the Agent index comprises the
Agent and Profile databases. The Agent index is configured automatically and should not be modified.
Agentboolean fields
IDOL server can store Boolean agents (a Boolean or Proximity expression that legacy technologies
use to categorize documents) in agentboolean fields. You can then query IDOL server with text and an
agentboolean field to return categories whose Boolean agent matches this text.
Agents
An agent searches for information about a specific topic. An administrator can create agents for users
or allow users to create their own agents.
Category index
IDOL server stores categories in its Category index. By default the Category index comprises the
Activated and Deactivated databases. The Category index is configured automatically and should not
be modified.
Page 435
Glossary
Clusters
Cluster information is hierarchically agglomerated data that has been extracted from snapshots (this
does not require the setup of an initial taxonomy). Each cluster represents a concept area that contains
a set of items, which share common properties. Clustering data allows you to make trends and
developments in data visible.
Community
The community comprises all people in a user's network neighborhood. It allows a user to find other
people in the community who have been looking at similar documents or have agents that are similar
to the user's agents.
Concept summary
A brief summary of each result document that is returned for a query. The concept summary displays a
few sentences that are typical of the result's content (these sentences can be from different parts of the
result document).
Connector
A connector is an Autonomy fetching solution (for example HTTPFetch, Oracle Fetch, File System
Fetch and so on) that allows you to retrieve information from any type of local or remote repository (for
example, a database or a web site). It imports the fetched documents into IDX or XML file format and
indexes them into IDOL server from where you can retrieve them (for example by sending queries to
IDOL server).
Context summary
Returns a conceptual summary of the result document that is biased by the terms in the query. A
context summary comprises sentences that are particularly relevant to the terms in the query (these
sentences can be from different parts of the result document).
Data index
IDOL server stores content data in its Data index. By default the Data index comprises the News and
Archive databases. You can customize how data is stored in the Data index by configuring appropriate
settings in the IDOL server configuration file.
Page 436
Glossary
Database
An Autonomy database is an IDOL server data pool that stores indexed information. The administrator
can set up one or more databases, and specifies how data is fed to the databases. By default IDOL
server contains the databases Profile, Agent, Activated, Deactivated, News and Archive.
Default user
By default IDOL server gives users the default user role. That means that the user only has the
privileges that have been allocated to this role.
IDOL server
Using Autonomy connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates
unstructured, semi-structured and structured information from multiple repositories through an
understanding of the content, delivering a real time environment in which operations across
applications and content are automated, removing all the manual processes involved in getting the
right information to the right people at the right time.
IDX
Apart from XML files only files that are in IDX format can be indexed into IDOL server. You can use a
connector to import files into this format or manually create IDX files (see Storing content in IDOL
server on page 83).
Indexing
The process of storing data in IDOL server. Data can be stored in different field types (index, numeric
and ordinary fields) or prevented from being storing It is important to store data in appropriate field
types in order to ensure optimized performance. IDOL server can return any fields it stores for queries,
however, you can only query for terms in Index fields.
Page 437
Glossary
Index fields
You should store fields that contain text which you want to query frequently as Index fields. Index fields
are processed linguistically when they are stored in IDOL server. This means that stemming and
stoplists are applied to text in Index field before they are stored, which allows IDOL server to process
queries for these fields more quickly (typically DRETITLE and DRECONTENT are fields that should be
set up as Index fields).
Link term
Link terms (also referred to as "Links") are terms in query text that are also contained in the result
documents that IDOL server returns for this query.
Privilege
The privileges of a user depend on the roles that have been allocated to him within IDOL server.
Privileges determine, for example, whether a user is allowed to access specific data.
Profiles
The profile of a user that is based on the concept of documents that the user reads. Every time a user
opens a document his profile is updated. This allows the administrator to bring new documents to the
user's attention which he is interested in according to his profile.
Query
You can submit a natural language query to IDOL server which analyzes the concept of the query and
returns documents that are conceptually similar to the query. You can also submit other query and
search types to IDOL server, for example, Boolean, bracketed Boolean and keyword searches.
Quick summary
A brief summary of each result document that is returned for a query. The quick summary displays the
first few sentences of the result document.
Reference fields
Reference fields are used to identify documents. At index time Reference fields can be used to
eliminate duplicate copies of documents. At query time Reference fields can be used to filter results.
Page 438
Glossary
Retrain
You can retrain agents by indicating which of the results that have been returned to you are most
relevant to your query. The retrained agent will then return more relevant results.
Role
Each user is allocated one or more roles within IDOL server by the administrator. The roles that a user
has determine which privileges he has.
Section breaking
IDOL server indexes documents in sections (the number of sections that a document is split up into
increases proportionally with the size of the document). This ensures that when you, for example,
query for text that is relevant to a specific part of a book, IDOL server can find the appropriate section
and return it to you (if the book was not indexed in sections, IDOL server might not be able to find the
text you are looking for, as it may not be conceptually relevant to the entire book).
Snapshot
Internal raw data from which you can extract clusters. You can thus generate cluster information and
spectrographs.
Stemming
In languages some words have a common morphological root. Autonomy provides stemming
algorithms that reduce words to this form. This is useful because it allows concepts to be matched
regardless of the grammatical use of words. In English for example, the words "help", "helpful",
"helping" and "helped" can all be stripped down to their stem "help" without significant loss of meaning.
Autonomy provides as standard a set of stemming algorithms for the most commonly used languages.
Stemming is applied after stopwords have been discarded both at index time (when content is stored in
IDOL server) and at query time (query text is stopped and stemmed before it is matched).
Stoplist
Each language that is supported needs a stoplist (located in IDOL servers langfiles directory) which
contains a list of common words that are not stored in IDOL server. Words as, for example, "the" or "a"
are used too frequently to carry any significance and IDOL server does not require them to understand
the concept of text.
Page 439
Glossary
Stopword
A common word that is used too frequently to carry any significance (for example, "the" or "a").
Stoplists list the stopwords for the languages that IDOL server supports. Stopwords are not stored in
IDOL server.
Synonym file
A synonym file allows IDOL server to handle synonym queries (IDOL server also needs to be
configured to enable synonym queries). A synonym query returns results which are conceptually
similar to the query's terms and / or conceptually similar to the synonyms that are available for the
query's terms. A synonym file contains comma separated lists of synonym strings for words. You can
specify lists for each language type you have set up in IDOL server within this file.
Taxonomy
IDOL server's taxonomy generation feature allows you to create automatically hierarchical contextual
taxonomies of clusters or other information. This provides you with an overview of the 'information'
landscape and an insight into specific areas of the information.
Term
The basic entity that is indexed into IDOL server (for example, a word in a document after stemming
has been applied to it).
Page 440
Index
A
ACI 435
API 14
Task 73
ACLType (configuration setting) 115, 279
Action commands 189
AdminRevokeLicense 43
AgentAdd 121
AgentCopy 123
AgentDelete 123
AgentEdit 122
AgentGetResults 122, 181
AgentRead 123
AgentRetrain 122
CategoryActivate 143
CategoryBuild 144
CategoryCopy 132
CategoryCreate 130
CategoryDelete 144
CategoryDeleteTraining 144
CategoryExportToXML 145
CategoryGetDetails 141
CategoryGetHierDetails 141
CategoryGetTNW 142
CategoryGetTraining 142
CategoryImportFromCluster 131
CategoryImportFromTopic 131
CategoryImportFromXML 132
CategoryMove 140
CategoryQuery 148, 181
CategoryReplace 143
CategorySetDetails 142
CategorySetTNW 143
CategorySetTraining 140
CategorySuggestFromCategory 147
CategorySuggestFromDocument 147
CategorySuggestFromText 147
CategorySyncCatDRE 145
CategorySyncCatDre 56
ClusterCluster 29, 36, 154, 155, 156, 160,
407
ClusterResults 155
ClusterServe2DMap 29, 36, 155
Page 441
Index
TaxonomyGenerate 132, 160, 261, 262, 407
TermGetAll 190
TermGetBest 189
TermGetInfo 189
UserAdd 110
Activating or deactivating categories 143
Adaptive Probabilistic Concept Modeling 435
Adding language type fields to documents 318
AdminClients (configuration setting) 190
Administering
Categories 141
IDOL server 357
Administration 4
AdminRevokeLicense action 43
Advanced keyword search 193, 232
AdvancedSearch (configuration setting) 193, 232
Afrikaans encoding settings 325
AFTER operator 195, 236
Agent index 435
[Agent] configuration file section 403
AgentAdd action 121
Agentboolean
Categories 300
Fields 299
Agentboolean fields 435
Storing Boolean agents 299
AgentCopy action 123
AgentDelete action 123
AgentEdit action 122
AgentGetResults action 122, 181
AgentRead action 123
AgentRetrain action 122
Agents 8, 109, 121, 435
AgentAdd action 121
AgentCopy action 123
AgentDelete action 123
AgentEdit action 122
AgentGetResults action 122, 181
AgentRead action 123
AgentRetrain action 122
Copying 123
Creating an agent 121
Deleting 123
Editing 122
Emailing agent results to users 176
Exporting 382
Page 442
Importing 383
Querying with an agent 122
Retraining 122
Training 121
Viewing an agents details 123
Albanian encoding settings 325
Alert task 73, 125
Alerting 8, 109, 125
Email templates 127
Users to new content 125
Users to new documents 73
alertTemplate.html 127
[AnalysisSchedules>] configuration file section
407
AND operator 194, 195, 236
APCM 435
Application server 39
Arabic encoding settings 326
ASCII 390
AttachmentTemplate (configuration setting) 127
Attributes
XML 69
AutoDetectLanguagesAtIndex (configuration
setting) 71, 320
Automater iii
Automatic Language Detection
Enabling 320
Automatic language detection 307
Autonomy
Content Infrastructure 435
Data flow and security 5
Infrastructure 1
Autonomy Service Dashboard 118, 410
Azeri encoding settings 326
B
Backing up IDOL servers Data index 378
Backup (configuration setting) 379
BackupCompression (configuration setting) 379
BackupDir<N> (configuration setting) 379
BackupInterval (configuration setting) 379
BackupMaintainDirStructure (configuration
setting) 379
BackupTime (configuration setting) 379
Basque encoding settings 326
Index
Before indexing
Data processing 73
BEFORE operator 195, 236
Before storing content in IDOL server 63
Belarussian encoding settings 327
BIAS field specifier 266, 269
BIF files 63, 81
BindLevel (configuration setting) 158, 159
Boolean
Agents 299
Operators 194
AND 194, 195, 236
EOR 195, 236
NOT 194, 195, 236
OR 194, 195, 236
Precedence of Boolean and Proximity
operators 195
XOR 195, 236
Search 194
Boosting result relevance 266, 269, 271
Bracketed expressions 195
Breton encoding settings 327
Building categories 144
Bulgarian ecoding settings 327
C
Canonicalization 308
CantHaveCSVs (configuration setting) 67
CantHaveFields (configuration setting) 67
Cat task 73
Catalan encoding settings 328
Categories
Activating 143
Administering 141
Building 144
Changing Fields 142
Changing term weights 143
Deactivating 143
Deleting 144
Deleting training 144
Exporting to XML 145
Matching 148
Moving 140
Replacing 143
Retraining 140
Suggesting 147
Synchronizing 145
Training 140
Viewing 141
Viewing terms and weights 142
Viewing training 142
Categorization 8, 129
CategoryActivate 143
CategoryBuild 144
CategoryCopy action 132
CategoryCreate action 130
CategoryDelete 144
CategoryDeleteTraining 144
CategoryExportToXML 145
CategoryGetDetails 141
CategoryGetHierDetails 141
CategoryGetTNW 142
CategoryGetTraining 142
CategoryImportFromCluster action 131
CategoryImportFromTopic action 131
CategoryImportFromXML 132
CategoryMove 140
CategoryQuery 148
CategoryReplace 143
CategorySetDetails 142
CategorySetTNW 143
CategorySetTraining 140
CategorySuggestFromCategory 147
CategorySuggestFromDocument 147
CategorySuggestFromText 147
CategorySyncCatDRE 145
Creating a hierarchical category structure
130
Categorizing
Data 146
Documents 73
Legacy profiles from BIF files 73
Category index 435
CategoryActivate action 143
CategoryBuild action 144
CategoryCopy action 132
CategoryCreate action 130
CategoryDelete action 144
CategoryDeleteTraining action 144
CategoryExportToXML action 145
CategoryGetDetails action 141
CategoryGetHierDetails action 141
CategoryGetTNW action 142
Page 443
Index
CategoryGetTraining action 142
CategoryImportFromCluster action 131
CategoryImportFromTopic action 131
CategoryImportFromXML action 132
CategoryMove action 140
CategoryQuery action 148, 181
CategoryReplace action 143
CategorySetDetails action 142
CategorySetTNW action 143
CategorySetTraining action 140
CategorySuggestFromCategory action 147
CategorySuggestFromDocument action 147
CategorySuggestFromText action 147
CategorySyncCatDRE action 145
CategorySyncCatDre action 56
Changing
Category fields 142
Category term weights 143
Field values in documents 375
Channels 9, 149
Emailing channel results to users 176
Setting up and using 149
channels.xss template 180, 181
CharConv (configuration setting) 313, 319
Checking
That IDOL server is running correctly 117
The indexing process 106
Chinese encoding settings 328
ClassificationServerHost (configuration setting)
177
ClassificationServerNumResults (configuration
setting) 177
ClassificationServerParams (configuration
setting) 177
ClassificationServerPort (configuration setting)
177
ClassificationServerRetries (configuration
setting) 177
ClassificationServerThreshold (configuration
setting) 177
ClassificationServerTimeout (configuration
setting) 177
ClassificationServerValues (configuration
setting) 177
Page 444
ClassificationServerXSLTemplate (configuration
setting) 177
[Cluster] configuration file section 407
ClusterCluster action 29, 36, 154, 155, 156, 160,
407
Clustering 9, 151
A large amount of data 158
A small amount of data 157
Changing the data view 159
Changing the number and size of clusters
156
ClusterCluster action 29, 36, 154, 155, 156,
160
ClusterResults action 155
ClusterServe2DMap action 29, 36, 155
ClusterSGDataGen action 29, 36, 153, 156,
160
ClusterSGDataServe action 153
ClusterSGDocsServe action 153
ClusterSGPicServe action 29, 36, 153
ClusterSnapshot action 29, 36, 152, 153,
154, 156, 160
ClusterWriteToDisk action 29, 36
Configuring clustering 156
Generating snapshots 152
Generating WhatsNew and WhatsHot
information 154
Setting up schedules 160
Spectrograph data generation 153
Very different data 159
Very similar data 158
ClusterResults action 155
Clusters 436
ClusterServe2DMap action 29, 36, 155
ClusterSGDataGen action 29, 36, 153, 156, 160,
407
ClusterSGDataServe action 153
ClusterSGDocsServe action 153
ClusterSGPicServe action 29, 36, 153
ClusterSnapshot action 29, 36, 152, 153, 154,
156, 160, 407
ClusterWriteToDisk action 29, 36
Collaboration 9, 109, 163
Community action 163
Combine action 295
Combining different query types 241
Index
Commands
Action 189
Index 84
Service 423
Community 436
Community action 163, 171
[Community] configuration file section 404
Compact (configuration setting) 377
Compacting IDOL servers Data index 376
CompactInterval (configuration setting) 377
CompactTime (configuration setting) 377
Concept summary 257, 436
Configuration
Executing changes 358
Configuration file
[Agent] section 403
[AnalysisSchedules>] section 407
ASCII versus UTF8 390
[Cluster] section 407
[Community] section 404
[Databases] section 394
[DataDRE] section 406
[DocumentTracking] section 410
[DRE] section 406
[FieldProcessing] section 395
[IndexCache] section 393
[IndexTasks] section 409
[LanguageTypes] section 413
[License] section 392
[Logging] section 411
Modifying parameter values 390
[Paths] section 394
[Profile] section 404
[ProfileNamedAreas] section 404
[Properties] section 397
[Role] section 403
[Schedule] section 395
[SectionBreaking] section 394
Sections 391
[Security] section 399
[Server] section 393
[Service] section 392
[Summary] section 395
[Synonym] section 410
[Taxonomy] section 407
[Templates] section 410
[TermCache] section 393
Page 445
Index
DiscardUnconfiguredLanguagesAtIndex
320
DiscardUnknownLanguagesAtIndex 320
DocumentTrackingActive 108
DocumentTrackingType 279
DreTemplateReferenceEnd 177
DreTemplateReferenceStart 177
EmailActionXSLTemplate 179
Encoding 325
Expire 368
ExpireDateType 279, 367
ExpireInterval 368
ExpireIntoDatabase 366, 368
ExpireTime 368
FieldCheckType 279, 292
FixedField<N> 318
FixedFieldValue<N> 318
FlattenIndexType 279
From 176
FromHost 176
FromName 176
HiddenType 279
HighlightingType 298
HighlightType 279
HyphenChars 104
IDOLserver 125
Index 70, 267, 279, 286
IndexPort 26, 33, 85, 359, 360, 361, 362,
364, 365, 367, 369, 371, 373, 376, 378,
379, 381
Interval 176
InvertedAgentType 279
KillDuplicates 105, 295
Language 325
LanguageDirectory 311, 312, 319
LanguageType 279, 314
Library 176, 179
LogTypeCSVs 416
MaxEmailsPerUser 177
MaxSyncDelay 72
MinClusterDocs 157, 159
MinWordsPerSentence 258
Module 74
Name 363
NextTask 74
NodeTableStoreContent 64
Number 366
NumberOfBackups 380
Page 446
NumClusters 159
NumDBs 363
NumericDateType 279, 288
NumericType 279, 290
OnFailureTask 74
Online help 389
ParametricType 229, 280
Port 26, 33, 62, 106
PrintType 276, 280
ProperNames 231
Property 314, 316
PropertyFieldCSVs 229, 267, 276, 288, 290,
292, 295, 298, 367, 390
PropertyMatch 65, 114, 281, 285
ProxyHost 176, 179
ProxyPassword 176, 179
ProxyPort 176, 179
ProxyUsername 176, 179
QueryClients 189
ReferenceType 280
Retries 176
RunMailer 176
SectionBreakType 280
SecurityType 280
SeedBindLevel 157, 158, 159
SeedSize 157
SendToList 128
SentenceBreaking 351
ServicePort 26, 33, 43
SleepBetweenRequests 177
SMTPHost 176, 179
SMTPPort 176, 179
Soundex 237
SourceFields 258
SourceType 280
SpellCheckCorrectMinDocOccs 255
SpellCheckIncorrectMaxDocOccs 255
SpellCheckMaxCheckTerms 255
StartingSuggestOverrideFactor 157, 158
StartTask 74
StartTime 176, 178
StripLanguage 313, 319
SynonymType 240, 280
Template 127
TermSize 349
TestUser 176, 178
TimeoutMS 176
TitleType 280
Index
Transliteration 350
TrimSpaces 280
VerboseLogging 178
Weight 267, 280
XSLTemplate 176
Configuring
Clustering 156
IDOL server 389
Connector 3, 436
Content
Indexing 83
Storing 83
Context summary 257, 436
ContextSummaryQueryTermWeight
(configuration setting) 258
Converting results to a specific encoding 322
Copying agents 123
Creating
A hierarchical category structure 130
A new database 363
Agents 121
Categories
By copying categories 132
By generating a taxonomy 132
From clusters 131
From legacy topic sets 131
From scratch 130
From XML 132
Databases 362
Users 110
Croatian encoding settings 329
Cross-lingual systems 307
Custom action 179, 180, 181
Custom emails 179
Cycles (configuration setting) 176
Czech encoding settings 329
D
Danish encoding settings 329
Data
Before indexing 63
Categorizing 146
Distributing across multiple disks 64
Indexing 83
Data index 436
Databases 437
Allocating files 65
Changing a documents database 373
Creating 362, 363
Deleting 364
Deleting all documents 365
Expiring documents 366
Exporting IDX documents 369
Exporting XML documents 371
[Databases] configuration file section 394
DatabaseType (configuration setting) 66, 279
[DataDRE] configuration file section 406
DateFormatCSVs (configuration setting) 373
Dates
Storing dates in fields 287
DateType (configuration setting) 279
Deduplication 105
Default
User 437
DefaultAddSetToReadDocuments (configuration
setting) 177
DefaultEmailFormat (configuration setting) 176
DefaultEmailResultsType (configuration setting)
176
DefaultExcludeReadDocuments (configuration
setting) 177
DefaultLanguageType (configuration setting)
311, 321, 322, 323, 324
DefaultSendEmail (configuration setting) 176
DefaultSubject (configuration setting) 176
DeferLogin (configuration setting) 111
Delayed synchronization 72
DelayedSync (configuration setting) 72
Deleting
Agents 123
Categories 144
Category training 144
Documents from IDOL server 359, 360, 365
IDOL server databases 364
Profiles 187
Deploying Retina to your application server 39
DetectLanguage action 190
DiscardUnconfiguredLanguagesAtIndex
(configuration setting) 320
DiscardUnknownLanguagesAtIndex
(configuration setting) 320
Page 447
Index
DiSH 40, 118
Displaying
Additional fields for individual queries 276
Additional fields with results 275
IDOL server license information 41
Online help 61
Distributed systems 3
Distributing IDOL server 46
Example 47
DNEAR<N> operator 195, 235, 236
Documents
Changing field values 375
Changing the index date, expire date or
database of documents 373
Deleting 359, 360, 364, 365
Expiring 366
Sectioning 433
Tracking documents through import and
indexing 108
Undeleting 361
[DocumentTracking] configuration file section
410
DocumentTrackingActive (configuration setting)
108
DocumentTrackingType (configuration setting)
279
[DRE] configuration file section 406
DREADD (index command) 84
DREADDDATA (index command) 94
DREBACKUP (index command) 378
DRECHANGEMETA (index command) 373
DRECOMPACT (index command) 376, 378, 380
DRECREATEDBASE (index command) 362
DREDELDBASE (index command) 365
DREDELETEDOC (index command) 360
DREDELETEREF (index command) 359
DREEXPIRE (index command) 366
DREEXPORTIDX (index command) 369, 371
DREINITIAL (index command) 381
DREREMOVEDBASE (index command) 364
DREREPLACE (index command) 375
DreTemplateReferenceEnd (configuration
setting) 177
DreTemplateReferenceStart (configuration
setting) 177
DREUNDELETEDOC (index command) 361
Page 448
Index
Greek 333
Greenlandic 333
Hebrew 333
Hindi 334
Hungarian 334
Icelandic 334
Indonesian 335
Italian 335
Japanese 335
Kazakh 336
Korean 336
Kurdish 336
Kyrgyz 337
Lappish 337
Latin 337
Latvian 338
Lithuanian 338
Luxembourgish 338
Macedonian 339
Malay 339
Maltese 339
Maori 340
Mongolian 340
Norwegian 340
Persian 341
Polish 341
Portuguese 341
Romanian 342
Russian 342
Serbian 342
Slovak 343
Slovenian 343
Somali 343
Sorbian 344
Spanish 344
Swahili 344
Swedish 345
Tagalog 345
Tatar 345
Thai 346
Turkish 346
Ukrainian 346
Urdu 347
Uzbek 347
Valencian 347
Vietnamese 348
Welsh 348
English encoding settings 330
Page 449
Index
Fields 279
Adding metadata to documents after
indexing 103
Agentboolean 435
Agentboolean fields 299
Associating properties with fields 281
Changing field values in documents 375
FieldCheckType fields 291
Highlight fields 297
Index fields 285, 438
Language type 318
Numerical fields 289
NumericDateType fields 287
Processing fields and documents that
contain specific fields 281
Properties 279
Reference fields 105, 272, 295, 438
Setting up
Highlight fields 297
Indexing 67
Speeding up numerical queries 289, 291
Files
Importing 83
FileWriter task 73
Finnish encoding settings 331
FixedField<N>n (configuration setting) 318
FixedFieldValue<N> (configuration setting) 318
FlattenIndexType (configuration setting) 279
French encoding settings 331
From (configuration setting) 176
FromHost (configuration setting) 176
FromName (configuration setting) 176
Functionality matrix 18
Fuzzy queries 227
G
Gaelic encoding settings 332
Galician encoding settings 332
Generating
Snapshots 152
Taxonomies 261
WhatsNew and WhatsHot information 154
German encoding settings 332
Page 450
Index
I
IAS 437
Icelandic encoding settings 334
IDOL server 3, 437
Administration 357
Backing up the Data index 378
Before storing content 63
Changing the index date, expire date or
database of documents 373
Checking that IDOL server is running
correctly 117
Clustering 151
Compacting the Data index 376
Configuration 389
Configuration file 391
Creating a new database 362
Data flow and security 5
Database 437
Deleting
A database and all the documents it
contains 364
All documents from a database 365
Documents by reference 359
Individual documents and ranges of
documents 360
Directory structure 28, 35
Distributing 46
Executing configuration changes 358
Expiring documents 366
Exporting IDX documents 369
Exporting users, roles, agents and profiles
382
Exporting XML documents 371
Functionality matrix 18
IDOL server
Profiling 10, 109, 185
Importing users, roles, agents and profiles
383
Initializing IDOL servers Data index 381
Installation 23, 25, 32
Integrating with a third party user structure
111
Introduction 7
Licensing 40, 41, 42, 43, 44
Revoking a client license 42
Modifying configuration parameter values
390
Online help 61, 389
Operations 7
Agents 8, 109, 121
Alerting 8, 109, 125
Categorization 8, 129
Channels 9, 149
Clustering 9
Collaboration 9, 109, 163
Dynamic Thesaurus 9, 165
Eduction 9
Expertise 9, 109, 171
Hyperlinking 10, 173
Mailing 10, 109, 175
Retrieval 10, 189
Spelling Correction 12, 255
Summarization 12, 257
Taxonomy generation 13, 261
Profiling 10, 109, 185
Restoring deleted documents 361
Starting 59
Stopping 60
Storing
Content 83
Users 109
System
Architecture 14
Requirements 23
System architecture 5
Upgrading to 50
Using multiple languages 307
IDOLserver (configuration setting) 125
IDX files 437
Creating 431
Import action 383
Importing
Data
Tracking documents 108
Files 83
Legacy profiles from BIF files 73
Users, roles, agents and profiles from IDOL
server 383
Index (configuration setting) 70, 267, 279, 286
Index action 56
Index commands 84
DREADD 84
DREADDDATA 94
DREBACKUP 378
DRECHANGEMETA 373
Page 451
Index
DRECOMPACT 376, 378, 380
DRECREATEDBASE 362
DREDELDBASE 365
DREDELETEDOC 360
DREDELETEREF 359
DREEXPIRE 366
DREEXPORTIDX 369, 371
DREINITIAL 381
DREREMOVEDBASE 364
DREREPLACE 375
DREUNDELETEDOC 361
Index fields 285, 438
Setting up 285
Index task 73
[IndexCache] configuration file section 393
IndexerGetStatus action 55, 106, 190
Indexing 437
Considerations 63
Content 83
Data
Checking if the indexing process was
successful 106
Tracking documents 108
Data over a socket 94
Directly indexing IDX and XML files 84
Eliminate duplicate documents 105
Fields 67
Hyphenated terms 104
Optimizing 72
Process 72
Users 109
XML attributes 69
Indexing Delayed Synchronization 72
IndexPort (configuration setting) 26, 33, 85, 359,
360, 361, 362, 364, 365, 367, 369, 371, 373,
376, 378, 379, 381
[IndexTasks] configuration file section 409
Indonesian encoding settings 335
Initializing IDOL servers Data index 381
Installing IDOL server 23, 25, 32
Windows directory structure 28, 35
Integrating with a third party user structure 111
Intellectual Asset Protection System 437
Interfaces 3
Interval (configuration setting) 176
Page 452
Introduction
IDOL server 7
InvertedAgentType (configuration setting) 279
Italian encoding settings 335
J
Japanese encoding settings 335
K
Kazakh encoding settings 336
KillDuplicates (configuration setting) 105, 295
KillDuplicates configuration parameter 295
Korean encoding settings 336
Kurdish encoding settings 336
Kyrgyz encoding settings 337
L
Language (configuration setting) 325
LanguageDirectory (configuration setting) 311,
312, 319
Languages 307, 309
Adding language type fields to documents
318
Automatic language detection 307
Canonicalization 308
Converting results to a specific encoding 322
Cross-lingual systems 307
Enabling Automatic Language Detection 320
Encoding settings 325
Encodings 308
Processing 71
Required files 325
Returning documents
In a specific language for your query
324
In multiple languages for your query 323
SentenceBreaking files 351
Settings 325
Specifying the language type of your query
321
Stemming 308
Stoplists 308, 353
TermSize setting 349
Transliteration
Schemes 308
Settings 350
Index
LanguageType (configuration setting) 279, 314
[LanguageTypes] configuration file section 413
Lappish encoding settings 337
Latin encoding settings 337
Latvian encoding settings 338
Legacy profiles 73
Library (configuration setting) 176, 179
[License] configuration file section 392
LicenseInfo action 41
Licensing 40
Displaying information 41
Forcibly revoking licenses from inaccessible
clients 43
Licensing errors 44
Revoking a client license 42
Link term 438
List action 50, 52, 190
Lithuanian encoding settings 338
Logging
Setting up log streams 384
[Logging] configuration file section 411
LogTypeCSVs (configuration setting) 416
LP task 73
Luxembourgish encoding settings 338
M
Macedonian encoding settings 339
Mailing 10, 109, 175
Templates 180
Malay encoding settings 339
Maltese encoding settings 339
Mangolian encoding settings 340
Manipulating the relevance of query results 266,
269, 271
Manually creating IDX files 431
Maori encoding settings 340
MATCH field specifier 67, 285
Matching
Categories 148
Matching documents against agentboolean
categories 300
MaxEmailsPerUser (configuration setting) 177
MaxSyncDelay (configuration setting) 72
Memory mapping 289, 291
MergeConfig (service port command) 423, 427
Metadata 279
Adding metadata to documents after
indexing 103
MinClusterDocs (configuration setting) 157, 159
MinWordsPerSentence (configuration setting)
258
Modifying field content 73
Module (configuration setting) 74
Moving categories 140
Multipliers 271
N
Name (configuration setting) 363
NEAR<N> operator 195, 235, 236
NextTask (configuration setting) 74
NodeTableStoreContent (configuration setting)
64
Norwegian encoding settings 340
NOT operator 194, 195, 236
Number (configuration setting) 366
NumberOfBackups (configuration setting) 380
Numbers
Storing numbers in fields 289
NumClusters (configuration setting) 159
NumDBs (configuration setting) 363
Numeric fields 289, 291
Numerical fields 289
NumericDateType (configuration setting) 279,
288
NumericDateType fields 287
NumericType (configuration setting) 279, 290
O
OCR task 73
ondemand.xss template 180, 181
OnFailureTask (configuration setting) 74
Online help 61, 389
Operations 7
Agents 8, 109, 121
Alerting 8, 109, 125
Categorization 8, 129
Channels 9, 149
Clustering 9
Collaboration 9, 109, 163
Dynamic Thesaurus 9, 165
Page 453
Index
Eduction 9
Expertise 9, 109, 171
Hyperlinking 10, 173
Mailing 10, 109, 175
Retrieval 10, 189
Spelling Correction 12, 255
Summarization 12, 257
Taxonomy generation 13, 261
Operators
AFTER 195, 236
AND 194, 195, 236
BEFORE 195, 236
Boolean 194
DNEAR<N> 195, 235, 236
EOR 195, 236
NEAR<N> 195, 236
NOT 194, 195, 236
OR 194, 195, 236
Precedence of Boolean and Proximity
operators 195, 236
Proximity 235
WNEAR<N> 195, 235, 236
XOR 195, 236
Optimizing
Content storage 72
Indexing 72
OR operator 194, 195, 236
P
ParagraphConcept summary 257
ParagraphContext summary 257
Parametric search 228
ParametricType (configuration setting) 229, 280
[Paths] configuration file section 394
Persian encoding settings 341
PODS 4
Polish encoding settings 341
Port (configuration setting) 26, 33, 62, 106
Portuguese encoding settings 341
Precedence of Boolean and Proximity operators
195, 236
Preventing term stemming 122
PrintType (configuration setting) 276, 280
Privilege 438
Processing data before indexing it 73
Examples 75, 76, 78, 79, 81
Page 454
Index
ProxyHost (configuration setting) 176, 179
ProxyPassword (configuration setting) 176, 179
ProxyPort (configuration setting) 176, 179
ProxyUsername (configuration setting) 176, 179
Q
Queries 438
Specifying the language type of your query
321
Query 6
Query action 165, 189, 191, 193, 194, 196, 197,
198, 199, 227, 232, 235, 237, 238, 240, 246,
247, 258, 272, 276, 297
Query results
Converting to a specific encoding 322
Displaying additional fields 276
Displaying additional fields with results 275
Filtering 272
Manipulating relevance 266, 269, 271
Relevance ranking 265
Returning documents in a specific language
324
Returning multiple languages 323
Query types
Advanced keyword 193, 232
Boolean 194
Exact Phrase 196
Field search 198
Field Text query 199
Fuzzy 227
Parametric 228
Proper Names 231
Proximity 235
Soundex 237
Synonym 238
QueryClients (configuration setting) 189
Querying
Agents 122
BIAS field specifier 269
For non-alphanumeric characters 249
Numeric fields 289, 291
With profiles 187
Quick summary 257, 438
R
Reference fields 438
Eliminate duplicate documents during
indexing 105
Filtering results at query time 272
Simultaneously using KillDuplicates and
Combine 295
ReferenceType (configuration setting) 280
Relevance ranking 265
Manipulating result relevance 266, 269, 271
Replacing categories 143
Requesting support 57
Restoring deleted documents 361
Results
Converting to a specific encoding 322
Displaying additional fields 276
Displaying additional fields with results 275
Filtering 272
Manipulating relevance 266, 269, 271
Relevance ranking 265
Returning documents in a specific language
324
Returning multiple languages 323
Retina
Deploying Retina to your application server
39
Retraining 439
Agents 122
Categories 140
Retries (configuration setting) 176
Retrieval 10, 189, 191
Advanced keyword search 193, 232
Boolean search 194
Combining different query types 241
Conceptual matching 191
Custom action 179, 180
DetectLanguage action 190
Exact Phrase search 196
Field search 198
Field Text query 199
Fuzzy query 227
GetContent action 189, 276
GetQueryTagValues action 189, 228, 230
Page 455
Index
GetStatus action 190
GetTagNames action 189
GetTagValues action 189, 228, 229, 230
Highlight action 189
IndexerGetStatus action 190
List action 190
Paramatric search 228
Precedence of Boolean and Proximity
operators 195, 236
Proper Names queries 231
Proper Names query 231
Proximity search 235
Query action 165, 189, 191, 193, 194, 196,
197, 198, 199, 227, 232, 235, 237, 238,
240, 246, 247, 272, 276
Querying for non-alphanumeric characters
249
Soundex keyword search 237
Suggest action 165, 174, 189, 191, 197, 199,
247, 272, 276
SuggestOnText action 165, 189, 191, 197,
199, 247, 276
Summarize action 189
Synonym query 238
TermGetAll action 190
TermGetBest action 189
TermGetInfo action 189
Using wildcards in queries 246
Returning documents
In a specific language for your query 324
In multiple languages for your query 323
Revoking client licenses 42, 43
[Role] configuration file section 403
RoleAdd action 110
RoleAddRoleToRole action 110
RoleAddUserToRole action 110
Roles 439
Exporting 382
Importing 383
Romanian encoding settings 342
root category 130
Route task 73
Routing documents to multiple tasks 73
RunMailer (configuration setting) 176
Page 456
Index
Setting up
Clustering schedules 160
Index fields 285
Log streams 384
Security 113
Tasks to process data before indexing 74
Sizing 46
SleepBetweenRequests (configuration setting)
177
Slovak encoding settings 343
Slovenian encoding settings 343
SMTPHost (configuration setting) 176, 179
SMTPPort (configuration setting) 176, 179
Snapshots 439
Generating 152
Somali encoding settings 343
Sorbian encoding settings 344
Soundex (configuration setting) 237
Soundex keyword search 237
SourceFields (configuration setting) 258
SourceType (configuration setting) 280
Spanish encoding settings 344
Specifying the language type of your query 321
Spectrograph data generation 153
SpellCheckCorrectMinDocOccs (configuration
setting) 255
SpellCheckIncorrectMaxDocOccs (configuration
setting) 255
SpellCheckMaxCheckTerms (configuration
setting) 255
Spelling Correction 12, 255
Starting IDOL server 59
StartingSuggestOverrideFactor (configuration
setting) 157, 158
StartTask (configuration setting) 74
StartTime (configuration setting) 176, 178
Stemming 308, 439
Tilde 122
Stop (service port command) 423, 429
Stoplists 308, 353, 439
Stopping IDOL server 60
Stopword 440
Storing fields 67
Storing Boolean agents in agentboolean fields
299
Page 457
Index
SynonymType (configuration setting) 240, 280
Syntax
Action commands 62
Index commands 84, 94
Service commands 424
System
Architecture 14
Requirements 23
T
Tagalog encoding settings 345
Tasks 73
ACI 73
Alert 73
Cat 73
Educe 73
Examples 75, 76, 78, 79, 81
FieldOp 73
FileWriter 73
HTTP 73
index 73
LP 73
OCR 73
Processing data before indexing 74
Route 73
Tatar encoding settings 345
Taxonomy 440
Generation 13, 261
Scheduling 262
TaxonomyGenerate action 132, 160,
261, 262
[Taxonomy] configuration file section 407
TaxonomyGenerate action 132, 160, 261, 262,
407
Template (configuration setting) 127
Templates 180
alertTemplate.html 127
channels.xss 180, 181
Editing mailing operation templates 181
email.xss 180, 181
ondemand.xss 180, 181
Writing templates for alert emails 127
[Templates] configuration file section 410
Term 440
[TermCache] configuration file section 393
TERMEXACTPHRASE field specifier 197
Page 458
Index
Using 246
Multiple languages 309
Encoding settings
languages 325
UTF8 390
Uzbek encoding settings 347
for
supported
V
Valencian encoding settings 347
VerboseLogging (configuration setting) 178
Vietnamese encoding settings 348
Viewing
Agent details 123
Categories 141
Category details 141
Category hierarchy details 141
Category terms and weights 142
Category training 142
Profile details 187
VQL conversion error messages 416
W
Weight (configuration setting) 267, 280
Welsh encoding settings 348
WhatsHot 154
WhatsNew 154
WILD field specifier 247
Wildcards 246
Searches in Japanese, Chinese, Korean and
Thai 248
Using 247
WNEAR<N> operator 195, 235, 236
Writing documents to disk 73
X
XML
Attributes 69
Importing 83
Indexing 83
XOR operator 195, 236
XSLTemplate (configuration setting) 176
Page 459