Discover Best Practices
Discover Best Practices
Prepared by:
Jason Normandin
Concord Technical Support
Copyright 2004 Concord Communications, Inc. eHealth, the Concord Logo, Live Health, Live Status, SystemEDGE, AdvantEDGE and/or other Concord marks or products referenced
herein are either registered trademarks or trademarks of Concord Communications, Inc. Other trademarks are the property of their respective owners.
I. INTRODUCTION ...................................................................................................................................................................3
II. PREREQUISITES...............................................................................................................................................................3
III. OVERVIEW OF THE EHEALTH DISCOVER PROCESS..............................................................................................3
1. HOW DOES THE E HEALTH DISCOVER PROCESS WORK?............................................................................................................3
2. WHAT ARE THE DIFFERENCES BETWEEN AD-HOC AND SCHEDULED DISCOVERIES? ...................................................................5
3. HOW DO DISCOVERIES IMPACT MY LICENSE CONSUMPTION? ..................................................................................................6
4. EXPLANATION OF THE E HEALTH MERGE ALGORITHM .............................................................................................................7
5. HOW DOES SELECTING THE MIB2 OPTION IMPACT MY DISCOVERY RESULTS?.......................................................................9
IV. TROUBLESHOOTING COMMON DISCOVERY ISSUES .............................................................................................9
1. TROUBLESHOOTING NO RESPONSE TO SNMP OR NO RESPONSE TO PING ERRORS ...............................................................9
2. TROUBLESHOOTING NO MIB S UPPORT FOR THIS AGENT ERRORS ........................................................................................11
3. TROUBLESHOOTING SYSTEMEDGE DISCOVERY ISSUES ........................................................................................................11
4. RECONCILING AND AVOIDING DUPLICATE ELEMENT CREATION ............................................................................................13
V. GENERAL DISCOVERY BEST PRACTICES....................................................................................................................14
1. AVOIDING DUPLICATE ELEMENTS BY IMPLEMENTING A STRONG CHANGE CONTROL PROCESS ...............................................14
2. MINIMIZING DATA LOSS THROUGH STATISTICS POLLER ERROR ANALYSIS AND NODBDATAFOR TOOLS .................................14
3. USING SEED FILES TO AUTOMATE INCREMENTAL CONFIGURATION UPDATES ...........................................................................15
4. SELF MONITORING THE E HEALTH SYSTEM USING PROCESS SET CREATION ............................................................................15
5. EFFECTIVELY INTERFACING WITH CONCORD TECHNICAL SUPPORT TO RESOLVE DISCOVERY ISSUES .......................................17
VI. CHANGES TO THE DISCOVERY PROCESS IN EHEALTH 5.6.X .............................................................................17
VII. OTHER RESOURCES......................................................................................................................................................18
The eHealth discover process is an integral piece of a successful eHealth implementation. The eHealth
discover mechanism is the main avenue to integrate existing network devices into the eHealth Fault and
Performance management environment. Through that process, network devices are added to the eHealth
configuration and provide the user with the data necessary to successfully manage their network infrastructure and
to maximize the benefit of the eHealth suite
Although relatively simple, the eHealth discover process does require a hands on approach to ensure success.
This document will provide the reader with the knowledge and tools necessary to ensure success managing that
process.
II. Prerequisites
This document is not intended as a replacement for the standard eHealth suite documentation such as the
Administration Guide or Users Guide. This document should be used in conjunction with the existing eHealth
manuals and Concord Knowledgebase. Additionally, the reader should possess a basic understanding of the
eHealth application, an understanding of their network infrastructure, and a basic understanding of SNMP and
device MIBs.
1. The finder process searches the network for everything it can find within preset limits. The preset
limits, for either a scheduled or an interactive discovery, are determined by the user and are defined by
three major categories. These include IP addresses to search, technology type and community string.
The finder is a program written in the TCL ("Tool Control Language") scripting language. TCL is an
interpreted language and like any interpreted language tcl requires a runtime interpreter. The tcl
interpreter and related libraries are included in the $NH_HOME/bin/sys directory.
Finder is composed of several logical pieces that operate sequentially to perform one primary mission: the
creation of poll records in the $NH_HOME/poller/poller.cfg file. Finder is never run directly; rather it is
called from other scripts or programs (depending on the operating system), which check environment, set
variables, etc.
First, finder queries the sysObjectID of the device-in-question (DiQ). The sysObjectID is an entry in the
MIB2 system table that identifies the vendor who wrote and/or implemented the MIB being queried.
Depending on the object class being discovered (i.e. LAN/WAN, Router, Probe, or Server) the finder will
go to the class's main table and iterate through the list of possible OIDs until it finds a match for the
sysObjectID retrieved from the DiQ's MIB.
If a match is found, the table will then tell finder where to go next. In the case of LAN/WAN, the main
table will point finder toward a vendor-specific (or perhaps IETF standard) algorithm and a vendor-
specific (or IETF standard) interface table to be used as input to that algorithm. In this way finder can
cover any situation where a device is supported by standard MIBs, such as an RMON probe, or vendor-
Next, finder uses a collection of tables for interface types. These tables are used to choose the interface
types (ifType in MIB2) that will be added to the poller configuration. We generally do not want to use all
entries in the ifTable, as some entries are not relevant to eHealth. For example if we are discovering an
RMON probe for ethernet statistics we most likely do not want to discover the out-of-band (OOB) 9600
bps SLIP port, which is also in the MIB2 ifTable. So these tables are used as kind of an inclusive decision
filter to pass only the types of interfaces that we want downstream to the algorithm that will be used to
generate poll records.
Finder also looks at the ifAdminStatus or the ifOperstatus to determine if the device/interface is down or
up. In most cases the ifAdminStatus is used. There are a few cases where the ifOperStatus is used instead.
This is determined in the finder.tcl.
ifOperStatus is the actual electrical connection of device(plugged in, not plugged in)
ifAdminStatus is the desired position that the administrator chooses.
both up = active and discoverable.
both down = not active and not discoverable.
ifOperStatus = up, ifAdminStatus = down = active but not discoverable
ifOperStatus down, ifAdminStatus up = not active but admin does want it to be discovered
Once an interface has passed through this table it is passed off to an algorithm to generate the actual poll
record (poller entry). In some cases the algorithm will perform some additional exclusive interface-type
filtering. Sometimes only a single interface entry is sent, and we iterate through an array of entries, and
other times we go interface by interface and generate a poll record on each one. It depends on whether we
are doing standard support or enterprise-specific support and how many interfaces exist on the device.
Sometimes the enterprise support is extremely easy, using a simple table and the standard algorithm and
sometimes it is quite complex. It all depends on the complexity of the MIB implementation and the
statistics that the customer is requesting. If there is a requirement to cross reference variables from one
table to others, the algorithm can be quite intricate.
For example an RMON probe is quite simple. A stand-alone probe will typically populate its MIB2
ifTable with one or more ethernet ports and an OOB SLIP port. We filter out everything but the ethernet
ports and send them down to the standard ifTable algorithm, which generates poll records for RMON
"etherstats" elements.
An example of more complex support is the Bay 5000 chassis. Like most chassis designs the Bay 5000
can accept many types of blades - ethernet, token-ring, FDDI, ATM, management, etc. - each with
different capabilities and/or numbers of ports. The management software (i.e. Optivity) allows the user to
define logical groups of ports (virtual LANs if you will), which are either partitioned from the rest of the
network, or connected to other ports on another card or chassis. In order to provide utilization statistics
for all of these complex blade types and virtual LANs that the user can build, the vendor had to come up
with a very complex group of MIBs. Consequently, finder needs to sort through this jungle of options and
is rather involved.
Based on the above information, finder assigns an agent type to each element found on the device. The
agent type is associated with a MIB translation file in the $NH_HOME/poller directory. A list of these
associations may be found in $NH_HOME/poller/agent.types.
2. The newly created DCI file is passed through the eHealth Merge Algorythm.
For more information on the eHealth merge process, please see section 3.4, Explanation of the eHealth
Merge Algorithm.
3. The new elements or updates to existing elements are saved to the eHealth configuration.
For more information on the save process, please see section 3.2, What are the Differences Between Ad-
hoc and Scheduled Discoveries?
Scheduled and Ad-hoc discoveries perform the same duties with the exception of how/when the results
are saved to the eHealth configuration. The scheduled discovery can be configured to either save the
discovery results or simply log the results for review by the eHealth Administrator for a later discovery.
If the scheduled discover is configured to simply log the results, the eHealth administrator should review
the changes logged and re-run the discovery to actually save the results at a later time.
During the scheduled discovery where the job is configured to save the results, the merge process and the
save process take place at the same time in the config server. This is due to the fact that the scheduled
discovery does not allow the user to review the discovery information before saving it to the database.
In contrast, during an interactive discovery, eHealth gives the user the option to edit the findings before
saving. By selecting "Edit before Save", all new elements found are brought up in the poller configuration
editor. Here the user may modify the information found by the finder. A DCI file is generated from the
save process which contains the original discovery information along with the modifications made by the
user through the "Edit before Save". This DCI file is then sent to the config server to update the poller
configuration/database.
The interactive discovery has been engineered to be the more aggressive tool. As the user is allowed to
edit the findings before committing them to the database, the user has more control over what will be
polled on the network and how it is polled.
The eHealth Discover logfiles are an invaluable tool to better manage the Discovery process. The logfile
created by eHealth will vary depending on the type of discovery run.
discoverInteractive.mm.dd.yyyy.nnnnnn.log
If the adhoc/interactive results are not saved, a .unsaved will be appended to the log file name.
A poller audit log will also be created in the $NH_HOME/log directory. This log contains a listing of all
of the changes made to the poller configuration when the results are saved. These log files have the
following format:
pollerAudit.date.time.log
For scheduled discoveries with the 'Save Results' option selected, a discover..log will be created in the
$NH_HOME/log directory which contains the information which would have been displayed in the
Discover UI if this discover was run interactively.
Like the adhoc discovery above, a discoverResults.log and pollerAudit log will be created containing the
same information as documented above.
The scheduled discovery process will also create a discoverScheduled log. This log file has the same
naming convention and contents as the discoverInteractive log described above.
For scheduled discoveries with the 'Report only' option selected, the same log files will be created
containing the same information as the scheduled discovery with the 'Save Results' option selected with
the exception of the pollerAudit log. Since no changes are being made to the configuration, this log is not
created as that operation is not performed.
If the LAN/WAN option was selected, or the 'Include in LAN/WAN reports' option selected within the
poller configuration UI then the interfaces would be actively polled to report individual statistics and
therefore a poller license would be consumed for each respective interface.
The same scenario exists during Server discoveries. Several disks, partitions, CPU's, etc. may be
discovered and actively polled, but once again these elements simply provide aggregate variables to the
parent Server element and therefore do not consume a poller license. The lan/wan elements of a server
would be subjected to the same scenario as described in the above Router discovery example.
Turning off polling for an aggregated element will not impact the total available licenses, while disabling
polling for non-aggregated elements will impact the total available licenses. It must be noted however that
disabling polling for aggregate elements will impact the total statistics reported by the parent device.
Other element types such as RAS and Process Sets share similar parent child relationships and license
usage of those technologies will be similar to as described above.
In addition, weighted licensing of certain Technology Types such as Wireless Access Points and Mobile
Wireless devices will affect license consumption. Weighted licensing simply indicates that certain
element types will consume more then 1 license per element. For example, PDSN elements will utilize
1000 statistical licenses per elements. This is due to the amount of information that 1 PDSN element
provides.
However, if the element has a positive number next to it, then it does use a license. If the same number is
next to several elements, then all of those elements only use one license. The numbers will increment with
each license used, so the bottom number is the total licenses being used.
For example, the following is the output from the nhListElementLicenses command:
2 sysName-SH
2 sysName-SH-/
2 sysName-SH-/export
2 sysName-SH-/opt
2 sysName-SH-/tmp
2 sysName-SH-/var
2 sysName-SH-/var/run
2 sysName-SH-Cpu-1
2 sysName-SH-disk-dad0c0t0d0s0
2 sysName-SH-disk-dad1c0t2d0s0
2 sysName-SH-disk-sd0c0t1d0s0
3 sysName-SH-enet-port-2
In this case, one license is being used by all the elements, except sysName-SH-enet-port-2. This element
utilizes its own license as can be seen by the incrementing of the count to three. Each additional element
that uses its own license will be incremented by one.
The merge process is invoked after the discovery process is finished creating an incoming DCI file. The
following is the DCI attributes search order used to determine if a discovered element is a "resolved
updated", "unresolved update" or a "new element":
1. nmsSource
o The default nmsSource is NH:DISCOVER, and it is hard coded in the discovery
process
o Integration modules and Application Response elements have a different nmsSource
o The matching search is limited to those elements having the
same nmsSource and nmsId not empty, if a match is found, move to item 2,
otherwise a "new element" is created.
o If nmsId is empty, move to item 3
2. uniqueDeviceId
o Unique attribute for each device in the network used to distinguish one from another
o Assigned by the finder upon discovery
o By default, it is set to the lowest MAC address found in the device
1. nmsSource
o The default nmsSource is NH:DISCOVER, it is hardcoded in the discovery process
o Integration modules and Application response have a different nmsSource
o The matching search is limited to those elements having the same the same
nmsSource
o If a match is found, move to item 2, otherwise a "new element" is created
2. deviceHashKey
o New DCI field added in eHealth 5.6
o NOT visible from the GUI, only through DCI
o Assigned during the merge to uniquely identify each device within the configuration
o The matching search is limited to those elements having the same deviceHashKey
o If a match is found, move to item3, otherwise a "new element" is created.
o The following attributes are used to determine the uniqueness of the device:
uniqueDeviceId
ipAddress
sysName
ifPhysicalAddress cloud (List of all the physical addresses in the device)
ifIpAddress cloud (List of all the ip addresses in the device)
3. UDP Port, SNMP enterprise ID, parent mtf (if any)
o Used to identify multiple SNMP agents running in the same host
The merge algorithm was rewritten in eHealth release 5.6 to perform a more reliable comparison with
existing and new elements. This new algorithm greatly reduces the likelihood of duplicate element or
unresolved new elements being created during the merge. To further reduce the likelihood of duplicate
element creation, please refer to section 4.4, Reconciling and Avoiding Duplicate Element Creation.
The Find MIB2 LAN option allows the finder to locate LAN interfaces which only contain basic MIB2
statistics such as In/Out/Total packets. This method of discovery is useful when a device has an
uncertified SNMP agent installed. When discovering this device, eHealth will generate a basic element
which will allow for reporting of availability and basic packet count information. This option is not
recommended for devices running a certified firmware version as the vendor specific interface will be
discovered allowing for a more robust reporting solution.
1. The device is unable to respond to ping or responds to ping outside of the timeout threshold due to
network load.
Ping the device from the command line using the configured eHealth Ping packet size (default =
100 bytes). There are 3 steps that can be taken to resolve this issue:
1. Ensure the device is able to respond to ping and attempt to reduce the load by discovering
during off-peak hours.
2. Disable the discovery ping as described in section 4.1.2
3. Increase the timeout as described in section 4.1.3
2. The device is unable to respond to ping due to protocol restrictions placed on the device or the network
segment on which the device resides.
If a device is unable to respond to ping due to configuration restrictions, the discover ping can be
disabled via the NH_DISCOVER_DISABLE_PING variable. When this variable is set to yes,
3. Either network latency or load on the target device caused the SNMP request to either be dropped by
the device or received/transmitted outside the threshold of the discover timeout.
The NH_DISCOVER_TIMEOUT environment variable specifies the time in seconds that the
discover process waits for a ping response and an SNMP response from a device. Increasing the
value of this variable will allow eHealth to wait longer for device responses. The default value for
the NH_DISCOVER_TIMEOUT variable is equal to 1 second.
To determine the most appropriate 'timeout' value, perform a discovery from the command line:
*NOTE: Command line discovery results are output to the display (or a file) and not saved to the
poller configuration and database.
As the $NH_USER:
where:
mode = "lanWan", "router/Switch", "dialog", "server", "application", "modemPool",
"ras", "respelements"
Once the minimum timeout has been determined, modify the setting of the
NH_DISCOVER_TIMEOUT variable to that value.
Verify that the SNMP agent is properly configured by obtaining a MIB dump of the device using
the nhSnmpTool utility.
5. The port on which the SNMP agent on the target device is running is not configured in the following
eHealth variables:
NH_DISCOVER_PORTS
NH_DISCOVER_SERVER_PORTS
NH_DISCOVER_APPLICATION_PORTS
NH_DISCOVER_RESPONSE_PORTS
Determine the port on which the SNMP agent is running and add that port number to the
appropriate NH_DISCOVER_* variable(s).
This error message indicates that the finder process was unable to successfully match the agent in
question with the coded list of supported agents. Verify that the device in question is infact certified via
the Concord Communication Device Certification matrix:
http://www.concord.com/devices/html/default.html
1. Submit a certification request to have the device agent reviewed for certification via:
http://license.concord.com/custserv/certification.htm
Additional information on the Concord Communications certification policy can be found at:
http://www.concord.com/devices/cert_policy.asp
2. Rediscover the device using the Find MIB2 Lans option to attempt discovery of any MIB2
Lan ports on the device. See section 3.5 for additional information regarding this option.
b. UNIX: examine the system log for errors relating to SystemEdge licensing.
a. UNIX: Use the 'ps ef | grep sysedge' command to locate process and port entry
b. Windows: Sysedge runs as a sub-agent of the Windows master SNMP agent. This will
usually be port 161, but can be verified in the winnt/system32/dirvers/etc/services file.
Example:
snmprecv timeout
b. Use the sysvariable command from the eHealth system using the SystemEdge systems
IP address.
c. Use the walktree command to verify valid SNMP communication from agent to
eHealth.
UNIX: examine the appropriate resource file for correct discover port entry
b. Determine if agent can be discovered using command line with forced set timeouts.
d. Res.force.log: DCI formatted output of timeout and retry increase server discovery
In most cases, the rediscovery of existing elements will result in resolved updates.
However, network environments are always changing and this creates a chance of getting duplicate
elements when the merge algorithm fails to resolve an update because of differences between the original
and newly discovered element's attributes. A duplicate element is simply an element where the eHealth
element naming convention duplicates an already existing element name. eHealth will attempt to ensure
uniqueness by appending a A (or B,-C etc.) to the newly found elements name.
In order to minimize this possibility, we recommend rediscovering the elements within the eHealth
configuration on a regular basis. This will limit the amount of updates that occur by minimizing the time
between updates.
In case of duplicate creation, examine the elements (original and duplicate) and determine if eHealth
should have merged those elements into one, once that assessment has been made, take note of the
following attributes of the duplicate element from the eHealth Discover UI:
Hardware ID (uniqueDeviceId)
Discover Key (nmsId)
System Name (sysName)
Agent Type (mibTranslationFile)
First delete the new element and update the original with these attributes, then rediscover to update all
other attributes.
It is strongly suggested that prior to a device change occuring, the eHealth elements associated with that
device be rediscovered. This will ensure the eHealth device configuration is current prior to the change
occuring. After the device change has been made, and additional rediscovery should be performed to
ensure that the new configuration is updated within the eHealth configuration.
This method will minimize the number of device changes detected by eHealth at one time thereby
minimizing the chance for duplicate element creation. For additional information on this topic, please
view sections 3.4 and 4.4.
The eHealth installation includes valuable tools such as the nhListElements command to assist in
configuration management. That utility includes a noDbDataFor flag which creates a list of all elements
that have not reported data in the configured amount of time.
This usually means that an element either has polling disabled or is experiencing polling errors which are
causing eHealth to not insert data into the database for that element. That list can be used as a to-do list
of elements which should be rediscovered or investigated further. The rediscovery should resolve any
conflicts which may be causing the polling errors in question.
nhListElements
The nhListElements command displays a simple list of eHealth element names using selected
criteria. You can use arguments to filter the list and create specific lists of elements. You can also
redirect the output of this command as input to other commands to modify your poller
configuration file, such as nhModifyElements, nhDeleteElements and nhPopulateGroup.
Syntax The nhListElements command uses the following syntax:
nhListElements [-h] [-rev] [-showTypes] [-showDciFields]
nhListElements [-elements] [-outfile filename]
nhListElements -rebooted [-outfile filename]
nhListElements -where "whereClause" [-outfile filename]
nhListElements -elemType type [-outfile filename]
-noDbDataFor hours
Lists only those elements for which eHealth has not collected data and added it to the database for
the number of hours specified, and for which there is not any alarm data. This command allows
you to produce a list of elements that eHealth is not currently polling, or elements that it is
currently polling but that have poll errors. You cannot use this argument in combination with any
other nhListElements argument except for -outfile. You cannot use this argument on the central
site to return data in a remote polling environment. You must run it on the remote systems.
NOTE: in eHealth 5.0.2 and prior, only elements with polling turned on would be output from the
nhListElements command
The eHealth discover mechanism allows for the use of seed files during the discovery process. These
seed files are simply a text file containing a list of IP address, community string combinations. This
provides the eHealth Administrator with an easy way to discover groups of elements and to automate the
discover process.
Example:
# Server 1
10.100.10.32 private
# Server 2
10.100.10.33 public
Seed files should also contain like technology types to ensure the discover is run against the correct
technology and there are no mismatches. For example, a router discovery of a server may actually
produce a router element as servers can act as a routing device.
The eHealth poller configuration file ($NH_HOME/poller/poller.cfg) can also be used as a rediscovery
seed file but this is only recommended for small configurations. Larger configurations should not utilize
this method as a rediscovery of the entire configuration causes a severe performance impact to the
eHealth server. It is recommended that the rediscovery target a portion of the configuration when using
seed files in large configurations.
Process Sets are created via the eHealth discover UI via the Find Processes > Define option. Two new
process sets should be created for eHealth and the Database using the following processes and
parameters:
nhiArControl
nhiCfgServer
nhiConsole
nhiDbServer
nhiLiveExSvr
nhiMsgServer
nhiNotifierSvr
nhiPoller
Arguments: none
nhiPoller
Arguments: -live
nhiPoller
Arguments: -dlg
nhiPoller
Arguments: -import
nhiRespServer
nhiServer
nhiTrapServerCmu
o ora_arc0_EHEALTH
o ora_arc1_EHEALTH
o ora_ckpt_EHEALTH
o ora_dbw0_EHEALTH
o ora_lgwr_EHEALTH
o ora_pmon_EHEALTH
o ora_reco_EHEALTH
o dmfacp
o iigcn
o iidbms
Argument: recovery
o Iidbms
Argument: dbms
Each process should have the create if found flag set, match full name set, and the appropriate Operating
System set.
Once the process set has been defined, the eHealth server should be discovered using the read-write
community string to create the appropriate MIB rows. Once discovered, the Record Detailed Data
option can be enabled to allow for individual process data to be reported along with aggregate process set
data.
When the situation arises that it is necessary to contact Concord Technical Support, it is important to
provide Support with the information necessary to troubleshoot the issue. When dealing with Discovery
issues, the following information is often vital to the troubleshooting process:
Although the above information and files may appear to be unrelated, in the majority of instances this
information is required during the troubleshooting process. Providing this information to the Technical
Support Engineer when initially contact Concord Technical Support, will dramatically reduce the time
taken in obtaining all necessary information to resolve the issue.
The main change in the Discover process in eHealth 5.6 is the changes made to the merge algorithm.
eHealth no longer utilizes an elements discoverKey to determine uniqueness but now relies on a
deviceHashKey. This new key allows for a greater level of accuracy when determining if changes to a
device constitute a new element or an update to an existing element. Additional information on the
discover algorithm changes can be found in section 3.4, Explanation of the eHealth Merge Algorithm.
In addition to this document, there are many other resources available to the eHealth Administrator to
assist in the management of the Discover process and the eHealth element configuration. These include,
but are not limited to:
PrimusTrain202
TS2906
TS15008
TS13051
TS11641
TS13242
PrimusTrain195
PrimusTrain90
TS11673
TS15008
TS4602
TS13577
TS14359