03 - 01 - RN33153EN20GLA0 - Alarms Hadling PDF
03 - 01 - RN33153EN20GLA0 - Alarms Hadling PDF
Alarms Handling
RN33153EN20GLA0 1
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 2
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 3
RNC Maintenance and Fault Management in RNC
Objectives
RN33153EN20GLA0 4
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 5
RNC Maintenance and Fault Management in RNC
• Create disaster recovery and escalation plans to enable the fast and
correct handling of problems.
• Create a plan for preventive maintenance, for example by using daily,
weekly and bi-annual routines as a basis.
• Define maintenance entities suitable for your network structure.
• Plan all maintenance activities well beforehand.
For more information on developing a maintenance strategy and organization, you can
refer to the following ITU-T recommendations for example:
Recommendation M.10 (10/92) - Scope and application of Recommendations
for maintenance of telecommunication networks and services
Recommendation M.20 (10/92) - Maintenance philosophy for telecommunication
networks
Recommendation M.21 (10/92) - Maintenance philosophy for telecommunication
services
RN33153EN20GLA0 6
RNC Maintenance and Fault Management in RNC
The modular structure of network elements ensures not only fault tolerance but also ease
of maintenance. If a plug-in unit becomes faulty, you can replace it with a new plug-in
unit; the redundancy schemes ensure that the network element remains operational even
in fault situations.
Maintenance system consists of automatic functions that are integrated into the network
element and functions that are carried out by the operation and maintenance personnel.
Preventive maintenance applies to the computer units and peripheral devices of network
elements. In a digital network element, the need for preventive maintenance actions is
limited and mainly associated with peripherals.
The purpose of preventive maintenance is to prevent equipment from malfunctioning and
wearing out.
The alarm and diagnostic reports generated by the system provide you with information
on the network element's state.
RN33153EN20GLA0 7
RNC Maintenance and Fault Management in RNC
Failure
• Processing of
alarms events
Preventive maintenance applies to the computer units and peripheral devices of network
elements. In a digital network element, the need for preventive maintenance actions is
limited and mainly associated with peripherals. The purpose of preventive maintenance
is to prevent equipment from malfunctioning and wearing out.
The alarm and diagnostic reports generated by the system provide you with information
on the network element's state.
RN33153EN20GLA0 8
RNC Maintenance and Fault Management in RNC
Supervision System
Fault / disturbance
observation
Supervision is responsible for fault detection
Failure and uses both hardware and software for
this purpose.
Supervision Alarm
System System
• Hardware supervision
• Software supervision
•Fault • Time supervision
detection • Supervision of semipermanent connections
Supervision functions
Supervision is responsible for fault detection and uses both hardware and software forthis
purpose.
RN33153EN20GLA0 9
RNC Maintenance and Fault Management in RNC
Activation of
recovery
Alarm functions
The system has the following
Alarm alarm functions:
System
• Collection of alarm data
• Storage of alarms
• Alarm • Output of alarms
printouts • Control of alarm outputs
• Activation of recovery functions
when a unit fails
• Processing of
• Informing about alarms
alarms events
Alarm functions
Alarm functions have a user interface with which you can set alarm parameters,
examine the alarm situation and alarm history, and define handling rules for new events.
RN33153EN20GLA0 10
RNC Maintenance and Fault Management in RNC
Activation of
Recovery includes the following fault location
functions:
Recovery functions
Recovery is responsible for eliminating the effects of faults on the operation of the
system. For this purpose, it uses unit switchovers and different kinds of restarts.
Individual program blocks can be restarted by a local unit state administration function
(part of the recovery) when the program blocks do not respond to supervision messages.
The recovery or the hardware watchdog timer can restart any computer unit,
without restarting the whole system.
Recovery includes the following functions:
• Restarting program blocks, computers, preprocessors, and the whole system
• Automatic recovery from faults
• A user interface, for example, for manual recovery
RN33153EN20GLA0 11
RNC Maintenance and Fault Management in RNC
Activation of
fault location
•Recovery starts diagnostic programs
automatically.
Diagnostics
•Diagnostic programs are executed on System
functional units which are in the test
state (TE-EX).
• Fault
location
Diagnostic functions
Diagnostic programs are able to localise faults in the system to an accuracy of one plugin
unit in 70% of all cases, and four plug-in units in 95% of the cases.
The system complies with the ITU-T requirement on the average active repair time of 30
minutes.
Recovery starts diagnostic programs automatically.
Diagnostic programs are executed on functional units which are in the test state (TE-EX).
The same programs can be activated with the commands of the UD MML command
group.
Plug-in units can normally be removed from and inserted in functional units which are in
the separated (SE-OU or SE-NH) state, without disconnecting the power.
The fault diagnosis relies on the hardware configuration data which is stored in the
configuration database, and nonconformity between the database. The actual hardware
situation usually leads to a diagnosis report. In this case, the fault localisation function
may not be able to complete its actual task properly.
RN33153EN20GLA0 12
RNC Maintenance and Fault Management in RNC
Steps
1 Monitor the alarms daily as described in Monitoring alarms daily
2 Check the working states of computer units
For further information, see Working state change
3 Make fallback copies as described in Fallback copying the software build
4 Verify the fallback build as described in Verifying fallback build
5 Make backup copies of the fallback copy as described in Backup copying FB buildto
optical disk
RN33153EN20GLA0 13
RNC Maintenance and Fault Management in RNC
Steps
1 Monitor the signalling network
For more information, see Activating and deactivating SS7 signalling measurement
reporting, Activating SS7 signalling interval log reporting, Configuring MTP's traffic
measurement
matrix, Interrogating SS7 signalling measurements with MML commands
2 Print and save the working states
For further information, see Working state change fails
3 Display blocked alarms as described in Displaying alarm blocking data weekly
4 Monitor the use of memory as described in Measuring long-period memory usage
RN33153EN20GLA0 14
RNC Maintenance and Fault Management in RNC
• Diagnostics
– Run fault diagnostics
Steps
1 Set daylight saving time as described in Setting summer time on or off
2 Run fault diagnostics as described in Inspecting unit diagnostics and working
states
Further information
Clean the magneto optical disks periodically to ensure satisfactory operation. Do this
according to the magneto disk manufacturer's recommendatio
RN33153EN20GLA0 15
RNC Maintenance and Fault Management in RNC
Proactive Maintenance
• Monitor Radio Network
– Configuration, counter, alarm data
– Critical resource, mobility action, connection quality
RN33153EN20GLA0 16
RNC Maintenance and Fault Management in RNC
Supervision System by
Hardware Management System (HMS) IPA 2800
The Hardware Management System (HMS) provides a duplicated serial bus between the
master node (located in the OMU) and every plug-in unit in the system. The bus provides fault
tolerant message transfer facility between plug-in units and HMS master node.
The HMS is used in supporting auto-configuration, collecting fault data from plug-in units and
auxiliary equipment, collecting condition data external to network element and setting hardware
control signals like restart and state control in plug-in units.
The hardware management system is robust. For example, it is independent of system timing and
it can read hardware alarms even from a plug-in unit without power. Thus it allows power alarms
and remote power on/off switching function.
The hardware management system forms a hierarchical network (see above). The duplicated
master network connects the master node with the bridge node of each subrack. The subrack
level networks connect the bridge node with each plug-in unit in the subrack.
In system set-up: Collecting data of existing racks and subracks and their indexes. Collecting
equipping data of every subrack, plug-in unit types, their variants and versions.
In normal use: Collecting fault data from plug-in units and auxiliary equipment and condition data
external to network element. Setting hardware control signals like restart and state control in plug-
in units. Reading and writing of any configuration data stored in non volatile memory of unit
computer or control computer to check hardware configuration and to set communication
channels between computers for ATM based message passing via the ATM switch fabric.
Offering protected message link between HMS master node and an unit having no connectivity via
the ATM switch fabric. Controlling auxiliary equipment like fan trays and AC/DC power supplies.
Controlling equipment and space external to network element like door locking, temperature and
other states, alarms and conditions of street side cabinets, wall mounting cabinets and
telecommunication sites or other installation space.
In maintenance operations: Detecting extracting and inserting a plug-in unit. Identifying inserted
plug-in unit. Setting automatically configuration data of an inserted plug-in unit.
Under severe fault conditions: HMS is able to read hardware alarms from a plug-in unit without
power feed. HMS performs all its tasks even when system timing is missing.
RN33153EN20GLA0 17
RNC Maintenance and Fault Management in RNC
FTM AXC
UI UI UI
RN33153EN20GLA0 18
RNC Maintenance and Fault Management in RNC
Network element
Platform Network
Alarm Element
System Level Alarm
Mediating NetAct
Alarm
System Function
(NELAS)
Appl Level
Alarm Sys
RN33153EN20GLA0 19
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 20
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 21
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 22
RNC Maintenance and Fault Management in RNC
RNC OMS
RNCs can generate large numbers of alarms. For alarms that are repeated or alarms
that follow certain patterns, you can reduce the number of alarms to a more manageable level
by adjusting the setting delay, cancelling delay, or alive time.
With the History Analysis Graph you can analyze alarm history data to find and set
optimum delay values for alarms. The analysis of alarm history data allows you to
forecast how much the alarm count is reduced the next time a similar fault situation
occurs. The history data is shown on the History Analysis Graph, and you can adjust the
setting delay, cancelling delay, or alive time to see how the forecast changes.
For more information about the setting delay, cancelling delay, and alive time alarm
parameters, see OMS alarm parameter values.
RN33153EN20GLA0 23
RNC Maintenance and Fault Management in RNC
An alarm marked with two asterisks does not threaten the operation of
** the system. However, if the fault occurs during working hours, it must
be corrected at once. If the fault occurs outside working hours, it can
be corrected the next day
Alarms marked with three asterisks require immediate action from the
user. An alarm like this is set when the system has become faulty to
*** the extent that some essential functionality of the system has
stopped, or is in danger of stopping. The maintenance personnel must
take immediate action
RN33153EN20GLA0 24
RNC Maintenance and Fault Management in RNC
• Alarm outputs are used for indicating 3=***QUAL ***quality of service alarms
the overall alarm situation in the 4=***ENVIR ***environmental alarms
system
5=General All two star (**) alarms
• By default alarm outputs #0...#5 are
controlled according to the live alarm 15=Alarm output All *, ** and *** alarms
situation controlled on
• Alarm output #15 is by default alarm class basis
controlled on the basis of all alarms of
type ***ALARM, **ALARM, and
*ALARM
RN33153EN20GLA0 25
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 26
RNC Maintenance and Fault Management in RNC
4. Recovery information
This field displays *RCY*, when the alarm system informs the recovery of the alarm. After
this, the recovery starts automatic recovery actions for the object of the alarm.
5. Date
The setting or cancellation date of the alarm. Only month and day are displayed.
6. Time
The setting or cancellation time of the alarm.
7. Urgency level
Alarms are classified according to their criticality from the user's point of view:
*** requires immediate actions
** requires actions during normal working hours
* normally no actions required
The urgency level is output in all alarm printouts except notices (NOTICE). The urgency
levels in alarm cancellation printouts are indicated by dots (.) instead of asterisks (*).
8. Alarm number
An unambiguous identifier of an alarm. The alarm number is also a search index for the
alarm description.
9. Alarm text
A short description of the event that caused the alarm.
RN33153EN20GLA0 27
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 28
RNC Maintenance and Fault Management in RNC
The location information displayed in the alarm printout does not necessarily contain all
the parts mentioned above. Its accuracy is determined on the basis of the hardware
component which contains the object of the alarm. For example, if the object of the alarm
consists of more than one plug-in unit, the PPA number is not displayed, or if an alarm
concerns the whole cabinet, only RI (or C) is displayed. An unknown location information
is printed as ?????-??-??
6. Alarm event type
Categorization of the alarm for the network management system (NMS). Five basic
categories are specified according to the CCITT Rec. X.733:
COMM communications alarm. Principally associated with the procedures
and/or processes required to convey information from point to point
QUAL quality of service alarm. Principally associated with a degradation
in the quality of service
PROCES processing error alarm. Principally associated with a software or
processing fault
EQUIPM equipment alarm. Principally associated with an equipment fault
ENVIR environmental alarm (= external alarm). Principally associated with
a condition relating to an enclosure in which the equipment resides
7. Recovery information
This field displays *RCY* when the alarm system informs the recovery of the alarm. After
this, the recovery starts automatic recovery actions for the object of the alarm.
8. Date
Setting or cancellation date of the alarm
9. Time
Setting or cancellation time of the alarm
RN33153EN20GLA0 29
RNC Maintenance and Fault Management in RNC
The urgency level is output in all alarm printouts except notices (NOTICE). The urgency levels in
alarm cancellation printouts are indicated by dots (.) instead of asterisks (*).
11. Printout type:
ALARM fault situation
CANCEL fault terminated
DISTUR disturbance
NOTICE notice
RN33153EN20GLA0 30
RNC Maintenance and Fault Management in RNC
In addition, the value of a certain field (for example, the index of a functional unit or the index of
a plug-in unit) can be displayed as two dots (..).In this situation the field cannot be given a
single value according to its meaning.
If the amount of supplementary information data does not match with the formatting
information, a question mark (?) is printed at the end of the fields.
25. Supplementary text
A more detailed text printed out in some alarms. The left margin of the line is shifted left if the
text is longer than 76 characters.
Please note that this field may not be identical between the setting and cancellation printouts of
the alarm.
26. Alarm operating instructions
Operating instructions the user may have defined for an alarm. The left margin of the line is
shifted left if the text is longer than 76 characters
RN33153EN20GLA0 31
RNC Maintenance and Fault Management in RNC
Alarm Reference
• Alarm number
• Alarm manual is the main – Notices (0 – 999)
reference for alarm handling – Disturbances (1000 – 1999)
– Failure printout (2000 – 3999)
• It is available on
– NED, as part of WCDMA RAN – Diagnosis reports (3700 – 3999)
System documentation – Base Station alarms (7000 – 7999)
– FM Application – RNC OMS alarms (>70000)
– Alarm Monitor
(Alarm Monitor)
(FM)
32 © Nokia Siemens Networks RN33153EN20GLA0
According to the alarm type and the origin of the alarm, the numbering of the alarms
conforms with the following:
Alarm number
Notices 1 - 999
Disturbance Printouts 1000 - 1999
Failure Printouts 2000 - 3999
External Alarms 4000 — 5999
Base Station Alarms 7000 — 7999
Diagnosis Reports are numbered with the numbers 3700 – 3999.
RN33153EN20GLA0 32
RNC Maintenance and Fault Management in RNC
Alarm Reference
Alarm number
and text
Full description
Additional
information
Instruction in
handling the alarm
Cancelling
information
33 © Nokia Siemens Networks RN33153EN20GLA0
RN33153EN20GLA0 33
RNC Maintenance and Fault Management in RNC
Each alarm event, alarm and its cancellation, not filtered by the alarm system, is saved in
a log file. This log data is called alarm history. Using the AH command group commands,
you can display the history data concerning the system's alarm situation.
You can either display the alarm history, or merely the active alarms, on the selected
output device.
An IPA 2800 alarm whose object unit is not in the normal working state is normally filtered
by the alarm system. When the alarm is filtered (by any means), it is neither printed out
nor stored in the alarm history. However, an alarm that is filtered purely on the basis of
the state of its object unit is printed out when displaying active alarms.
The alarm history can be useful in troubleshooting situations. By analyzing the alarm
history data you may be able to find trends or patterns on how and when the alarms
occur.
RN33153EN20GLA0 34
RNC Maintenance and Fault Management in RNC
With the command group commands, you can block the alarm completely and
remove blocking prevent the alarm system from sending the alarm information to one
or more alarm indication instances, for example, the network management system
NetAct allow the alarm system to send the alarm information to one or more alarm
indication instances, if it has been prevented.cdisplay the blocking information of the
alarms and alarm indication instances.
RN33153EN20GLA0 35
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 36
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 37
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 38
RNC Maintenance and Fault Management in RNC
AAP::CLS=AL2;
RN33153EN20GLA0 39
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 40
RNC Maintenance and Fault Management in RNC
RN33153EN20GLA0 41
RNC Maintenance and Fault Management in RNC
AHO:OMU;
* More information in
“Alarm Handling” module
RN33153EN20GLA0 42
RNC Maintenance and Fault Management in RNC
AFP:::;
AFP:::;
RNC IPA2800
BLOCKED ALARMS
COMMAND EXECUTED
RN33153EN20GLA0 43
RNC Maintenance and Fault Management in RNC
_____________________________________________________
_____________________________________________________
_____________________________________________________
RN33153EN20GLA0 44
RNC Maintenance and Fault Management in RNC
_____________________________________________________
_____________________________________________________
3. Display all ** alarms and *** alarms for the OMU’s (active and history)
_____________________________________________________
_____________________________________________________
4. Display ** alarms which have occurred after date ___ ___ ___.
_____________________________________________________
5. Look for all working state change notices (alarm number 690)
of the OMU’s that have occurred since date ___ ___ ___.
_____________________________________________________
6. Show all the alarms that have appeared in the _______ today after 8 a.m.
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
RN33153EN20GLA0 45
RNC Maintenance and Fault Management in RNC
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
RN33153EN20GLA0 46
RNC Maintenance and Fault Management in RNC
R 255 R 102
G 153 G 102
B0 B 102
R 102
G0
B 102
complete
47 © Nokia Siemens Networks RN33153EN20GLA0
RN33153EN20GLA0 47