Alarm Trip 1-Out-Of-2 Voting With High Availability
Alarm Trip 1-Out-Of-2 Voting With High Availability
The Problem
A customer has a Safety Instrumented Function they need to implement
with two independent level transmitters (radar), each wired to a
separate STA Functional Safety Trip Alarm. The process will trip if either
level reaches the set point, for example, a 1 out of 2 (1oo2) voting.
However, if the alarm trip (STA) detects an input or unit fault, the customer
did not want the process to trip. Instead, they wanted to take the faulty unit
out of service leaving the process running with the one good level and
alarm trip. This effectively degrades the configuration from a 1oo2 to 1oo1
(with the fault alarmed) until the fault is diagnosed and repaired (normally
within 72 hours).
Safety Architectures
Before we outline the solution we should review the safety aspect of the
system. As defined in IEC61508-6 Annex B, 1oo1 represents a minimum
system. No fault tolerance is provided by this system and no failure mode
protection is provided, see figure 1.
In 1oo2, the effect of a dangerous failure is minimized since either trip can
cause the system to fail-safe. The 1oo2 system offers low probability of
failure on demand, but it increases the probability of a "false trip".
Figure 1. Safety Architectures 1oo1 Simplex, 1oo2 High Integrity, and
2oo2 High Availability
Using 2oo2 voting reduces spurious trips but also increases the probability
of failure on demand. In older systems 2oo3 voting was commonly used.
This provided both high integrity and availability but at higher system costs.
These architectures do not use diagnostics as part of the automatic
system. Safety architectures have been developed which incorporate
diagnostics to improve both integrity and availability at lower costs.
Alarm trip relays are wired in series for 1oo2 voting fault relays.
Fault relays are wired into a safety repeater to provide additional fault
contacts, like Moore Industries’ SRM (SRM A & B), see figure 4.
NC contacts from each Safety Relay Modules (SRM) are wired in
parallel with corresponding STA trip relays to bypass the trip relay if
there is a diagnostic fault.
This creates 1oo1 voting for the healthy STA in the case of a fault
NO contacts from both SRMs are wired in parallel and then in series
with the process to trip if both units have a fault.
We often get the question whether HFT (hardware fault tolerance) is equal to redundancy?
The answer is no. In this blog post I will explain why not. In order to do that we need to
actually understand three terms, i.e., redundancy, HFT and voting.
REDUNDANCY
In the technical world everybody seems to know the word redundancy and yet it can be very
confusing. Especially when you try to express it in a number, i..e, how much redundant is a
design. So what is redundancy? Redundancy can be defined as a system function
which is designed in such way that there are multiple means (parts, components, devices,
software, etc) to carry out the function so that the function will not fail if one or more of these
means fails. Redundancy is not determined by the number of similar parts or devices you
see. Whether there is redundancy or not is solely determined by the function that you carry
out with these parts or devices. Take a look at the following pictures. You see two valves. Is
this now redundancy or not?
If on the other hand the function is to open the flow and both valves are closed during
normal operation than both valves need to open in order to start the flow. If one valve is
stuck closed (in this case also a dangerous failure) the function cannot be carried out, even
if the other valve opens. This is not redundancy and the valves are in a so called 2oo2
architecture design.
In the first case we are redundant but how much redundant. Some cultures call it redundant,
other say it is two redundant, but actually the correct way to express it is one redundant.
The reason is that one valve is needed to stop the flow and there is one additional valve in
case the other fails.
HFT AND VOTING
In the functional safety business we use the term HFT to express that we have redundancy
or not. When a design has a HFT of X it means that it can tolerate X dangerous failures and
it still works. X+1 dangerous failures and it does not work any more. HFT can easily be
calculated if the architecture is known, i.e., 1oo1, 1oo2, 2oo3, etc. If the architecture is
expressed as MooN than the HFT is calculated as N – M. In other words a 2oo4
architecture has a HFT of 2. This means it can tolerate 2 failures and it still works, and thus
it is an architecture with redundancy. But how much redundant is it? Lets explore this.
A 1oo1 architecture has a HFT=0 and thus can tolerate 0 failures and has no or zero
redundancy. A 2oo2 architecture has a HFT=0 and thus can tolerate 0 failures. It has no or
0 redundancy. Yet it consist of two devices. The problem in this case is voting. Voting is
defined as the number of paths that must work out of the total number of paths available. A
2oo2 has two paths available but also two paths need to work. If one path fails, it does not
work any more, even if the other path is available. Hence a 2oo2 has no redundancy. So
just because you see two valves that does not mean you have redundancy. You need to
know how much voting is needed.
So how does this now work for the most popular architectures in the safety industry. See
the table below which gives an overview.
1oo1 1 0
1oo2 1 1
2oo2 2 0
2oo3 2 1
2oo4 2 2
3oo3 3 0
You notice anything special? Yes, HFT looks like it is equal to redundancy but suddenly
with 2oo4 it goes wrong. Which automatically means that hardware fault tolerance is not a
measure of redundancy. It is not the same. If HFT is larger than zero you know you have
redundancy but you do not know how much.