0% found this document useful (0 votes)
60 views12 pages

Accurate Microarchitecture-Level Fault Modeling For Studying Hardware Faults

Uploaded by

Darvish Arh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views12 pages

Accurate Microarchitecture-Level Fault Modeling For Studying Hardware Faults

Uploaded by

Darvish Arh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Accurate Microarchitecture-Level Fault Modeling for Studying Hardware Faults ∗

Man-Lap Li, Pradeep Ramachandran, Ulya R. Karpuzcu, Siva Kumar Sastry Hari, Sarita V. Adve
Department of Computer Science
University of Illinois at Urbana-Champaign
swat@cs.uiuc.edu

Abstract 1 Introduction

Decreasing hardware reliability is expected to impede the While technology scaling facilitates extended system in-
exploitation of increasing integration projected by Moore’s tegration, the scaled transistors are increasingly prone to fail-
Law. There is much ongoing research on efficient fault toler- ures for reasons such as infant mortality, wear-out, varia-
ance mechanisms across all levels of the system stack, from tion, etc., making them less reliable. The hardware reliability
the device level to the system level. High-level fault tolerance problem has, in the past, concerned only high-end niche sys-
solutions, such as at the microarchitecture and system levels, tems where solutions that involve heavy amounts of redun-
are commonly evaluated using statistical fault injections with dancy in terms of space, time, or information are acceptable.
microarchitecture-level fault models. Since hardware faults In the future, however, the reliability problem is expected to
actually manifest at a much lower level, it is unclear if such pervade even the mainstream computing market where tradi-
high level fault models are acceptably accurate. On the other tional solutions are too expensive to be applied.
hand, lower level models, such as at the gate level, may be To counter this reliability threat, researchers have pro-
more accurate, but their increased simulation times make it posed solutions at all levels of design, from the system level
hard to track the system-level propagation of faults. Thus, an all the way down to the circuit and the device level. Exam-
evaluation of high-level reliability solutions entails the clas- ples include software-level symptom-based detection tech-
sical tradeoff between speed and accuracy. This paper seeks niques such as SWAT that capture how hardware faults man-
to quantify and alleviate this tradeoff. ifest to the system level [12, 22], end-to-end error detection
We make the following contributions: (1) We introduce and correction [23], microarchitecture-level (µarch-level) re-
SWAT-Sim, a novel fault injection infrastructure that uses dundancy [27], and circuit-level BIST techniques [6].
hierarchical simulation to study the system-level manifes- To evaluate the efficacy of these solutions, it is essential to
tations of permanent (and transient) gate-level faults. For capture the expected behavior of the fault at the level at which
our experiments, SWAT-Sim incurs a small average perfor- the solution is implemented. For example, the manifestation
mance overhead of under 3x, for the components we sim- of a gate-level floating-point (FP) unit fault needs to be accu-
ulate, when compared to pure microarchitectural simula- rately captured at the microarchitecture level to evaluate the
tions. (2) We study system-level manifestations of faults in- efficacy of a proposed microarchitecture-level floating-point
jected under different microarchitecture-level and gate-level unit checker. This paper concerns accurate models of hard-
fault models and identify the reasons for the inability of ware faults at the microarchitecture level to evaluate fault-
microarchitecture-level faults to model gate-level faults in tolerant solutions at the microarchitecture and higher levels.
general. (3) Based on our analysis, we derive two probabilis- Recently, several µarch-level solutions that tolerate hard-
tic microarchitecture-level fault models to mimic gate-level ware failures have been proposed [2, 8, 12, 14, 25, 30].
stuck-at and delay faults. Our results show that these models The primary evaluation mode for these proposals has been
are, in general, inaccurate as they do not capture the complex through statistical fault injections in simulations either at
manifestation of gate-level faults. The inaccuracies in exist- the gate level [8, 14, 25] or the microarchitectural state el-
ing models and the lack of more accurate microarchitecture- ements (e.g., output latch of an ALU) [2, 12, 30]. While
level models motivate using infrastructures similar to SWAT- gate-level fault injections can accurately capture lower level
Sim to faithfully model the microarchitecture-level effects of faults, the long simulation time of these schemes prevents
gate-level faults. detailed evaluation of the propagation of gate-level faults
through the hardware and into the software. On the other
∗ This work is supported in part by the Gigascale Systems Research Cen- hand, the µarch-level injections are fast and allow observ-
ter (funded under FCRP, an SRC program), the National Science Founda- ing faults propagated to the software level. However, while
tion under Grants CCF 05-41383, CCF 08-11693, and CNS 07-20743, an
OpenSPARC Center of Excellence at the University of Illinois at Urbana-
latch-level injections may be appropriate for array elements
Champaign supported by Sun Microsystems, and an equipment donation within the processor, it is unclear whether modeling faults in
from AMD. combinational logic at the latch level (e.g., injecting a fault
978-1-4244-2932-5/08/$25.00 ©2008 IEEE 105
at the output latch of the FP unit to represent a fault in the 1.1 SWAT-Sim
logic), is accurate. While alternative FPGA-based emula-
tions [11, 17, 21] offer higher speed and model gate-level To meet the stated criteria, we propose a novel
faults with high fidelity, the limited observability and con- fault injection infrastructure, SWAT-Sim, that couples a
trollability gives less flexibility than software simulations. microarchitecture-level simulator with a gate-level simulator
Hence, this paper focuses on software simulation methods. and has the following properties.
The lack of speed in the gate-level fault simulation
paradigm and the possible lack of fault modeling fidelity 1. To achieve speed close to a microarchitectural simula-
in µarch-level fault simulation prompt searching for a so- tor and minimize overhead, SWAT-Sim only simulates
lution that can achieve the best of both worlds. To address the component of interest (in our case, the faulty com-
this classic tradeoff between speed and accuracy, past work ponent) at gate-level accuracy and invokes a gate-level
has applied the paradigm of hierarchical simulation, where simulation of the component on-demand.
different parts of the system are simulated at different ab-
straction levels so that required details are modeled only in 2. To accurately capture the interaction between the hard-
the parts of interest, thus incurring reasonable performance ware fault and the software, SWAT-Sim invokes the
overheads [1, 5, 7, 10, 15, 18]. gate-level simulation repeatedly during runtime (inter-
In the context of fault tolerance, hierarchical simulations spersed with µarch-level simulations); thus, if the soft-
have been used to study transient faults in the processor by ware activates the gate-level fault, it would be corrupted
using a hierarchy of RTL and lower-level simulators [7, 15]. and affects future activations of the same fault.
Since these simulators were used to study transients, they in-
3. To allow fault modeling flexibility, SWAT-Sim employs
voke the lower-level simulator just once to capture the effect
a gate-level timing simulator where different timing
of the fault, following which simulation happens only in the
faults can be modeled by changing the delay informa-
higher level. Other work has used hierarchical simulations
tion within the faulty module.
to generate fault dictionaries that capture the manifestations
from the lower level “off-line” and use them to propagate
fault effects during high-level simulations [10]. This idea of These design choices of SWAT-Sim allow studying of the
fault dictionaries has also been used to study gate-level stuck- impact of gate-level permanent faults on software at speeds
at faults in small structures, such as an adder [5]. However, comparable to µarch-level simulators. Further, since the fault
fault dictionaries are specific to the fault model for which simulation is performed while real-world software is execut-
they are generated and cannot be used to simulate arbitrary ing, the effect of the fault is studied using functional vectors
fault models (the dictionary will have to be generated off-line that represent realistic scenarios. SWAT-Sim thus has an ad-
for every such fault model); timing faults particularly present vantage over other methods that use artificially generated test
a challenge. Further, for faults in arbitrarily large structures, vectors (e.g., functional vectors collected from a fault-free
the growing sizes of inputs and faults make the dictionaries execution) to study the fault effect, as test vectors may not be
intractable, making them hard to use. representative of real-world faulty behavior.
Our focus here is on the increasingly important perma- 1.2 Contributions
nent and intermittent faults [4, 31] and solutions for model-
ing them at the microarchitecture level or higher. In partic- • We present SWAT-Sim, a novel fault injection infras-
ular, successful solutions must address the following three tructure for studying system-level effects of gate-level
critical aspects of fault simulation that prior work does not permanent faults. To the best of our knowledge, SWAT-
address in unison. Sim is the first simulator that facilitates detailed under-
standing of permanent fault propagation from the gate
1. Simulation must be fast enough to capture how software level, through the microarchitecture, to the full system
would be affected by hardware faults. level, with real-world applications running on an oper-
ating system. SWAT-Sim is both fast (compared to gate-
2. Unlike transients, where the fault effect can be captured level simulators) and accurate (compared to µarch-level
once and propagated to the higher abstraction level, per- simulators), with a small average overhead of 3x, for the
manent and intermittent faults have the characteristic components we simulate, over µarch-level simulators.
that one activation of a fault could corrupt the soft-
ware execution, which influences future activations of • With SWAT-Sim, we study the system-level manifes-
the same fault. This feedback mechanism between the tations of faults injected in a Decoder, an Arithmetic
hardware fault and the software must be faithfully sim- and Logic Unit (ALU), and an Address Generation
ulated. Unit (AGEN) of a superscalar processor. We inject
faults under µarch-level stuck-at, gate-level stuck-at,
3. The simulator must be flexible enough to model differ- and gate-level delay fault models, and use the previ-
ent types of faults. ously studied SWAT detection techniques to understand
their system-level manifestation [12]. We show that, in
106
general, µarch-level stuck-at faults do not result in sim- collects signals from the stimuli pipe and the other writes the
ilar system-level fault manifestation as gate-level stuck- results to the response pipe. The stimuli and response (argu-
at or delay faults. We thus infer that more accurate mod- ments of the two tasks) are tailored to the µarch-level struc-
els are needed to model gate-level faults at the µarch tures under fault injection.
level. Figure 1 compares how a single fault in a µarch-level
structure X is simulated in a purely µarch-level simulator
• Based on an extensive analysis of the propagation of (Figure 1(a)) and in SWAT-Sim (Figure 1(b)).
gate-level faults to the microarchitecture, we derive two In Figure 1(a), a single fault in X is modeled as a single-
probabilistic fault models, the P-model and the PD- bit corruption at the output latch of X because the µarch-level
model, for gate-level stuck-at and delay faults. Our simulator lacks the gate-level details of X.
analysis suggests that these models are, in general, inac-
On the other hand, at the gate-level, a single fault in X
curate µarch-level models for gate-level faults because
is modeled as a fault in a specific gate or net. Figure 1(b)
they fail to capture the complex manifestation of gate-
shows the steps of how the SWAT-Sim hierarchical simula-
level faults. However, we identify several reasons of the
tor simulates the effect of this fault. (1) An instruction in
inaccuracies of the models that could aid deriving better
the µarch-level simulator uses X. SWAT-Sim collects the rel-
µarch-level models in the future.
evant input vectors and sends them to the stimuli pipe. (2)
The Verilog system task reads from the input pipe and sends
Overall, this paper makes a first attempt towards under-
the stimuli to the gate-level simulator. (3) The gate-level sim-
standing the differences in system-level effects between the
ulator feeds the stimuli to the faulty module and obtains the
µarch-level stuck-at fault models and gate-level stuck-at and
output after gate-level simulation. (4) The Verilog system
delay fault models. Our extensive analysis and modeling
task transfers the result from the gate-level simulator to the
showed that it is highly complex to capture the several factors
response pipe. (5) The µarch-level simulator reads the result
that should be used for deriving µarch models that accurately
from the response pipe and continues simulation. In partic-
represent the behavior of gate-level faults. Therefore, un-
ular, the figure shows the effect of a single gate-level fault
til there are further breakthroughs in µarch-level fault mod-
propagating into a multiple-bit corruption at the output latch.
els, we believe that gate-level simulations are necessary to
In contrast, the fault injected in pure µarch-level simulation
capture the behavior of gate-level faults. Hence, fast simu-
only results in a single-bit corruption (Figure 1(a)).
lation methods, such as the proposed hierarchical simulator,
SWAT-Sim, are essential for studying system-level effects of 2.2 Different µarch-level Structures
hardware faults. Given the wide variety of structures within a modern pro-
cessor and the differences in the abstraction levels between
2 The SWAT-Sim Infrastructure
a typical µarch-level simulator and its corresponding gate-
SWAT-Sim is fundamentally a µarch-level simulator that level counterpart, several factors should be considered when
only simulates the faulty µarch-level blocks, such as a faulty performing such hierarchical simulations.
ALU or decoder, at the gate level. This greatly minimizes the Simulating sequential logic: Simulating combinational
gate-level simulation overhead. logic with single- or multi-cycle latency in SWAT-Sim is
straightforward. As long as the outputs are read after the
2.1 Interfacing the Simulators stipulated latency, the outputs are guaranteed to be correct
SWAT-Sim couples a full-system µarch-level and a gate- for each invocation. Sequential logic, however, requires state
level simulator. A gate-level Verilog module of the faulty to be maintained across invocations. In SWAT-Sim, since the
unit is simulated only when the unit is utilized by the µarch- gate-level simulator is invoked (and thus clocked) only when
level simulator. The inputs to the µarch-level unit are passed the unit is utilized, state is maintained across multiple invoca-
as stimuli to the gate-level simulator. When the gate-level tions, resulting in accurate simulation of sequential circuits.
simulation completes, the results are passed back to the Handling gate-level signals that are not modeled at the
µarch-level simulator, which then continues execution. µarch level: In some cases, due to abstract modeling in the
This communication between the two simulators is µarch simulators, not all signals modeled at the gate-level
achieved using UNIX named pipes. In the µarch-level sim- appear at the µarch level. If the faulty component contains
ulation, each time an instruction utilizing the faulty unit is such signals, the µarch-level simulator can be enhanced with
encountered, the stimuli needed by the gate-level module are those signals to help propagate faults in these paths, improv-
written to a dedicated stimuli pipe. After the gate-level simu- ing its accuracy. Even in the absence of these enhancements,
lation completes, the computed data is written to a dedicated SWAT-Sim would present a more accurate fault model than
response pipe from where the µarch-level simulator can read existing µarch-level fault models.
the response. Simulating large µarch-level components that may result
While the µarch-level simulator can access the named in large overheads: Since the primary aim of SWAT-Sim
pipes like files, the gate-level simulator is enhanced with two is being able to study the propagation of gate-level faults to
system tasks, implemented using the Verilog Procedural In- the system level, simulations must be carried out at reason-
terface (VPI) [9], that handle accesses to/from the pipes: One able speeds. The components we study in the paper present
107
Instruction Instruction
Stimuli
Inputs
Inputs
Input VPI
Pipe System
Gate
Task
Faulty Unit Level
Faulty Unit
X Sim
X
Output Output Outputs
Output
Fault Pipe
VPI
… Injected … System Fault
Continue Execution Continue Execution Task Propagated

(a) µarch-Level Fault Simulation (b) SWAT-Sim Fault Simulation


Figure 1. Comparison of how a faulty µarch-level unit X is simulated by (a) a pure µarch-level simulator and (b) by SWAT-Sim.

overheads in simulation time of under 3x (discussed in Sec- Base Processor Parameters


tion 4.1), when compared to pure µarch-level simulations. Fetch/Decode/
However, if the overhead becomes exorbitant because the Execute/Retire rate 4 per cycle
faulty module is too large, the module can be further par- Functional Units 2 Int add/mul, 1 Int div,
2 Load, 2 Store, 1 Branch,
titioned so that only the faulty submodule is simulated at the
2 FP add, 1 FP mul, 1 FP div/sqrt
gate level while the rest is simulated at the higher level. For Integer Unit latencies 1 add, 4 mul, 24 divide
example, [15] uses such an approach in a lower-level hierar- FP Unit latencies 4 default, 7 mul, 12 divide
chical simulator. Reorder Buffer size 128
Overall, by effectively coupling the gate-level and µarch- Register File size 256 integer, 256 FP
level simulators, SWAT-Sim is capable of simulating gate- Unified Load-Store Queue 64 entries
level faults in different µarch-level components, making it Base Memory Hierarchy Parameters
a useful tool for full-system fault propagation studies with Data L1/Instruction L1 16KB each
gate-level accuracy. L1 hit latency 1 cycle
L2 (Unified) 1MB
3 Experimental Methodology L2 hit/miss latency 6/80 cycles

3.1 Simulation Infrastructure Table 1. Parameters of the simulated processor.

Since permanent faults are persistent and can propagate generates the SDF (Standard Delay Format) file that contains
through the µarch-level to affect the OS and application state, the delay information of each gate and wire within the syn-
SWAT-Sim requires a full-system, a µarch-level, and a gate- thesized gate-level module. The Cadence NC-Verilog simu-
level timing simulator. Any set of such simulators may be lator then performs gate-level timing simulations with infor-
interfaced for the purposes of fault propagation. mation provided in this file. For delay faults (described in
In our implementation, SWAT-Sim consists of three com- Section 3.2), we modify the post-synthesis SDF file to incor-
ponents – the Virtutech Simics full-system functional simula- porate added delays.
tor [29], the Wisconsin GEMS processor and memory µarch- This simulation setup allows us to inject permanent faults
level timing models [13], and the Cadence NC-Verilog gate- under different fault models into the ALU, the AGEN, and
level simulator. We interfaced the Cadence NC-Verilog sim- the Decoder, and to observe their impact on real workloads (6
ulator with GEMS using system calls implemented in VPI as SpecInt2000 and 4 SpecFP2000) running on the Sun Solaris-
described in Section 2.1 9 operating system. Both the application and the OS run on
For the gate-level modules, we obtained the RTL de- a simulated 4-wide out-of-order processor (Table 1) support-
signs of the arithmetic and logic unit (ALU) and the address ing the SPARC V9 ISA.
generation unit (AGEN) from the OpenSPARC T1 architec-
3.2 Fault Models
ture [28] and built an RTL model of the SPARC V9 decoder
based on the decoder in GEMS. The Decoder module de- In our experiments, we injected faults according to the
codes one 32-bit instruction word per cycle and generates following fault models to study differences in system-level
the signals modeled by our µarch-level simulator. The ALU effects among faults injected at the µarch level and the gate
module is capable of executing arithmetic (add, sub), logical level. In all cases, we inject single bit (or single wire) faults.
(and, or, not, xor, and mov), and shift (shift-left and shift- Gate-level stuck-at fault model: The gate-level stuck-at
right) instructions. The AGEN module computes the effec- fault model is a standard fault model applied in manufac-
tive virtual address given the operand values of the mem- turing testing. We inject both stuck-at-0 and stuck-at-1 faults
ory (load/store) instruction. Using Synopsys Design Com- in randomly chosen wires in the circuit.
piler, we synthesized these modules at 1GHz with the UMC Gate-level timing fault model: It has been shown that
0.13µm standard cell library. Further, this synthesis tool also aging-related faults result in timing errors in the faulty gate,
108
with increasing delay as the aging worsens [3]. Ideally, we architectural state, the application output remains correct. (2)
would like to model this effect using transition fault models Detected>10M: the fault is detected later in the execution,
and path delay faults, with different amount of delays. Here, but is deemed not recoverable. (3) Silent data corruption
we experiment with two delay fault models: (1) We inject (SDC): the fault remains undetected and corrupts the appli-
a one-clock-cycle delay into the faulty gate such that timing cation output.
violations occur along all paths containing the gate when a Given the injection outcomes, we study the differences
transition occurs. (2) The faulty gate is injected with a half- between the various fault models using two metrics, coverage
clock-cycle delay, potentially causing a subset of the gate’s and detection latency, as follows.
output cone to violate timing. Coverage: We define coverage as the percentage of un-
Microarchitecture-level stuck-at fault model: Due to the masked faults that are detected within 10 million instructions
absence of more accurate fault models, stuck-at faults at the and calculated as Injected − M asked × 100, where µarch-
T otal Detected
input/output latch of a faulty µarch-level unit have been used Mask, Arch-Mask, and App-Mask constitute masked injec-
to estimate the effect of gate-level faults (both stuck-at and tions.
timing-related faults). We adopt this fault model, injecting Detection Latency: The latency of detection determines the
both stuck-at-0 and stuck-at-1 faults in the input of the De- recoverability of the fault. Faults with shorter detection la-
coder and the output latch of the ALU or AGEN. tencies can be fully recovered using hardware techniques
3.3 Studying System-Level Effects (e.g., [19, 26]) with little hardware buffering to handle in-
put/output commit problems. On the other hand, while the
A key objective of this study is to understand the differ- memory state corrupted by faults with longer detection la-
ences, if any, in system-level manifestations of µarch-level tencies can be recovered using hardware techniques, the sup-
and gate-level faults within µarch-level structures. For this port for handling the input/output commit problems would
purpose, we use the SWAT symptom-based detection scheme be more complex and may require software involvement.
because these detectors essentially capture how hardware We measure the latency of detection from the instruction at
faults manifest into the system level and software [12]. which the architectural state (of either the application or the
We inject faults using the fault models described in Sec- OS) is corrupted until the detection.
tion 3.2 and rely on SWAT-Sim’s full-system µarch-level
simulator to propagate the fault effect to the software for the 3.4 Parameters of the Fault Injection
SWAT detectors to detect. Specifically, we use the following
SWAT detectors – (1) FatalTraps, such as memory address Our fault injection campaign consists of several runs for
misalignment, illegal instruction, etc., denoting an abnormal each of our 10 applications, with 1 fault injected per run. For
software execution, (2) Hangs of the application and the OS, each combination of fault model (Section 3.2), faulty struc-
identified using a hardware hang detector, and (3) HighOS, ture, and application, we inject a fault in one of 4 different
representing abnormal executions that have excessive con- randomly chosen points in the application and one of 50 dif-
tiguous OS instructions (30,000 contiguous instructions for ferent points in the faulty unit. For the gate-level stuck-at
our experiments) [12]. Following the methodology used in and delay fault models, the 50 points in a structure are cho-
[12], we consider faults detected within 10 million instruc- sen from the 1853, 2641, and 757 wires of the synthesized
tions (after corruption of some architectural state) to be re- gate-level representation of the Decoder, ALU, and AGEN
coverable (e.g., using pure hardware [19, 26], or hybrid hard- respectively. For the µarch-level faults, these points are ran-
ware/software recovery schemes). Therefore, for each fault domly chosen from the 32 bits of the input latch of the De-
injection run, SWAT-Sim performs a detailed timing simula- coder and from the 64 bits of the output latches of the ALU
tion (both µarch and gate-level) for 10 million instructions and AGEN. Further, since there are multiple decoders, ALUs
after the first architectural state corruption.1 If there is no and AGEN units in our superscalar processor, one of them
architectural state corruption for 10 million instructions after is chosen randomly for each injection. We also ensure that
the fault injection, the fault is assumed to be masked and the the samples are chosen so that gate-level stuck-at and delay
simulation is terminated. faults are injected in the same set of wires to facilitate a fair
Thus, at the end of the above 10 million instruction win- comparison among the gate-level faults.
dow, a fault results in one of the following outcomes: (1) This gives us a total of 2000 simulations per fault model
µarch-Mask: the µarch-level state (output latches of ALU, per structure (4 × 10 × 50). Each injection run whose fault is
AGEN, and Decoder) is never affected. (2) Arch-Mask: the not masked is a Bernoulli trial for coverage (either detected
architectural state is not corrupted. (3) Detected: a detec- or not). Further, since the injection experiments are inde-
tion occurs. (4) Unknown: the fault is neither detected nor pendent of each other, this gives us a low maximum error of
masked at the µarch and architecture level. 1.1% for the reported coverage numbers, at a 95% confidence
The faulty cases that result in unknown are then simulated interval.
in functional mode and can have one of three outcomes: (1)
Application-level masking: even though the fault corrupts the
3.5 Limitations of the Evaluation
1 Architectural state corruptions are determined by continuously compar- Here, we list some of the assumptions and limitations of
ing against a golden fault-free run (see [12]). our evaluation.
109
• SWAT-Sim assumes that a Verilog description of the Unit Fault Model Max Avg
module of interest is readily available for interfacing. Gate Stuck-At 2.20 1.56
ALU
This is true for the large fraction of the processor that Gate Delay 2.65 1.93
is typically re-used from older tape-outs. However, for Gate Stuck-At 1.59 1.26
AGEN
modules that are yet to be developed, neither SWAT- Gate Delay 1.89 1.35
Sim nor pure gate-level simulators can be used to per- Gate Stuck-At 2.91 2.12
Decoder
Gate Delay 5.10 2.91
form fault injection experiments. As these models start
to become available, SWAT-Sim can be incrementally
Table 2. Slowdowns of SWAT-Sim when compared to pure
interfaced with them. µarch-level simulation.
• Using SWAT-Sim, we study the propagation of gate- and each fault model. We do not inject a fault in the de-
level faults in only three microarchitecture units (De- sired faulty unit, but force the unit to be simulated at the gate
coder, ALU, and AGEN) as we could not find other level. To be conservative, we always use the most utilized
Verilog modules close enough to the SPARC architec- unit for this purpose (e.g., ALU 0 for faulty ALU). For delay
ture modeled by the µarch-level simulator (we used the faults, we simulate the chosen unit with SDF timing annota-
in-order UltraSPARC T1 as our Verilog source and the tion. Table 2 shows the maximum and average slowdowns of
out-of-order GEMS as our µarch-level source). SWAT-Sim compared to pure µarch-level simulation, when
simulating the ALU, the AGEN, and the Decoder across dif-
• The timing information generated in the SDF file rep-
ferent fault models.
resents pre-layout timing, which does not reflect accu-
Overall, the worst average-case slowdown of SWAT-Sim,
rate post-layout timing for both gate delays and inter-
compared to the µarch-level simulation, is under 3x, which
connect. By extracting this information using a place-
is an acceptable overhead considering SWAT-Sim’s ability to
and-route tool, the accuracy of our timing simulations,
model gate-level faults. In particular, Table 2 shows that the
and thus our results, can be further improved.
Decoder incurs the most overhead, with average slowdowns
• Although prior work has suggested other statistical de- of gate-level stuck-at and delay faults being 2.12x and 2.91x
lay models for timing faults (e.g., based on threshold respectively. The average slowdowns of the ALU and the
voltage and temperature [16, 24]), we inject fixed and AGEN are under 2x. The maximum slowdowns observed for
arbitrarily chosen delay that may or may not represent the ALU and the AGEN are under 2.7x and 2x, respectively
real-world failure modes. Integrating more accurate while the overall maximum slowdown of 5.1x is measured
lower-level timing fault models in SWAT-Sim is a sub- for the Decoder. The Decoder incurs higher overhead than
ject of our future work. other units because it sits at the processor front-end and is
more utilized than the ALU and the AGEN.
In spite of these assumptions and limitations, the results As expected, the delay fault simulations always incur
presented in this paper demonstrate the importance of us- higher overhead than the stuck-at fault simulations because
ing hierarchical simulators, such as SWAT-Sim, to accurately simulating delay faults requires timing information which is
model gate-level faults at the µarch level. more compute-intensive.
Since we do not have the corresponding gate-level model
of the superscalar processor we simulate at the µarch level,
4 Results
we cannot directly determine the performance benefit of
The hierarchical nature of SWAT-Sim allows us to achieve SWAT-Sim over pure gate-level simulation. Instead, we de-
gate-level accuracy in fault modeling, at speeds compara- rive a rough conservative estimation of the performance ben-
ble with µarch-level simulators. We first summarize SWAT- efit as follows. Assume (conservatively) that we need to
Sim’s performance when compared to both the µarch-level simulate a fault in a circuit that contains 4 times the num-
simulation and pure gate-level simulation (Section 4.1). We ber of gates and is utilized twice as often as the Decoder.
then use the SWAT-Sim simulator to first evaluate the accu- Assume that the full superscalar processor we wish to sim-
racy of the previously used µarch-level stuck-at fault mod- ulate has 25 million gates. Assuming SWAT-Sim’s worst-
els for representing gate-level faults (Section 4.2). Subse- case slowdown is linear to the utilization and the size of the
quently, we extensively analyze the reasons for the differ- gate-level module and the baseline µarch simulator simulates
ences in the manifestations of gate-level faults from µarch- at the rate of 17k instr/sec (which is the measured average
level faults (Section 4.3). From this detailed analysis, we de- speed of our µarch-level simulator), it would take SWAT-
rive two candidate probabilistic µarch level fault models for Sim 10M instr × 17k4×2×5.1 instr/sec = 6.7 hr to simulate 10
modeling gate-level stuck-at and delay faults (Section 4.4). million instructions in the worst case. On the other hand,
conservatively assuming the gate-level simulator simulates
4.1 Performance Overhead of SWAT-Sim 25M gates-cycles/sec (more than 1300x the speed reported
To understand the performance overhead incurred by in [20]) and the execution has an IPC of 1, it would take
SWAT-Sim when compared with pure µarch-level simula-
25M gates
10M instr × 1 instr/cycle×25M gates−cycles/sec = 2778 hr
tion, we profile a set of 40 fault-free runs for each structure to simulate 10 million instructions. SWAT-Sim thus achieves
110
94 94 89 85 99 100 96 95 95 90 94 97 96 98 95 95 100 100
100% delay faults (95%) is slightly more pessimistic. In contrast,
90%
the coverage of the ALU delay faults is significantly lower
Percentage of Injected Faults

80%
70%
(89% and 85% for 1-cycle and 0.5-cycle delay faults, respec-
60% tively).2
50% The following analyzes the faults that do not result in de-
40%
tection in more detail.
30%
20%
Masking: A large source of discrepancy among the dif-
10% ferent fault models lies in the masking rate (µarch-level, ar-
0% chitectural, and application masking). The µarch-level stuck-
at fault models have very little masking of all three kinds
Delay 1cyc
Delay 0.5cyc

Delay 1cyc
Delay 0.5cyc

Delay 1cyc
Delay 0.5cyc
Gate s@0
Gate s@1

µarch s@0
µarch s@1

Gate s@0
Gate s@1

µarch s@0
µarch s@1

Gate s@0
Gate s@1

µarch s@0
µarch s@1
(on an average, 0.3% for the Decoder, 2% for the ALU, and
under 9% for the AGEN), while the gate-level fault models
INT ALU AGEN Decoder
µarch-Mask Arch-Mask App-Mask show a much higher rate of masking (>30% for all struc-
Detected Detected>10M SDC
tures, with 0.5-cycle delay faults in the AGEN having the
Figure 2. Efficacy of the SWAT fault detection highest masking rate of 54%).
scheme [12] under different fault models for the ALU, The masking rates of µarch-level faults are low mainly
AGEN, and Decoder. Depending on the fault model and the because the faults are rarely µarch-masked when compared
structure, the µarch-level fault may or may not capture the to gate-level faults. As µarch-level faults directly change the
system-level effects of gate-level faults accurately, as indi- latch data, the only case where it does not result in a µarch
cated by the differences in coverage.
corruption (i.e., is µarch-masked) is when the data does not
activate the latch fault, e.g., correct data value of 0 masks
a 417x speedup over traditional gate-level simulation. a stuck-at-0 fault. At the gate level, there are two scenarios:
(1) the fault at the gate is not activated, and (2) the fault is ac-
4.2 Accuracy of Microarchitecture-Level tivated but does not propagate due to other signals in the cir-
Fault Models cuit. Thus, the gate-level faults see much higher µarch mask-
We next investigate the accuracy of µarch-level fault mod- ing rates. Further, the µarch-level faults are hardly masked
els. If these fault models were accurate enough, then we can at the application and architecture levels since the they tend
eliminate gate-level simulations entirely, thus eliminating the to perturb the data more severely and cause symptoms more
need for SWAT-Sim and its overhead. easily than the gate-level faults.
Interestingly, gate-level faults injected into the 3 struc-
4.2.1 Comparison of Coverage tures exhibit different masking behaviors. All structures have
Figure 2 compares the efficacy of the SWAT scheme in de- high µarch-level masking. However, architectural masking is
tecting different faults injected using different fault models significant only for the Decoder (25% to 31%) and applica-
into the ALU, the AGEN, and the Decoder. The bars repre- tion masking is substantial only for the ALU (35% to 42%).
sent the outcomes for the µarch-level stuck-at-1 (µarch s@1) Decoder faults are more likely to be masked at the archi-
and stuck-at-0 (µarch s@0) models, the gate-level stuck- tecture level than other structures. For these cases, we ob-
at-1 and stuck-at-0 models (Gate s@1 and Gate s@0, re- serve that the faults affect a subset of instructions of types
spectively), and the gate-level 1-cycle-delay and 0.5-cycle- that are sparingly used and corrupt only wrong-path instruc-
delay models (Delay 1cyc and Delay 0.5cyc, respectively). tions. Thus, even though the gate-level faults become mi-
Each bar shows the fraction of fault injections that are croarchitecturally visible, they are not activated again after
microarchitecturally masked (µarch-Mask), architecturally the pipeline flush and thus the fault becomes architecturally
masked (Arch-Mask), application-masked (App-Mask), de- masked. For the ALU and AGEN, however, we see relatively
tected within 10M instructions (Detected), detected beyond few faults that get activated only by speculative instructions.
10M instructions (i.e., not recoverable) (Detected >10M), On the other hand, a significant number of ALU faults are
and those that lead to silent data corruptions (SDC). The masked by the application. This is likely due to the activated
number on top of each bar represents the coverage. faults being logically masked. For example, suppose instruc-
Figure 2 shows that depending on the structure and the tion r1 ← r2 + r3 uses the faulty ALU and the fault causes
fault model, the µarch-level fault model may or may not ac- r1 to change from 1 to 2. If r1 is only used for the branch
curately capture the effect of gate-level faults, as indicated by instruction beq r1, 0, L, the fault effect is masked by the ap-
the coverage. For the AGEN, the coverage of µarch stuck-at plication. This type of masking is relatively rare in other
faults is similar to that of the gate-level stuck-at and 1-cycle structures. Since it is more likely for Decoder faults to affect
delay fault models (between 94% and 97%). However, the the program control flow and for AGEN faults to change the
coverage of 0.5-cycle delay AGEN faults is noticeably lower addresses of memory accesses, these faults, once activated,
(90%). For the Decoder and the ALU, the coverage for the 2 We found the coverage with SWAT-Sim improves significantly (from
µarch-level stuck-at faults is near perfect (99+%) while the 89% to 94% for 0.5-cycle delay faults in ALU) when the undetected cases
coverage of the gate-level stuck-at faults (94% for the ALU are run for 50M instructions, showing that SWAT’s detectors remain effec-
and between 96% and 98% for the Decoder) and the Decoder tive at this longer latency (which is still recoverable [19]).
111
100% 45 46
36
90%
20%
Percentage of Detected Faults

80% 19
18%
70%
<10M 16%

Fault Activation Rate


60% <1M 14%
50% <100k 12%
<10k 10%
40% 9.0
<1k
30% 8% 6.1 6.8
6.9
5.9
6%
20% 4.1
4%
10% 1.6 1.6 1.5 1.9
2% 1.1 0.7 0.8 0.6
0%
0%
Delay 1cyc
Delay 0.5cyc

Delay 1cyc
Delay 0.5cyc

Delay 1cyc
Delay 0.5cyc
Gate s@0
Gate s@1

µarch s@0
µarch s@1

Gate s@0
Gate s@1

µarch s@0
µarch s@1

Gate s@0
Gate s@1

µarch s@0
µarch s@1

Delay 1cyc
Delay 0.5cyc

Delay 1cyc
Delay 0.5cyc

Delay 1cyc
Delay 0.5cyc
Gate s@0
Gate s@1

µarch s@0
µarch s@1

Gate s@0
Gate s@1

µarch s@0
µarch s@1

Gate s@0
Gate s@1

µarch s@0
µarch s@1
ALU AGEN Decoder ALU AGEN Decoder

Figure 3. Latency of fault detection in terms of number Figure 4. Mean fault activation rate for the different fault
of instructions executed from architectural state corruption models as a percentage of the number of instructions.
to detection. The differences in the models impact recovery,
which is primarily governed by these latencies.
the three structures. While the µarch-level stuck-at-1 model
shows that a larger fraction of faults are recoverable for the
usually lead to detectable symptoms (i.e., not masked). above latency than gate-level stuck-at faults, the recoverable
SDC: Similar to the overall coverage, the SDC rates (per- faults for µarch-level stuck-at-0 faults is lower.
centage of total injections that result in SDC events) are de- From these differences in system-level manifestations, we
pendent on the type of fault and the structure in which the infer that µarch-level stuck-at faults do not, in general, accu-
fault is injected. While the SDC rate is higher for gate- rately represent gate-level stuck-at or delay faults. This mo-
level faults than µarch-level faults in the ALU (1.8%–4.4% tivates either building more accurate µarch-level fault mod-
vs. 0%–0.5%, respectively) and the Decoder (0.4%–1.2% els, or in their absence, using the SWAT-Sim infrastructure
vs. 0.1%–0.2%, respectively), the SDC rates of the AGEN to study the system-level effect of gate-level faults.
faults are nearly identical (1.6% for 0.5-cycle delay faults 4.3 Differences Between Fault Models
and 0.5%–0.8% for others).
Before we attempt to derive a more accurate µarch-level
The SDC rates are high for the gate-level faults in the
fault model than the existing ones, we investigate the funda-
ALU because these faults are rarely activated and only per-
mental reasons for the different behaviors of the µarch-level
turb the data value slightly once activated. In contrast, the
and gate-level fault models. In the following sections, we try
µarch-level stuck-at faults are easily activated and less likely
to understand the differences by comparing the fault activa-
to cause SDCs.
tion rates and the data corruption patterns at the microarchi-
The above differences in manifestations are largely gov-
tectural state across different fault models.
erned by how the fault at the gate level becomes visible to the
microarchitecture (activation rate, which latch bits are cor- 4.3.1 Fault Activation Rates
rupted, etc.), as analyzed further in Section 4.3. The fault activation rate of a given faulty run is defined as the
percentage of instructions that get corrupted by the injected
4.2.2 Latency to Detection fault among all instructions that utilize the faulty unit. We
We next discuss how the µarch- and gate-level fault models collect the activation rates for all faulty runs that do not result
compare in terms of detection latency. in µarch-masked, calculate the weighted arithmetic mean for
Figure 3 gives the total number of instructions executed each fault model, and present these numbers in Figure 4. Be-
after the architectural state is corrupted, until the fault is de- cause the different runs execute different numbers of instruc-
tected, for each unit under each fault model. The detected tions, we weight the activation rate of each run by the total
faults are binned into different stacks of the bar based on their number of instructions executed by the faulty unit and calcu-
detection latencies (from 1,000 to 10 million instructions). late the weighted mean.
As mentioned previously (Section 3.3), the latency to de- Figure 4 shows that the µarch-level stuck-at faults present
tection has direct bearing on the recoverability of the applica- a higher activation rate than faults injected at the gate-level.
tion and the system. Hardware buffering, required for hard- For the ALU, the µarch-level faults have a >4% activa-
ware recovery, can buffer 1000s of instructions, but can be tion rate, while the activation rates of gate-level faults are
expanded through intelligent design to tolerate latencies of at most 1.6%. For the AGEN, the corresponding numbers
up to millions of instructions [19, 26]. are >9% and <7% respectively. The Decoder faults tend to
From Figure 3, we see that the percentage of detected have higher activation rates than faults in other structures be-
faults for which the software can be recovered using re- cause decoders are utilized more; the Decoder µarch-level
covery techniques that can tolerate short latencies of under faults have activation rates >19% while the rates of gate-
10K instructions is different under different fault models for level faults are <7%. The activation rate for gate-level faults
112
is lower because activating gate-level faults requires both ex- faults for the ALU, AGEN, and Decoder are quite differ-
citation and propagation to the output latch, while the µarch- ent from those of the gate-level faults. While µarch-level
level fault is directly injected into the latch. Additionally, the ALU and AGEN faults are injected in the output latches and
µarch-level stuck-at-1 fault has a significantly higher activa- corrupt at most one bit, the corresponding gate-level faults,
tion rate than the other fault models (36%, 45%, and 47% for though usually corrupt one bit, can result in multi-bit corrup-
the ALU, the AGEN, and the Decoder respectively). This tions (between 9% and 25% across the ALU and the AGEN).
high rate is caused by the biases in data values towards zero. However, for µarch-level Decoder faults, although faults are
Further, we notice a difference in the activation rates be- injected at the input latch, the resulting multi-bit corruptions
tween the gate-level stuck-at and delay faults, with the delay turn out to be too aggressive (22% of corruptions for µarch-
fault models exhibiting lower rates of activation for all struc- level faults are 8+ bits while the corresponding numbers for
tures. Less than 2% of instructions activate the 1-cycle delay gate-level faults are less than 15%). This is because the out-
faults and 0.5-cycle delay faults in all 3 structures, with the put cone of the input (output) latch of the faulty unit is too
lowest average activation rate being 0.6% for 0.5-cycle delay large (small) when compared to that of a gate-level fault and
faults in the AGEN. The lower average activation rate can be leads to aggressive (conservative) bit corruptions at the out-
explained with the different excitation conditions for the two put latch.
models. A stuck-at-X fault is excited when the signal at the To better understand how the microarchitectural state gets
faulty net is X. Thus, if the probability of having a logic 1 at corrupted by the injected faults, we collect the probability
the faulty net is p, the probability of exciting the stuck-at-0 that bit i was flipped, given an instruction activates the un-
fault at that wire is p and that of exciting the stuck-at-1 fault derlying fault. Figures 5(a) and (b) show the distribution
is (1-p). A delay fault, on the other hand, is active only if of the probabilities of a given bit in the output latch (num-
there is a transition at the faulty wire and hence the excita- bered from bit 0 to bit 63) to be faulty under µarch-level
tion probability is p(1-p), which is always smaller than that stuck-at-0, gate-level stuck-at-0, and gate-level 1-cycle delay
of the stuck-at faults. This lower probability of excitation models for the ALU and the AGEN respectively. For brevity,
generally results in a lower average activation rate for gate- we omit the µarch-level stuck-at-1, gate-level stuck-at-1, and
level delay faults. Further, while an activated 1-cycle delay 0.5-cycle delay models.
fault causes all paths from the faulty net to the output latch to From the figures, we see that the probabilities of bit-flips
miss timing, a 0.5-cycle delay fault usually results in fewer of the µarch-level model are vastly different from the gate-
errors observed at the output as it can be the case that some level models. Further, the probability of flipping lower order
paths from the faulty net to the output do not violate timing. bits is higher for µarch-level faults as the applications we
Although the higher activation rates (Figure 4) of µarch- use predominantly perform computations on the lower order
level stuck-at faults result in higher coverage (Figure 2) for 32-bits. The difference presented here is another source of
the ALU and Decoder, we do not find such a correlation for discrepancy of the µarch-level model to represent gate-level
the AGEN. When comparing gate-level faults of the same faults.
structures, stuck-at faults have higher activation rates and re- When comparing the two gate-level fault models, interest-
sult in slightly higher coverage than delay faults for the ALU ingly, both have very similar corruption patterns even though
and Decoder, but not for the AGEN. Nonetheless, higher ac- they differ in terms of coverage, detection latency, activa-
tivation rates do not necessarily drive the coverage up. Ad- tion rate, and number of bit-flips. To investigate this phe-
ditionally, we find no direct correlation between activation nomenon, we studied the differences between corruption pat-
rate and latency of detection. Thus, factors other than just terns of the gate-level stuck-at and delay fault injected at the
activation rate need to be investigated if we are to succeed same net and made the following observation: delay faults
in deriving better µarch-level fault models. We next look at generally yield more corruption patterns than the stuck-at-0
how activated faults manifest at the output latches (i.e., at the faults because they can cause the same bit to be corrupted
µarch-level). in both directions, instead of a single direction in stuck-at-0
faults. While this higher number of corruption patterns may
4.3.2 Corruption Pattern at the Microarchitectural cause delay faults easier to be detected, we note that the av-
State erage activation rate of delay faults is also lower than that of
While an activated µarch-level fault corrupts only one bit in stuck-at faults, as explained in Section 4.3.1, making them
the microarchitectural state, an activated gate-level fault may harder to detect and causing longer detection latencies.
corrupt multiple bits once it becomes visible in the microar- Overall, our analysis shows that the different activation
chitectural state. rates and bit corruption patterns paint a clearer picture in ex-
Table 3 shows the number of bits corrupted at the output plaining the differences in the coverage (Figure 2) and the
latch (microarchitectural state) for different fault models for detection latencies (Figure 3) between µarch-level and gate-
a fault in the ALU, the AGEN, and the Decoder. For each level faults. We found that higher activation rates of µarch-
fault model, it shows the percentage of instructions that have level stuck-at-1 faults typically cause higher coverage (and
different number of bits flipped at the output latch. The bits lower detection latencies) than gate-level faults, but it is not
are binned on a log scale. a perfect correlation. In some cases, despite significant dif-
Table 3 shows that the corruption patterns of µarch-level ferences in activation rates, the coverage of gate-level and
113
ALU AGEN Decoder
Bits 1 2 4 8 9+ 1 2 4 8 9+ 1 2 4 8 9+
µarch 100.0% 0.0% 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0% 72.5% 0.2% 4.8% 8.9% 13.4%
Gate s@1 91.1% 4.7% 1.2% 1.1% 1.9% 87.1% 6.8% 5.0% 1.0% 0.1% 66.1% 14.9% 10.5% 6.2% 2.3%
Gate s@0 84.4% 4.6% 2.8% 1.1% 7.1% 75.5% 8.4% 8.6% 7.4% 0.0% 60.8% 22.3% 12.2% 2.6% 2.2%
Delay 1cyc 90.4% 3.9% 1.4% 1.1% 3.2% 90.5% 4.1% 3.7% 1.5% 0.2% 71.7% 11.1% 12.5% 1.7% 2.9%
Delay 0.5cyc 75.0% 5.8% 2.2% 3.9% 13.1% 83.7% 7.9% 3.1% 2.4% 2.8% 68.2% 12.8% 4.3% 2.7% 12.0%

Table 3. Percentage of bits incorrect at the output latch.

18% µarch s@0 18% µarch s@0

Gate s@0 Gate s@0


16% 16%
Delay 1cyc Delay 1cyc
14% 14%

12% 12%

Probability of Flip
Probability of Flip

10% 10%

8% 8%

6% 6%

4% 4%

2% 2%

0% 0%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64
Corrupted Bit in Output Latch Corrupted Bit in Output Latch
(a) ALU (b) AGEN
Figure 5. Probability of corrupting each bit of the output latch, under µarch-level s@0, gate level s@0, and gate level delay models.

µarch-level faults is quite close. This is because once ac- abilistic model for each injected gate-level fault. In partic-
tivated, gate-level faults cause different multi-bit corruption ular, we profile each SWAT-Sim run and collect the proba-
patterns. In some cases, these patterns are more intrusive bilities of the number of bits flipped at the output latch, the
than the µarch-level fault corruptions, boosting the coverage patterns of the flips, and the directions of the flips. Based on
of the gate-level faults despite their lower rate of activation. the collected information, we then derive two probabilistic
In other cases, the higher intrusiveness of the multi-bit cor- µarch-level fault models, called the P-model, and the PD-
ruptions is not enough to compensate for the very low acti- model, respectively.
vation rates – this is specifically the case for gate-level delay In the P-model, when an instruction uses the faulty unit,
faults which see the lowest coverage numbers. we decide on which bits to flip in the output latch based on
We see that such complex interactions have a push-and- the previously observed probabilities of the different number
pull effect in determining the system-level outcome of faults of bit-flips for this gate-level fault injection run (essentially
and conclude that simple µarch-level stuck-at faults are inac- using a table like Table 3, but built on a per-run basis). We
curate for modeling gate-level faults because they fail to (1) then condition on this probability to decide on the pattern of
capture the system-level behavior, such as application-level the flip (similar to Figure 5 for different numbers of bit flips,
masking, (2) induce different activation rates, and (3) ac- but again on a per-run basis). All the bits indicated by this
curately model µarch-level multiple bit corruption patterns. pattern are then flipped.
Therefore, any accurate µarch-level fault model for gate-
level faults must account for all these factors to accurately The PD-model refines the P-model by enforcing the di-
capture their behavior. rection of the bit-flips based on the profiling runs. That is, if
the observed corruption pattern in the profiling run shows bit
4.4 Probabilistic µarch-Level Fault Models 3 of the output latch has a 1-to-0 (0-to-1) corruption, in the
Given the inaccuracy of the µarch-level stuck-at fault PD-model, this bit is corrupted only if it is a 1 (0).
model, we investigate whether we can derive alternate µarch- We developed the P-model and the PD-model for both the
level fault models based on our analysis of the manifestation gate-level stuck-at-0 and 1-cycle delay faults for the ALU
of the gate-level faults (both stuck-at and delay) at the mi- and the AGEN. Figure 6 shows the ability of the P-model
croarchitecture. Such a model would be invaluable for ac- and the PD-model in mimicking the behavior of the corre-
curately simulating the effect of the fault at the µarch-level, sponding gate-level fault models, evaluated using the cover-
without invoking a gate-level simulator. age (similar to Figure 2). The number on top of each bar
We investigated the behavior of the gate-level stuck-at and gives the coverage of the SWAT detectors for faults injected
delay faults and found that each gate-level fault is activated in that fault model. The results for gate-level stuck-at-1 and
differently and leads to different software-level outcomes. 0.5-cycle delay faults are not shown for the sake of clarity of
Hence, in our first-cut µarch-level fault model, we develop the figures, and lead to similar conclusions as the other fault
probabilistic models on a per-run basis, i.e., a different prob- models.
114
100 98 94 99 98 89 99 98 97 94 96 85 86 95 98 97
100% 100%
90% 90%
Percentage of Injection

Percentage of Injection
80% 80%
70% 70%
60% 60%
50% 50%
40% 40%
30% 30%
20% 20%
10% 10%
0% 0%

P-Model

PD-Model

P-Model

PD-Model

P-Model

PD-Model

P-Model

PD-Model
SWAT-Sim

SWAT-Sim

SWAT-Sim

SWAT-Sim
s@1

s@0

s@1

s@0
µarch Gate s@0 Delay 1cyc µarch Gate s@0 Delay 1cyc
µarch-Mask Arch-Mask App-Mask µarch-Mask Arch-Mask App-Mask
Detected Detected>10M SDC Detected Detected>10M SDC

(a) ALU (b) AGEN


Figure 6. The accuracy of the derived P- and PD-models for modeling gate level faults in (a) ALU and (b) AGEN, evaluated using
the coverage of the SWAT detectors. The models closely mimic the masking outcomes (µarch-Mask+Arch-Mask) of the gate-level
faults but do not, in general, accurately model their system level effects, resulting in differences in coverage.

From the figures, we see that both the P-model and the the stated limitations of the P- and PD- models, an accu-
PD-model follow the µarch-level masking effects of the rate unified µarch-level model for the gate-level faults may
gate-level faults more closely than the µarch-level stuck-at be realizable. Nonetheless, until such a model is developed,
faults. Nevertheless, the P- and PD-models for both gate- SWAT-Sim remains an efficient platform for simulating and
level stuck-at-0 and 1-cycle delay ALU faults are unable to observing the system-level effects of gate-level faults.
capture the application-level masking effect while the two
models for gate-level stuck-at-0 AGEN faults over-estimate
5 Conclusions
the µarch-level masking effect. With several µarch-level fault tolerance proposals emerg-
In terms of coverage, the P- and PD-models do reasonably ing, models that accurately depict the µarch-level effect of
well for gate-level ALU stuck-at-0 fault and AGEN 1-cycle gate-level faults become increasingly important.
delay fault with differences less than 5%. However, for the This paper proposes SWAT-Sim, a hierarchical simulator
other fault models, the P- and PD- models have 9+% differ- that models only the faulty unit at gate-level accuracy with
ences in coverage. the rest of the system modeled at the µarch level. The fast
In spite of extensive analysis and modeling, the proba- and accurate nature of SWAT-Sim makes it possible to ob-
bilistic models do not accurately capture the µarch-level be- serve the system-level effects of the gate-level fault models.
havior of gate-level faults due to the following reasons. Using SWAT-Sim, we evaluate the differences between the
manifestations of µarch-level and gate-level faults at the sys-
• The models are oblivious to temporal variation in the tem level. We found the simple µarch-level stuck-at fault
corruption rates, i.e., both the models use the probabil- models to be, in general, inaccurate for capturing the system-
ities of injecting k-bit flips as an average rate across all level effects of gate-level faults. Based on an analysis of
instructions for injections on a given wire. the causes for these differences, we derive two probabilistic
• The probabilities on which the models pick the number µarch-level fault models for gate-level faults. However, the
of bits to flip, the pattern of the bit-flips, and the direc- models fail to capture the complex manifestations of gate-
tion of the bit flips are not conditioned on the fault-free level faults, resulting in inaccuracies.
value on which the patterns are applied. For example, The inaccuracies in the existing µarch-level stuck-at fault
although the pattern says that bit 1 should be flipped models and the absence of more accurate models motivate
from a 1 to a 0, if the original value of the bit is 0, no using simulators like SWAT-Sim to accurately model the
flips occur. Thus, there are fewer flips than what the µarch-level effect of gate-level permanent faults.
model expects, which skews the probabilities. Acknowledgments
• The profiling runs consider the output value but over- We would like to thank Pradip Bose from IBM and Sub-
look the input value that activates the fault in the circuit hasish Mitra from Stanford University for initial discussions
and produces the corrupted output. on this work, Tong Qi for an initial version of the decoder
module, and Ting Dong for help with statistical analysis.
As previously discussed, we derive a different model for
each faulty run in SWAT-Sim that simulates a different fault References
in the gate-level circuit. However, for an abstract evalua-
tion and accurate prediction, a unifying model that general- [1] T. Austin et al. Opportunities and Challenges for Better than
izes the proposed per-run models must be built. Based on Worst-Case Design. In ASP-DAC ’05: Proceedings of the
115
2005 conference on Asia South Pacific design automation, [16] B. C. Paul et al. Temporal Performance Degradation Un-
pages 2–7, New York, NY, USA, 2005. ACM. der NBTI: Estimation and Design for Improved Reliability
[2] T. M. Austin. DIVA: A Reliable Substrate for Deep Submi- of Nanoscale Circuits. In DATE, 2006.
cron Microarchitecture Design. In Proceedings of Interna- [17] A. Pellegrini et al. CrashTest: A Fast High-Fidelity FPGA-
tional Symposium on Microarchitecture, 1998. Based Resiliency Analysis Framework. In International Con-
[3] J. Blome et al. Self-Calibrating Online Wearout Detection. ference on Computer Design, 2008.
In Proceedings of International Symposium on Microarchi- [18] M. Pirvu, L. Bhuyan, and R. Mahapatra. Hierarchical Simu-
tecture, 2007. lation of a Multiprocessor Architecture. 2000.
[4] S. Borkar. Microarchitecture and Design Challenges for Gi- [19] M. Prvulovic et al. ReVive: Cost-Effective Architecture Sup-
gascale Integration. In Proceedings of International Sympo- port for Rollback Recovery in Shared-Memory Multiproces-
sium on Microarchitecture, 2005. Keynote Address. sors. In Proceedings of International Symposium on Com-
[5] F. A. Bower, D. Sorin, and S. Ozev. Online Diagnosis of puter Architecture, 2002.
Hard Faults in Microprocessors. ACM Transactions on Ar- [20] R. Raghuraman. Simulation Requirements For Vectors in
chitecture and Code Optimization, 4(2), 2007. ATE Formats. In Proceedings of International Test Confer-
[6] M. Bushnell and V. Agarwal. Essentials of Electronic Test- ence, 2004.
ing for Digital, Memory, and Mixed-Signal VLSI Circuits. [21] P. Ramachandran et al. Statistical Fault Injection. In Proceed-
Springer, 2000. ings of International Conference on Dependable Systems and
[7] H. Cha et al. A Gate-Level Simulation Environment for Networks, 2008.
Alpha-Particle-Induced Transient Faults. IEEE Transactions [22] S. Sahoo et al. Using Likely Program Invariants to Detect
on Computers, 45(11), 1996. Hardware Errors. In Proceedings of International Conference
[8] K. Constantinides et al. Software-Based On-Line Detection on Dependable Systems and Networks, 2008.
of Hardware Defects: Mechanisms, Architectural Support, [23] J. H. Saltzer et al. End-to-End Arguments in System Design.
and Evaluation. In Proceedings of International Symposium ACM Trans. on Comp. Systems, 2(4), 1984.
on Microarchitecture, 2007. [24] S. Sarangi et al. A Model for Timing Errors in Processors with
[9] C. Dawson, S. Pattanam, and D. Roberts. The Verilog Pro- Parameter Variation. In International Symposium on Quality
cedural Interface for the Verilog Hardware Description Lan- Electronic Design, 2007.
guage. In Verilog HDL Conference, 1996. [25] S. Shyam et al. Ultra Low-Cost Defect Protection for Micro-
[10] Z. Kalbarczyk et al. Hierarchical Simulation Approach to Ac- processor Pipelines. In Proceedings of International Confer-
curate Fault Modeling for System Dependability Evaluation. ence on Architectural Support for Programming Languages
IEEE Transactions on Software Engineering, 25(5), 1999. and Operating Systems, 2006.
[11] G. Kanawati et al. FERRARI: A Flexible Software-Based [26] D. Sorin et al. SafetyNet: Improving the Availability
Fault and Error Injection System. IEEE Computer, 44(2), of Shared Memory Multiprocessors with Global Check-
1995. point/Recovery. In Proceedings of International Symposium
[12] M. Li et al. Understanding the Propagation of Hard Errors on Computer Architecture, 2002.
to Software and Implications for Resilient Systems Design. [27] J. Srinivasan et al. Exploiting Structural Duplication for Life-
In Proceedings of International Conference on Architectural time Reliability Enhancement. In Proceedings of Interna-
Support for Programming Languages and Operating Systems, tional Symposium on Computer Architecture, 2005.
2008. [28] Sun. OpenSPARC T1 Processor. Website, 2007. http:
[13] M. Martin et al. Multifacet’s General Execution-Driven Mul- //www.opensparc.net/.
tiprocessor Simulator (GEMS) Toolset. Computer Architec- [29] Virtutech. Simics Full System Simulator. Website, 2006.
ture Newsletters, 33(4), 2005. http://www.simics.net.
[14] A. Meixner, M. Bauer, and D. Sorin. Argus: Low-Cost, Com- [30] N. Wang and S. Patel. ReStore: Symptom-Based Soft Er-
prehensive Error Detection in Simple Cores. In Proceedings ror Detection in Microprocessors. IEEE Transactions on De-
of International Symposium on Microarchitecture, 2007. pendable and Secure Computing, 3(3), July-Sept 2006.
[15] S. Mirkhani, M. Lavasani, and Z. Navabi. Hierarchical Fault [31] D. Yen. Chip Multithreading Processors Enable Reliable
Simulation Using Behavioral and Gate Level Hardware Mod- High Throughput Computing. In Proceedings of Interna-
els. In 11th Asian Test Symposium, 2002. tional Reliability Physics Symposium, 2005. Keynote Ad-
dress.

116

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy