Debugging Low Test
Debugging Low Test
Scan is a structured test approach in which the overall function of an integrated circuit (IC) is broken into
smaller structures and tested individually. Every state element (D flip-flop or latch) is replaced with a scan
cell that operates as an equivalent state element and is concatenated into long shift registers called “scan
chains” in scan mode. All the internal state elements can be converted into controllable and observable
logic. This greatly simplifies the complexity of testing an IC by testing small combinational logic segments
between scan cells. Automatic test pattern generation (ATPG) tools take advantage of scan to produce high-
The combination of scan and ATPG tools has been shown to successfully detect the vast majority of
manufacturing defects. When you use an ATPG tool, your goal should be to achieve the highest coverage of
defects as possible. Because high test coverage directly correlates to the quality of the parts shipped, many
companies demand that the coverage for single stuck-at faults be at least 99% and transition delay faults be
at least 90%.
When the coverage report falls short of these goals, your task is to figure out why the coverage is not high
enough and perform corrective actions where possible. Debugging low defect coverage historically requires
a significant amount of manual technique and intimate knowledge of the ATPG tool, as well as design
Automating more of the debug process during ATPG greatly simplifies this effort. I have seen some cases in
which automation saved hours, even days, of manual debugging effort and other cases in which the tool
provided answers when no feasible, manual technique was possible. Before exploring why you might be
getting low coverage and why further automation is needed, I’ll explain how ATPG tools in general
The ATPG tool generates a “statistics report” that tells you what the tool has done and provides the fault
category information that you have to interpret to debug coverage problems. If you’re an expert at using an
ATPG tool, you’ll probably have little problem understanding the fault categories listed in the statistics
report. But if you’re not a design-for-test expert, this data may as well be written in hieroglyphics (Fig. 1).
Although the statistics report contains a lot of information, it can be difficult to interpret and rarely gives
enough useful information to determine the reasons for low coverage, even for an ATPG expert.
When debugging low coverage, you’ll need to understand some of the basic fault categories that are listed
in most typical ATPG statistics reports. The first and broadest category is what is sometimes referred to as
the “fault universe.” This is the total number of faults in a design. For example, when dealing with single
stuck-at faults, you have two faults for each instance/pin, stuck_at logic 1 and stuck_at logic 0, where the
instance is the full hierarchical path name to a library cell instantiated in the design netlist.
This number of total faults really is only important when comparing different ATPG tools against each
other. The total number can vary if “internal” faulting is turned on and whether or not “collapsed” faults
are used. Internal faulting extends the fault site down to the ATPG-model level, rather than limiting it to the
library-cell level. ATPG tools, for efficiency purposes, are designed to collapse equivalent faults whenever
possible. Typically, you’ll want to have the internal faults setting turned off and uncollapsed faults setting
turned on. These settings most closely match the faults represented in the design netlist.
Faults that cannot possibly be tested are reported as untestable or undetectable. This includes faults that
are typically referred to as unused, tied, blocked, and redundant. For example, a tied fault is one in which
the designer has purposely tied a pin to logic high or logic low. If a stuck-at-1 defect were to occur on a pin
that is tied high, you could not test for it because that would require the tool to be able to toggle the pin to
logic low. This cannot be done because of the design restriction, so the fault is categorized as “untestable.”
Untestable/undetectable faults are significant for two reasons. First, they distinguish “fault coverage” from
“test coverage,” both of which are reported by ATPG tools. When most tools calculate coverage, fault
Test coverage subtracts the untestable/undetectable faults from the total number of faults when calculating
coverage. For this reason, the reported number for test coverage is typically higher than fault coverage.
The second reason that untestable/undetectable faults are important is that nothing can be done to improve
the coverage of these faults; therefore, you should direct your debugging efforts elsewhere.
One last thing to be aware of regarding untestable/undetectable faults is that ATPG-tool vendors vary in
how they categorize these faults. These differences can result in coverage discrepancies when comparing
This begs a question as to which is the more critical figure: test coverage or fault coverage? Most
engineers, but not all, rely on the higher test coverage number. The justification for ignoring
untestable/undetectable faults is that any defect that occurs at one of those fault locations will not cause
the device to functionally fail. For example, if a stuck-at 1 defect occurred on a pin that is tied high by
design, the part will not fail in functional operation. Others would argue that fault coverage is more
important because any defect, even an untestable defect, is significant because it represents a problem in
the manufacturing of the device. That debate won’t be explored here though.
Some faults are testable, meaning that a defect at these fault sites would result in a functional failure.
Unfortunately, ATPG tools cannot produce patterns to detect all of the testable faults. These testable but
Of all the fault categories listed in an ATPG statistics report, AU is the most significant category that
negatively affects test coverage and fault coverage. Determining the reasons why ATPG is unable to
produce a pattern to detect these faults and coming up with a strategy to improve the coverage is the
Here are some of the most common reasons why faults may be ATPG_untestable:
Pin constraints: At least one input signal (usually more than one) is required to be constrained to a constant
value to enable test mode. While this constraint makes testing possible, it also results in blocking the
propagation of some faults because the logic is held in a constant state. Unless you have special knowledge
to the contrary, these pin constraints must be adhered to, which means you cannot recover this coverage
loss.
Determining the effect on coverage loss is not as simple as counting the number of constrained faults on
the net. The effect on defect coverage also extends to all the logic gates that have an input tied and
whatever upstream faults are blocked by that constraint. Faults downstream from the tied logic have
Black-box models: When an ATPG model is not available for a module, a library cell, or more commonly a
memory, ATPG tools treat them as “black boxes,” which propagates a fixed value (often an “X” or unknown
value). Faults in the “shadow” of these black boxes (i.e., faults whose control and observation are affected
by their proximity to the black box), will not be detected. This includes faults in the logic cone driving each
black-box input as well as the logic cones driven by the outputs. Obtaining an exact number of undetected
faults is complicated by the fact that some of those faults may also be in other overlapping cones
that are detected. The solution is to ensure that everything is modeled in the design.
Random access memory: In the absence of either bypass logic or the ability to write/read through RAMs,
faults in the shadow of the RAM may be undetected. Similar to black-box faults, it is difficult determine
exactly which faults are not detected because of potentially overlapping cones of logic.
If you make design changes, adding bypass logic may address this problem. Some ATPG tools are capable
of special “RAM-sequential” patterns that can propagate faults through memories so long as the applicable
design rule checks (DRCs) are satisfied. This may be an option to get around having to modify the design to
improve coverage.
Cell constraints: Sometimes you need to constrain scan cells with regard to what values they are capable of
loading and capturing (usually for timing-related reasons). These constraints imposed on the ATPG tool will
prevent some faults from being detected. If the cell constraint is one that limits capturing, then to
determine the effect, you’ll need to look at the cone of logic that drives the scan cell and sift out faults that
makes cell constraints unnecessary. However, this type of timing problem is often found too late in the
design cycle to be changed. Using cell constraints is a bandage approach to getting patterns to pass, and
ATPG constraints: You may impose additional constraints on the ATPG tool to ensure that certain areas of
the design are held in a desired state. For example, let’s say you need to hold an internal bus driving in one
direction. As with all types of constraints, parts of the design will be prevented from toggling, which limits
test coverage. Similar to pin constraints, if the assumption is that these are necessary for the test to work,
False/multicycle paths: Some limitations to test coverage are specific to at-speed testing. False paths
cannot be tested at functional frequencies; therefore, ATPG must be prevented from doing so to avoid
failures on the automatic test equipment. Because transition-delay fault (TDF) patterns use only one at-
speed cycle to propagate faults, multicycle paths (which require more than one cycle) must also be masked
out. Determining which faults are not detected in false paths is complicated by the manner in which false
Delay-constraint files usually specify a path by designating “-from”, “-to” and possibly “-through” to
describe a start and end point of the path. In between those points, there can be a significant amount of
logic to trace and potentially multiple paths if you don’t use “-through” to specify the exact path.
How you identify which coverage issues (as described above) exist,
How you determine the effect each issue has on the coverage, and
Typically, we have had to rely on a significant amount of design experience as well as ATPG tool proficiency
to manually determine and quantify the effects of design characteristics or ATPG settings that limit
coverage. The usual steps that are required to manually debug fault coverage are:
extremely difficult, if not impossible, to identify a single problem by looking at a list of AU faults. You have
to recognize trends in either the text listing of faults or graphical view of faults relative to the design
hierarchy. For example, a long list of faults that are obviously contained in the design hierarchy of the
At some point, you’ll need to focus your analysis efforts on one fault at a time, so pick one you think might
represent a larger group of faults. You might zero in on design elements like registers or memories, but this
is usually based more on intuition than anything else. ATPG tools have different reporting capabilities that
can be used to report on the inherent controllability and observability of a fault location, which can help but
often provide limited information. Interpreting the reports at this level requires an in-depth knowledge of
the ATPG tool’s capabilities and a fair amount of instinct regarding where to focus efforts.
As is often the case, your success with debugging relies on having been through the process and identifying
similar situations. For example, if a significant number of boundary scan faults are listed as AU, this may be
an indication that the boundary-scan logic has been initialized to a certain desired state and must be held
in that state to operate properly. Making connections like this between the trends you identify in the list of
AU faults to what you know about designs and design practices in general requires a fair amount of
experience.
Once an issue is identified, how you determine its significance will be different depending on the issue. As
previously described, you often need to keep track of backward and forward cones of logic fanning out from
a single constrained point to determine the potential group of affected faults. From there, you also need to
evaluate each of those potential faults to assess if it is possibly observed in another overlapping cone of
logic.
Some other possible techniques can approximate the effect of some issues. For pin constraints, it may be
possible to have the tool temporarily treat them like a tied-untestable fault so that coverage can be
recalculated and compared to the original coverage number. Whole design modules can be no-faulted (for
example, memory built-in self-test \\[MBIST\\] logic) to see the difference in coverage.
All of these approaches require a combination of special scripts to trace logic paths backward and/or
forward, multiple runs of the ATPG tool with different settings, and a high level of tool expertise. Even then,
Recently, ATPG tools have been improved to automatically identify issues that affect test coverage and
quantify just how much each issue affects the coverage. The most common method to display this
information is through a modified version of the traditional statistics report that you can access in the
command line mode of ATPG tools. Mentor Graphics’ ATPG tools FastScan and TestKompress are used as
an example here to demonstrate what’s available for automated analysis of low test coverage.
Without any additional ATPG tool runs or any of the manual debug steps, the new statistics report
automatically provides details about coverage issues (Fig. 2). Note the list of the total number of
uncollapsed faults in the design, which is then broken down into various ATPG categories (Fig. 2, arrow
#1). The percentage listed within the parentheses is based on the total number of faults.
The next important area of the report is the test coverage achieved by the patterns generated (Fig. 2, arrow
#2). In this case, the coverage is 83.67%, which may not be acceptable. If that test coverage is
unacceptable, the next place to look is the line in the statistics report that indicates the number of
atpg_untestable or AU (Fig. 2, arrow #3). This line points out that 57,563 faults (or 14.56% of the total
Up to this point, the information is very typical of what you would find in a traditional report. Moving down
to the “Untested Faults” section (Fig. 2, arrow #4), you can now get a detailed breakdown of which AU
categories have a significant effect on test-coverage loss. The first most significant category of test
coverage loss is TC or tied cells (Fig. 2, arrow #5). This category of AU faults accounts for 4.46% of the
total number of faults. In this case, “tied cells” refers to registers that are tied to a particular state as a
result of the ATPG tool having performed DRCs and simulating an initialization or “test_setup” procedure.
The report also lists the most significant individual tied cells (as well as the state to which they are tied), so
that you may evaluate the severity of effect on test coverage at a fine level of detail. A quick review of the
instance path names of these tied cells suggests that it’s all test-related logic (boundary scan and
MBIST).Although you must still perform additional manual analysis to determine if this category of AU
faults can be reduced, this report gives a clear indication of where to look in the design. If it is determined
that nothing can be corrected because the test mode requires this logic to be tied, then at least you will be
The next significant category of AU faults is FP or “false_path” faults (Fig. 2, arrow #6). This transition-
fault pattern set includes a definition of false paths so the coverage will be lower. From this report, you can
see that 5.37% of the faults cannot be tested because of the false path definitions. Many test engineers
believe that test coverage should not be penalized as a result of false paths because they are functionally
A relatively significant number of multicycle-path faults (1.01%) hurt the test coverage (Fig. 2, arrow #7).
Given this information, you may choose to address these faults by targeting them with another pattern set
using a clock cycle that will exercise them at a lower frequency. There is no guarantee that all of these
faults will be detected at a different frequency because other issues may prevent detection. What the report
tells you is that these definitely can not be tested because of the reason listed. This is true for all the
categories.
The SEQ (sequential_depth) category (Fig. 2, arrow #8) refers to faults that cannot be detected because
the sequential depth of the ATPG tool has not been set high enough. This implies that there may be some
non-scan logic or memories that require an increased sequential depth to propagate and detect faults. You
can affect this number by changing some of the settings during pattern generation.
Right after the SEQ category is another category called “Unclassified.” This is a group of faults that does
not fall into any of the pre-defined categories that the ATPG tool can determine. They are faults that
traditional statistics reports would normally indicate as AU—there’s just no additional detailed analysis
available to determine why they are AU. These faults will require manual analysis.
I previously mentioned that many test engineers do not believe false path faults should be included in the
calculation of test coverage while others do. To satisfy these differing requirements, a new column of
information called “total relevant” has been added to the statistics report (Fig. 2, arrow #9).
Faults that were not considered relevant were deleted, which resulted in the lower number of total faults
(374,238) as compared to the total number of faults in the neighboring column (395,480). How can you tell
which faults were detected from the relevant coverage calculation? If you trace down the “Total Relevant”
column of information, you will eventually see the word “deleted” corresponding to the false-path category.
This means that the 21,242 false-path faults were deleted from the total relevant faults, and the coverage
was recalculated. The relevant coverage was 88.46% as compared to 83.67% (Fig. 2, arrow #10). You can
see both coverage numbers side by side and determine which one should be used.
Another way to slice the coverage information is to view it with respect to the clock domains (Fig. 2, arrow
#11). The next column to the right indicates what percentage of the total number of faults is covered by
that clock domain (e.g., 58.71% of the faults in the design are in the clk1 clock domain).
The next column indicates the test coverage of that clock domain’s fault population. In this case, 94.88% of
the clk1 faults were detected. The point in listing both the percentage of total faults and percentage
coverage of each clock domain is so that you can investigate low coverage for clock domains that represent
a significant percentage of the design. Additional reporting capability is available so that a detailed analysis
of the AU faults can be shown for the fault universe of each individual clock domain.
Some tools provide more graphical means of viewing this information relative to design hierarchies as well
as the design’s clock domains. In addition to the traditional statistics report viewed on the tool’s command
line, you can look at the coverage analysis graphically. An example shows how the AU analysis categories
can be displayed relative to the design hierarchy (Fig. 3, top left panel). The bottom panel displays the
same statistics report as shown on the command line, but design instances are hyperlinked so that you can
bring up the schematic view of that instance (Fig. 3, top right panel). You can also overlay the fault
category information on the schematic view. The example shown here is the same one discussed earlier in
which boundary-scan logic is tied because of the initialization procedure, which resulted in a loss of 0.24%
test coverage.
The additional information provided in detailed statistics reports like this provides valuable insight into how
to identify and address potential test coverage issues. Debug automation in an ATPG tool means that the
most significant test-coverage issues are quickly highlighted along with the effect on coverage. In many
cases (such as pin constraints and tied cells), you will be able to immediately determine that nothing can be
done to fix the issue and you can easily determine what the test-coverage ceiling will be.
Further automation within the ATPG tool eliminates significant manual effort and debug time required to
sift through an otherwise nonsensical listing of untestable faults. As a result, you are freed to focus on the