Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Structural delay testing of latch-based high-speed circuits with time borrowing
(USC Thesis Other)
Structural delay testing of latch-based high-speed circuits with time borrowing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
STRUCTURAL DELAY TESTING OF LATCH-BASED HIGH-SPEED
CIRCUITS WITH TIME BORROWING
by
Kun Young Chung
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2008
Copyright 2008 Kun Young Chung
ii
Table of Contents
List of Tables.........................................................................................................................................iv
List of Figures ........................................................................................................................................v
Abstract .................................................................................................................................................vi
CHAPTER 1. Introduction .....................................................................................................................1
1.1. Operation of latch-based circuits ................................................................................................2
1.2. Reference times and nominal delays...........................................................................................3
1.3. Types of time borrowing and their advantages...........................................................................3
1.4. Design choice between flip-flop-based and latch-based circuits................................................4
1.5. Testing of flip-flop-based versus latch-based circuits ................................................................5
CHAPTER 2. Background – Challenges in delay testing of latch-based circuits ..................................9
2.1. Key challenges in delay testing of latch-based circuits with time borrowing.............................9
2.2. Benefits of scan design-for-testability ......................................................................................10
2.3. The overall optimization problem.............................................................................................11
CHAPTER 3. A new delay testing approach .......................................................................................13
3.1. Basic assumptions.....................................................................................................................13
3.2. Key ideas behind the proposed approach .................................................................................14
3.2.1. r-r test: A set of sufficient conditions on block delays......................................................14
3.2.2. r-f test: A set of necessary conditions on block delays .....................................................15
3.2.3. Time borrowing detection.................................................................................................16
3.2.4. Basic delay testing strategies ............................................................................................17
3.3. Test of the first logic block (SCUT
0
) ........................................................................................18
3.4. Configuring on-path latches .....................................................................................................19
3.4.1. On-path latch with time borrowing...................................................................................20
3.4.2. On-path latch with no time borrowing..............................................................................20
3.5. Configuring off-path latches.....................................................................................................22
3.6. A set of scan chain configurations necessary to maximize coverage .......................................27
3.7. Required tests for maximum coverage .....................................................................................28
3.8. Test generation under the optimal set of scan chain configurations .........................................29
3.8.1. Theoretical maximum delay fault coverage for latch-based circuits.................................29
3.8.2. Proposed test generation approach....................................................................................32
3.9. Test generation under limited scan chain configurations..........................................................33
3.9.1. Available scan chain configurations .................................................................................33
3.9.2. Test generation strategy ....................................................................................................34
3.9.3. Proposed test generation approach....................................................................................36
3.10. Experimental results and comparison .....................................................................................37
3.10.1. Test generation approaches.............................................................................................38
3.10.2. Trends in the experimental results ..................................................................................39
CHAPTER 4. Test application cost minimization under maximum coverage .....................................42
4.1. Motivation example ..................................................................................................................42
4.1.1. Test schedule 1..................................................................................................................46
4.1.2. Test schedule 2..................................................................................................................47
4.1.3. Test schedule 3..................................................................................................................48
4.1.4. The overall optimization problem.....................................................................................49
4.2. Unique characteristics of the optimization problem .................................................................50
iii
4.2.1. Meaning of test results – Dependencies among SPUTs....................................................50
4.2.2. Benefits of r-r tests and r-f tests........................................................................................52
4.3. Framework for test scheduling to minimize test application cost.............................................55
4.3.1. An SPUT-based approach.................................................................................................55
4.3.2. Search space......................................................................................................................56
4.4. Building a search tree ...............................................................................................................58
4.4.1. Reduction rules .................................................................................................................58
4.4.2. Covering test sequences in a test schedule........................................................................60
4.5. A deterministic optimization approach.....................................................................................61
4.6. The complexity of the optimization problem............................................................................64
4.7. Proposed heuristic approaches..................................................................................................65
4.7.1. Key ideas and overview of the proposed heuristics..........................................................65
4.7.2. Heuristic 1 (H1): Relative benefit function.......................................................................67
4.7.3. Heuristic 2 (H2): Near-lower-bound function ..................................................................70
4.7.4. Experimental results..........................................................................................................72
4.7.5. Analysis of the heuristic approach ....................................................................................76
CHAPTER 5. Flip-flop-based v.s. latch-based designs........................................................................79
5.1. Flip-flop-based counterpart of latch-based circuit....................................................................79
5.2. Performance comparison ..........................................................................................................81
5.3. Yield comparison......................................................................................................................85
5.4. Delay fault coverage comparison .............................................................................................89
5.5. A summary of comparison results ............................................................................................89
CHAPTER 6. Future research tasks .....................................................................................................91
6.1. The overall test optimization ....................................................................................................91
6.1.1. Realistic test application cost............................................................................................92
6.1.2. Chip personality distribution.............................................................................................93
6.1.3. Inclusion of r-f tests ..........................................................................................................94
6.2. Other delay testing approaches.................................................................................................94
6.3. Scan DFT design and control ...................................................................................................95
CHAPTER 7. Conclusion.....................................................................................................................97
References ..........................................................................................................................................101
Appendix: Chip personality distribution from statistical timing information.....................................104
iv
List of Tables
Table 1. Five-stage pipeline of array multiplier. ..................................................................................40
Table 2. The characteristics of chip instances under test based on the results for r-r tests...................45
Table 3. The implication of test results for a target SPUT from L
in
to L
out
...........................................51
Table 4. A summary of benefit of r-r tests............................................................................................54
Table 5. A summary of benefit of r-f tests............................................................................................54
Table 6. A chip personality distribution for Figure 17. ........................................................................69
Table 7. Relative benefit function example..........................................................................................69
Table 8. Design of chip personality distribution. .................................................................................73
Table 9. Test application cost comparisons of proposed approaches...................................................74
Table 10. Contributions to overhead of H1 and H2. ............................................................................75
v
List of Figures
Figure 1. An example latch-based linear pipeline. .................................................................................1
Figure 2. Additional components for scan flip-flop and scan latch........................................................6
Figure 3. A three-stage linear pipeline. ................................................................................................18
Figure 4. A two-stage linear pipeline. ..................................................................................................19
Figure 5. Relationships among scan chain configurations. ..................................................................25
Figure 6. Property 2 helps improve robust coverage............................................................................26
Figure 7. A robust test for a multi-segment path can be obtained by combining the robust tests
of its single-segment subpaths................................................................................................31
Figure 8. Proposed ATPG algorithm....................................................................................................35
Figure 9. Test procedure – managing multiple SCUTs.........................................................................37
Figure 10. A two-stage pipeline example.............................................................................................44
Figure 11. Test schedule 1: Average cost = 23.84................................................................................46
Figure 12. Test schedule 2: Average cost = 25.4..................................................................................48
Figure 13. Test schedule 3: Average cost = 22.68................................................................................49
Figure 14. A generic search tree for optimal test scheduling. ..............................................................57
Figure 15. An example of the cost function computation. ...................................................................63
Figure 16. Overall approach that uses proposed heuristics. .................................................................66
Figure 17. Test scheduling illustration. ................................................................................................69
Figure 18. Test application cost comparisons of proposed approaches................................................74
Figure 19. Percentage over the lower-bound........................................................................................75
Figure 20. Design and test schemes for high-speed circuit. .................................................................80
Figure 21. Timing requirements for the two designs............................................................................81
Figure 22. Yield comparison (T = T
FF
).................................................................................................87
Figure 23. Statistical delay distribution across a latch and the probability of time borrowing. .........105
Figure 24. An example reconvergence fan-out. .................................................................................107
vi
Abstract
Latch-based circuits are used in full custom designed high-speed chips, especially to implement
some delay critical parts due to two benefits: higher performance and higher yield at desired
performance. However, the unavailability of a delay test methodology that provides sufficiently high
coverage has hindered their widespread use.
In this dissertation, we show that the conventional delay testing approaches cannot be used for
delay testing of latch-based circuits with time borrowing, and show that it is necessary to use design-
for-test (DFT). We first focus on maximizing path delay fault coverage and propose the first path
delay testing approach and the associated DFT for such circuits. We prove that our latch-based delay
testing approach provides the theoretical maximum coverage (for any scan-based approach). We also
prove that this coverage is always greater than (or equal to) that for the latch-based circuit’s flip-flop-
based counterpart. Secondly, we focus on minimizing test application cost for delay testing latch-
based circuits under the constraint that maximum coverage is achieved. We show that conventional
test scheduling methods may not be applicable due to the unique characteristics of latch-based circuits
with time borrowing. We then formulate the minimization problem and propose a deterministic and
two heuristic approaches for test scheduling of such circuits.
The experimental results show that, for many example circuits, the proposed approaches achieve
dramatically higher coverage of path delay faults compared to classical approach, and achieve test
application costs that are within 5% of the corresponding lower-bounds
We then compare high-speed latch-based circuits with their flip-flop-based counterparts from
the viewpoint of path delay testing and present design guidelines for latch-based circuits that
guarantee that latch-based circuits also achieve higher yield and higher performance than their flip-
flop-based counterparts.
1
CHAPTER 1
Introduction
Pipelining of combinational logic is widely used in many parts of a circuit to improve
performance, where either flip-flops or latches are used. One of the advantages of flip-flop-based
pipelines over latch-based pipelines is that flip-flop-based pipelines are relatively easier to design and
supported by CAD tools. To ensure correct propagation of signals across flip-flop-based pipelines,
the delay of each stage (or logic block) must be smaller than the clock period. This timing
requirement is becoming increasingly difficult to satisfy especially in high-speed parts of circuits as
technology advances, since timing is becoming more significantly affected by process variations
and/or defects in fabrication. Hence, latch-based pipelining is used in many high-speed custom-
designed circuits, since it enhances performance and improves yield via intentional or unintentional
time borrowing that relaxes the timing requirement for each logic block. When time borrowing
occurs, a block may take longer time than its nominal allocation before completing its computation
and providing the result to the next block.
L1
φ φ
C
0
φ
φ
stable
t
1
t
3
t
2
…
…
…
t
4
t3 -t2
L6
L5
L3
L2
L4
L9
L8
L7
φ
C
1
…
…
…
Output
of C0
L1 L1
φ φ
C
0
φ
φ
stable
t
1
t
3
t
2
…
…
…
t
4
t3 -t2
L6
L5
L3
L2
L4
L9
L8
L7
φ
C
1
…
…
…
Output
of C0
Figure 1. An example latch-based linear pipeline.
2
A simplified latch-based high-speed pipeline can be modeled as shown in Figure 1, in which
C
i
’s are combinational logic blocks and L
j
’s are latches. Every latch is assumed to be a positive D-
latch, i.e., it becomes transparent when the corresponding clock is high. For simplicity of description,
we describe the approach assuming that complementary clocks are used. However, our approach is
applicable to any type of clocks, including two-phase non-overlapping, four-phase non-overlapping,
and four-phase overlapping [17]. To simplify the discussion, we assume that latches are ideal, i.e., all
their delays as well as their setup and hold times are zero. However, our approach for test
development inherently takes into account the actual setup time, hold time, and clock-to-Q delay of
every latch. The characteristics of real latches are also explicitly considered during the detailed design
of design-for-testability (DFT) circuitry.
1.1. Operation of latch-based circuits
Let us assume that C
0
is the first combinational logic block of a high-speed pipeline. We
consider the inputs to the latches driving C
0
as primary inputs and assume that a new combination of
values is applied at the inputs of block C
0
at the rising edge of its driving clock, i.e., the clock
controlling the latches at its inputs. In Figure 1, this is the rising edge of clock φ at time t
1
. The values
at the outputs of C
0
must stabilize some time before the subsequent falling edge of the block’s
receiving clock, i.e., the clock controlling the latches at its outputs. For C
0
in Figure 1, this is the
subsequent falling edge of clock φ at time t
4
. If the values at outputs of C
0
do not stabilize by this
time, correct values cannot propagate via latches at the outputs of the block to the inputs of the next
block, C
1
, and a delay fault exists at the given clock frequency. On the other hand, if the values at the
outputs of C
0
stabilize before the subsequent rising edge of its receiving clock, i.e., the subsequent
rising edge of φ at t
2
, then any change in values is passed to the inputs of the next block, C
1
, only
after the rising edge of the block’s receiving clock, φ . If the values at the outputs of C
0
stabilize after
the subsequent rising edge of its receiving clock, φ , but before the clock’s falling edge, then any
change in values passes immediately via the latches, and thus to the inputs of the next block. Hence, a
3
new combination of values may be applied at the inputs of C
1
as early as the subsequent rising edge
of its driving clock, φ , or as late as this clock’s subsequent falling edge. The values at the outputs of
block C
1
must stabilize some time before the subsequent falling edge of its receiving clock, φ, and so
on.
Hence, unlike in a flip-flop-based circuit, in a correctly functioning circuit the values at the
inputs of a block in a latch-based circuit may not be applied at a specific time. Nor may the
corresponding response values need to become available at its outputs at a specific time. This allows a
block to borrow time from others in its fan-in/fan-out.
1.2. Reference times and nominal delays
Without any loss of generality, we define a reference time as the earliest time at which new
values may be applied at the inputs of a block, namely the rising edge of the block’s driving clock.
The corresponding reference time for the responses at the outputs of a block is the rising edge of the
block’s receiving clock. In general, the reference times at the inputs and the outputs of a block are the
clock edges at which the latches at its inputs and outputs, respectively, become transparent. The
nominal delay for a block is defined as the time difference between the reference times at its inputs
and outputs. For each block in Figure 1, the nominal delay is half of the clock period, i.e., T/2, where
T is the period of the clock. If transitions at inputs and outputs of each block of a circuit satisfy these
reference times, the circuit will operate at the desired clock frequency. Such a circuit can be viewed
as a nominal circuit where no time is being borrowed by any block.
1.3. Types of time borrowing and their advantages
Now consider a scenario where the values at the inputs of C
1
do not arrive before t
2
, the rising
edge of , φ and arrive at t
3
as shown in Figure 1. In this case, we say that C
0
is borrowing time from
C
1
. The time duration t
3
– t
2
in Figure 1 denotes the amount of time borrowed. Similarly, if the outputs
of C
1
do not stabilize before the rising edge of φ (i.e., t
4
) but do so shortly thereafter, then C
1
is said
4
to be borrowing time from the block in its fan-out. For our definition of the reference times, C
1
may
borrow time from blocks in its fan-out to accommodate its own large delay and/or to compensate for
the time it lent to C
0
.
Time borrowing may be intentional in the sense that it may be planned during the design of a
circuit. Even when time borrowing is not planned during the design of a circuit, it may occur
unintentionally if variations and/or defects during fabrication cause such borrowing in some
fabricated copies of the circuit. Note that even when time borrowing occurs, unintentionally or
intentionally, the circuit is fault-free at given clock frequency provided that the values at outputs of
every time borrowing logic block stabilize before the subsequent falling edge of the block’s receiving
clock. Hence, latch-based circuits can enhance performance (i.e., increase clock frequency) by
enabling time borrowing, and improve yield via unintentional time borrowing.
1.4. Design choice between flip-flop-based and latch-based circuits
Flip-flop-based pipelines are easy to design and verify using an extensive set of available tools
for synthesis and verification. Hence, most ASIC designers prefer flip-flop-based pipelines. On the
other hand, latch-based pipelines are more difficult and challenging to design and verify because
ensuring correct timing behavior is more difficult and tool support is limited [8]. However, latch-
based pipelines are used in full custom designed high-speed chips, especially in some of their delay
critical parts, due to abovementioned major benefits, namely, higher performance and higher yield at
desired performance [8].
In a latch-based pipeline, if time borrowing is intentionally planned during design (intentional
time borrowing), this enables design of latch-based pipelines that operate at higher clock frequency,
because it is not necessary to carefully balance delays of combinational logic blocks to increase clock
frequency, and because latches are immune to clock skew to some degree. On the other hand, flip-
flop-based pipelines must carefully balance delays of logic blocks in order to achieve high
performance, since flip-flops present hard time boundaries between pipeline stages where no time
borrowing is permitted. Furthermore, clock skew must be budgeted for in the clock period for flip-
5
flop-based circuits. While retiming can balance flip-flop-based pipeline stages to reduce clock period,
in some cases this is not possible. For example, accessing cache memory is a substantial portion of
the clock period, limiting the clock period of the pipeline stage. There also needs to be additional
logic for tag comparison and data alignment [18]. With flip-flops, the only method for increasing the
speed may be to give the cache access an additional pipeline stage to complete if it is the critical path
limiting the clock period [18]. Experimental results in [8] show that latch-based designs are 5–19%
faster than corresponding flip-flop-based designs, for small increase in area.
If unintentional time borrowing occurs in latch-based circuit, which is not planned during design
but occurs in some fabricated copies for a design due to delay variations and/or (minor) defects
during fabrication, then such a fabricated copy of the circuit may operate at desired clock frequency
when the amount of time borrowing is accommodated by subsequent block(s). In this manner,
unintentional time borrowing increases yield at desired clock frequency when an appropriate delay
testing approach is used. On the other hand, a flip-flop-based pipeline without sufficient timing
margin will malfunction at desired clock period when variations and defects cause similar extra delay
in the circuit, leading to reduction in yield.
In summary, latch-based design enhances performance by enabling intentional time borrowing
and improves yield by allowing unintentional time borrowing, compared to flip-flop-based designs.
Importantly, as clock distribution is becoming increasingly difficult, abovementioned performance
benefits are growing. For the above reasons, latch-based pipelines are used in full-custom designed
high-speed circuits, especially in highly delay critical parts of circuit.
However, we can realize these two advantages only if latch-based design is carried out to obtain
high-speed implementations and an appropriate delay test methodology is used. Otherwise, flip-flop-
based pipelines would prevail due to the ease of design, verification, and test.
1.5. Testing of flip-flop-based versus latch-based circuits
The approaches of static testing, such as stuck-at fault testing, are similar for both latch-based
and flip-flop-based circuits. The same automatic test pattern generator (ATPG) can be used to
6
generate tests for both architectures, since tests are generated based on the circuit structure without
considering the timing of circuits. The only difference is that latch-based pipelines entail higher cost
when scan DFT is used to target faults in each block individually, because replacing a latch with a
scan latch requires two additional latches in general, whereas replacing a flip-flop with a scan flip-
flop requires an additional multiplexer as shown in Figure 2 [23]. Hence, in this research we focus on
delay testing (path delay testing) of flip-flop-based and latch-based circuits.
FF
D
clk
FF
D
clk
ScanIn
ScanOut
Latch
D
clk
D
clk
Latch*
clkA
ScanIn
ScanOut
Latch
clkB
(a) Flip-flop (b) Scan flip-flop (c) Latch (d) Scan latch
Figure 2. Additional components for scan flip-flop and scan latch.
In delay testing of flip-flop-based circuits, typically a divide-and-conquer approach using scan
DFT is used where each logic block is tested individually using scan. As soon as an erroneous
response is captured at any flip-flop in the circuit, the chip is identified as being faulty at the given
clock frequency. Such a chip is either discarded or “binned” to be sold at a slower rated clock
frequency. Also, since such an approach separately targets paths within individual blocks, it targets
shorter paths. Hence, higher delay fault coverage is obtained using a smaller number of tests,
compared to the approach without using scan DFT.
On the other hand, latch-based pipelines with time borrowing pose new challenges in delay
testing of latch-based circuits. Unlike flip-flop-based pipelines where each block of logic can be
tested (for delay faults) separately using scan DFT, none of the existing DFT techniques can be used
for delay testing of latch-based circuits with time borrowing, since time borrowing makes it necessary
to target multi-segment paths (a.k.a., multi-block paths), i.e., paths that span multiple blocks.
If the same divide-and-conquer approach is used for delay testing of latch-based pipelines, each
logic block will be tested independently using scan DFT. Using the nominal delay for each block, this
7
allows a maximum delay of T/2 (T is the clock period) for each logic block, if complementary clocks
are used as shown in Figure 1. The delay fault coverage we compute from this approach is high in
most cases since short paths are tested using scan. However, any fabricated copy of the chip that has
even one intentional/unintentional time borrowing site will fail the tests applied by such a divide-and-
conquer approach and be discarded. Since we are considering high-speed application of latch-based
circuits, the circuit is likely to contain one or more time borrowing sites. Whenever this is the case,
such a divide-and-conquer approach will lead to zero-yield for latch-based pipelines. In other words,
circuit designers cannot use intentional time borrowing, which suppresses the performance benefits of
latch-based designs. Also, there will be no yield benefit of using latch-based designs because
unintentional time borrowing is not allowed. Consequently, a simple divide-and-conquer delay testing
approach is not appropriate for delay testing of high-speed latch-based circuits as it erodes much of its
benefits.
The objective of this research is to propose an optimal scan-based path delay testing of latch-
based circuits with time borrowing by optimizing the robust path delay fault (PDF) coverage as well
as the test application cost. We also suggest guidelines for latch-based designs such that the
performance and yield benefits of latch-based circuits are guaranteed under the optimal delay fault
coverage.
In Chapter 2, the key challenges in delay testing of latch-based circuits are discussed. This
motivates the need for developing new DFT designs and new approaches for DFT-based delay testing.
In Chapter 3, we focus on maximizing delay fault coverage and present new path delay testing
approach that requires a very small number of scan chain configurations, while guaranteeing
maximum robust PDF coverage. This decreases the overheads due to DFT significantly. We then
prove that the proposed delay testing approach for latch-based circuits always achieves the theoretical
maximum PDF coverage regardless of the length of multi-segment paths targeted.
Furthermore, we propose a new test generation approach that works for any time borrowing
scenario even for cases where only a fraction of the abovementioned scan chain configurations (or
8
any other set of configurations) are available. This is especially useful since it allows us to avoid scan
chain configurations that significantly degrade circuit performance.
In Chapter 4, we propose a new test scheduling approach for latch-based circuits to minimize
the test application cost while achieving the maximum coverage that the test generation method
presented in Chapter 3 can achieve. First, we show that conventional test scheduling approaches may
not be applicable due to the unique characteristics of latch-based circuits with time borrowing. We
then present our new formulation of the test cost minimization problem for path delay testing of latch-
based circuits, and present a deterministic approach as well as two heuristic approaches.
In Chapter 5, we compare high-speed latch-based circuits with their flip-flop-based counterpart
designs from the viewpoint of path delay testing, and propose design guidelines for latch-based high-
speed circuits that guarantee that a latch-based circuit achieves higher performance and higher yield
than its flip-flop-based counterpart. We also prove that a latch-based circuit the delay testing
approach proposed in Chapter 3 obtains the maximum path delay fault coverage that is always greater
than (or equal to) the coverage of the corresponding flip-flop-based circuit.
In Chapter 6, we discuss the future research tasks and subjects such as other delay testing
approaches and the hardware design and control issues related to the scan DFT.
9
CHAPTER 2
Background – Challenges in delay testing of latch-based
circuits
2.1. Key challenges in delay testing of latch-based circuits with time borrowing
In a flip-flop-based circuit, the transition at the output of a path in one block is latched into the
corresponding flip-flop at a specific clock edge before it begins to propagate via a path in the next
block. Hence, if the delay of the path in the first block is excessive, the transition misses the clock
edge and cannot be seen by the path in the next block. Also, if the delay of the path in the first block
is short, the transition at the input of a path in the next block still starts after the appropriate clock
edge. Thus, in delay testing of flip-flop-based circuits, PDFs in one combinational logic block can be
treated independently of the PDFs in the adjacent blocks.
In latch-based circuits, in contrast, test application at the inputs of a block may not always occur
at the rising edge of the driving clock due to time borrowing. If time borrowing never occurs at a
particular latch, transitions are always applied at the rising edge of the driving clock. However, if time
borrowing occurs at a latch, the exact time at which transition is applied to the next block depends on
the amount of time borrowing.
Hence, for delay testing of latch-based circuits, we first need to know the latches that are sites of
time borrowing. Latches that are sites of intentional time borrowing are known prior to DFT design,
test development, and test application. On the other hand, latches that are sites of unintentional time
borrowing may vary from one fabricated copy of the chip (called a chip instance in the following) to
another and hence are not known prior to test application. More importantly, the precise amount of
time borrowing at a site of intentional/unintentional time borrowing varies from one chip instance to
another; even in a particular chip instance it varies from one vector to another.
Hence, one of the biggest challenges in delay testing of latch-based circuits is that it is
impossible to use the scan mode to apply a test at a latch without knowing that the latch is not a site of
10
time borrowing. Furthermore, even when time borrowing is known to occur at a latch, it is practically
impossible to use scan to apply tests where bits are skewed to precisely replicate the arbitrary and
unknown amounts of time borrowing at outputs of various latches. It is important to recall that simply
applying tests at the inputs of a block and observing responses at its outputs at nominal times will
cause many fault-free chips to be unnecessarily discarded. In fact, in circuits where time borrowing
has been intentionally exploited, this can lead to zero yield at the given clock frequency. Hence, we
need to develop a novel delay testing approach that applies tests and captures responses at clock
edges only, while considering intentional/unintentional time borrowing.
Consequently, if a latch is a site of time borrowing, it is necessary to test multi-segment paths,
i.e., paths obtained by concatenating appropriate paths in successive logic blocks separated by latches.
Since many latch-based parts of circuits (e.g., data-paths) contain an astronomical number of such
multi-segment paths [1], the classical test approach, which targets the entire pipeline without DFT,
typically suffers from impractically high test generation complexity, high test application time, and –
for many circuits – meaninglessly low fault coverage. Hence, the use of some new type of DFT is
imperative to reduce significantly test generation and test application times while providing
meaningfully high values of delay fault coverage by targeting shorter paths. The next section explains
these major benefits of exploiting scan DFT.
2.2. Benefits of scan design-for-testability
There are two major benefits of delay testing using scan DFT. First, appropriate use of scan
DFT reduces the number of target PDFs. For example, consider paths via L
5
in the two stage pipeline
shown in Figure 1. Let x paths in C
0
terminate at L
5
and y paths in C
1
originate from L
5
. Then there
exist xy physical paths in the above two blocks via L
5
. Since, for each physical path, two PDFs – one
with a rising transition at its input and another with a falling transition – must be targeted, a total of
2xy PDFs that pass via the latch (as well as the two blocks) must be targeted. If one can verify, during
testing of a particular chip instance, that no time borrowing, intentionally or unintentionally, occurs at
11
this latch L
5
, paths in C
0
and C
1
can be targeted separately. In such a case, the total number of PDFs
corresponding to the latch that must be targeted drops from 2xy to 2(x+y). Since x and y are typically
large, the use of scan reduces the total number of target PDFs. Note that even greater reductions occur
when one considers the above arguments for multi-segment paths that pass via a larger number of
blocks.
Second, the average length of a path targeted during test generation is shortened when the
proposed approach and DFT are used. It is not always possible to propagate a transition robustly
along a path, since sometimes conflicting logic values are required at side inputs of the path for robust
propagation. As the length of a target path increases, typically the possibility of a conflict between
values required at side inputs also increases. The use of scan at latches where no time borrowing
occurs reduces the average length of paths and hence, in many circuits, enhances PDF coverage.
2.3. The overall optimization problem
As noted in Sections 2.1 and 2.2, there are unique problems for test generation and design of
DFT circuits to apply tests and capture responses and it is imperative to develop new DFT designs
and a new delay testing technique that takes advantages of these new DFT designs. Developing this
type of new delay testing involves several sub-objectives that jointly define the overall optimization
problem. The sub-objectives include maximization of delay fault coverage, minimization of test
application cost, minimization of test generation time, design of optimal DFT circuitry and so on.
Among these, maximization of delay fault coverage and minimization of test application cost are
two major problems for test engineers, and typically there is trade-off between the two. In other
words, in order to reduce test application cost, delay fault coverage might have to be compromised to
some extent. However, even if we are willing to incur a sufficiently high test application cost, we may
not be able to achieve desired level of delay fault coverage without implementing an appropriate test
generation methodology.
12
Hence, we prioritize the above sub-objectives to define the overall optimization problem as
follows. First, in Chapter 3, we propose a new structural test generation approach that maximizes the
robust PDF coverage by exploiting scan DFT that applies tests and captures responses at clock edges
only. Second, in Chapter 4, we discuss a new test scheduling method to minimize test application cost,
under the condition that the maximum robust PDF coverage proposed in Chapter 3 is maintained.
13
CHAPTER 3
A new delay testing approach
3.1. Basic assumptions
In our approach, a latch may operate in the following four modes.
(1) Normal mode. The latch is transparent when the corresponding clock is high and holds its state
when the clock is low.
(2) Scan mode.
(2a) Scan-in mode. Vectors are loaded via scan-in and applied at the rising edge of the
corresponding clock. Concurrently, the values previously captured in the latches are scanned
out as described next.
(2b) r-capture scan-out mode. The latch captures response at the rising edge of the corresponding
clock for scan out.
(2c) f-capture scan-out mode. The latch captures response at the falling edge of the
corresponding clock for scan out.
It is assumed that time borrowing does not occur at the “primary” inputs and outputs of the
entire latch-based circuit. This is often true because high-speed latch-based pipelines are typically
embedded in a larger flip-flop-based system. We may test each block individually. Alternatively, we
may test any set of contiguous blocks together as a single entity. In either case, we use the term sub-
circuit under test (SCUT) to describe the block(s) under test during a particular phase of testing. Each
SCUT is characterized by what blocks are included and what scan chain configurations (i.e., operation
modes of latches) are used for the latches in the SCUT.
The proposed test generation approach is based on robust path delay testing of path delay faults.
Hence, the following property of robust tests is used throughout this dissertation.
Lemma 1: Any robust test for a target PDF invokes a delay equal to or greater than the delay of the
target path, independently of the presence or absence of other delay variations or faults in the circuit.
14
This is because a robust test for a PDF guarantees that a transition at an on-path line cannot
occur unless a transition occurs at the previous on-path line, independently of the presence or absence
of any other delay variations or faults [7]. This basic property attributed to robust tests is strictly true
under many commonly used delay models. More details can be found in [7]. In particular, the
propagation of a transition along the path may be affected by the existence of non-static values at off-
path inputs. However, the conditions for robust propagation only allow those values at off-path inputs
which may delay but cannot accelerate the on-path propagation.
3.2. Key ideas behind the proposed approach
3.2.1. r-r test: A set of sufficient conditions on block delays
Under the r-r test application, tests are applied to the inputs of the SCUT at the rising edge of the
driving clock (i.e., the clock that drives the latches at the input of the SCUT), and the responses are
captured at the outputs of the SCUT at the rising edge of the receiving clock (i.e., the clock that drives
the latches at the output of the SCUT). Let us assume that the SCUT is comprised of m consecutive
blocks of logic in a linear pipeline (C
h
, C
h+1
, C
h+2
, ⋅⋅⋅, C
h+m–1
), and that the latches at the inputs of
target paths (input latches) in the SCUT are free of time borrowing. Then, the time interval TA(r,r),
denoting the nominal time allocated to the SCUT, is mT/2, where T is the clock period. Hence, the
following is a sufficient (but not necessary) condition for the SCUT to be free of delay faults.
∑
− +
=
≡ ≤ ∆
1
) , (
2
m h
h i
i
r r TA
T
m C , (1)
where ∆C
i
is the maximum delay of any multi-segment path in block C
i
. If this condition is violated
for a latch (output latch) at the output of the SCUT, we discover that the SCUT borrows time via the
latch from the next block C
h+m
, since a transition arrives at the latch after the output latch of C
h+m–1
becomes transparent. In particular, we have the following result. If every block of a CUT passes r-r
tests at clock period T, the CUT has no delay fault and no time borrowing at that clock frequency.
15
However, one or more SCUT may fail r-r tests due to time borrowing but the circuit may not have a
delay fault. The above observation is generalized to obtain Theorem 1.
Theorem 1. If a CUT can be partitioned, i.e., divided into a disjoint SCUTs that collectively include
all blocks in the CUT, such that each SCUT passes corresponding r-r tests at clock period T, then the
CUT is free of delay faults at that clock frequency [9].
3.2.2. r-f test: A set of necessary conditions on block delays
Under the r-f test application, tests are applied to the inputs of the SCUT at the rising edge of the
driving clock and the responses are captured at the outputs of the SCUT at the falling edge of the
receiving clock. Let us assume that the SCUT is comprised of m consecutive blocks of logic in a
linear pipeline (C
h
, C
h+1
, C
h+2
, ⋅⋅⋅, C
h+m–1
), and that the latches at the inputs of the SCUT are free of
time borrowing. Let the time interval TA(r,f) denote the maximum time allowable for any multi-
segment path in a given SCUT. Then, the following is a necessary (but not sufficient) condition for
the SCUT to be free of delay faults.
∑
− +
=
≡ + ≤ ∆
1
) , (
2
) 1 (
m h
h i
i
f r TA
T
m C , (2)
where ∆C
i
is the maximum delay of any multi-segment path in block C
i
. If this condition is violated
for even one SCUT, the entire CUT is proven to have a delay fault at the clock period T. Note that the
r-f test application allows the maximum time duration for transitions to propagate via each path in an
SCUT. Hence, it is necessary for every block to pass r-f tests.
Theorem 2. If any SCUT fails its r-f tests at clock period T, then the circuit has delay faults at that
clock frequency.
Proof: If an SCUT fails r-f tests, it means the delay of SCUT is longer than the maximum time
duration for transition. Since such additional delay cannot be accommodated via time borrowing, the
circuit has delay faults at the given clock frequency. ■
16
Consider a scenario where, based on circuit design, we expect extreme time borrowing, i.e.,
where the delay of one or more single block and/or one or more multiple block combination are, in
presence of delay variations or defects, likely to exceed the maximum possible delay allowed given
by the relation described for TA(r,f). In this case, we consider each single block or multiple block
combination where extreme time borrowing is deemed likely to occur as an SCUT and apply suitable
tests to the SCUT using the r-f test application. Obviously, if any of these SCUTs fails any of its r-f
tests, then the entire CUT is identified as having a delay fault at the desired clock period and delay
testing can be terminated immediately.
However, in much of this dissertation, for simplicity of discussion, it is assumed that a chip
under test has either skipped r-f tests (since extreme time borrowing was not expected) or passed all
the r-f tests applied (since extreme time borrowing does not exist).
3.2.3. Time borrowing detection
Each latch in the circuit, other than primary input and primary output latches, is determined
either as being a time borrowing latch (TBL) or as a non-time borrowing latch (NTBL). A latch L is
identified as a TBL if at least one test for an SCUT that terminates at L fails r-r tests by the capture of
an erroneous value at the latch. In contrast, a latch L is identified as an NTBL if, for all SCUTs that
terminate at the latch, no r-r test fails by causing an error at that latch. The test procedure of the
proposed structural delay testing is based on identification of time borrowing status of latches within
the CUT. We now summarize the above observations.
Observation 1: A latch L is identified as an NTBL if, for all SCUTs that terminate at the latch, no r-r
test fails by causing an error at that latch.
Observation 2: A latch L is identified as a TBL if at least one test for an SCUT that terminates at L
fails r-r tests by the capture of an erroneous value at the latch.
Consider a latch L that is identified as a TBL via testing. In general, in such a situation a non-
empty subset of paths that terminate at the latch are time borrowing while the set of remaining paths
17
(which might be empty) are non time borrowing. While it is theoretically feasible to develop a
methodology that considers L as time borrowing only with respect to the former set of paths, such a
methodology is practically unimplementable since it requires execution of delay diagnosis on each
copy of chip to identify the former set. Since typically the complexity of such diagnosis is extremely
high, we consider L as time borrowing wich respect to all paths that terminate at L as stated below.
Observation 3: If a latch L is identified as a TBL, all multi-segment paths that pass via L must be
targeted by subsequent testing, despite the fact that some of paths in fan-in of L may be not borrowing
time.
3.2.4. Basic delay testing strategies
The basic idea behind the proposed delay testing approach is to test the first logic block to
identify sites of time borrowing and adaptively add the subsequent logic blocks by deciding whether
to target multi-segment paths or single-segment paths. This process continues to the last logic block.
Consider the three-stage linear pipeline shown in Figure 3. Latches at inputs of logic block C
k
are
called level-k latches. Initially, r-r tests are applied to the first logic block (C
0
). If C
0
passes all these r-
r tests, time borrowing does not occur at any of level-1 latches and hence the next logic block (C
1
)
can be tested separately from C
0
. Likewise, if C
1
also passes all r-r tests, time borrowing does not
occur at any of level-2 latches and hence the next logic block (C
2
) can be tested separately. Since time
borrowing does not occur at any latch in this particular case, each logic block is individually tested,
which tends to decrease the number of target paths and significantly increase the coverage.
On the other hand, if C
0
fails r-r tests, we target multi-segment paths that span C
0
and C
1
,
denoted as C
0
+ C
1
. These multi-segment paths are targeted by using scan DFT at level-0 and level-2
latches and configuring all level-1 latches in normal mode. However, this scheme of simply
combining consecutive logic blocks to target multi-segment paths does not improve coverage
significantly, especially in cases where time borrowing occurs extensively across the pipeline. For
example, in the worst case where C
0
+ C
1
also fails r-r tests, we will end up testing the entire three-
18
stage pipeline (C
0
+ C
1
+ C
2
) jointly, while obtaining the same coverage as the classical approach at
the additional cost of testing C
0
and C
0
+ C
1
.
Hence, we developed an SCUT-based approach [9] that targets multi-segment paths via TBLs
only, and targets shorter paths that start at NTBLs. This is done by configuring NTBLs in scan mode
and TBLs in normal mode. For instance, suppose the testing of C
0
identifies L
10
only as a TBL. Then,
in the next test session for C
0
+ C
1
, we configure L
11
and L
12
in scan mode, and L
10
in normal mode. In
this manner, the two-segment paths via L
10
are tested and the single-segment paths starting at L
11
and
L
12
are tested. Similarly, the results of this testing of C
0
+ C
1
will decide the time borrowing status at
level-2 latches, which will adaptively determine the configurations of level-2 latches in the next test
session that includes C
2
.
Although this approach [9] improved the coverage significantly in many cases, the experiments
show that the coverage is still low for cases where time borrowing occurs extensively. Next we
propose an advanced SCUT-based approach that further improves the coverage even when time
borrowing occurs extensively, and reduces the complexity of DFT circuitry and scan chain routing.
L
00
L
01
L
02
L
10
L
11
L
12
L
20
L
21
L
22
L
30
L
31
L
32
C
0
C
1
C
2
Level-0 Level-1 Level-2 Level-3
Figure 3. A three-stage linear pipeline.
3.3. Test of the first logic block (SCUT
0
)
In Sections 3.3 through 3.6, for simplicity of explanation, the discussion uses the two-stage
linear pipeline shown in Figure 4. However, the ideas proposed are applicable to general latch-based
networks. Assume that there are j latches at the inputs of the first combinational logic block C
0
(level-
19
0 latches), k latches between C
0
and C
1
(level-1 latches), and l latches at the outputs of C
1
(level-2
latches).
C
0
C
1
L
11
L
1i
L
1k
L
01
L
02
L
0j
L
21
L
22
L
2l
Level-0 Level-1 Level-2
A multi-segment path p
α
β
Figure 4. A two-stage linear pipeline.
As in the approach described above [9], the paths in the first logic block C
0
(= SCUT
0
) in Figure
4 are tested by themselves by operating level-0 latches in scan (scan-in) mode and level-1 latches in
scan (r-capture scan-out) mode. The purpose of testing SCUT
0
is to identify TBLs among the level-1
latches. Assume that such testing detects time borrowing at a subset of level-1 latches denoted by the
set L
TB
, where
L
TB
= {L
1i
| L
1i
is a TBL, 1 ≤ i ≤ k}.
The remainders of the level-1 latches are not time borrowing sites and constitute a set L
NTB
, i.e.,
L
NTB
= {L
1i
| L
1i
is not a TBL, 1 ≤ i ≤ k}.
In this CUT we are interested in testing multi-segment paths in the CUT that span C
0
and C
1
.
When any particular multi-segment is being tested, the target path passes via one level-1 latch, which
is referred to as the on-path latch for the path. All the other level-1 latches are called off-path latches
for the target path. Sections 3.4 and 3.5 describe how to test the multi-segment paths with high PDF
coverage using DFT by treating on-path and off-path latches differently and by considering L
TB
and
L
NTB
.
3.4. Configuring on-path latches
Consider a case where the objective of delay testing is to test an arbitrary multi-segment path p
comprised of a sub-path α in C
0
and a sub-path β in C
1
in the circuit shown in Figure 4. α and β are
20
connected by a level-1 latch L
1i
. In this case the latch L
1i
is the on-path latch and every other level-1
latch is an off-path latch.
3.4.1. On-path latch with time borrowing
First, consider the case where the on-path latch L
1i
is identified as a TBL during testing of
SCUT
0
(i.e., L
1i
∈ L
TB
). In order to test multi-segment paths that pass via L
1i
∈ L
TB
, scan mode cannot
be used for L
1i
, since no known DFT circuitry can replicate an appropriately skewed test application
and response capture corresponding to the precise amount of time borrowing, which varies from
vector to vector and from one chip instance to another. Therefore, only normal mode can be used at
TBL L
1i
during testing of any multi-segment path that passes via the latch.
3.4.2. On-path latch with no time borrowing
Now consider the case where the on-path latch L
1i
is identified as a NTBL during testing of
SCUT
0
(i.e., L
1i
∈ L
NTB
). Theorems 3 and 4 identify the relationship between (i) testing α and β
individually as sub-paths, and (ii) testing α and β jointly (denoted as α + β) as a multi-segment path.
(i.e., α + β stands for the testing of the path p by configuring L
1i
in normal mode in the SCUT
comprised of C
0
and C
1
.)
Theorem 3 (Test quality). If any robust test for α passes when C
0
is tested by itself and any robust test
for β passes when C
1
is tested by itself, then the worst-case delay of multi-segment path p via L
1i
is
within the limit imposed by the given clocks.
Proof: We will prove this by contradiction. Let us start by assuming that the path p, i.e., α + β, fails
when SCUT comprised of C
0
and C
1
are tested together. In this case, the delay of p exceeds the sum
of nominal delays of C
0
and C
1
. Assuming that no time borrowing occurs at the inputs of C
0
, this can
occur only under the following two scenarios: (i) delay of α is greater than the nominal delay of C
0
, or
(ii) delay of β is greater than the nominal delay of C
1
. In the former case, any robust test for α when
21
C
0
was tested by itself would have failed and in the latter case any robust test for β when C
1
was
tested by itself would have failed (or both). Hence, if each α and β pass robust tests when respective
blocks are tested, this shows that the delay of p cannot exceed its nominal delay. ■
Theorem 4 (Coverage). If the multi-segment path α + β is robustly testable in the multi-segment
SCUT comprised of C
0
and C
1
, α in C
0
by itself and β in C
1
by itself are both individually robustly
testable.
Proof: When α + β is targeted in an SCUT comprised of C
0
and C
1
, the conditions necessary (and
sufficient) for robust detection of p require that α be robustly sensitized within C
0
, as in the case
where α is tested in C
0
by itself. Therefore, if α is not robustly testable in C
0
by itself, then no robust
test exists for any multi-segment path that includes α. Similar reasoning also applies to β. ■
In summary, if we conclude that time borrowing does not occur at the latch after testing the
block(s) in the fan-in of the latch, we can separately test the sub-paths in the fan-out of the latch
instead of testing multi-segment paths that pass via the latch.
While testing a sub-path in the fan-out of latch L
1i
at which no time borrowing occurs, L
1i
may
be configured either in normal mode or in scan mode. Next, we compare these two approaches in
Sections 3.4.2.1 and 3.4.2.2.
3.4.2.1. Approach 1: The multi-segment path of α + β is targeted by configuring L
1i
in normal mode.
In this approach, the on-path latch L
1i
is configured in normal mode although L
1i
is a NTBL.
This may be the case if the DFT design does not support required scan mode operation. Even if the
normal mode is in use, we can attain higher test quality based on Theorem 4 and the fact that time
borrowing does not occur at L
1i
by exploiting the following property.
Property 1 (Use of test results for shorter sub-paths). In order to test the sub-path β because L
1i
is a
NTBL, the test generation procedure targets β as parts of multi-segment paths that start at any level-0
latch and include β by configuring L
1i
in normal mode. Note that the sub-paths in the fan-in of L
1i
are
used only to produce a rising or a falling transition (as appropriate to test β) at the output of L
1i
, and
22
the logic values within C
0
need not robustly propagate the transition along any particular path in the
fan-in of L
1i
. As long as a desired transition is initiated at the output of L
1i
, robust propagation of the
transition is required only for the sub-path β. By doing so, the number of target PDFs is also reduced
because only β is targeted instead of all multi-segment paths that include β.
In summary, it is shown in Theorem 4 that for a path in the fan-in of L
1i
(e.g., α), testing using
SCUT
0
provides equal or higher robust coverage compared to testing using the multi-segment SCUT
comprised of C
0
and C
1
. Also for the sub-paths in the fan-out of L
1i
(e.g., β), testing using Property 1
is as good as testing multi-segment paths in an ordinary manner because Property 1 does not require
robust sensitization along any particular path within C
0
. Hence, Property 1 shows that robust test
coverage can be further improved even without using the scan mode at the on-path latch, provided
that time borrowing is known not to occur at the on-path latch.
3.4.2.2. Approach 2: The sub-path β is targeted by operating L
1i
in scan mode.
In this approach, we only test the sub-path β that originates at a NTBL L
1i
. As L
1i
is configured
in scan mode, the sub-path β of the original target path ( α + β) will be tested separately and the robust
coverage for sub-paths like β will be combined with the robust coverage for sub-paths like α in the
fan-in of the latch.
If L
1i
can be configured in scan mode, which is typically true in cases where L
1i
is connected to
the scan-out chain for time borrowing detection, Approach 2 is preferred to Approach 1 because
Approach 2 provides equal or higher coverage than Approach 1, as per Theorem 4.
3.5. Configuring off-path latches
Suppose a multi-segment path p that comprises α and β passing via L
1i
in Figure 4, is targeted.
The configuration of the on-path latch L
1i
is determined as explained in Section 3.4 (i.e., normal mode
is used if L
1i
is a TBL; either normal or scan mode is used if L
1i
is a NTBL).
Note that in both cases, any off-path latch can be configured in scan mode, independently of
whether or not that off-path latch is a site of time borrowing. This is due to the following two reasons.
23
First, if a static value is applied via scan at an off-path latch, the time borrowing status of the off-path
latch has no impact on the on-path delay. Second, even if a rising or a falling transition is applied via
scan, a robust test for a target path like β does not require off-path transitions to satisfy any specific
timing requirement as per Lemma 1. (In particular, an early off-path transition cannot reduce the on-
path delay. In our scheme the off-path transition is never later than in the normal mode.) Hence, even
for a latch where time borrowing is proven to occur, scan mode operation does not violate the robust
delay test conditions, provided that the latch is off-path.
Now let us consider two alternatives, Alternative-1 and Alternative-2 that operate the on-path
latch in the same mode (that meets above requirements) and differ only in the configuration of the
off-path latches. Let the set of the off-path latches at level-1 configured in scan mode in Alternative-1,
A
1
scan
, be a proper subset of the set of off-path latches configured in scan mode in Alternative-2, A
2
scan
(i.e., A
1
scan
⊂ A
2
scan
). Note that Alternative-1 includes the case where scan mode is used for none of
the off-path latches, i.e., A
1
scan
= φ . Hence, the classical approach where only normal mode is used for
every on-path and off-path latch is a special case of Alternative-1. By comparing the two alternatives,
we obtain the following results.
Theorem 5 (Test quality). Any robust test for the multi-segment path p using Alternative-1 or
Alternative-2 invokes a delay equal to or greater than the delay of p.
Proof: First, consider the case where the on-path latch L
1i
is configured in normal mode. In both
alternatives, scan mode is used for each level-0 latch and each level-1 latch in A
1
scan
and A
2
scan
. If two
different test vectors are applied in the two alternatives, the arrival times at the output of the on-path
latch L
1i
may be different in the two cases. However, due to the characteristic of robust tests given in
Lemma 1, the delay invoked for α and via L
1i
will be guaranteed to be equal to or greater than the
worst-case delay of sub-path α plus the delay via L
1i
. Next, for the propagation of this transition along
β, the values applied at off-path level-1 latches may be different in two alternatives. However, also
due to Lemma 1, the subsequent propagation along β will invoke overall delay equal to or greater
than that of the target path p.
24
Second, in case where the on-path latch L
1i
is configured in scan mode when L
1i
is a NTBL, the
transitions in both alternatives depart from L
1i
at the rising edge of the clock driving L
1i
. The
subsequent propagation along β will invoke overall delay equal to or greater than that of the target
path p due to Lemma 1. ■
Theorem 6 (Coverage). If a multi-segment path p is robustly testable in Alternative-1, then it is
robustly testable in Alternative-2.
Proof: Note that both alternatives require the same set of conditions for the values at on-path lines
and off-path inputs for robust detection of p. We can specify independent logic values (i) at every
level-1 latch that belongs to A
1
scan
in Alternative-1 and A
2
scan
in Alternative-2, respectively, as well as
(ii) at all level-0 latches in both alternatives. Since A
1
scan
⊂ A
2
scan
, Alternative-2 provides a superset of
possible value assignments to satisfy the same set of conditions for robust detection of p.
Consequently, if a robust test exists for p in Alternative-1, then one surely exists in Alternative-2. ■
Theorem 5 shows that the test quality obtained by any robust test applied using Alternative-2 is
equal to the test quality obtained by any robust test applied using Alternative-1. Theorem 6 shows that
robust delay fault coverage for Alternative-2 is definitely equal to and may be superior to that for
Alternative-1.
Hence, if we can use a scan chain configuration described in Alternative-2 to target a multi-
segment path p, then we need not use a scan chain configuration described in Alternative-1 to test the
path. The following result is a corollary to Theorems 5 and 6 assuming that the on-path latch L
1i
is a
time borrowing site.
Corollary 1. While testing a multi-segment path via a latch at which time borrowing is known to
occur, the best robust test quality and the best robust coverage can be obtained by operating the on-
path latch in normal mode and all off-path latches in scan mode (single-normal configuration),
provided that DFT circuitry and control signals allow such a combination of modes.
For example, suppose there are four latches at level-1 of Figure 4, and testing of SCUT
0
shows
that time borrowing occurs at L
12
, and multi-segment paths that pass via L
12
are targeted. In this case,
25
normal mode is required at L
12
since it is the on-path latch and a site of time borrowing. Depending
on the configuration of the off-path latches, 8 (=2
3
) configurations may be used as shown in Figure 5.
s,(n),s,s
n,(n),n,n
n,(n),s,s s,(n),n,s s,(n),s,n
L11, L12, L13, L14
n,(n),n,s n,(n),s,n s,(n),n,n
n: normal mode, s: scan-in mode, ( ): required mode for on-path latch
An arrow config.A Æ config.B indicates that if a path via the on-path latch
is robustly testable using config.B, it is robustly testable using config.A
(L 12: time borrowing site)
Figure 5. Relationships among scan chain configurations.
The relationships among different configurations given by Theorem 6 are represented by arrows
in Figure 5. If a path via the on-path latch (L
12
) is robustly testable using the configuration specified
by the destination of an arrow, the path is robustly testable using the configuration specified by the
source of the arrow.
The following result is a corollary to Theorems 5 and 6, this time assuming that the on-path
latch L
1i
is a NTBL.
Corollary 2. While testing a multi-segment path via a latch at which time borrowing does not occur,
the best robust test quality and robust coverage can be obtained by operating the on-path latch as well
as all off-path latches in scan mode (all-scan configuration), provided that such a configuration is
supported by the DFT hardware and control.
We can modify Figure 5 to include the remaining eight possible scan chain configurations which
have L
12
in scan mode and add corresponding arrows to show the relationships between different scan
chain configurations for this case where L
12
is a non-time borrowing on-path latch.
26
If a target path p is tested using a configuration where one or more off-path latches are in normal
mode, then we can use the following property to modify the value applied at output of any latch
where no time borrowing occurs and yet is configured in normal mode.
Property 2 (Hazard-free values at NTBLs). The output of a NTBL is always hazard-free, because
data stabilizes at the latch input before the latch becomes transparent.
By considering both hazardous and hazard-free values at the input of each such latch even when
a hazard-free value is desired at the latch’s output, robust tests may be found for some paths for
which such tests may not otherwise be found. Figure 6 shows an example case where time borrowing
is detected only at L
1
but both latches are operating in normal mode to test a path via L
1
. The falling
transition is propagated via L
1
to test the path shown in bold. Robust propagation of the falling
transition at the on-path input of G
4
requires static-1 at its off-path input, which is the output of L
2
.
However, the output of G
3
cannot have a static-1 signal because the values at the inputs of G
3
are
already determined by the on-path values as a rising transition at one input and a falling transition at
the other. Hence, a conventional test generator will be unable to find a robust test for this path via L
1
.
On the other hand, our ATPG (automatic test pattern generator) exploits Property 2, and considers
hazardous-1 signal as well as static-1 at the input of L
2
. Hence, by exploiting Property 2, our ATPG
can successfully generate a robust test for the target path and improve coverage.
s1
L 2 in
normal
L 2 in
normal
L1 in
normal
L1 in
normal
Time borrowing
No time borrowing
& &
& &
& &
& &
s1
G 1
G2
G3
G4 s1
Figure 6. Property 2 helps improve robust coverage.
27
3.6. A set of scan chain configurations necessary to maximize coverage
The above results provide a significant reduction in the number of scan chain configurations that
are required to maximize robust PDF coverage, even when time borrowing occurs at unexpected
latches (i.e., latches that are not sites of intentional time borrowing). In [9], the fully-adaptive
approach requires the DFT circuitry to support 2
k
scan chain configurations at a level with a total of k
latches. However, Corollary 1 and Figure 5 show that when we detect time borrowing at the i
th
latch
in the level, then the configuration in which the i
th
latch is in normal mode and all the other latches are
in scan mode, by itself, maximizes the coverage for all multi-segment paths that pass via the i
th
latch.
Hence, no matter which and how many of the latches at the level are sites of time borrowing, the
robust PDF coverage can be maximized for the paths that pass via TBLs if the DFT supports the
following k single-normal configurations: (n, s, s, ···, s), (s, n, s, ···, s), (s, s, n, ···, s), ··· , (s, s, s, ···, n),
where n denotes normal mode and s denotes scan mode. As per Corollary 2, the all-scan configuration
(s, s, s, ···, s) provides the maximum coverage for all paths that pass via the latches where no time
borrowing occurs. Of course, we need the all-normal configuration (n, n, n, ···, n) to support normal
circuit operation. We call the above k+2 configurations as the optimal set of scan chain
configurations, meaning that all normal, all-scan, and every single-normal configurations are
available in every level of latches.
By decreasing the number of required scan chain configurations from 2
k
to k+2 in a level of k
latches, the proposed approach reduces the complexity of DFT circuitry and scan chain routing. This
is a significant improvement over [9]. In general, we have the following results.
Theorem 7. Maximum possible robust PDF coverage can be attained for any latch-based circuit
independently of the time borrowing scenario, provided that latches at inputs of every combinational
block can be configured in (n, n, ···, n, n), (s, s, ···, s, s), (n, s, ···, s, s), ···, (s, s, ···, n, s), (s, s, ···, s, n),
i.e., the all-normal, the all-scan-in, and all possible single-normal configurations.
28
3.7. Required tests for maximum coverage
For a faulty chip instance, it is not necessary to cover the entire CUT since a faulty chip instance
is discarded as soon as a delay fault is identified. In particular, a delay fault in a faulty chip instance
can be detected by any target path in an SCUT that terminates at a primary output and fails r-r tests
(under the assumption that r-f tests are not used), regardless of time borrowing status at the latch
where the target path begins. In other words, the objective of delay testing for faulty chip instances is
not to compute the delay faulty coverage but to identify a delay fault at minimum test application cost.
In contrast, for fault-free chip instances, we must test all necessary target paths that are required
to cover the entire CUT in order to compute the delay fault coverage. Essentially, we are interested in
targeting every path of CUT that starts from a primary input and terminates at a primary output. A
multi-segment path p that spans from a primary input to a primary output can be tested either by itself
as a single target path or by multiple sub-paths that partition p using scan, where every partition is
made only at a NTBL. For instance, suppose that p starts from a primary input L
0
, passes via L
1
, and
terminates at a primary output L
2
. Let us assume that L
1
is known as a NTBL and r-r tests are applied
to the sub-path from L
0
to L
1
and the sub-path from L
1
to L
2
. If both sub-paths pass the r-r tests, we
can say p is robustly tested as proven by Theorems 3 and 4.
In summary, for fault-free chip instances, it is required for a test generation algorithm to target
every path from a primary input to a primary output either as a single target path or as multiple
disjoint sub-paths where every partition is made only at a NTBL, such that the coverage is maximized
by selecting best scan chain configurations as per Theorems 3 through 6 and Corollaries 1 and 2. All
these required tests for robustly testable paths/sub-paths must be generated so that the test generation
may conclude and fault coverage can be computed. Hence, we obtain the following result.
Theorem 8. The maximum robust PDF coverage is obtained if scan chain configurations are selected
as per Corollaries 1 and 2 assuming the optimal set of scan chain configurations are available, and
every robustly testable path from primary input to primary output is tested either by itself or by
multiple disjoint sub-paths that partition the path at NTBLs.
29
3.8. Test generation under the optimal set of scan chain configurations
If the optimal set of scan chain configurations (i.e., all-normal, all-scan, every possible single-
normal configurations) are available, we prove in Section 3.8.1 that the proposed test generation
approach under the optimal set of scan chain configurations guarantees the optimal robust PDF
coverage. Hence, no other path delay testing approach for latch-based circuits can obtain higher
robust PDF coverage than that of our proposed approach.
In Section 3.8.2 we describe the test generation procedure under the optimal set of scan chain
configurations. We show that test generation for any single/multi-segment path in CUT is
significantly simplified due to the optimality of coverage the proposed approach achieves.
3.8.1. Theoretical maximum delay fault coverage for latch-based circuits
In general, path delay testing of a structurally long path in a circuit is difficult because test must
meet specific requirements at many off-path inputs to sensitize the path. As noted in Chapters 1 and 2,
multi-segment paths must be targeted in latch-based high-speed pipelines in case time borrowing
occurs. This is believed to increase the complexity of test generation. However, our proposed latch-
based delay testing approach always achieves theoretical maximum delay fault coverage of a latch-
based circuit regardless of length of multi-segment paths being targeted or the complexity of test
generation. This can also reduce the complexity of test generation procedure significantly.
We obtain the following new results (Theorems 9, 10, 11, and 12), from which we prove that our
latch-based delay testing approach achieves theoretical maximum path delay fault coverage, provided
that the optimal set of scan chain configurations are available.
First, in Theorem 9, we identify paths for which it is structurally impossible to generate a robust
test using any scan-based path delay testing methods.
Theorem 9. If a single-segment path q in block C
i
is robustly untestable when C
i
is tested by itself,
then any multi-segment path Q that includes q is also robustly untestable using any scan-based path
delay testing method which controls and observes values only at latches.
30
Proof: It is given that a robust test does not exist for the path q, using any path delay testing method
where values are applied at the latches at inputs of C
i
and responses captured at outputs of C
i
. Note
that a robust test does not exist for q even when latches at inputs of C
i
are independently controlled.
Hence, any multi-segment path Q that includes q cannot be robustly tested since no test for Q can
robustly sensitize its sub-path q, regardless of delay testing method used.
Theorem 10 (Theoretical maximum coverage). Suppose that the total number of paths from primary
inputs to primary outputs in a latch-based circuit is N. If m paths out of these N paths include at least
one single-segment sub-path that is robustly untestable, then the theoretical maximum path delay fault
coverage of the latch-based circuit is (N – m)/N for any scan-based method.
Proof: According to Theorem 9, it is proven that no scan-based path delay testing approach can
generate a robust test for m paths, since they include at least one robustly untestable single-segment
sub-path. Hence, no scan-based path delay testing approach can robustly test more than N – m paths
in the circuit.
Next, the following theorems prove that the proposed latch-based delay testing approach is
guaranteed to robustly cover the remaining N – m paths regardless of time borrowing status inside the
circuit, attaining the theoretical maximum coverage.
Theorem 11. If every single-segment sub-path that is included in a k-segment path P in an n-stage
latch-based pipeline (1 ≤ k ≤ n) is robustly testable, then P is always robustly testable by the latch-
based path delay testing approach proposed in this chapter.
Proof: For simplicity, we first assume that P is a two-segment path where two single-segment paths p
in C
0
and q in C
1
are connected via latch L
D
. If p is robustly testable in C
0
with a rising (falling)
transition arriving at L
D
and q is robustly testable in C
1
with a rising (falling) transition departing
from L
D
, then we prove that the multi-segment path P comprised of p and q is robustly testable in
SCUT comprised of C
0
and C
1
by scanning all off-path latches between C
0
and C
1
. As shown in
Figure 7, p is a path from L
B
to L
D
in C
0
and q is a path from L
D
to L
H
in C
1
. Let a robust test Test
p
for
31
p in C
0
apply vector (V
A
, V
B
, V
C
) at the input latches L
A
, L
B
, and L
C
, where V
B
is a transition that
propagates via p and terminates at L
D
with a rising (falling) transition. Let a robust test Test
q
for q in
C
1
apply vector (V
D
, V
E
, V
F
) at the input latches L
D
, L
E
, and L
F
, where V
D
is a rising (falling) transition
that propagates via q. When we target the multi-segment path P comprised of p and q in an SCUT
comprised of C
0
and C
1
using the latch-based delay testing approach, we can scan L
A
, L
B
, and L
C
as
well as two off-path latches L
E
and L
F
, regardless of time borrowing status at L
D
, L
E
, and L
F
according
to Section 3.5. Recall that the on-path latch L
D
can be always configured in normal mode regardless
of time borrowing status according to Section 3.4. A simple combination of Test
p
and Test
q
becomes a
robust test vector (V
A
, V
B
, V
C
; V
E
, V
F
) for the two-segment path P, where V
D
value is a rising (falling)
transition implied by the values of V
A
, V
B
, and V
C
. An example is shown in Figure 7. Note that as
Test
p
and Test
q
are generated by controlling all off-path latches completely independently, robust test
(V
A
, V
B
, V
C
; V
E
, V
F
) for P is also generated by controlling all off-path latches completely
independently under the condition that the same type of transition is implied at L
D
. We can easily
generalize the above results for C
0
and C
1
with arbitrary number of inputs. Subsequently, we can
easily generalize these results to paths passing via an arbitrary number of blocks. Hence, we can
prove that the proposed latch-based delay testing approach is guaranteed to generate a robust test for
any multi-segment path that is comprised only of robustly testable single-segment paths.
L
A
L
B
L
C
C
0
L
D
L
E
L
F
L
G
L
H
L
I
φφ
φ
C
1
L
A
L
B
L
C
C
0
L
D
L
E
L
F
φ
φ
L
D
L
E
L
F
L
G
L
H
L
I
φ
φ
C
1
pq
pq
S0
S0
S1
S0
S1 S0
S1
S1
S0
S0
S0
S0
(a) A robust test of p (b) A robust test for q
(c) A robust test of two-segment path of p and q
Figure 7. A robust test for a multi-segment path can be obtained by combining the robust tests
of its single-segment subpaths.
32
Theorems 9, 10, and 11 lead to the following result.
Theorem 12. The proposed latch-based path delay testing approach is guaranteed to achieve the
theoretical maximum delay fault coverage by configuring all off-path latches in scan mode.
Note that Theorem 12 implies that we can achieve the maximum robust PDF coverage even
when we target only the longest multi-segment paths that start from a primary input to a primary
output, at the cost of high test application cost. In other words, the proposed approach optimizes the
robust PDF coverage even when time borrowing occurs ubiquitously throughout CUT, overcoming
one of the inherent delay testing problems of latch-based circuits with time borrowing. Also, the
maximum robust PDF coverage obtained by the proposed approach is the same as the robust PDF
coverage achievable by testing every block of pipeline separately in a divide-and-conquer fashion.
3.8.2. Proposed test generation approach
As implied in Theorems 11 and 12, a robust test for a multi-segment path can be constructed
simply by combining tests for single-segment sub-paths of the target multi-segment path. Hence, the
proposed test generation approach initially generates and stores tests for all single-segment paths.
Similar to the basic delay testing strategy described in Section 3.2.4, test application starts from the
first stage of a pipeline and gradually expands/moves to subsequent stages. According to Corollaries
1 and 2, the all-scan configuration is used at a level of latches where the on-path latch is identified as
a NTBL and the single-normal configuration is used at a level where the on-path latch is identified as
a TBL. The test procedure is illustrated next.
First, the tests for single-segment paths in the first block C
0
are applied and sites of time
borrowing and non-time borrowing are identified at the level-1 latches. For NTBLs at level-1, single-
segment paths in the fan-out of these latches are tested using the single-segment tests in C
1
that are
already generated and stored, using the all-scan configuration at level-1 latches. For TBLs in level-1,
a single-normal configuration is used that configures the TBL in normal mode, and tests for each two-
segment path via the TBL are constructed by combining the tests for the corresponding two single-
33
segment sub-paths in C
0
and C
1
, respectively, only excluding the value for the on-path latch that is
now configured in normal mode. In the same manner, single-segment paths in C
2
or multi-segment
paths in C
1
+ C
2
and/or C
0
+ C
1
+ C
2
are tested based on the time borrowing status at level-2 latches.
This procedure continues until the last block is considered.
The next section describes our approach for test generation under any set of available scan chain
configurations – even those that do not include all of the above k+2 scan chain configurations.
3.9. Test generation under limited scan chain configurations
Some scan chain configurations may not be allowed due to considerations such as performance
overheads associated with using scan at particular latches. Also, to further reduce DFT overheads, a
small number of scan chain configurations may be identified during circuit design using the
probability of time borrowing at each latch. Restrictions on scan chain configurations, however,
trigger the complication of not having the optimal configuration available to test a target path under
the particular time borrowing detected in a particular instance of the circuit under test. In this context,
we propose and demonstrate a new test generation approach that optimizes coverage by considering
the time borrowing status of a CUT in combination with the available scan chain configurations. We
demonstrate that this new approach exploits the properties and the theorems presented above for any
available set of scan chain configurations to provide high coverage.
3.9.1. Available scan chain configurations
We assume that the greater the flexibility in the operation of a latch, the higher the overall DFT
overheads. We also assume that the available configurations of latches at each level are determined
prior to test generation (and definitely before any chip instances are tested).
Of course, this test generation algorithm directly covers the case where all scan chain
configurations are available. Property 2 is exploited if a NTBL is operating in normal mode as an off-
path latch.
34
3.9.2. Test generation strategy
One of the most important parts of the test generation under restrictions on the scan chain
configurations is the selection of the best available configuration(s) for each test session. This
selection process is essentially based on Corollaries 1 and 2 and Figure 5. For multi-segment paths
that pass via a latch where time borrowing occurs, the best configuration is the single-normal
configuration in which only the on-path latch is in normal mode. If this single-normal configuration is
not available, a configuration should be chosen such that the on-path latch is in normal mode and as
many off-path latches are in scan mode as possible, based on Theorems 5 and 6.
In some cases, multiple configurations may be used to target a given set of path to generate a
test. For example, suppose that the circuit shown in Figure 4 has four level-1 latches L
11
, L
12
, L
13
, and
L
14
. If multi-segment paths passing via L
12
, which is identified as a TBL, are being targeted, the
configuration (s, n, s, s) is optimal requiring L
12
in normal mode, as shown in Figure 5. However,
suppose only the following configurations are supported by DFT: {(n, n, n, n), (s, s, s, s), (s, n, n, n),
(s, n, s, n), (s, n, n, s)}. Among the configurations, we will use two, namely (s, n, s, n) and (s, n, n, s),
since (a) both of these configurations provide better coverage than (n, n, n, n) and (s, n, n, n) (see
Figure 5), and (b) each one of these configurations may provide coverage for some paths which the
other configuration may not cover (since there is no arrow from either of these configurations to the
other in Figure 5). We have developed an algorithm that identifies a minimal subset of the available
scan chain configurations to be used for testing any set of target paths, under any given scenario of
time borrowing.
In general, there may exist multiple versions of SCUTs in one test session (each session differs
in the last stage of the target SCUTs) because the best scan chain configurations may be different for
different target paths (details will be discussed in Section 3.9.3). Since multiple versions of SCUTs
are tested, it is also necessary to avoid testing the same path many times. Our proposed ATPG selects
the best set of SCUTs, manages multiple versions of SCUTs, avoids any unnecessary repetition in
testing of paths, and properly computes the delay fault coverage for the entire pipeline. We have
35
developed an algorithm that identifies the subset of all available scan chain configurations to be used
for testing of any set of target paths, under any given scenario of time borrowing (Figure 8).
Procedure:ATPG_MultiSCUTs( ){
Read the pipeline circuit file and available latch configurations;
Initialize SCUT_list[level] for each level;
For each level {
/* select best configurations for the latches in the current level */
For each latch of the current level {
If (time borrowing = true & corresponding single-normal mode is available)
Select the single-normal mode;
Else if (time borrowing = false & all-scan mode is available)
Select the all-scan mode;
If (no configuration is selected above)
Select all compatible configurations modes µ
1
, µ
2
, ···, µ
r
;
For all pairs of configurations µ
j
and µ
k
{
If µ
j
has a superset of latches in scan than µ
k
,
Then, eliminate µ
k
;
}
}
/* construct SCUTs */
For each configuration selected by at least one latch {
If (selected configuration consists of scan modes only)
Construct an SCUT with a single stage;
Add new SCUT to SCUT_list[level];
Else
Combine the selected configuration of current level with
all entries of SCUT_lilst[level – 1];
Add new SCUT(s) to SCUT_list[level];
}
For all latches of all levels within the longest SCUT {
If (time borrowing = false)
Initialize sub-path_list[ ] for this latch to trace tested paths;
}
For all stage inputs within the longest SCUT {
Initialize sub-path_list[ ] for the stage input to trace tested paths;
}
For each SCUT in SCUT_list[level] {
Based on the latch configurations,
Remove/inactivate transitive fanins of latches in scan mode;
Remove/inactivate transitive fanins of stage outputs except for
those in the last stage;
Determine the current primary inputs and primary output;
/* Test of an SCUT */
Call TestATPG procedure for the current SCUT {
For each target path {
Clear line values;
PreProcessRobust( ) {
Robustly sensitize the target path;
If (any line is removed), skip the target;
If (latch is met that is in normal w/o time borrowing)
Target the path starting at this latch;
}
ATPGprocedure( ) {
Generate test for the target path;
Removed/inactivated parts should be ignored;
Implication( ) considering glitch-free signal at
output of latches w/o time borrowing;
Write test vector to a file;
Accumulate PDF coverage information;
A path is not tested more than once;
}
}
}
} /* end of testing all SCUTs of the current level */
Get time borrowing results at the current output latches;
}
Figure 8. Proposed ATPG algorithm.
36
3.9.3. Proposed test generation approach
The proposed test generation approach is illustrated using a three-stage linear pipeline shown in
Figure 9. Suppose that for level-1 latches, (L
D
, L
E
, L
F
), DFT is designed to support three
configurations: {(n, n, n), (n, s, s), (s, s, s)}; and for level-2 latches, (L
G
, L
H
, L
I
), to support two
configurations: {(n, n, n), (s, s, s)}. It is assumed that time borrowing occurs at L
D
, L
E
, and L
I
in a
copy of the chip under test.
The SCUTs at each level are shown in gray in Figure 9. Bold solid lines are used to represent the
paths targeted in each SCUT, and dotted lines are used to represent the paths that are not targeted but
used to apply values at off-path inputs. The fan-ins of latches operating in scan mode are ignored in
Figure 9 since they are not considered by the ATPG. As shown in the figure, the hazard-free property
described in Property 2 is exploited when non-time borrowing latches, L
F
, L
G
, and L
H
, operate in
normal mode as off-path inputs.
The test procedure is summarized as follows. First, C
0
is tested (SCUT
0
) and time borrowing is
detected at L
D
and L
E
. To target the two-segment paths via L
D
, the configuration (n, s, s) is selected at
level-1 latches to obtain SCUT
10
. To target the paths via L
E
, (s, n, s) is desired but not available.
Therefore, (n, n, n) is selected for level-1 latches to obtain SCUT
11
. However, Property 2 is exploited
at the non-time borrowing off-path latch L
F
. To target the sub-paths starting from L
F
, (s, s, s) is
selected for level-1 latches to obtain SCUT
12
. During testing of SCUT
10
, SCUT
11
, and/or SCUT
12
, time
borrowing is detected at L
I
. For the sub-paths starting at L
G
and L
H
, the best configuration (s, s, s) is
used for the level-2 latches. Lastly, to test the multi-segment paths via L
I
, (n, n, n) is selected for the
level-2 latches. When combined with each of the three previous SCUTs, namely SCUT
10
, SCUT
11
, and
SCUT
12
, this gives rise to SCUT
21
, SCUT
22
, and SCUT
23
.
37
C 0 C 2
L D
L F
L E
L G
L I
L H
C 1
(s,s,s)
SCUT 20
L J
L L
L K
L A
L B
L C
Level-0 Level-1 Level-2 Level-3
L A
L B
L D
L F L C
L E
L J
L L
L K
L G
L I
L H
(n,n,n)
SCUT 11
C 0 C 2
L A
L B
L D
L F L C
L E
L J
L L
L K
L G
L I
L H
C 1
(n,s,s)
Level-0 Level-1 Level-2 Level-3
SCUT 10
C 0 C 2
L J
L L
L K
L G
L I
L H
C 1
Level-0 Level-1 Level-2 Level-3
L A
L B
L D
L F L C
L E
SCUT 0
Test two-block paths via LD
Test two-block paths via LE
Test sub-paths starting at LF
Test sub-paths starting at LG and LH
Test three-block paths via LD and LI
Test three-block paths via LE and LI
Test two-block sub-paths starting at LF
and passing via LI
L D
L F
L E
L J
L L
L K
L G
L I
L H
(s,s,s)
SCUT 12
L A
L B
L C
L A
L B
L D
L F L C
L E
L G
L I
L H
(n,s,s) (n,n,n)
SCUT 21
L J
L L
L K
L D
L E
L I
L G
L H
(n,n,n)
SCUT 22
L J
L L
L K
L A
L B
L F L C
(n,n,n)
L D
L E
L I
L G
L H
(n,n,n)
SCUT 23
L J
L L
L K
L F
(s,s,s)
L A
L B
L C
L D
L E
L I
L x L x
L x L x
: Applying Observation1
: Time borrowing site
: On-path within SCUT
: Off-path within SCUT
: Active region of SCUT shaded
Figure 9. Test procedure – managing multiple SCUTs.
3.10. Experimental results and comparison
The proposed approach is applied to several circuits for various time borrowing scenarios under
a diverse set of available scan chain configurations. The circuits include a five-stage linear pipelined
array multiplier, five- and ten-stage versions of a linear pipeline that uses copies of the circuit C17
from the ISCAS ’85 benchmark suite (the connections among the stages are based on the pipeline
used in [9]), five- and ten-stage versions of a pipeline MIN (minimum vector selector from [9]), and
five- and ten-stage versions of a pipeline that uses copies of T1 from [1].
To verify the improved coverage provided by the proposed method, the robust PDF coverage
and the number of tests are compared for four different approaches under the given scan chain
configurations.
38
3.10.1. Test generation approaches
(1) Classical (Classical approach). The entire CUT is tested as a single SCUT, since classical
approaches cannot use DFT for delay testing of latch-based circuits with time borrowing.
(2) ITC03.ext (An extended version of our previous approach presented in [9]). The approach in
[9] is modified to deal with cases where not all scan chain configurations are available.
In particular, the approach in [9] does not consider any restriction on the scan chain
configurations, and configures the latches such that (a) normal mode is used for all latches that are
sites of time borrowing, and (b) scan mode is used for all latches that are not sites of time borrowing.
Note that condition-(b) is not required but used to attain higher coverage. Hence, the extended version
of the approach in [9] (ITC03.ext) is implemented such that it chooses a single configuration for a
level of latches that satisfies condition-(a) and has the most NTBLs in scan mode. If multiple
configurations satisfy these two conditions and have the same number of latches in scan mode,
ITC03.ext arbitrarily selects one of them.
(3) Proposed.v1 (The proposed approach without using Property 2). We claim that the approach
we propose improves the test quality due to two new features:
(F
1
) Improvements due to Theorems 5 and 6 (and Corollaries 1 and 2), which suggest using scan
mode for as many off-path latches as possible.
(F
2
) Improvements due to Property 2 that utilizes the hazard-free property at the outputs of
latches that are not sites of time borrowing but configured in normal mode.
The first version of the proposed method, Proposed.v1, implements F
1
only.
(4) Proposed.v2 (The entire proposed approach). The final version of the proposed approach,
Proposed.v2, implements both new features, i.e., F
1
as well as F
2
.
The robust PDF coverage for ITC03.ext is always greater than or equal to that for the classical
approach, regardless of what scan chain configurations are available. Proposed.v1 may improve test
quality compared to ITC03.ext if Theorems 5 and 6 (i.e., F
1
) are applicable. In Proposed.v2, the test
quality can be further improved compared to Proposed.v1 if Property 2 (i.e., F
2
) is applicable.
39
3.10.2. Trends in the experimental results
In practice, the latches at a particular level of a pipeline are connected by a scan chain to scan-
out the captured responses while testing the SCUTs that terminate at the latches. In this case, we can
use the same scan chain to scan vectors to test SCUTs that start at these latches. Hence, we assume in
general that the all-scan configuration is available by default at a level of latches where all latches can
be scanned-out. For this reason, we present in Table 1 the experimental results for cases where the
all-scan configuration is assumed to be available at every level of latches.
We have performed an extensive set of experiments for all above circuits (a linear pipelined
multiplier and five- and ten-stage versions of linear pipelines using copies of C17 [9], MIN [9], and
T1 [1]), assuming arbitrary scan chain configurations, and under various time borrowing scenarios to
demonstrate the benefits of the proposed techniques. The complete results for all circuits for these
diverse scan chain configurations can be found in [11]. Here we present a small subset of results that
illustrate the typical trends.
Table 1 shows the test generation results for a five-stage pipelined multiplier for two different
scan chain configurations, namely configuration-A and configuration-B, and six different time
borrowing scenarios, namely S
1
to S
6
. The scan chain configuration-A assumes that only the all-
normal and all-scan configurations are available at every latch (see the third and fourth columns of
Table 1). The scan chain configuration-B assumes that the all-normal, the all-scan, and all the k
single-normal configurations, i.e., the configurations {(n, s, s, ⋅⋅⋅, s), (s, n, s, ⋅⋅⋅, s), (s, s, n, ⋅⋅⋅, s), ⋅⋅⋅, (s,
s, s, ⋅⋅⋅, n)}, are available at every level of k latches (see the fifth and the sixth columns of Table 1).
The time borrowing scenarios are listed starting with scenarios with the fewest time borrowing sites
(S
1
: no time borrowing) to the most time borrowing sites (S
6
: time borrowing at every latch).
40
Table 1. Five-stage pipeline of array multiplier.
Available scan chain configurations at every level
(A) all-normal, all-scan
(B) all-normal, all-scan,
every single-normal
Time borrowing
sites
Approach
No.of tests
Robust PDF
coverage (%)
No.of tests
Robust PDF
coverage (%)
Classical 640 9.54 640 9.54
ITC03.ext 478 100 478 100
Proposed.v1 478 100 478 100
S
1
: None
Proposed.v2 478 100 478 100
Classical 640 9.54 640 9.54
ITC03.ext 902 95.35 902 95.35
Proposed.v1 764 96.66
+
786 100
+
S
2
: L
12
, L
14
,
L
34
, L
36
, L
37
Proposed.v2 764 96.66
+
786 100
+
Classical 640 9.54 640 9.54
ITC03.ext 1459 9.54 1443 36.61
Proposed.v1 755 82.40
+
1008 100
+
S
3
: L
10
, L
14
,
L
24
, L
28
, L
33
,
L
39
, L
44
Proposed.v2 782 85.16
+
° 1008 100
+
Classical 640 9.54 640 9.54
ITC03.ext 2158 18.65 2158 18.65
Proposed.v1 1174 61.74
+
3178 100
+
S
4
: L
21
, L
24
,
L
27
, L
31
, L
33
,
L
36
, L
42
, L
44
Proposed.v2 1280 63.13
+
° 3178 100
+
Classical 640 9.54 640 9.54
ITC03.ext 1459 9.54 1443 36.61
Proposed.v1 855 77.36
+
1166 100
+
S
5
: L
14
, L
15
,
L
21
, L
22
, L
27
,
L
29
, L
30
, L
33
,
L
37
, L
44
Proposed.v2 858 77.68
+
° 1166 100
+
Classical 640 9.54 640 9.54
ITC03.ext 1459 9.54 1459 9.54
Proposed.v1 1459 9.54 10592 100
+
S
6
: All
latches
Proposed.v2 1459 9.54 10592 100
+
+
: Theorems 5 through 6 (i.e., F
1
) the coverage from ITC03.ext
°: Property 2 (i.e., F
2
) improves the coverage from ITC03.ext
In Table 1 we also report the reasons behind test quality improvements provided by the
proposed approaches Proposed.v1 and Proposed.v2, compared to the test quality of ITC03.ext. Note
that the number of tests may not directly quantify test application times, because the number of stages
constituting each SCUT may vary depending on the configurations used and the time borrowing
scenario.
In all scenarios, except the scenario where time borrowing occurs at none of the latches, the
proposed approach improves the robust PDF coverage significantly while sometimes applying much
fewer tests. For example, in scan chain configuration-A with time borrowing scenario S
3
, ITC03.ext
requires more than twice the number of tests needed for the classical approach, while obtaining the
same low robust PDF coverage (9.54%). In contrast, the proposed approaches Proposed.v1 and
41
Proposed.v2, can achieve much higher coverages, namely 82.40% and 86.15%, respectively, while
using much fewer tests due to Theorems 5 and 6 (i.e., F
1
in Section 3.10.1) and Property 2 (i.e., F
2
in
Section 3.10.1).
Even in an extreme case of time borrowing, namely scenario S
6
, where time borrowing occurs at
every latch, the proposed approach, under scan chain configuration-B, can achieve 100% robust PDF
coverage due to Theorems 5 and 6 (i.e., F
1
) by using only k+2 scan chain configurations at a level
with k latches, while ITC03.ext is unable to improve the coverage of 9.54%.
Similar improvements are observed in the pipelines using copies of MIN, C17, and T1 as
demonstrated in [11]. Also as expected, it is confirmed that Property 2 improves the robust PDF
coverage in almost all scan chain configurations where the all-scan configuration is not available.
With regard to the effect of pipeline length (the number of stages) on the test quality, our results
for the five- and ten-stage pipelines using MIN, C17, and T1 in [11] illustrate that as the number of
stages increases, the coverage decreases for the classical approach and ITC03.ext, while the coverage
for the proposed approach does not decrease significantly. This shows that the proposed approach is
more efficient for delay testing of pipelines with more stages when compared to the classical and
ITC03.ext approaches.
The experiments demonstrate that our test approach, when applied under restricted scan chain
configurations, does not sacrifice much of the benefits of the fully-adaptive approach while
dramatically decreasing DFT overheads by using much fewer scan chain configurations. In the next
chapter, we propose an approach to minimize test application time.
42
CHAPTER 4
Test application cost minimization under maximum coverage
4.1. Motivation example
In Chapter 3, we focused on the optimization of robust PDF coverage and limited ourselves to
approaches where test application starts from the first stage of a pipeline and gradually
expands/moves to subsequent stages. The following example demonstrates the potential benefits of
alternative test schedules that lead to reduction of test application cost. Note that all the assumptions
made in Chapter 3 also apply in this chapter.
Consider a simple two-stage pipeline as shown in Figure 10. Let us assume that scan chain
configurations (n, n), (n, s), (s, n), and (s, s) are available for the level-1 latches (L
1
, L
2
). Hence, four
SCUTs may be constructed for testing: SCUT
0
(C
0
by itself), SCUT
10
(C
1
by itself; using configuration
(s, s) for level-1 latches), SCUT
11
(C
0
+ C
1
; using configuration (n, s) for level-1 latches), SCUT
12
(C
0
+ C
1
; using configuration (s, n) for level-1 latches). According to Chapter 3, the maximum robust
PDF coverage can be obtained when SCUTs are chosen for testing based on the time borrowing status
at the level-1 latches. For example, according to Corollary 2, SCUT
0
and SCUT
10
must be tested if
time borrowing does not occur at L
1
and L
2
.
In order to compare different test schedules and their test application costs, each SCUT is
viewed as a collection of multiple sets-of-paths under test (SPUTs). An SPUT is a group of all paths
that start at a particular input latch, pass through a particular set of latches (if any), and terminate at a
particular output latch. The paths in an SPUT are viewed as parts of an SCUT, which determines the
latches used and their scan chain configurations. In other words, an SPUT specifies a group of paths
as well as the scan chain configurations in use. In our notation, additional subscripts are used to
distinguish various SPUTs that constitute an SCUT. To help explanation in this section, the
hyphenated numbers in subscript represent the indices of latches that are included in the SPUT, and
43
the number in parentheses indicates the index of the SCUT that contains the SPUT. For example,
SPUT
0-2-3(12)
is a group of all paths that start at L
0
, pass via L
2
, and terminate at L
3
, where scan chain
configurations are as specified in SCUT
12
.
In Figure 10, SCUT
0
includes two SPUTs: an SPUT containing all paths that start at L
0
and
terminate at L
1
(SPUT
0-1(0)
) and an SPUT containing all paths that start at L
0
and terminate at L
2
(SPUT
0-2(0)
). SCUT
10
includes two SPUTs: SPUT
1-3(10)
and SPUT
2-3(10)
. SCUT
11
contains three SPUTs:
SPUT
0-1-3(11)
, SPUT
1-3(11)
, and SPUT
2-3(11)
. SCUT
12
contains three SPUTs: SPUT
0-2-3(12)
, SPUT
1-3(12)
, and
SPUT
2-3(12)
. All SPUTs of the circuit in Figure 10 are listed in the second column of Table 2.
Note that SCUT
11
does not include the group of all paths that start at L
0
and terminate at L
2
(SPUT
0-2(11)
). SCUT
11
must reconfigure L
2
in r-capture scan-out mode in order to capture responses
for the testing of SPUT
0-2(11)
, where L
2
is configured in scan mode in SCUT
11
to apply values for the
testing of SPUT
0-1-3(11)
, SPUT
1-3(11)
, and SPUT
2-3(11)
. This modification of scan chain configuration is
nothing but converting SCUT
11
to SCUT
0
as far as the paths between L
0
and L
2
are concerned. Hence,
we need not test SPUT
0-2(11)
when SPUT
0-2(0)
is used. For the same reason, SCUT
12
does not include
SPUT
0-1(12)
.
Note that the paths in SPUT
1-3(11)
and SPUT
2-3(12)
are tested using Property 1 when time
borrowing does not occur at L
1
and L
2
, respectively. Also, note that the paths that start at L
1
and
terminate at L
3
can be targeted as parts of more than one SCUT (i.e., as SPUT
1-3(10)
, SPUT
1-3(11)
, and
SPUT
1-3(12)
), and the paths that start at L
2
and terminate at L
3
can be targeted as parts of more than one
SCUT (i.e., as SPUT
2-3(10)
, SPUT
2-3(11)
, and SPUT
2-3(12)
).
Let us assume that the SPUTs can be tested in any order, independent of the SCUTs to which
they belong. Note that the test schedule of the test generation approach described in Chapter 3 is a
special case where all SPUTs associated with each SCUT are tested one after another. The SPUT
formulation is hence test scheduling at a finer level of granularity. We will further justify the use of
SPUTs in Section 4.3.1. The primary focus of this motivation example is to introduce the fundamental
ideas of the overall optimization problem and our proposed approach.
44
L
1
L
2
L
0
L
3
C
0
C
1
Level-0 Level-2 Level-1
Figure 10. A two-stage pipeline example.
For this example of two stage pipeline, assume that the maximum robust PDF coverage is 95%.
The number of tests for each SPUT is as shown in the fourth column of Table 2. In order to evaluate
the average test application cost for different test schedules, the test application cost is assumed to be
proportional to the number of tests applied. (This implicitly ignores costs that may be associated with
reconfiguring SCUTs and assumes that every test configuration requires equal number of scan clocks.
However, the same ideas can be easily extended to more realistic definition of test cost, which will be
discussed in Section 6.1.1.)
Three test schedules are considered in Sections 4.1.1 to 4.1.3, respectively, to prove that the test
application cost may be improved further without compromising the key benefit of the proposed
approach presented in Chapters 2 and 3, namely high path delay fault coverage. Test schedule 1 in
Section 4.1.1 is used to show the limitations of a non-adaptive approach. Test schedule 2 in Section
4.1.2 is based on the test schedule implied in Chapter 3. Test schedule 3 in Section 4.1.3 is used to
show the application cost can be reduced from that of Test schedule 2.
Although the paths that start at L
1
and terminate at L
3
are included in SPUT
1-3(10)
, SPUT
1-3(11)
,
and SPUT
1-3(12)
, the coverage of the paths between L
1
and L
3
using SPUT
1-3(10)
is greater than or equal
to the coverages using SPUT
1-3(11)
and SPUT
1-3(12)
as per Theorem 6 and Corollary 1. Likewise, the
coverage of the paths between L
2
and L
3
for SPUT
2-3(10)
is greater than or equal to the coverages for
SPUT
2-3(11)
and SPUT
2-3(12)
. In reality, there may exist cases where applying tests to paths in SPUT
2-
3(11)
as a part of SCUT
11
is better than applying tests to the same paths in the form SPUT
2-3(10)
, i.e., as a
part of SCUT
10
, provided that the scan chain configurations of the circuit include the one required by
SCUT
11
and the cost associated with reconfiguring the circuit as SCUT
10
is high, and provided that the
coverages from SPUT
2-3(11)
and SPUT
2-3(10)
are identical. However, since the test application cost
45
function is simplified in this chapter such that it is solely determined by the number of tests applied, it
is assumed that test schedules prefer SPUT
1-3(10)
to both SPUT
1-3(11)
and SPUT
1-3(12)
, and prefer SPUT
2-
3(10)
to both SPUT
2-3(11)
and SPUT
2-3(12)
.
Test schedules are applied to a set of chip instances characterized by Table 2, where the
personality of each chip instance (chip personality) is characterized by the results by r-r tests for the
SPUTs. (For simplicity, r-f tests are not included in this example.) It is assumed for simplicity that
every chip instance has one of the nine chip personalities, P-1 to P-9, with the different characteristics
as shown in Table 2, and the percentages of chip instances that have each of these chip personalities
are shown in the third row of Table 2. Although there are ten SPUTs, each of which either fails or
passes r-r tests, not all 2
10
combinations of the results for r-r tests are possible due to dependencies
among SPUTs. For example, if both SPUT
0-1(0)
and SPUT
1-3(10)
pass r-r tests, SPUT
0-1-3(11)
is
guaranteed to pass r-r tests. If SPUT
0-1(0)
passes r-r tests and SPUT
0-1-3(11)
fails r-r tests, then SPUT
1-
3(10)
is guaranteed to fail r-r tests.
Table 2. The characteristics of chip instances under test based on the results for r-r tests.
Chip personalities: Characteristics of chip instances based on the r-r test results
and their distribution
P-1 P-2 P-3 P-4 P-5 P-6 P-7 P-8 P-9
SCUT SPUT
A(B)
** Latches
No.
of
tests
45%* 10%* 18%* 11%* 9%* 3%* 2%* 1%* 1%*
SPUT
0-1(0)
L
0
-L
1
4 Pass Pass Pass Fail Fail Pass Pass Fail Fail
SCUT
0
(C
0
)
SPUT
0-2(0)
L
0
-L
2
6 Pass Pass Pass Pass Pass Fail Fail Fail Fail
SPUT
1-3(10)
L
1
-L
3
6 Pass Fail Pass Pass Pass Pass Pass Pass Pass
SCUT
10
(C
1
);(s,s)
SPUT
2-3(10)
L
2
-L
3
8 Pass Pass Fail Pass Pass Pass Pass Pass Pass
SPUT
0-1-3(11)
L
0
-L
1
-L
3
12 Pass Fail Pass Pass Fail Pass Pass Pass Fail
SPUT
1-3(11)
L
1
-L
3
6 Pass Fail Pass Pass Pass Pass Pass Pass Pass
SCUT
11
(C
0
+C
1
)
; (n,s)
SPUT
2-3(11)
L
2
-L
3
8 Pass Pass Fail Pass Pass Pass Pass Pass Pass
SPUT
0-2-3(12)
L
0
-L
2
-L
3
24 Pass Pass Fail Pass Pass Pass Fail Pass Fail
SPUT
1-3(12)
L
1
-L
3
6 Pass Fail Pass Pass Pass Pass Pass Pass Pass
SCUT
12
(C
0
+C
1
)
; (s,n)
SPUT
2-3(12)
L
2
-L
3
8 Pass Pass Fail Pass Pass Pass Pass Pass Pass
Existence of a fault fault-free faulty faulty fault-free faulty fault-free faulty fault-free faulty
Maximum coverage 95% · · 95% · 95% · 95% ·
Time borrowing site None L
1
only L
2
only L
1
and L
2
*: The percentage of chip instances with the particular chip personality.
**: A lists the indices of latches included in SPUT
A(B)
, B is the index of SCUT
B
that covers SPUT
A(B)
.
46
In this section, we assume that chip personality distribution is available and use it to compare
the efficiencies of different test schedules. More details on how to obtain the chip personality
distribution are presented in Section 6.1.2 and Appendix.
According to Table 2, the chip instances that belong to P-1, P-4, P-6, and P-8 are fault-free. The
chip instances that belong to P-1, P-2, and P-3 have no time borrowing at the level-1 latches, the chip
instances that belong to P-4 and P-5 have time borrowing only at L
1
, the chip instances that belong to
P-6 and P-7 have time borrowing only at L
2
, and the chip instances that belong to P-8 and P-9 have
time borrowing at both L
1
and L
2
.
4.1.1. Test schedule 1
Suppose that a test engineer designed Test schedule 1 with the expectation that time borrowing
would occur at L
1
only and not at L
2
, where tests are applied in the non-adaptive order specified in
Figure 11.
T1
SPUT
0-2(0)
T2
SPUT
2-3(10)
T3
SPUT
0-1-3(11)
Figure 11. Test schedule 1: Average cost = 23.84.
The test procedure and results are summarized as follows. Note that N
k
refers to the number of
tests for step k and R
k
refers to the percentage of chips that are tested in step k (e.g., N
2
and R
2
for T2).
First, the tests for SPUT
0-2(0)
are applied to all chip instances (R
1
= 100%, N
1
= 6), where the chips
instances in P-6, P-7, P-8, and P-9 fail these r-r tests, i.e., time borrowing is detected at L
2
for P-6, P-7,
P-8, and P-9. After that, the tests for SPUT
2-3(10)
are applied to all chip instances (R
2
= 100%, N
2
= 8),
where the chips in P-3 (18%) fail r-r tests, i.e., the chip instances in P-3 (18%) are identified as faulty
chips and discarded. Then, the tests for SPUT
0-1-3(11)
are applied to the remaining 82% of the chip
instances (R
3
= 82%, N
3
= 12), where the chips in P-2, P-5, and P-9 (20%) are identified as faulty
chips and discarded.
47
The overall average test application cost per chip is 23.84 (=
∑
=
3
1 k k k
R N ). This test schedule
reports the robust PDF coverage of 95% for the chips in P-1, P-4, P-6, P-7, and P-8. However, the
reported coverage is correct only for chip instances of types P-1 and P-4. Since Test schedule 1 is
unable to adaptively change the subsequent SPUTs, it fails to test the multi-segment paths via L
2
for
P-6 and P-8 although time borrowing is detected by SPUT
0-2(0)
, and hence the robust delay coverage
reported for P-6 and P-8 is invalid. Moreover, Test schedule 1 fails to identify the faulty chips of type
P-7, i.e., results in test escape.
In summary, the above results indicate that it is necessary to design a test schedule such that it
does not allow any test escape and it achieves the maximum robust PDF coverage for each chip
instance. Also, the test scheduling must be capable of adjusting the subsequent SPUTs adaptively
such that all TBLs are identified and all multi-segment paths via TBLs are tested. This can be done by
testing SPUT
0-2-3(12)
for the chips that fail SPUT
0-2(0)
(P-6, P-7, P-8, and P-9).
4.1.2. Test schedule 2
Test schedule 2 follows what we propose in Chapter 3, where tests are performed from the first
stage of a pipeline and extended to include subsequent stages by constructing SCUTs adaptively
based on time borrowing sites identified during each SCUT testing. Hence, all SPUTs within SCUT
0
are targeted first. Based on the level-1 latches identified as sites of time borrowing, multi-segment
paths and/or single-segment paths in C
1
are targeted adaptively. This schedule is summarized in
Figure 12.
From T1 – T2, time borrowing sites are all identified. Accordingly, P-1, P-2, and P-3 continue
with T3, P-4 and P-5 with T3’, P-6 and P-7 with T3”, and P-8 and P-9 with T3”’. At T3, the chip
instances in P-2 (10%) are identified as faulty and discarded. At T4, the chip instances in P-3 (18%)
are identified as faulty and discarded. At T4’, the chip instances in P-5 (9%) are identified as faulty
and discarded. At T4”, the chip instances in P-7 (2%) are identified as faulty and discarded. At T3”’,
48
the chip instances in P-9 (1%) are identified as faulty and discarded. The average value of total test
application cost is computed as follows:
Overall average test application cost = . 4 . 25
% 1 % 2
% 5
% 20
% 63 % 73
% 100
' " 4 ' " 3
4
3
"
4
3
'
4 3
2
1
=
× + × +
× +
× +
× + ×
+ ×
∑
∑
∑
=
=
=
N N
N
N
N N
N
k
k
k
k
k
k
As described in Chapter 3, all faulty chip instances are identified, and all fault-free chip
instances are tested robustly with the corresponding maximum coverage since all SCUTs that are
required to achieve the maximum coverage defined by Chapter 3 are tested by Test schedule 2.
Time
borrowing
sites?
T1
SPUT
0-1(0)
T2
SPUT
0-2(0)
T3
SPUT
1-3(10)
T4
SPUT
2-3(10)
T3'
SPUT
2-3(10)
T4'
SPUT
0-1-3(11)
T3'"
SPUT
0-1-3(11)
T4'"
SPUT
0-2-3(12)
None (73%)
L
1
(20%)
L
2
(5%)
P-4,P-5
P-8, P-9
P-1, P-2, P-3
L
1
& L
2
(2%)
T3"
SPUT
1-3(10)
T4"
SPUT
0-2-3(12)
P-6,P-7
T3 discards P-2 (10%) T4 discards P-3 (18%)
T4' discards P-5 (9%)
T4" discards P-7 (2%)
T3'" discards P-9 (1%)
Figure 12. Test schedule 2: Average cost = 25.4.
4.1.3. Test schedule 3
Another interesting test schedule is designed to demonstrate that Test schedule 2, which is based
on Chapter 3, can be improved in terms of test application cost without compromising the robust PDF
coverage. Test schedule 3 is shown in Figure 13.
Whenever an SPUT that terminates at a level-1 latch is tested, the time borrowing status at the
output latch is checked and multiple alternatives for subsequent testing are explored as shown in
Figure 13. In this test schedule, all fault-free chip instances are tested robustly with the maximum
49
coverage since all SCUTs that are required to achieve the maximum coverage defined by Chapter 3
are tested in Test schedule 3. The overall average test application cost is 22.68 using similar
calculations as in Section 4.1.2. This shows 10.7% improvement in test application cost compared to
the cost for Test schedule 2, while identical robust PDF coverage is obtained for each chip personality.
Thus, this example illustrates that we can further improve the test application cost via optimal test
scheduling.
Thus this example illustrates that overall test application cost depends not only on the test
application costs of SPUTs used in test schedule but also on the probabilities of time borrowing at
latches. Hence, complexity of test scheduling problem grows with the number of logic blocks and the
number of latches.
T1
SPUT
2-3(10)
T2
SPUT
0-1(0)
T3
SPUT
0-1-3(11)
Time
borrowing at
L
2
?
Time
borrowing at
L
1
?
T3'
SPUT
1-3(10)
P-4, P-5,
P-8, P-9
(22%)
Yes
Yes
No
No
T4
SPUT
0-2(0)
T1 discards P-3 (18%)
No more test
necessary
P-1, P-2,
P-6, P-7
(60%)
P-8 (1%) T5
SPUT
0-2-3(12)
T3 discards P-5, P-9
(10%)
Time
borrowing at
L
2
?
Yes
No
T4'
SPUT
0-2(0)
P-6, P-7 (5%) T5'
SPUT
0-2-3(12)
P-4 (11%)
T5' discards P-7 (2%)
P-1 (45%)
T3' discards P-2 (10%)
No more test
necessary
Figure 13. Test schedule 3: Average cost = 22.68.
4.1.4. The overall optimization problem
In order to accomplish the overall optimization of DFT design as testing for latch-based high-
speed circuits with time borrowing, the minimization of test application cost must be achieved under
the constraint that the test scheduling method guarantees the maximum delay fault coverage that can
50
be obtained by the proposed delay testing method in Chapter 3. Next we present a systematic
approach for the overall optimization problem under this constraint.
4.2. Unique characteristics of the optimization problem
The test application cost minimization problem is similar to the classical test scheduling (or test
scoring) problem [20][24][31], to the extent that the order in which test vectors are applied is
important to reduce the expected value of test application cost. However, as we explain in this section,
our test application cost problem has some unique characteristics that make it more challenging than
the classical test scheduling problem. In particular, passing/failing test results have different
implications and benefits that depend on the time borrowing status at the input latch, the type of
output latch (whether it is a primary output), and dependencies among tests. Furthermore, we must
apply tests adaptively according to time borrowing status identified during testing.
In the test approach proposed in Chapter 3 and the motivation example in Section 4.1, for
simplicity of analysis it is assumed that a chip under test has either skipped r-f tests (since extreme
time borrowing was not expected) or passed all r-f tests applied (since extreme time borrowing does
not exist). However, in some chips, r-f tests may be used to identify faulty chip instances before
applying many r-r tests, which can reduce the test application cost. On the other hand, applying all r-f
tests for every SPUT may be impractically costly. Our preliminary ideas on how to incorporate r-f
tests as well as r-r tests into the overall optimization problem are considered in Section 4.2.1, where
we identify the meanings of test failure. The benefits of r-f tests as well as r-r tests are discussed in
Section 4.2.2. However, for simplicity, the rest of the problem formulation is carried out without
considering r-f tests. See Section 6.1.3 for more details regarding the extension that considers r-f tests
as well as r-r tests.
4.2.1. Meaning of test results – Dependencies among SPUTs
In conventional delay testing, a failing test simply identifies a faulty chip at that particular clock
frequency, which can be discarded immediately without any further testing. In contrast, in our
51
framework, failing a test does not necessarily mean that the chip under test is faulty. In addition, some
tests are dependent on other tests. For instance, even when r-r tests for a target path pass, the latch at
the end of the path (output latch) may possibly be a site of time borrowing in case the latch at the
beginning of the path (input latch) is a TBL. In other words, passing r-r tests can definitively identify
the time borrowing status at the output latch only if the input latch is identified as a NTBL. Table 3
summarizes the meanings of all possible results of r-r and r-f tests for a target SPUT that starts at L
in
(input latch) and ends at L
out
(output latch), under different conditions on L
in
and L
out
. Note that L
in
is
considered as a NTBL if it is a primary input.
Table 3. The implication of test results for a target SPUT from L
in
to L
out
.
Results
Case
r-r
tests
r-f
tests
Is L
out
a
primary
output?
Time
borrowing at
L
in
?
Time borrowing at
L
out
Meaning
1 (Fail) Fail No Yes/No
2 Fail n/a* Yes Yes/No
Yes Faulty chip instance
3 Fail Pass No Yes/No Yes
Multi-segment paths via L
out
must be
tested
4 Pass (Pass) No No
No for the target
SPUT
Target SPUT does not borrow time at L
out
5 Pass n/a* Yes No
No for the target
SPUT
Target SPUT is fault-free
6 Pass (Pass) No Yes
7 Pass n/a* Yes Yes
Unknown
Cannot determine time borrowing status
at L
out
since L
in
is a time borrowing latch
( ): test result automatically known from the other test result.
*: r-f tests are not applicable (n/a) since L
out
is a primary output.
If L
out
for an SPUT is a primary output for the latch-based part (L
out
may be connected to a flip-
flop-based blocks), r-f tests are not necessary and hence only r-r tests are applicable (Cases 2, 5, and
7). If r-r tests for an SPUT pass, the r-f tests for the SPUT are known to pass. On the other hand, if r-f
tests for an SPUT fail, r-r tests for the SPUT are known to fail. This dependency between r-r tests and
r-f tests for any SPUT is denoted in Table 3 using parentheses. Table 3 includes all possible cases of
two test results as well as the conditions of L
in
and L
out
, except for the cases where r-f tests for an
SPUT fail when r-r tests for the SPUT pass. This case is impossible because r-r tests for an SPUT
always fail if the corresponding r-f tests fail.
52
If r-f tests for an SPUT fail, the chip instance is identified as faulty regardless of time borrowing
status at L
in
(Case 1). When L
out
is a primary output, failing r-r tests for the SPUT indicates that the
chip instance is faulty, regardless of the time borrowing status at L
in
(Case 2). In Case 3 where r-r
tests for an SPUT fail and r-f tests pass, L
out
, which is not a primary output, is identified as a site of
time borrowing regardless of the time borrowing status at L
in
, and hence the multi-segment paths via
L
out
must be tested. In Cases 4 and 5, passing r-r tests imply that the target SPUT does not borrow
time at L
out
(Case 4) and the SPUT is fault-free (Case 5), respectively, because L
in
is known as being a
NTBL. In contrast, in Cases 6 and 7, passing r-r tests cannot determine the time borrowing status at
L
out
, since L
in
is a TBL. Hence, we see that passing r-r tests can determine the time borrowing status at
L
out
only if L
in
is an NTBL.
It should be noted that a test result for an SPUT may have dependencies with test results for
other SPUTs. For instance, in Cases 4 and 5, the test result for the current SPUT can be analyzed only
if the time borrowing status at the input latch L
in
is known as being an NTBL. In addition, in Case 3,
since L
out
turns out to be a TBL, it is necessary to test multi-segment SPUTs that pass via L
out
. Hence,
another important characteristic of the optimization problem is that it often requires adaptation in the
sense that the selection of subsequent target paths (i.e., SPUTs) is affected by the latches identified as
sites of time borrowing by the previous tests. As we have seen in Section 4.1.1, non-adaptive
operation may result in test escape and/or over- or under-estimation of robust PDF coverage. More
details of the characteristics and formulation of the overall optimization problem will be presented in
Sections 4.3 to 4.5.
4.2.2. Benefits of r-r tests and r-f tests
When a test is applied to an SPUT, say SPUT
k
, we have an accumulated record of the test results
from the SPUTs that have been tested prior to SPUT
k
. Depending on this record, the time borrowing
status at the input latch, L
in
, and the output latch, L
out
, of SPUT
k
may or may not be known based on
the meanings of test results summarized in Table 3. In other words, the time borrowing status at L
in
53
prior to testing SPUT
k
can fall into one of the following three cases: (a) unknown (time borrowing
status is unknown), (b) non-time borrowing, or (c) time borrowing. Similarly, the time borrowing
status at L
out
prior to testing SPUT
k
is one of the above three cases, i.e., (a), (b), or (c), if L
out
is not a
primary output, or one of the two cases, namely (a) and (b), if L
out
is a primary output. (Note that
when L
out
is a primary output the chip under test will be discarded if L
out
is identified as a TBL, i.e., if
any r-r test fails at L
out
.)
Application of a test to an SPUT provides different benefits depending on the accumulated
record of the results of prior tests. First, suppose r-r tests are applied to SPUT
k
. If the time borrowing
status at L
out
is already known as being either time borrowing or non-time borrowing prior to testing
SPUT
k
, applying r-r tests to SPUT
k
does not provide any benefit in terms of fault coverage,
knowledge of time borrowing status, or information used by subsequent tests of other SPUTs,
regardless of the time borrowing status at L
in
. Also, when the time borrowing status at L
out
is unknown
and L
in
is known as being time borrowing, passing of r-r tests for SPUT
k
at L
out
provides no benefit
(Cases 6 and 7 of Table 3). On the other hand, in a case where SPUT
k
fails r-r tests under the
condition that the time borrowing status at L
out
is unknown, the benefit when L
out
is not a primary
output is that L
out
is identified as a TBL (Case 3 of Table 3), and the benefit when L
out
is a primary
output is that the chip instance is identified as having a delay fault and is discarded (Case 2 of Table
3). In case where SPUT
k
passes r-r tests under the condition that the time borrowing status at L
out
is
unknown and the time borrowing status at L
in
is known as being non-time borrowing (Cases 4 and 5
of Table 3), the benefit of r-r tests of SPUT
k
is that this result may enhance the coverage if SPUT
k
and
other SPUTs terminating at L
out
collectively and eventually identify L
out
as being a NTBL. However,
if some other SPUT that terminates at L
out
fails, this test result of SPUT
k
will not be used to enhance
the coverage, since L
out
is eventually identified as being a TBL.
Second, suppose r-f tests are applied to SPUT
k
, whose output latch L
out
is not a primary output.
There is no benefit of such r-f tests if L
out
is known as being a NTBL since these tests will always pass.
Only when SPUT
k
fails r-f tests and the time borrowing status at L
out
is either unknown or time-
54
borrowing, r-f tests provide the benefit that the chip instance is identified as being faulty and
discarded, regardless of the time borrowing status at L
in
(Case 1 of Table 3).
The benefits of r-r tests and r-f tests are summarized in Table 4 and Table 5, respectively.
Table 4. A summary of benefit of r-r tests.
Time borrowing status known from
previous test results
Case
L
in
L
out
Results for
r-r tests applied
Benefit
1
Time borrowing/
non-time borrowing/
status unknown
Time borrowing/
non-time borrowing
Pass/fail None
2 Time borrowing Unknown Pass None
3 Non-time borrowing Unknown Pass
May enhance coverage if this result and other
results of SPUTs terminating at L
out
collectively identify L
out
as being a non-
time borrowing latch.
4
Time borrowing/
non-time borrowing/
status unknown
Unknown Fail
1. Time borrowing detected (if L
out
is not a
primary output)
2. Fault detected (if L
out
is a primary output)
Table 5. A summary of benefit of r-f tests.
Time borrowing status known from
previous test results Case
L
in
L
out
Results for
r-f tests applied
Benefit
1
Time borrowing/
non-time borrowing/
status unknown
Non-time borrowing (Always pass) None
2 Pass None
3
Time borrowing/
non-time borrowing/
status unknown
Time borrowing/
status unknown
Fail Fault detected
In summary, our problem is significantly more complex than the conventional test scheduling
problem, since passing or failing r-r tests for an SPUT have different meanings depending on the time
borrowing status at the input latch, the type of output latch (whether it is a primary output of a latch-
based part of a circuit or not), and results of r-r tests for other SPUTs (i.e., dependencies among
SPUTs). Scheduling r-f tests is somewhat similar to the conventional test scheduling problem in the
sense that a faulty chip is identified and discarded if r-f tests for an SPUT fail at any stage of testing.
55
However, passing r-f tests for an SPUT do not provide any coverage for the SPUT, which makes even
r-f tests different from conventional testing.
4.3. Framework for test scheduling to minimize test application cost
4.3.1. An SPUT-based approach
In Section 4.1, the notion of SPUT (set-of-paths under test) is first introduced to replace SCUT.
Recall that an SPUT is the group of all paths that start at a particular input latch L
in
, pass via a
particular sequence of latches (if any), and terminate at a particular output latch L
out
. Hence, each
SCUT is viewed as a collection of multiple SPUTs. We call this finer-grained approach an SPUT-
based approach. Test scheduling at such a finer granularity (i.e., SPUT) can reduce the overall test
application cost. For example, suppose there are eight SPUTs terminating at a latch L. If a majority of
the chips under test borrow time at L and this can be detected by testing one particular SPUT, then the
test application cost may be reduced by testing this particular SPUT first.
Test scheduling may be performed under even finer granularity (i.e., path) than SPUT. Such a
path-based approach must be supported by diagnostic methods to identify the paths(s) that causes the
observed test failure. Accordingly, Observations 1, 2, and 3 regarding identification of TBLs and
NTBLs in Section 3.2.3 must be modified as follows in the context of paths in order to implement a
path-based approach.
In a path-based approach, a latch L is identified as a TBL for the path(s) that causes failure of
tests at L (Modified Observation 2 for a path-based approach). If a latch L is identified as a TBL for a
path p, all multi-segment paths that pass via L and cover p must be targeted (Modified Observation 3
for a path-based approach). L is identified as a NTBL for all other paths that do not cause failure of
tests at L (Modified Observation 1 for a path-based approach).
In other words, L can be regarded either as a NTBL or as a TBL depending on the path under
consideration. This is likely to reduce the number of target multi-segment paths. However, it also
implies that all paths in the fan-in of L must be tested even after a test fails for path p in order to
56
identify other paths in the fan-in of L that pass. This is likely to increase the number of tests
especially when most paths in the fan-in of L fail the tests. Recall that in an SPUT-based (or SCUT-
based) approach, a latch is regarded as a TBL as long as one path fails at the latch, and no more test is
needed for the paths that terminate at the latch. In consequence, a path-based approach does not
necessarily guarantee reduction of test application cost.
Implementing a path-based approach entails so much complication in the test generation and the
test procedure. For every test, a path-based approach must be supported by diagnostic methods that
identify the paths that cause the observed test failure. Such diagnosis would add impractically high
run-time complexity to testing of every fabricated chip. Also due to the adaptive nature of our testing
approach, it is impractical to store all necessary diagnostic data in the memory of test equipment,
since such data would need to be stored for all possible time borrowing scenarios.
More importantly, path-based approaches cannot further improve the robust PDF coverage,
since Theorem 12 proves that no other scan-based approach can achieve a better robust PDF coverage
than the proposed approach described in Section 3.8.
In summary, due to exorbitant complexity of implementation, uncertain cost reduction, and no
coverage benefit, SPUT is the finest granularity at which we can practically carry out test scheduling
and test generation.
4.3.2. Search space
Now let us consider the space of all possible test schedules that must be searched (implicitly or
explicitly) to find an optimal test schedule. As mentioned earlier, we ignore r-f tests in this
formulation. Also, as observed above SPUTs may be tested in any order. Therefore, a straightforward
description of the search space is a complete tree that considers all combinations of SPUTs. For a
CUT with n SPUTs, such a search tree has n! paths from the root to the leaves. However, such a
complete tree does not capture the dependencies among test results for different SPUTs and the
adaptive features discussed in Section 4.2.1.
57
SPUT
0-1(0)
SPUT
0-2(0)
SPUT
0-2-3(12)
Start
SPUT
2-3(10)
Pass
Pass
Fail
SPUT
0-1(0)
SPUT
1-3(10)
Time
borrowing
SPUT
0-2-3(12)
SPUT
1-3(10)
SPUT
0-2-3(12)
Pass
Pass
Fail
End:Faulty
Pass
Pass
Fail
Time
borrowing
Pass
Pass
Fail
End:Faulty
SPUT
0-1(0)
Pass Pass
Fail
Time
borrowing
Figure 14. A generic search tree for optimal test scheduling.
Hence, a generic search tree is illustrated in Figure 14 for the example in Figure 10. Our search
space can be represented using two types of nodes, namely test nodes and choice nodes. A test node
(a rectangular node in Figure 14) corresponds to application of tests to an SPUT. Each test node leads
to two choice nodes (oval nodes in Figure 14) using thin arrows, one when the tests for the SPUT
pass and the other when they fail. A choice node points to one or more test nodes using thick arrows.
At a test node, we test the SPUT and take the thin-solid arrow if the tests pass. Otherwise, we
take the thin-dotted arrow, which leads to either identification of a faulty chip instance or
identification of a time borrowing site, depending on whether the output latch is a primary output. In
Figure 14, rectangles with dotted lines denote SPUTs that terminate at a primary output and
rectangles with solid lines denote the SPUTs that terminate at a non-primary output. From an oval
node (a choice node), one chooses an SPUT from a set of candidate SPUTs that have not yet been
tested. At a choice node, we must select the test node so as to minimize the test application cost
(under the constraint that the selection leads to the maximum coverage).
58
The derivation of test cost requires the knowledge of the probabilities of occurrence of various
chip personalities. The probability distribution of chip personalities that include time borrowing
behaviors (chip personality distribution) as illustrated in Table 2 can be approximated using the
results of statistical timing analysis for latch-based pipelines [6]. We develop test scheduling
algorithm based on such personality distributions. We discuss statistical timing analysis issues for
latch-based pipelines in Appendix.
It should be noted that a test schedule is not a single path from the root to a leaf of the search
tree unless there exists only a single personality in the personality distribution. As implied in Figure
13, a test schedule is defined as follows.
Definition: A test schedule is a tree that consists of test nodes and thin arrows only, where a
particular thick edge is selected at each choice node in the search tree.
Within a test schedule, each chip under test takes a particular path in the tree from the root to a
leaf based on the test results of test nodes along its path.
4.4. Building a search tree
If we draw the search tree shown in Figure 14 by enumerating all possible combinations of all
SPUTs, it will include many unnecessary trees and nodes, whose test results are already known from
test results of other SPUTs. It may also include SPUTs whose testing will not augment fault coverage.
Hence, we propose several reduction rules to remove such redundant test nodes.
In this chapter, for simplicity we assume that the optimal set of scan chain configurations (i.e.,
the all-normal, all-scan, and every single-normal configuration) are available at every level of latches.
4.4.1. Reduction rules
Let SPUT
k
contain all paths that start at L
0
, pass via L
1
, L
2
, ⋅⋅⋅, L
n-1
(unless SPUT
k
has single-
segment paths), and terminate at L
n
.
59
It is noted in Case 1 of Table 4 that there is no benefit of applying r-r tests to SPUT
k
if the time
borrowing status at output latch L
n
is known as being either time borrowing or non-time borrowing.
Hence, the following two reduction rules are derived.
Reduction rule 1: If L
n
is not a primary output and r-r tests for SPUT
k
fail, remove every SPUT that
terminates at L
n
from the sub-tree rooted at the current node in the search tree.
Suppose there exists SPUT
m
that starts at L
m
and terminates at L
n
. Whether L
m
is a NTBL or TBL,
testing of SPUT
m
is redundant since L
n
is already identified as a TBL from SPUT
k
(Case 1 of Table 4),
meaning that we are now interested in testing multi-segment SPUTs that pass via L
n
(Case 3 of Table
3).
Reduction rule 2: If r-r tests for SPUT
k
pass and this result and previous passing tests identify L
n
as a
NTBL, remove all SPUTs that terminate at L
n
in the sub-tree rooted at the current node in the search
tree.
According to Case 1 of Table 4, if L
n
is identified as a NTBL, testing any SPUT that terminates
at L
n
provides no benefit regardless of time borrowing status at input latch. Hence, it can be removed
from the sub-tree rooted at the current node.
In addition to Reduction rule 2, the following reduction rule is derived regarding multi-segment
paths that pass via L
n
, which is identified as a NTBL.
Reduction rule 3: If r-r tests for SPUT
k
pass and this result and previous passing tests identify L
n
as a
non-time borrowing site, remove all multi-segment SPUTs that pass via L
n
in the sub-tree rooted at
the current node in the search tree.
In this case again, the fan-in of L
n
in the CUT is already tested to its maximum coverage with
passing results. Hence, targeting multi-segment SPUTs that pass via L
n
only increases the number of
target paths and redundantly tests again retests paths in the fan-in of L
n
.
Property 1 in Section 3.4.2 explains a way to test sub-paths that start at a NTBL, say L
0
, which is
configured in normal mode by using multi-segment paths via L
0
. In Reduction rule 4, a similar idea is
60
used when multi-segment paths (or SPUT) via L
0
are tested before the sub-paths that are covered by
the multi-segment paths.
Reduction rule 4: When L
0
is identified as a NTBL after testing SPUTs that terminate at L
0
, remove
SPUT
k
if a longer SPUT, which starts at a latch in the fan-in of L
0
, passes via the latches of SPUT
k
,
and terminates at L
n
, has been tested previously and passed r-r tests.
Suppose that tests are applied to two-segment SPUT
m
, that starts at L
a
, passes via L
b
, and
terminates at L
c
, and all these tests pass. Assume that these tests are applied when L
a
is already
identified as being a NTBL and the time borrowing status at L
b
and L
c
is are unknown. In subsequent
tests, suppose that L
b
is identified as being a NTBL. Although tests are not directly applied to SPUT
k
that starts at L
b
and terminates at L
c
, SPUT
k
has already been tested indirectly in testing of SPUT
m
,
since signals are transmitted to the output of L
b
at the rising edge of its clock in testing of SPUT
m
due
to the fact that L
b
is a NTBL.
These reduction rules can significantly reduce the search space while still guaranteeing that we
can find the optimal test schedule.
4.4.2. Covering test sequences in a test schedule
This section generalizes Section 3.7 and characterizes required targets of test generation to cover
CUT in the context of SPUT-based delay testing approach and test scheduling.
In the search tree as depicted in Figure 14, a path in the tree from the root reaches a terminal
node (leaf) if a fault is detected for a faulty chip instance or if sufficient tests are applied to cover the
entire CUT for a fault-free chip instance. Each path from the root in a test schedule tree includes a set
of SPUTs, each of which belongs to one of the following three classes.
(Class 1) An SPUT that is used to identify a TBL.
(Class 2) An SPUT that is used to identify a NTBL.
(Class 3) An SPUT that is redundant but not removed by given reduction rules.
61
For fault-free chip instances, note that the robust PDF coverage is computed from SPUTs of
Class 2 that start and terminate at NTBLs, and cover the entire CUT as disjoint partitions as explained
in Section 3.7. When all such SPUTs to cover the CUT are selected and applied, testing is completed
and no more branch from the search tree is required for the corresponding fault-free chip personality.
We call such a path in a test schedule tree a covering test sequence.
Definition: A covering test sequence is a path from the root to a leaf in the test schedule tree or the
search space where the path includes all necessary SPUTs of Class 2 that cover the entire CUT as
disjoint partitions.
In contrast, the objective of delay testing for a faulty chip instance is to identify a delay fault at
minimum cost. Hence, a test schedule for a faulty chip instance may include SPUTs of Class 3
(redundant) and SPUTs of Class 1 whose output latch is a primary output.
4.5. A deterministic optimization approach
There may exist multiple covering test sequences for a chip instance in the search tree. Although
they provide same maximum PDF coverages, they may differ in test costs. Suppose there are two
different covering sequences CS
1
and CS
2
for a chip instance, whose coverages are guaranteed to be
equal to the maximum coverage defined in Chapter 3. Note that CS
1
and CS
2
may have different
values of test application cost due to the existence of SPUT(s) of Class 3, the differences in the cost
efficiency in identifying TBLs using SPUTs of Class 1, and/or the difference in the cost of SPUTs in
Class 2. Hence, the objective of the overall optimization problem is to find a test schedule that (i)
consists of only covering test sequences, each of which is used by at least one fault-free chip
personality, (ii) identifies every faulty chip instance, and (ii) minimizes the overall test application
cost. Such a collection of covering test sequences that minimize the overall test application cost is
called an optimal test schedule.
62
Since the coverage is guaranteed to be optimal under the condition that the scan chain
configurations are selected as per Theorems 3 through 6 and Corollaries 1 and 2, the overall
optimization becomes a cost minimization problem.
We can derive an optimal test schedule deterministically using the generic search tree.
Beginning with the root node (a choice node), a search tree is constructed in a manner similar to that
described in Section 4.3.2, and reduction rules are applied at each choice node in order to reduce the
number of candidate SPUTs. The search tree expansion is completed by reaching all legitimate leaves
as discussed in Section 4.4.2. Now starting from the leaves and tracing back to the parent nodes, we
can compute the cost of sub-trees at each node. At each choice node in this process, we select one
child node (test node) that has the least cost of the sub-tree and remove all the other child nodes and
their sub-trees. In this manner when we reach the root, an optimal test schedule is determined.
This deterministic optimization algorithm is illustrated in Figure 15. Recall that it is assumed for
simplicity of formulation that the cost is proportional to the number of tests. Let n
i
denotes the
number of tests required for SPUT
i
. SPUT
i
has two child nodes (choice nodes), named SPUT
i
+
and
SPUT
i
–
for the cases where SPUT
i
passes and fails r-r tests, respectively.
Hence, the cost for the testing of the sub-tree rooted at SPUT
i
is the sum of n
i
and the weighted
average of the costs for the child nodes SPUT
i
+
and SPUT
i
–
, i.e.,
n(SPUT
i
) = n
i
+ {p
i
n(SPUT
i
+
) + (1 – p
i
)n(SPUT
i
–
)}, (3)
where the function n(node
i
) denotes the cost of the sub-tree that is rooted at node
i
, and p
i
denotes the
probability that SPUT
i
passes r-r tests. n(node
i
) is also called an average accumulated cost at node
i
.
In Figure 15, let us suppose that SPUT
b
, SPUT
c
, SPUT
d
, and SPUT
e
are the last test nodes of the
sub-tree rooted at SPUT
a
. Since these test nodes do not have additional cost from the subsequent
nodes, the optimal average accumulated costs at these nodes are n
*
(SPUT
b
)
= n
b
, n
*
(SPUT
c
)
= n
c
,
n
*
(SPUT
d
)
= n
d
, and n
*
(SPUT
e
)
= n
e
, where (
*
) denotes an optimal solution. Now the average
accumulated costs at other nodes are computed in a bottom-up fashion. The optimal average
accumulated cost n
*
(SPUT
a
+
) will be equal to the minimum of the costs of children nodes, i.e.,
n
*
(SPUT
a
+
) = min[n
*
(SPUT
b
), n
*
(SPUT
c
)]. (4)
63
Similarly, n
*
(SPUT
a
–
) can be computed as
n
*
(SPUT
a
–
) = min[n
*
(SPUT
d
), n
*
(SPUT
e
)]. (5)
The selection of child node from a test node such as SPUT
a
is determined by the test result
during testing. According to Equation (3), the optimal average accumulated cost n(SPUT
a
) can be
computed as a weighted average of n(SPUT
a
+
) and n(SPUT
a
–
), i.e.,
n
*
(SPUT
a
) = n
a
+ p
a
n
*
(SPUT
a
+
) + (1 – p
a
)n
*
(SPUT
a
–
)
= n
a
+ p
a
min(n
b
, n
c
) + (1 – p
a
)min(n
d
, n
e
)
= n
a
+ p
a
min(n
b
, n
c
) + (1 – p
a
)min(n
d
, n
e
). (6)
In this manner the average accumulated costs at all nodes in the search tree can be computed
using given probabilities p
i
of SPUT
i
passing r-r tests. Hence, this approach obtains an optimal test
schedule, which is a tree that consists of test nodes and thin arrows only, where a particular thick edge
is selected at each choice node in the search tree.
SPUT
a
(n
a
)
Pass
Time
borrowing
SPUT
b
(n
b
)
SPUT
c
(n
c
)
SPUT
d
(n
d
)
SPUT
e
(n
e
)
Pass
Faulty
Pass
Faulty
Pass
Time
borrowing
Pass
Time
borrowing
n(SPUT
a
)
n(SPUT
e
)=n
e
n(SPUT
d
)=n
d
n(SPUT
c
)=n
c
n(SPUT
b
)=n
b
n(SPUT
a
+
)
n(SPUT
a
–
)
1 – p
a
p
a
Figure 15. An example of the cost function computation.
64
4.6. The complexity of the optimization problem
In this section, we show that our optimizing test scheduling problem belongs to a class of prob-
lems for which the best known algorithm has exponential computational complexity and a polynomial
time optimization algorithm may not exist.
Let us consider a latch-based pipeline where the number of combinational logic blocks is h and
the number of latches at every level is k. The objective of our test scheduling approach is to schedule
SPUTs at a minimum test application cost. The total number of SPUTs, including single-segment
SPUTs to h-segment SPUTs, is
1 3 2
) 1 (
+
+ + − +
h
k k h hk L = . ) 1 (
1
1
∑
=
+
+ −
h
i
i
k i h Now consider a spe-
cial case where time borrowing occurs at all latches at level-1 through level-(h–1). If we further as-
sume prior knowledge that this special case has occurred, then in our optimization problem, we know
that only h-segment SPUTs will be used in any optimal solution. In other words, in this special case,
we need not apply tests to any single- or multi-segment SPUT that terminates at a latch at level-1
through level-(h–1), since time borrowing is already known to occur at the output latch of each such
SPUT. Hence, a test scheduling algorithm only needs to consider k
h+1
h-segment SPUTs.
In this special case, the test scheduling approaches proposed in [20] and [24] for conventional
testing can be directly applied to our test scheduling of SPUTs, because our problem becomes identi-
cal to the conventional test scheduling problem. First, testing for a particular chip instance will termi-
nate as soon as a test for a h-segment SPUT fails (discarding the chip as being faulty) or tests for all
SPUTs are applied and pass (the chip is identified as being delay fault-free), same as in conventional
testing. Second, tests for k
h+1
SPUTs are independent and a derived test schedule becomes a single
sequence of SPUTs (instead of a tree, in the general case), in the same manner as the conventional test
scheduling. Third, the conventional test scheduling orders groups of tests, since related test vectors in
a test set are typically grouped together [24]. This is analogous to the fact that in our special case we
need to order k
h+1
SPUTs, where each SPUT corresponds to a group of tests for the paths in the SPUT.
Hence, the computational complexity of the test scheduling problem for latch-based circuits in this
special case is the same as the complexity derived in [20] and [24], which is exponential, i.e.,
65
O(mn2
n
), where m denotes the number of chips and n denotes the number of groups of tests to be
scheduled. Note that n = O(k
h
) in this special case.
The above argument demonstrates that, in general, the complexity of test scheduling for latch-
based circuits is equal to or higher than that of the conventional test scheduling, since more SPUTs,
i.e., , ) 1 (
1 3 2 +
+ + − +
h
k k h hk L
need to be considered in scheduling, meaning of passing/failing test
results varies as discussed in Section 4.2, and tests may not be independent. Consequently, the num-
ber of SPUTs in our test scheduling easily explodes as the number of logic blocks and the number of
latches in the circuit increase. In modern designs, pipelines with over 10 stages are not uncommon
(e.g., Intel Core 2 Duo processor has 14 stages [21]), and logic blocks may have over 30 inputs. For
instance, if k = 30 and h = 14, we have 3.28×10
19
SPUTs to consider. Hence, in the next section, we
propose two heuristic test scheduling approaches.
4.7. Proposed heuristic approaches
As mentioned in Section 4.5, the overall optimization becomes a cost minimization problem
provided that the conditions for optimal coverage are met. Hence, we first identify two major sources
of cost reduction and propose heuristic approaches that try to maximize cost reduction from the major
sources.
4.7.1. Key ideas and overview of the proposed heuristics
There are two major sources of cost reduction. Test application cost is reduced for faulty chip
instances by identifying a fault using fewer tests. For fault-free chip instances, we can reduce test
application cost by removing as many SPUTs of Class 3 as possible, by identifying TBLs with tests
using SPUTs of Class 1, and by efficiently testing the CUT using SPUTs of Class 2.
Let us assume that statistical timing analysis is used to obtain an approximate chip personality
distribution similar to what is shown in Table 2. In our heuristic approaches, a test schedule is
constructed starting at the root node. At each choice node, instead of adding all candidate SPUTs as
66
child nodes, we apply our benefit function to select the heuristic best SPUT that achieves the best
benefit when tested at the current node in the test schedule.
HeuristicSchedulingRecursion ( )
If a covering test sequence is formed or a fault is identified, Return
Else
For each candidate SPUT
Preprocessing (optional)
Apply Reduction rule 1
If Lout is a primary output, return
Select the best candidate SPUT using a benefit function
BranchToPassingChildNode( )
BranchToFailingChildNode( )
Return
BranchToPassingChildNode( )
Add child node for passing result of the chosen SPUT
Include personalities that pass the chosen SPUT only
If Lout can be identified as a NTBL {
Mark Lout as a NTBL
Apply Reduction rules 2, 3, and 4
Check other latches if any of them can be identified as a NTBL
Check if a covering test sequence is formed
Call recursion: HeuristicSchedulingRecursion( )
BranchToFailingChildNode( )
Add child node for failing result of the chosen SPUT
Include personalities that fail the chosen SPUT only
Mark Lout as a TBL
Apply Reduction rule 1
Check if a fault is identified
Call recursion: HeuristicSchedulingRecursion( )
Figure 16. Overall approach that uses proposed heuristics.
For instance, suppose at a particular choice node, there are two candidate SPUTs, SPUT
A
and
SPUT
B
, and there are 20 chip instances under test that includes three personalities, P-x, P-y, and P-z
with the distribution 20%, 30% and 50%, respectively. Now we want to compare the cost and benefit
of each candidate SPUT to select the best candidate. If we select SPUT
A
, the test application cost of
SPUT
A
is the same for the three chip personalities, and is determined by the number of tests required
for SPUT
A
. On the other hand, the benefit of applying tests to SPUT
A
varies from personality to
personality. Hence, we obtain the numerical representation of the benefit of SPUT
A
for each
personality using a benefit function, and take the weighted average, i.e., 0.2Bnft
A-x
+ 0.3Bnft
A-y
+
0.5Bnft
A-z
, where Bnft
A-x
, Bnft
A-y
, and Bnft
A-z
denote the benefit values of SPUT
A
for P-x, P-y, and P-z,
respectively. In the same manner, we obtain the weighted average of benefit for SPUT
B
as 0.2Bnft
B-x
+
0.3Bnft
B-y
+ 0.5Bnft
B-z
. We then select the SPUT that has the highest value of weighted average and
add it as the next test node. We design the benefit functions in Sections 4.2 and 4.3 such that we can
67
compare benefits of a particular SPUT for various personalities and compare weighted averages of
different candidate SPUTs.
After best candidate SPUT is added to the test schedule as a test node, Reduction rules 1, 2, 3,
and 4 are applied depending on pass/fail result of the chosen SPUT. This reduces and updates the
number of candidate SPUTs at newly added choice nodes.
The overall algorithm that uses the proposed heuristics is summarized in Figure 16. We propose
two particular benefit functions in Sections 4.7.2 and 4.7.3. Within the current framework, test
application cost may be further improved by adopting a more efficient benefit function.
4.7.2. Heuristic 1 (H1): Relative benefit function
The benefit function used in the first heuristic approach (H1) is called a relative benefit function.
The relative benefit function is used to find a candidate SPUT that is likely to provide higher benefit
than other candidates compared to the base test schedule. The base test schedule (denoted BTS) is
obtained by modifying the SCUT-based approach in Chapter 3 to an SPUT-based version.
The motivation of H1 is that since we do not know the optimal test schedule, we instead give
higher priority to an SPUT that obtains relatively higher benefit compared to the benefit in the base
test schedule and avoids as much redundancy as possible.
Suppose that there are K candidate SPUTs at the current choice node and each chip instance
belongs to one of I personalities. Let c
k
denote the number of tests required for candidate SPUT
k
. Also,
let TBin
ki
and TBout
ki
indicate whether the input latch L
in
and output latch L
out
of SPUT
k
are
respectively TBL or NTBL for personality P-i. Note that the values of TBin
ki
and TBout
ki
are
determined from the given chip personality distribution. Let rb
ki
denote the relative benefit of SPUT
k
for personality P-i.
The strategy is that after rb
ki
values are obtained for each personality P-i (i = 0, 1, …, I–1), their
weighted average is computed using Equation (7), which becomes the relative benefit for SPUT
k
denoted by RB
k
.
68
( )
∑
⋅ =
i
ki i k
rb d RB (7)
where d
i
is the percentage of chip instances of the P-i (e.g., third row of Table 2). RB
k
is calculated for
every candidate SPUT
k
(k = 0, 1, …, K–1), and H1 selects SPUT that has the highest RB
k
value.
rb
ki
values are calculated as follows. Applying tests to SPUT
k
for P-i may be an SPUT of Class 1
(Section 4.4.2) that identifies a TBL/fault, an SPUT of Class 2 that is used to identify a NTBL for
fault-free chips, or an SPUT of Class 3 that is redundant. All possible cases and their benefit function
values are defined as follows.
First, suppose that personality P-i represents faulty chips according to the chip personality
distribution. If L
out
is a primary output and SPUT
k
fails, then the benefit of applying tests to SPUT
k
is
identification of a faulty chip (Class 1). This benefit is maximized if we detect this fault with the least
cost. Hence, we derive the relative benefit rb
ki
for H1 over BTS by subtracting the total cost TC
i
consumed in H1 for P-i from the total test application cost for P-i in BTS. TC
i
is the sum of costs of
SPUTs that are selected along its path from the root to the current node, including c
k
. Note that H1
requires cheaper cost to identify faulty chips of P-i as the benefit rb
ki
becomes higher.
Another case of Class 1 occurs when the personality P-i represents fault-free chips where
TBout
ki
is TBL and SPUT
k
fails. In this case, rb
ki
is calculated by subtracting the total cost used for
identifying time borrowing at L
out
in H1 from the total cost used for identifying time borrowing at L
out
in BTS.
Second, if TBin
ki
and TBout
ki
are NTBL for a fault-free personality P-i, SPUT
k
falls on Class 2
which may be used for coverage in both BTS and H1. In this case, if SPUT
k
passes via TBLs only or
is a single-segment SPUT, rb
ki
is 0, meaning both H1 and BTS have identical benefit. However, if
SPUT
k
passes via one or more NTBLs, rb
ki
is set to , 2 /
k
c − which is determined empirically to give
higher priority to SPUTs that do not pass via NTBLs. It is because in general we can reduce the
number of tests by scanning more NTBLs, maximizing benefit of scan [9].
Third, if P-i represents faulty chips and if L
out
is not a primary output or SPUT
k
passes r-r tests,
applying tests to SPUT
k
is redundant for P-i (Class 3). If P-i represents fault-free chips and if TBout
ki
69
is TBL but SPUT
k
passes r-r tests, then applying tests to SPUT
k
is redundant (Class 3). Also, for fault-
free personality P-i, if SPUT
k
passes r-r tests where TBin
ki
is TBL and TBout
ki
is NTBL, then applying
tests to SPUT
k
is redundant (Class 3). In case of Class 3, we assign a negative value
k
c − to rb
ki
in
order to give lower priority to SPUTs that are redundant and require higher test application time.
Let us demonstrate H1 using the simple two-block pipeline example shown in Figure 17 with a
chip personality distribution shown in Table 6. Note that P-1 and P-2 are fault-free, while P-3 and P-4
are faulty. Also, note that no latch is a TBL for P-1, only L
1
only is a TBL for P-2, only L
2
only is a
TBL for P-3, and L
1
and L
2
are TBLs for P-4.
BTS is a schedule where SPUT
0-1
is applied to P-1, P-2, P-3, and P-4, and then SPUT
1-2
is
applied to P-1 and P-3 and SPUT
0-1-2
is applied to P-2 and P-4 adaptively. If we apply relative benefit
function at the root node, we obtain the results shown in Table 7.
φφ
C
0
C
1
L
0
L
1
L
2
φ
SPUT
0-1
SPUT
1-2
SPUT
0-1-2
Figure 17. Test scheduling illustration.
Table 6. A chip personality distribution for Figure 17.
P-1 P-2 P-3 P-4
SPUTs
No. of
tests 25% 30% 30% 15%
SPUT
0-1
6 Pass Fail Pass Fail
SPUT
1-2
8 Pass Pass Fail Pass
SPUT
0-1-2
24 Pass Pass Fail Fail
Table 7. Relative benefit function example.
P-1 P-2 P-3 P-4
SPUTs
25% 30% 30% 15%
Weighted
average
SPUT
0-1
Class 2
rb = 0
Class 1
rb = 0
Class 3
rb = -6
Class 3
rb = -6
-2.7
SPUT
1-2
Class 2
rb = 0
Class 3
rb = -8
Class 1
rb = 6
Class 3
rb = -8
-1.8*
SPUT
0-1-2
Class 2
rb = -12
Class 2
rb = 0
Class 1
rb = -10
Class 1
rb = 6
-5.1
70
Let us consider the case of SPUT
0-1-2
as an example. If we apply SPUT
0-1-2
to P-1, SPUT
0-1-2
falls
in Class 2, where TBin and TBout are NTBL. Since it passes via a NTBL, the relative benefit is set to
–24/2. Similarly SPUT
0-1-2
falls in Class 2 for P-2 with the difference that L
1
is a TBL. Hence, the
relative benefit is set to 0. Applying tests to SPUT
0-1-2
identifies a fault in both P-3 and P-4 (Class 1).
The total cost for P-3 in BTS is 6+8 = 14 since SPUT
0-1
and SPUT
1-2
are tested in this order, while the
total cost for P-3 in H1 is 24. Hence, the relative benefit for this case becomes 14 – 24 = –10. In the
same manner, the relative benefit of SPUT
0-1-2
for P-4 is (6+24) – 24 = 6. We then compute the
weighted average using Equation (7).
In this example, we will select SPUT
1-2
as the first test node in the test schedule since it obtains
the highest value of weighted average relative benefit among three candidate SPUTs.
4.7.3. Heuristic 2 (H2): Near-lower-bound function
In the second heuristic approach, H2, we first define the lower-bound on test application cost for
each chip personality and SPUTs required by the lower-bound. The basic idea of H2 is to select an
SPUT that is most likely to make the overall test application cost of H2 as close as possible to that of
lower-bound.
For this purpose, we define a function called a near-lower-bound function. As the name implies,
this function gives a higher value for SPUT that helps the test schedule of H2 be closer to the lower-
bound, meaning higher benefit from applying tests to the SPUT. Using a similar equation as Equation
(7), we compute weighted average value of near-lower-bound function for every candidate SPUT, and
then we select SPUT that obtains the highest value as the next test node.
Comparisons with the lower-bound
A lower-bound on test application cost for a faulty chip personality P-i is the cost of an SPUT
that fails at a primary output at the minimum cost c
min,fault
. For example, a fault of P-3 in Table 6 can
be identified by both SPUT
1-2
and SPUT
0-1-2
. Since SPUT
1-2
has the minimum cost among the two, the
lower-bound for P-3 in Table 6 is 8 by SPUT
1-2
. Hence, if SPUT
k
fails at a primary output at cost c
k
,
71
the near-lower-bound function value nlb
ki
of SPUT
k
for P-i is defined as c
min,fault
– c
k
. For example, in
Table 6, the near-lower-bound function value for SPUT
1-2
for P-3 is 8 – 8 = 0, and the near-lower-
bound function value for SPUT
0-1-2
for P-3 is 8 – 24 = –16.
For all other SPUTs that do not detect a fault in faulty personalities, nlb
ki
is set to –c
k
, since such
SPUTs are not included in the test schedule of the lower-bound.
Now let us consider the lower-bound on test application cost for a fault-free chip. Ideally, if we
already know time borrowing status at every latch in a particular chip instance, we only need to apply
tests to SPUTs of Class 2 to cover the CUT. Hence, the lower-bound for a fault-free personality is
defined by the cost of Class 2 SPUTs that cover the CUT as disjoint partitions where partition is made
at every NTBL. In other words, these Class 2 SPUTs start at a NTBL and pass via only TBLs, if any,
and terminate at a NTBL. For example, suppose that
e d c b a
SPUT
− − − −
starts at L
a
(NTBL), and passes
via L
b
(TBL), L
c
(NTBL), and L
d
(TBL), and terminates at L
e
(NTBL). The same set of paths can be
covered by two SPUTs of Class 2, namely SPUT
a-b-c
, which covers L
a
, L
b
, and L
c
, and SPUT
c-d-e
,
which covers L
c
, L
d
, and L
e
. The definition of the lower-bound for a fault-free chip is based on the
assumption that cost for SPUT
a-b-c-d-e
is always higher than the sum of the costs for SPUT
a-b-c
and
SPUT
c-d-e
.
Based on this definition of lower-bound for fault-free chips, we define the near-lower-bound
function value nlb
kj
of SPUT
k
for fault-free chip personality P-j. If TBout
kj
is TBL and SPUT
k
passes
r-r tests or if TBin
kj
is TBL and TBout
kj
is NTBL, testing of SPUT
k
is not included in the tests
schedule of the lower-bound and hence nlb
kj
is set to a negative value –c
k
.
If TBout
kj
is TBL and SPUT
k
fails r-r tests, nlb
kj
is c
min,fail
– c
k
, where c
min,fail
is the cost of SPUT
that fails at L
out
at minimum cost and its derivation is similar to the derivation of c
min,fault
.
If TBin
kj
is NTBL and TBout
kj
is NTBL, nlb
kj
is calculated as c
min,cover
– c
k
, where c
min,cover
is the
sum of costs for SPUTs that are disjoint partitions of SPUT
k
where partition is made at every NTBL
except for the SPUTs that have already been tested in the previous nodes.
72
Preprocessing for H2
One major difference between H1 and H2 is that H2 first identifies all faulty chip personalities
and then applies tests to cover the remaining fault-free personalities. For the example shown in Table
6 and Figure 17, H2 first identifies faulty chip personalities P-3 and P-4 and then apply tests to cover
fault-free chip personalities P-1 and P-2.
In addition, when handling fault-free personalities, H2 implements a preprocessing as a heuristic
effort to additionally reduce cost by skipping some tests. For example, suppose that time borrowing
status at L
1
in the circuit shown in Figure 17 is unknown at the current node in the search tree and
suppose that the current node includes two chip personalities, P-1 and P-2 of Table 6. According to
the given personality distribution, L
1
is a NTBL in P-1 and a TBL in P-2. If we mark L
1
as a TBL
without applying tests to verify this fact and apply tests to SPUT
0-1-2
, then it will reduce test
application cost for P-2 because we will skip testing of SPUT
0-1
, where cost reduction per chip is 6 for
30% of chip instances of P-2. On the other hand, it will increase test application cost for P-1 because
applying tests to SPUT
0-1
and SPUT
1-2
costs less than applying tests to SPUT
0-1-2
, where cost increase
per chip is 10 (=24 – (6+8)) for 25% of chip instances of P-1. In this manner, our preprocessing
heuristically computes the cost increase as well as the cost reduction. If the cost reduction prevails, L
1
is regarded as a TBL and Reduction rule 1 is applied. This preprocessing is executed for every latch
whose time borrowing status is unknown whenever a new test node is added to the search tree. In the
above example, we choose not to mark L
1
as a TBL since the amount of cost increase is greater than
the amount of cost reduction. In this particular example, cost reduction can prevail if the distribution
of P-2 is 5/3 times higher than that of P-1.
4.7.4. Experimental results
For our experiments, the five-stage pipelined multiplier in Chapter 3 is used based on a
personality distribution containing 23 personalities, P-0 through P-22. About half of the personalities
represent fault-free chips and the rest represent faulty ones. Each personality is designed to have
73
various levels of time borrowing (i.e., time borrowing frequency) as summarized in Table 8. Four
categories are denoted by (i), (ii), (iii), and (iv) as shown in the table.
Table 8. Design of chip personality distribution.
TB frequency
(No. of TBLs)
Low (1-17) High (18-33)
Fault-free (i) P-0, 1, 2, 5, 8, 12, 13, 14 (ii) P-9, 17, 19, 21
Faulty (iii) P-3, 4, 6, 7, 10, 15, 16 (iv) P-11, 18, 20, 22
First, we use eleven different personality distributions at various yield levels from 100% to 0%.
Each personality distribution is denoted using the convention p(i)-(ii)-(iii)-(iv) where each number in
the parentheses refers to the percentage of chips in the corresponding category. For example, p45-20-
30-5 has 45% of chips are fault-free with low time borrowing frequency, 20% are fault-free with high
time borrowing frequency, 30% are faulty with low time borrowing frequency, and 5% are faulty
with high time borrowing frequency. Within each category, the given percentage is evenly distributed
among the corresponding personalities.
For each personality distribution, we apply BTS, H1, and H2, and the results are shown in Table
9. Each result is compared with the lower-bound (LB) and the percentage increase over the LB (i.e.,
sub-optimality) is reported. Also, we carried out test scheduling by randomly selecting SPUT at each
choice node, which is executed ten times for each personality distribution. This experiment clearly
shows that random approaches must be avoided for test scheduling since the sub-optimality for
random selections range from 324% to 18354% over LB. Hence, results of the random selection
approach are omitted from Table 9.
As the results show, H1 and H2 significantly reduce test application cost in all cases compared
to BTS. Note that H1 is more effective for high yield cases and H2 is more effective for low yield
cases, which may be due to the fact that we identify faulty chips first in H2. Since BTS was focused
on maximizing fault coverage for fault-free chips, its test application cost is particularly high for low
yield cases. On the other hand, H1 and H2 are significantly more efficient for low yield cases.
74
Table 9. Test application cost comparisons of proposed approaches.
p(i)(ii)(iii)(iv) Yield LB BTS %overLB H1 %overLB H2 %overLB
p100-0-0-0 100% 1218 1282 5.3% 1228 0.8% 1318 8.2%
p75-25-0-0 100% 2340 2451 4.7% 2363 0.9% 2427 3.7%
p50-50-0-0 100% 3463 3621 4.6% 3487 0.7% 3537 2.1%
p25-75-0-0 100% 4585 4790 4.5% 4606 0.5% 4652 1.5%
p0-100-0-0 100% 5708 5960 4.4% 5728 0.4% 10440 82.9%
p45-45-5-5 90% 3118 346411.1% 3157 1.3% 3188 2.3%
p40-40-10-10 80% 2773 330719.3% 2837 2.3% 2839 2.4%
p35-35-15-15 70% 2429 315129.7% 2505 3.2% 2491 2.6%
p30-30-20-20 60% 2084 299443.7% 2165 3.9% 2142 2.8%
p25-25-25-25 50% 1739 283763.1% 1820 4.7% 1790 2.9%
p20-20-30-30 40% 1395 268192.2% 1474 5.7% 1440 3.3%
p15-15-35-35 30% 1050 2524140.4% 1128 7.4% 1090 3.8%
p10-10-40-40 20% 705 2368235.7% 78010.7% 737 4.5%
p5-5-45-45 10% 361 2211513.3% 43320.0% 388 7.6%
p0-0-50-50 0% 16 205412855 % 85434% 39 143.9%
0
500
1000
1500
2000
2500
3000
3500
4000
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Yield
Average test application cost
LB
BTS
H1
H2
Figure 18. Test application cost comparisons of proposed approaches.
75
Table 10. Contributions to overhead of H1 and H2.
Overhead of H1 over LB Overhead of H2 over LB
Yield
For fault-free For faulty For faulty-free For faulty
100% 24.88 0.00 74.25 0.00
90% 32.18 7.21 66.83 3.53
80% 51.50 12.14 59.40 6.81
70% 54.86 21.69 51.98 10.17
60% 50.70 30.00 44.55 13.56
50% 44.69 36.28 35.63 14.91
40% 35.75 43.54 28.50 17.40
30% 28.95 49.00 21.15 18.55
20% 19.30 56.00 13.73 18.11
10% 10.01 61.97 6.86 20.54
0% 0.00 68.86 0.00 22.82
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
low low-mid mid mid-high high
TB frequency
Percetage over lowerbound
BTS
H1
H2
Figure 19. Percentage over the lower-bound.
The percentage sub-optimalities of H1 and H2 over LB are partly due to the increase of test
application cost for fault-free chips and partly due to increase for faulty chips. Table 10 shows the
contributions to the sub-optimality over LB for fault-free and faulty chips.
In order to see how the proposed heuristic approaches work for fault-free chips, we compare
five chip personality distributions at five different levels of time borrowing frequencies from low
76
(p100-0-0-0) to high (p0-100-0-0) where yield is maintained at 100% (first five personality
distributions in Table 9). H1 gives better results than BTS and H2, and the sub-optimality is
consistently less than 1% of LB in all five cases as shown in Figure 19.
H2 shows exceptionally high level of sub-optimality for the p0-100-0-0 case. In p0-100-0-0, it is
necessary to test longer multi-segment SPUTs whose test application costs are inherently high. We
observe that the near-lower-bound function in H2 fails to give sufficiently high priority to such high-
cost long multi-segment SPUTs that are required.
4.7.5. Analysis of the heuristic approach
In Section 4.7.1, we discuss two major sources of cost reduction, one for faulty chip instances,
where a fault is identified using less tests, and the other for fault-free chip instance, where TBLs are
identified using less tests or no test for potential TBLs. Hence, let us analyze two cases where one of
the two sources of cost reduction prevails over the other.
First, consider a high yield case where majority of chip instances are fault-free. In this high yield
case, if time borrowing occurs scarcely, then the proposed heuristic approach does not gain much cost
reduction since cost reduction for faulty chip instances is limited due to high yield situation and cost
reduction for fault-free chip instances is limited due to rare time borrowing sites.
On the other hand, if time borrowing occurs ubiquitously in the high yield case, then the
proposed heuristic approach does not gain much cost reduction for faulty chip instances but gains cost
reduction by trying not to apply tests to potential TBLs if exist and by identifying TBLs using less
tests. However, in this case, the cost reduction is comparably small since testing of the long multi-
segment paths is unavoidable, and the number of tests for the long multi-segment paths is greater than
the sum of the number of tests for shorter sub-paths in most cases. For example, suppose that all chip
instances for a five-stage pipeline are fault-free and all latches except for the primary inputs and
primary outputs are TBLs. Since we can regard all these TBLs as potential TBLs, we do not target
any SPUT terminating at any of the potential TBLs. As a result, we only need to target all the longest
multi-segment paths that start from primary inputs to primary outputs. However, in general the
77
amount of reduction is very limited since the number of five-segment paths is dominantly high.
Suppose for simplicity that all primary inputs are located at inputs of C
0
and all primary outputs are
located at outputs of C
4
. Let n
i
denote the number of paths from input latches of C
0
to output latches
of C
i
(i = 0, 1, …,4). Let us assume that the number of paths increases exponentially at a increase rate
r from n
0
to n
4
(i.e., n
i+1
/n
i
= r for i = 0, 1, …,4). For five-stage pipelines, n
4
(i.e., the number of five-
segment paths) is always greater than
∑
=
3
0 i
i
n (i.e., the sum of the number of tests for single- to four-
segment paths) unless the increase rate r is less than 1.534. For the five-stage pipeline of multiplier
that was used in Section 4.7.4, in this particular extreme time borrowing fault-free case, our heuristic
approach requires test application cost of 9,328 assuming this is the only personality, while the base
schedule requires 9,434. Although the reduction is in general very limited, note that, in this particular
extreme time borrowing fault-free case, no other test scheduling can reduce test application cost from
the value corresponding to the total number of multi-segment paths from primary inputs to primary
outputs, without compromising delay fault coverage (in the above example circuit, there are 9,328
multi-segment paths from primary inputs to primary outputs).
Second, consider a low yield case where majority of chip instances are faulty. If time borrowing
occurs scarcely, then the proposed heuristic approach may gain significant cost reduction since it tries
to apply tests to SPUTs that terminate at a primary output that is highly likely to fail, without having
to identify other NTBLs or TBLs that do not help identify a fault. Also, note that when selecting such
SPUTs to identify a fault, SPUTs may be selected regardless of time borrowing status at the input
latch, while the base schedule only applies tests to SPUTs only if the input latch is known to be a
NTBL.
If time borrowing occurs ubiquitously in the low yield case, the proposed heuristic approach
may still gain significant cost reduction, since it tries to apply tests only to SPUTs that terminate at a
primary output that is highly likely to fail, and such SPUTs may not necessarily be a long multi-
segment path as explained above. For example, in the five-stage pipeline of multiplier in Section 4.7.4,
when all latches except for the primary inputs and the primary outputs are TBLs and one primary
78
output is a TBL which can be detected by 8 tests in C
4
, the heuristic approach requires only 8 tests to
detect the fault while the base schedule requires 9,434 tests.
In this chapter, we focus on minimization of test application cost for delay testing of latch-based
circuits while maintaining the optimal coverage. We show that conventional test scheduling methods
are not applicable due to the unique characteristics of latch-based circuits with time borrowing. Hence,
we formulate the minimization problem and propose two heuristic approaches. The experimental
results show that the proposed heuristic test scheduling approaches achieve the overall test application
costs that are within 5% of the lower-bound in most cases for chip personalities with diverse yield
and time borrowing scenarios.
79
CHAPTER 5
Flip-flop-based v.s. latch-based designs
In Section 1.4, we discuss the major benefits of latch-based designs in comparison to flip-flop-
based designs. Although flip-flop-based version is easier to design and verify and an extensive set of
synthesis and verification tools are available, latch-based design is widely used in full custom
designed high-speed chips, especially in their delay critical parts due to two major benefits, namely,
higher performance and higher yield. Latch-based design may enhance performance by enabling
intentional time borrowing and improve yield by allowing unintentional time borrowing, compared to
flip-flop-based design. However, we can realize these two advantages only if latch-based design is
carried out to obtain high-speed implementations and an appropriate delay test methodology is used.
It is also important to ensure that it is possible to test latch-based designs in a manner that is
economical and ensures high test quality. Otherwise, flip-flop-based design would prevail due to the
ease of design, verification, and test.
Hence, in this chapter we compare a flip-flop-based circuit and the corresponding divide-and-
conquer delay testing approach with its latch-based counterpart and our latch-based delay testing
approach, in terms of delay fault coverage, performance, and yield. We derive a set of sufficient
conditions for latch-based circuit to achieve better performance, yield, and delay fault coverage.
5.1. Flip-flop-based counterpart of latch-based circuit
Suppose that we desire to implement a latch-based high-speed circuit shown in Figure 20(a)
using complementary clocks φ and φ . As discussed in Section 1.5, a divide-and-conquer delay
testing approach works only for flip-flop-based pipelines but may not work for latch-based pipelines
when time borrowing occurs at a latch within the circuit under test. Hence, our first design and test
approach is to redesign the same circuit using flip-flops and to perform delay testing using the divide-
80
and-conquer approach described in Section 1.5. This flip-flop-based counterpart with a divide-and-
conquer test scheme is denoted as D
FF
T
FF
and illustrated in Figure 20(b), which is implemented by
migrating latches at the inputs of C
2k+1
(k = 0, 1, 2, ···) in Figure 20(a) to the outputs of C
2k+1
based on
the following theorem.
Theorem 13 [8]. A sequential path via a rising edge triggered flip-flop and a sequential path via an
active-low latch and an active-high latch are equivalent.
This design and test scheme, i.e., D
FF
T
FF
, is compared with the latch-based design and test
scheme denoted as D
LAT
T
LAT
as shown in Figure 20(a), which represents latch-based design with
latch-based delay testing approach proposed in Chapter 3.
FF6
FF7
FF8
FF3
FF4
FF5
L
A
L
B
L
C
C
0
L
D
L
E
L
F
L
G
L
H
L
I
L
J
L
K
L
L
L
M
L
N
L
O
φφφ
φ φ
φ
φ
L
A
L
B
L
C
C
0
L
D
L
E
L
F
C
1
L
G
L
H
L
I
L
J
L
K
L
L
L
M
L
N
L
O
φ
L
X
L
Y
L
Z
FF0
FF1
FF2
(a) D
LAT
T
LAT
: Latch-based circuit and latch-based delay test
(Given high-speed latch-based pipeline)
(b) D
FF
T
FF
: FF-based circuit and FF-based delay test
φ φ φ
C
1
C
2
C
3
C
2
C
3
Figure 20. Design and test schemes for high-speed circuit.
81
L
A
L
B
L
C
C
0
L
D
L
E
L
F
L
G
L
H
L
I
φφ
φ
C
1
FF3
FF4
FF5
L
A
L
B
L
C
C
0
L
D
L
E
L
F
C
1
L
G
L
H
L
I
L
X
L
Y
L
Z
FF0
FF1
FF2
φ φ
φφ
Latch-based design FF-based counterpart
∆
CQ
∆
DC
t
skew
ω
0
+ ω
1 w/o TB
w/ TB
∆
CQ
∆
DQ
δ
0
δ
1
∆
CQ
∆
DQ
δ
0
δ
1
(a)
(b)
Figure 21. Timing requirements for the two designs.
5.2. Performance comparison
Let us compare the timing requirement of the two designs shown in Figure 20, where latch-
based design has 2N blocks and flip-flop-based design has N blocks. (Note ‘block’ denotes a
combinational logic block.) Denote the clock-to-Q delay by
CQ
∆ , D-to-Q delay by
DQ
∆ , set-up time
by
DC
∆ , slack by t
slack
, maximum clock skew by t
skew
, and delay of critical path in C
i
by
i
δ and
i
ω in
latch-based and flip-flop-based circuits, respectively.
Figure 21 illustrates the timing requirements of the two designs in C
0
and C
1
in Figure 20. In the
latch-based design,
DQ
∆ needs to be budgeted in the timing, whereas
DC
∆ and t
slack
need not be
budgeted [17]. As shown in Figure 21(a), if
2
0
T
CQ
< ∆ + δ
(i.e., if time borrowing does not occur at
output of C
0
), the signal cannot propagate to C
1
until the rising edge of φ , resulting in slack time
denoted as t
slack
. In this case, it may be possible to increase the clock frequency. Hence, the design
strategy of latch-based circuits is to maximize the clock frequency by taking advantage of time
borrowing and avoiding slack. Hence, we assume that logic blocks in the latch-based circuit are
designed such that time borrowing occurs at every latch via which critical paths pass.
82
Observation 4: In an ideal latch-based design, time borrowing occurs at every latch via which one or
more critical paths pass.
In other words, the following inequalities are satisfied
DC skew CQ
t T
T
∆ − − ≤ ∆ + ≤
0
2
δ (8)
DC skew DQ CQ
t
T
T ∆ − − ≤ ∆ + ∆ + + ≤
2
3
1 0
δ δ (9)
DC skew DQ CQ
t T
T
∆ − − ≤ ∆ + ∆ + + + ≤ 2 2
2
3
2 1 0
δ δ δ (10)
DC skew DQ CQ
t
T
T ∆ − − ≤ ∆ + ∆ + + + + ≤
2
5
3 2
3 2 1 0
δ δ δ δ (11)
…………
T N N
T N
DQ CQ N
⋅ ≤ ∆ − + ∆ + + + + + <
−
−
) 1 2 (
2
) 1 2 (
1 2 2 1 0
δ δ δ δ L (12)
The left inequalities in (8) through (12) indicate that time borrowing occurs at every latch via
which one more more critical paths pass, and the right inequalities in (8) through (12) indicate that
time borrowing amount does not exceed its maximum, i.e., T/2 – t
skew
– ∆
DC
, in Figure 20 where
complementary clocks are used with 50% duty cycle. Note that (12) is different from all other
relations in this group, since it must satisfy a boundary condition that ensures that the last stage of
latch-based pipeline does not borrow time.
In general, logic delays such as
i
δ and
i
ω are dominant in the block delays compared to ,
CQ
∆
,
DQ
∆ and .
DC
∆ Hence, to simplify the comparison we assume that ,
CQ
∆ ,
DQ
∆ and
DC
∆ are
approximately equal and denoted as ∆, i.e., . ∆ ≈ ∆ ≈ ∆ ≈ ∆
DC DQ CQ
Notice that inequalities in (8) through (12) are for single-segment to multi-segment paths that
start from inputs of C
0
. Timing requirements for other single-segment to multi-segment paths, such as
paths in C
1
, C
2
, C
1
+C
2
, C
2
+C
3
, C
1
+C
2
+C
3
, etc., are shown in (13) through (18).
DC skew DC skew
t T t ∆ − − ≤ ∆ + ≤ ∆ +
1
δ (13)
DC skew DC skew
t T t ∆ − − ≤ ∆ + ≤ ∆ +
2
δ (14)
83
……… (and similar relations for other single-segment paths)
DC skew DC skew
t
T
t
T
∆ − − ≤ ∆ + + ≤ ∆ + +
2
3
2
2
2 1
δ δ (15)
DC skew DC skew
t
T
t
T
∆ − − ≤ ∆ + + ≤ ∆ + +
2
3
2
2
3 2
δ δ (16)
……… (and similar relations for other two-segment paths)
DC skew DC skew
t T t T ∆ − − ≤ ∆ + + + ≤ ∆ + + 2 3
3 2 1
δ δ δ (17)
DC skew DC skew
t T t T ∆ − − ≤ ∆ + + + ≤ ∆ + + 2 3
4 3 2
δ δ δ (18)
……… (and similar relations for other three-segment paths and longer)
However, the inequalities such as (13) through (18) are always satisfied if the inequalities (8)
through (12) are satisfied. For instance, if (8) and (9) hold, (13) is always satisfied. In order to prove
this, let . ∆ + =
i i
X δ Inequality (9) can then be rewritten as .
2
3
1 0 DC skew
t
T
X X T ∆ − − ≤ + ≤ Since
DC skew
t T X
T
∆ − − ≤ ≤
0
2
from (8), X
1
(i.e., )
1
∆ + δ takes its maximum value when X
0
is at its
minimum value, which is T/2, and X
1
takes its minimum value when X
0
is at its maximum value,
which is T – t
skew
– ∆
DC
. Hence,
DC skew
t T
T
∆ − − ≤ ∆ + ≤
1
2
δ when X
0
= T/2, while
2
1
T
t
DC skew
≤ ∆ + ≤ ∆ + δ when X
0
= T – t
skew
– ∆
DC
. From these two relations, we obtain
DC skew DC skew
t T t ∆ − − ≤ ∆ + ≤ ∆ +
1
δ , which is identical to (13).
In the same manner we can prove that (14) is always satisfied if (9) and (10) hold, that (15) is
always satisfied if (8) and (10) hold, that (16) is always satisfied if (9) and (11) are satisfied, and that
(17) is always satisfied if (8) and (11) hold. Hence, the timing requirements for single-segment to
multi-segment paths that start from input of C
0
, such as (8), (9), (10), and (11) are sufficient condition
for the latch-based design to be fault-free and to conform with Observation 4.
Under Observation 4, the minimum clock period T
LAT
for the 2N-stage latch-based circuit in
Figure 20(a) is determined as
84
.
1
2
1 2
0
∑
−
=
+ ∆ =
N
i
i LAT
N
T δ (19)
On the other hand, the minimum clock period for flip-flop-based circuit, T
FF
is determined as
, max( 2
1 0
ω ω + + + ∆ =
skew FF
t T ,
3 2
ω ω + , L ).
1 2 2 2 − −
+
N N
ω ω (20)
In the flip-flop-based design shown in Figure 20(b), designer can achieve maximum
performance when block delays are equal. Hence, the strategy in flip-flop-based designs is to balance
block delays, i.e., L = + = +
3 2 1 0
ω ω ω ω , based on which we make the following observation
regarding a well-designed flip-flop-based circuit.
Observation 5: In an ideal flip-flop-based design, combinational logic block delays are evenly
distributed such that maximum delay of every combinational logic block is identical.
Since performance of flip-flop-based design may be sub-optimal when the same logic blocks C
0
,
C
1
, C
2
, and C
3
of the latch-based design in Figure 20(a) are re-used in the flip-flop-based counterpart
in Figure 20(b), we assume that flip-flop-based counterpart is slightly redesigned such that stage
delays are balanced while the maximum delay is the same as that of latch-based design, i.e.,
1 2 2 2 3 2 1 0 − −
+ = = + = +
N N
ω ω ω ω ω ω L and
∑ ∑
−
=
−
=
=
1 2
0
1 2
0
N
k
k
N
k
k
ω δ . (21)
Hence, as per Observation 5, we now denote logic blocks in flip-flop-based design as C
0
’, C
1
’, ⋅⋅⋅,
, '
1 2 − N
C and hence
i
δ and
i
ω may not be identical. Note that this allows us to compare an optimal latch-
based design with corresponding optimal flip-flop-based design.
Since () () L = + = + = =
∑ ∑
−
=
−
=
3 2 1 0
1 2
0
1 2
0
1 1
ω ω ω ω ω δ
N
i
i
N
i
i
N N
due to (21), we obtain the following
result from (19) and (20).
skew LAT FF
t T T + = . (22)
Equation (22) holds for any latch-based pipeline that includes an even number of logic blocks under
Observations 4 and 5.
With rapidly increasing clock frequencies, clock distribution is becoming increasingly
challenging, making clock skew, t
skew
, a critical bottleneck for high-performance designs [3]. In
85
particular, it is shown in [29] that the effect of interconnect process variations may cause up to 25%
clock skew variations for a gigahertz microprocessor design.
Hence, if we carefully design latch-based circuits such that time borrowing occurs at every latch
via which one or more critical paths pass (Observation 4), Equation (22) shows that latch-based
design always achieves better performance than flip-flop-based design. Also, the performance benefit
of latch-based design increases as the clock skew increases.
Furthermore, it is often difficult to divide the total delay equally into blocks of flip-flop-based
designs because some functional blocks, e.g., accessing cache memory, cannot be easily decomposed
in this manner. Hence, in practice, it is more likely that T
FF
> T
LAT
+ t
skew
.
5.3. Yield comparison
Intuitively, latch-based circuits are more tolerant to extra delays caused by defects and/or
variations in the fabrication process, compared to flip-flop-based circuits, since the extra delay in
latch-based designs may be accommodated by subsequent stages using time borrowing over multi-
segment paths. On the other hand, flip-flop-based designs are inherently less tolerant to extra delays
because every flip-flop is a hard boundary where time borrowing cannot occur. This implies the
performance yield, i.e., yield at a desired clock frequency, for a latch-based circuit (D
LAT
T
LAT
) is likely
to be higher than the yield for its flip-flop-based counterpart (D
FF
T
FF
).
In this section, we assume that both designs are carried out such that performance is maximized
as per Observations 4 and 5. In order to compare the yields of the two design and test schemes, we
consider cases where both designs are operating at the same clock period T. In addition, since we are
interested in comparing yield losses caused by defects and/or process variations, we only consider the
cases where extra delay caused by defects and/or process variations increases delay of paths, i.e.,
0 > σ .
Although the sizes and locations of the extra delay may be different in latch-based design and
flip-flop-based design, we assume in this chapter that extra delays in both designs follow the same
86
probabilistic distribution. Hence, yield can be represented by the margin for extra delay. For example,
if latch-based design can accommodate extra delay up to 3 units and flip-flop-based design can
accommodate extra delay up to 2 units, then the yield of latch-based design is deemed to be higher
than the yield of flip-flop-based design.
Let us consider three different ranges of T value. First, according to equation (22), if T < T
LAT
,
both designs will provide zero-yield. Secondly, if
FF LAT
T T T < ≤ (= T
LAT
+ T
skew
), then D
FF
T
FF
provides zero yield, while D
LAT
T
LAT
may provide non-zero yield depending on the size of extra delay,
meaning D
LAT
T
LAT
achieves higher yield than D
FF
T
FF
in this case.
For the third case where
FF
T T ≥ , we prove that it is possible that latch-based design achieves
higher yield than its flip-flop-based counterpart. Consider the same 2N-stage latch-based pipeline and
its counterpart, i.e., N-stage flip-flop-based pipeline as shown in Figure 20. Let
k
σ denote the
maximum extra delay that may affect delay of any path within C
k
in latch-based design (k = 0, 1, ⋅⋅⋅,
2N – 1), and
k
γ denote the maximum extra delay that may affect delay of any path within
k
C′ in flip-
flop-based design (k = 0, 1, 2, ⋅⋅⋅, 2N – 1).
For simplicity of explanation, consider N = 2. (N = 2 corresponds to a latch-based design with
four blocks.) When extra delays caused by variations and/or defects are considered in a latch-based
design, inequalities (8) through (12) are rewritten as follows.
,
2
0 0
∆ − − ≤ + ∆ + ≤
skew
t T
T
σ δ (23)
,
2
3
2
1 0 1 0
∆ − − ≤ + + ∆ + + ≤
skew
t
T
T σ σ δ δ (24)
, 2 3
2
3
2 1 0 2 1 0
∆ − − ≤ + + + ∆ + + + ≤
skew
t T
T
σ σ σ δ δ δ (25)
. 2 4
2
3
3 2 1 0 3 2 1 0
T
T
≤ + + + + ∆ + + + + ≤ σ σ σ σ δ δ δ δ (26)
For corresponding flip-flop-based designs, the following inequalities must be satisfied.
T t
skew
≤ + + + ∆ + +
1 0 1 0
2 γ γ ω ω (27)
87
T t
skew
≤ + + + ∆ + +
3 2 3 2
2 γ γ ω ω (28)
As per Observation 5, flip-flop-based counterpart is designed such that stage delays are the
same, which is denoted as α, i.e.,
α ω ω ω ω ≡ + = +
3 2 1 0
(29)
Hence, (20) and (21) are rewritten as
α + + ∆ = =
skew FF
t T T 2 (30)
α ω δ 2
3
0
3
0
= =
∑ ∑
= = k
k
k
k
(31)
Since
LAT FF
T T T > = , slack occurs in latch-based design. Hence, we consider a special case
where
δ
0
= δ
1
= δ
2
= T/2 – ∆ (32)
and slack is allowed only in the last block C
3
, which conforms to the condition specified in Observa-
tion 4 as depicted in Figure 22. In this special case, we show that yield of latch-based design is higher
than its flip-flop-based counterpart.
φ
∆∆
t
skew
0 T
FF
/2 T
FF
2T
FF
3T
FF
/2
∆∆ t
skew
ω
0
+ω
1
= α
∆∆ ∆ ∆
δ
0
(=T
FF
/2 − ∆)
δ
2
(=δ
0
) δ
3
δ
1
(=δ
0
)
t
slack
Flip-flop
-based
Latch
-based
ω
2
+ω
3
= α
Figure 22. Yield comparison (T = T
FF
).
From δ
0
= δ
1
= δ
2
= T/2 – ∆ , (30), and (31), we obtain
.
2
3
2
3 skew
t − =
α
δ (33)
By applying (32) and (33) to relations (23) through (26), allowable extra delay values for single
and multi-blocks are derived as the following.
88
∆ − − ≤
skew
t
T
2
0
σ (34)
∆ − − ≤
skew
t
T
2
1
σ (35)
∆ − − ≤
skew
t
T
2
2
σ (36)
skew
t 2
3
≤ σ (37)
∆ − − ≤ +
skew
t
T
2
1 0
σ σ (38)
∆ − − ≤ +
skew
t
T
2
2 1
σ σ (39)
skew
t 2
3 2
≤ + σ σ (40)
∆ − − ≤ + +
skew
t
T
2
2 1 0
σ σ σ (41)
skew
t 2
3 2 1
≤ + + σ σ σ (42)
skew
t 2
3 2 1 0
≤ + + + σ σ σ σ (43)
In short, extra delay
i
σ
of any logic block in the latch-based design may be positive as specified
in (34) to (43). In contrast, flip-flop-based counterpart cannot allow extra delay as implied in Figure
22, meaning γ
0
= γ
1
= γ
2
= γ
3
= 0. Hence, latch-based design achieves higher yield than its flip-flop-
based counterpart. The above special case proves that the yield of a latch-based circuit can be higher
than that of its flip-flop-based counterpart when a clock period
FF
T T ≥ is used.
As noted earlier, the most important motivation of latch-based design is to achieve higher per-
formance using time borrowing. Hence, if latch-based circuit is designed to operate at its minimum
clock period T
LAT
, or at a period close to T
LAT
(i.e.,
FF LAT
T T T < ≤ ), then latch-based design always
achieves a dramatically higher yield than flip-flop-based counterpart.
89
5.4. Delay fault coverage comparison
In this section, we compare the delay fault coverage of a flip-flop-based circuit when a divide-
and-conquer test scheme is applied (D
FF
T
FF
) with the delay fault coverage of a latch-based circuit
when the latch-based delay testing approach proposed in Chapter 3 is applied (D
LAT
T
LAT
).
We consider a flip-flop-based counterpart that uses identical logic blocks as the latch-based
design in the manner depicted in Figure 20(a) and (b). (Recall that in Sections 5.2 and 5.3, we had
assumed that flip-flop-based counterpart is slightly redesigned such that block delays are balanced to
maximize performance benefit. That was only to ensure that the performance and yield comparisons
were more than fair to flip-flop-based designs.) As proven in Theorem 12 in Section 3.8.1, it is
guaranteed that D
LAT
T
LAT
achieves the theoretical maximum robust path delay fault coverage. On the
other hand, the coverage of D
FF
T
FF
is less than or at most equal to that of D
LAT
T
LAT
because D
FF
T
FF
cannot scan in values to the nodes between C
2k
and C
2k+1
(k = 0, 1, ⋅⋅⋅).
Suppose that p is a path from L
B
to L
D
in C
0
and q is a path from L
D
to L
H
in C
1
in Figure 21(a)
(similar to that shown in Figure 7). It is known that robust tests exist for both p and q when the same
type of transition is implied at L
D
. As per Theorem 11, the multi-segment path comprised of p and q is
also robustly testable in D
LAT
T
LAT
shown in Figure 21(a). However, the same path comprised of p and
q is not necessarily robustly testable by D
FF
T
FF
in Figure 21(b), since sensitization of block C
1
in
Figure 21(b) is not independently controlled but dependent on logic values implied at the outputs of
C
0
by the values applied at inputs of FF0, FF1, and FF2. As a result, D
FF
T
FF
cannot achieve higher
delay fault coverage than D
LAT
T
LAT
, where the latter is guaranteed to be theoretical maximum by
Theorem 12 in Section 3.8.1.
5.5. A summary of comparison results
In order to take full advantage of performance benefit and yield benefit of latch-based design
over flip-flop-based design, we formulate sufficient conditions under which latch-based design
90
achieves better performance and better yield than the corresponding flip-flop-based design. These
conditions may be used as guidelines for designers to implement high-speed latch-based circuits.
In particular, we prove that the minimum clock period for latch-based design (i.e., T
LAT
) is
smaller than the minimum clock period for flip-flop-based design (i.e., T
FF
). Hence, latch-based
design always achieves higher yield than flip-flop-based design when both designs are operating at a
clock period T in the high-speed range of
FF LAT
T T T < ≤ , where flip-flop-based design provides zero-
yield. For the case where T is greater than T
FF
, i.e., T is in the range where both designs provide non-
zero yield, we prove that it is possible for latch-based designs to achieve higher yield than their flip-
flop-based counterparts.
In summary, we prove that when the latch-based delay testing approach proposed in Chapter 3
is used for latch-based circuits and when latch-based circuit is carefully designed following the
conditions described in this chapter, we can achieve higher delay fault coverage, higher performance,
and higher performance yield for latch-based designs compared to their flip-flop-based counterparts.
In particular, the latch-based delay testing approach proposed in Chapter 3 guarantees theoretical
maximum path delay fault coverage (of any scan-based method), which is higher than (or at least
equal to) the coverage for its flip-flop-based counterpart.
91
CHAPTER 6
Future research tasks
In Section 6.1, we describe the future research tasks to complete the overall test optimization
proposed presented in Chapter 4. In Section 6.2, we discuss the future tasks for the generalized
optimization that relaxes the simplifying assumptions made in Chapters 3 and 4. Also, the hardware
design and control issues related to the scan DFT are discussed in Section 6.3.
6.1. The overall test optimization
In Chapter 3, we proposed a test generation approach that maximizes the robust PDF coverage
of latch-based pipelines for any available set of scan chain configurations. In Section 4.1, however,
the motivation example demonstrated that the approach proposed in Chapter 3 may not minimize the
test application cost. We now propose in Chapter 4 an approach for minimizing test application time
while guaranteeing the maximum delay fault coverage that can be provided by the approach proposed
in Chapter 3.
This test application cost minimization problem is significantly more complex than the
conventional test scheduling (or test scoring) problem, since passing or failing r-r tests for an SPUT
have different meanings depending on the time borrowing status at the input latch, the type of output
latch (whether it is a primary output of a latch-based part of a circuit or not), and results of r-r tests for
other SPUTs (i.e., dependencies among SPUTs). Scheduling r-f tests is somewhat similar to the
conventional test scheduling problem in the sense that a faulty chip is identified and discarded if r-f
tests for an SPUT fail at any stage of testing. However, since passing r-f tests for an SPUT do not
provide any coverage for the SPUT, r-f test scheduling is also different from conventional test
scheduling.
In Chapter 4, we proposed a framework for this unique test scheduling problem to find an
optimal test schedule in the search tree comprised of test nodes (SPUTs) and choice nodes based on a
92
given (approximate) chip personality distribution of chips under test. Reduction rules in the search
tree are developed to reduce the complexity of the search process. As a preliminary approach, the
deterministic optimization approach is formulated and the complexity of the method is determined.
Also, heuristic approaches are proposed and the experimental results are provided.
The following sections elaborate the proposed research tasks to complete the overall test
optimization problem.
6.1.1. Realistic test application cost
In this dissertation, for simplicity, test application cost is assumed to be proportional to the
number of tests applied. However, this definition of test application cost ignores many factors, such
as cost associated with reconfiguring SCUTs and the differences in the costs of applying different
vectors due to difference in scan chain lengths. As noted earlier, the paths in an SPUT are viewed as
parts of an SCUT, which determines the latches used for testing and their scan chain configurations.
Hence, no reconfiguration of the scan chain is necessary if SPUTs included in an SCUT are tested
consecutively. On the other hand, if SPUTs included in different SCUTs are tested consecutively, the
scan chain must be reconfigured, which incurs additional test application cost. This cost varies
depending on how the scan chains are reconfigured. For instance, in Figure 9 in Section 3.9.3,
reconfiguration from SCUT
10
to SCUT
11
requires modifying the scan chain configuration for level-1
latches only, while reconfiguration from SCUT
12
to SCUT
22
requires modifying the scan chain
configurations for level-0, level-1, level-2, and level-3 latches. Also, depending on the scan chain
configurations and the scan chain connections in the circuit, vectors may be scanned-in and scanned-
out via scan chains with different lengths, leading to different test application times. In our test
approach, scan chain length can affect test application cost more significantly, compared to
conventional testing, since we may test multi-segment paths using SCUTs that configure latches
within the SCUT in scan mode.
93
Hence, we will develop and use a more realistic definition of test application cost that measures
test application time quantitatively by considering the above three parameters, namely the number of
tests, reconfiguration cost, and scan-chain length.
6.1.2. Chip personality distribution
The optimality of the test scheduling approaches such as the deterministic approach proposed in
Section 4.5 are predicated on the availability of personality distributions of chips under test. Hence, it
is critical to develop techniques to identify chip personalities and likelihood of each of their
occurrences. However, in reality the exact personality distribution cannot be known before we have
tested every possible SPUT for all chip instances, since timing characteristics are determined by
factors such as process variations, manufacturing defects, and noise in deep sub-micron designs.
Based on the fact that these factors are statistical in nature, an approach to identify an approximate
chip personality distribution using the statistical timing analysis is proposed in Appendix.
As discussed in Appendix, latch-based pipelines with time borrowing have three unique
characteristics, namely non-uniform input application time due to time borrowing, delay quantization
via NTBLs, and reconvergence fan-out propagating via NTBLs.
We will develop a statistical timing analysis tool for latch-based circuits with time borrowing
that concurrently tackles these three characteristics. First, a value at an input is applied at rising edge
if the corresponding input latch is a NTBL. Otherwise, multi-segment paths are analyzed to capture
the delayed input application due to time borrowing at the input latch. Second, we will propose a
relevant statistical delay model that represents the three distinct delay distributions via a latch (see
Figure 23 in Appendix).
Using the new tool, we will estimate the probability of time borrowing at each latch in the CUT.
More importantly, we will analyze the dependencies among TBLs, i.e, dependencies among
passing/failing results of SPUTs.
94
In order to overcome inaccuracies caused by approximations used, we will develop a method
which will start with the approximate chip personality distribution derived from the timing analysis
tool and continuously update the personality distribution using the test results for tested chips when
statistically sufficient number of test results are obtained.
6.1.3. Inclusion of r-f tests
For simplicity, test scheduling formulation in Sections 4.3 through 4.5 excludes r-f tests from
discussion. However, the meaning of test results and benefits of r-f tests are studied in Section 4.2.
Typically, applying all r-f tests for every SPUT will be impractically costly, despite the fact that every
r-f test that fails immediately identifies a faulty chip that can be discarded. This is because an r-f test
fails only under the most extreme amount of time borrowing, which is likely to occur only for highly
timing critical SPUTs. In order to minimize the cost incurred by applying r-f tests, r-f tests may be
used selectively only for the highly delay critical SPUTs. Hence, we will extend the framework
described in Chapter 4 to incorporate r-f tests, where the extended framework will quantify benefits
and cost of applying r-f tests, and include a method to select delay critical SPUTs for which r-f tests
will be applied. We will use the timing analysis tool described in Section 6.1.2 and the derived chip
personality distribution to find such delay critical SPUTs. In terms of the calculation of the cost of
applying r-f tests, we will extend the cost function that we will define according to the future research
tasks described in Section 6.1.1. As noted earlier, including a test node for r-f tests does not increase
the search tree exponentially because a failing r-f test simply identifies a faulty chip, meaning
termination of branching.
6.2. Other delay testing approaches
For some typical circuits, robust coverage may be low. In such cases, we will consider the non-
robust testing as well as transition delay fault testing.
95
When we use non-robust tests to the proposed approach, major changes to the proposed
approach become necessary. One of the key challenges is that we cannot configure an off-path latch
in scan regardless of time borrowing status at the latch, since Lemma 1 is no longer valid under non-
robust testing. This will obviously increase the number of required scan chain configurations, which
will add to the test application time. In the process of implementing a non-robust delay testing, we
will first investigate whether it is necessary to redefine TBL/NTBL and the procedures used to
identify time borrowing status of latches. Also, we will study the implication of test results for r-r
tests and r-f tests in the context of non-robust testing. Overall, we will analyze the benefits and cost of
non-robust testing regarding time borrowing/non-time borrowing identification, multi-segment paths
testing, coverage, test application cost, and so on.
Transition fault (TF) testing is an alternative approach for overcoming low coverage. In
conventional TF testing for a combinational logic, a delay fault is called a transition fault if its delay
defect size is greater than the maximum slack S, which is defined as S = T – α where T denotes the
clock period and α denotes the minimum path delay via the fault site. TF testing guarantees
identification of delay fault whose defect size is greater than S but does not guarantee detection of
delay fault with smaller defect sizes. When TF testing is applied to a flip-flop-based pipeline, each
combinational logic block can be tested individually if DFT is supported. However, when TF testing
is applied to a latch-based pipeline, the conventional definition of TF is no longer valid because of
possible time borrowing at latches. Starting with the redefinition of TF, we will adapt our entire
framework for delay testing under transition fault model.
6.3. Scan DFT design and control
In order to apply the proposed approach to real latch-based circuits, it is necessary to develop a
new scan latch and scan chain designs and control mechanisms that are suitable for latch-based high-
speed circuits. The new scan latch must be configurable in four operation modes, namely normal,
scan-in, r-capture scan-out, and f-capture scan-out modes. One of the unique characteristics of the
96
new scan latch design is that while all conventional scan latches in a particular level of a pipeline are
collectively configured either in normal mode or in scan mode, in our approach some latches in a
level may be configured in normal mode while others in scan-in mode. Hence, the new scan latch
must have the capability to bypass a value scanned-in when it is configured in normal mode. The new
scan DFT would also suffer from larger area overhead than the conventional scan DFT, since
additional latches will be included in the new scan design and more control lines will be used.
We will develop a new scan DFT design that does not deteriorate the timing of the circuits,
since latch-based pipelines are generally used for the most delay critical parts of chips.
97
CHAPTER 7
Conclusion
Pipelines may be implemented using either flip-flops or latches. Flip-flops are edge triggered,
where a signal at the data input of a rising-edge triggered flip-flop propagates via the flip-flop at the
rising edge of clock. On the other hand, latches are level-sensitive, where a signal propagates via the
latch when the clock is at the level that makes the latch transparent, e.g., when the clock is high for an
active-high latch. The use of latches allows each block a nominal delay, e.g., half of the clock period
for complementary clocks with 50% duty cycle. However, when necessary, a block may take longer
time before completing its computation and providing the result to the next block, due to the level-
sensitive property of latches. This is called time borrowing.
Compared to latch-based pipelines, flip-flop-based pipelines are easier to design and verify
using an extensive set of synthesis and verification tools. Hence, most ASIC designers prefer
implementing flip-flop-based pipelines. On the other hand, latch-based pipelines are more difficult
and challenging to design and verify because ensuring correct timing behavior is more difficult and
tool support is limited [9]. However, latch-based pipelines are widely used in full custom designed
high-speed chips, especially in their delay critical parts due to two major benefits, namely, higher
performance and higher yield [9].
In a latch-based pipeline, time borrowing may be intentionally planned during design, which is
called intentional time borrowing. Hence, latch-based pipelines can be designed to operate at higher
clock frequency because it is not necessary to carefully balance delays of logic blocks (stages) to
increase clock frequency and because latches are immune to clock skew to some degree. On the other
hand, flip-flop-based pipelines must carefully balance delays of logic blocks in order to achieve high
performance, since flip-flops present hard time boundaries between pipeline stages where no time
borrowing is permitted. Furthermore, clock skew must be budgeted for in the clock period. According
98
to [8], experimental results show that latch-based designs are 5 – 19% faster than corresponding flip-
flop-based designs, for small increase in area.
In addition, latch-based pipelines may experience unintentional time borrowing, which is not
planned during design but occurs in some fabricated copies for a design due to delay variations and/or
defects during fabrication. Hence, the circuit may be fault-free if the amount of time borrowing is
accommodated by subsequent block(s), which may increase yield when an appropriate delay testing
approach is applied. On the other hand, a flip-flop-based pipeline without sufficient timing margin
will malfunction at desired speed when variations and defects cause similar extra delay in the circuit,
leading to yield loss.
In short, latch-based design enhances performance by enabling intentional time borrowing and
improves yield by allowing unintentional time borrowing, compared to flip-flop-based designs.
However, we can realize these two advantages only if latch-based design is carried out to obtain high-
speed implementations and an appropriate delay test methodology is used. Otherwise, flip-flop-based
pipelines would prevail due to the ease of design, verification, and test.
The approaches of static testing, such as stuck-at fault testing, are similar for both latch-based
and flip-flop-based circuits. The same ATPG can be used to generate tests for both architectures. On
the other hand, in delay testing, there is no delay testing methodology developed for latch-based
circuits that considers the unique characteristics of time borrowing in latch-based circuits. Typically a
divide-and-conquer approach using scan DFT is used for flip-flop-based circuits where each logic
block is tested individually using scan. However, if the same divide-and-conquer approach is applied
to latch-based circuit, any chip instance with at least one intentional/unintentional time borrowing site
will fail the tests and be discarded, leading to zero yield. Since we are considering high-speed
application of latch-based circuits for performance and yield benefits, there must be at least one or
more time borrowing sites and hence a divide-and-conquer approach is not appropriate for delay
testing of high-speed latch-based circuits.
Due to intentional/unintentional time borrowing in latch-based circuits, delay testing of latch-
based circuits must consider multi-segment paths that are obtained by concatenating appropriate paths
99
in successive logic blocks separated by latches. In addition, delay testing of latch-based circuits is
necessary to use scan DFT because it is shown that the classical delay testing, which targets the entire
circuit without DFT, typically suffers from impractically high test generation complexity, high test
application time, and – for many circuits – meaninglessly low fault coverage.
Hence, in this research, we propose in Chapter 3 a new structural delay testing approach that
uses scan DFT for latch-based circuits with time borrowing. Although multi-segment paths are
targeted, it is shown in Section 3.8.1 that the complexity of test generation procedure of the proposed
approach is comparable to the complexity of a divide-and-conquer approach. More importantly, our
proposed approach achieved theoretical maximum coverage, meaning that no other path delay testing
approach for latch-based circuits, if exists, can obtain higher robust path delay fault coverage than
that of our proposed approach. In terms of DFT support for our proposed approach, we minimize the
number of required scan chain configurations as discussed in Section 3.6. Even when there are
limited number of scan chain configurations available, we have developed an approach that identifies
the best available scan chain configuration(s) so that the size of target paths are reduced and fewer
SCUTs are tested.
We also develop in Chapter 4 approaches to minimize test application cost of the proposed delay
testing approach for latch-based circuits with time borrowing under the condition that it obtains the
optimal delay fault coverage of the delay testing approach proposed in Chapter 3. It is shown that
conventional test scheduling (test scoring) approaches are not applicable due to the unique
characteristics of latch-based circuits with time borrowing, namely diverse implications of test results
and dependencies among tests. Hence, we formulate the test application cost minimization problem
and present a deterministic approach. We also propose two heuristic approaches. The experimental
results show that the proposed heuristic test scheduling approaches achieve the overall test application
costs that are within 5% of the lower-bound in most cases for chip personalities with diverse yield
and time borrowing scenarios, while the optimal delay fault coverage is obtained.
In Chapter 5, we compare latch-based design and flip-flop-based design in terms of performance,
yield, and delay fault coverage and derive the sufficient conditions under which a latch-based design
100
is guaranteed to achieve higher performance and better yield than flip-flop-based counterpart, while
attaining optimal delay fault coverage.
In conclusion, latch-based design is more difficult and challenging to design and test due to
limited support by synthesis tools and ASIC verification tools and due to lack of delay testing
approach that considers unique characteristics of latch-based design. However, latch-based design can
achieve higher performance than flip-flop-based design if latch-based circuit is carefully designed
following the timing conditions specified in Section 5.2. Also, by applying the proposed delay testing
approach to latch-based design that follows the conditions specified in Section 5.3, designer can
guarantee higher yield than flip-flop-based design as well as theoretical maximum coverage. In
addition, using the proposed heuristic approaches, we can efficiently reduce test application cost.
This research can be further extended to other delay testing approaches such as non-robust delay
testing and transition fault delay testing where time borrowing of latch-based circuit is considered.
We are also carrying out detailed design of DFT circuitry that minimizes performance penalty and
area overheads of exploiting DFT in the high-speed applications.
101
References
[1] N.M. Abdulrazzaq and Sandeep K. Gupta, “Test Generation for Path-Delay Faults in One-
dimensional Iterative Logic Arrays”, IEEE International Test Conference, 2000.
[2] A. Agarwal, D. Blaauw, V. Zolotov, and S. Vrudhula, “Statistical Timing Analysis using Bounds”,
ACM/IEEE Design, Automation, and Test in Europe Conference and Exhibition, 2003.
[3] A. Agarwal, D. Blaauw, and V. Zolotov, “Statistical Clock Skew Analysis Considering Intra-Die
Process Variations,” IEEE Transactions on Computer-Aided Design, 2004.
[4] Z. Barzilai and B.K. Rosen, “Comparison of AC Self-Testing Procedures”, IEEE International
Test Confernce, 1983.
[5] H. Chang and S.S. Sapatnekar, “Statistical Timing Analysis Considering Spatial Correlations
Using a Single PERT-like Traversal”, ACM/IEEE International Conference on Computer Aided
Design, 2003.
[6] M.C.T. Chao, L.C. Wang, K.T. Cheng, and S. Kundu, “Static Statistical Timing Analysis for
Latch-based Pipeline Designs”, ACM/IEEE International Conference on Computer Aided Design,
2004.
[7] L.-C. Chen, S.K. Gupta, and M.A. Breuer, “High Quality Robust Test for Path Delay Faults”,
IEEE VLSI Test Symposium, p.88–93, 1997.
[8] D. Chinnery and K. Keutzer, Closing the Gap Between ASIC & Custom, Kluwer Academic
Publishers, 2002.
[9] Kun Y. Chung and Sandeep K. Gupta, “Structural Delay Testing of Latch-based High-speed
Pipelines with Time Borrowing”, IEEE International Test Conference, 2003.
[10] Kun Y. Chung and Sandeep K. Gupta, “Structural Delay Testing of Latch-based High-speed
Pipelines with Time Borrowing”, Technical Report CENG 03-01, University of Southern
California, 2003.
[11] Kun Y. Chung and Sandeep K. Gupta, “Structural Delay Testing Under Restricted Scan of
Latch-based Pipelines with Time Borrowing”, Technical Report CENG-2005-5, University of
Southern California, 2005.
[12] Kun Y. Chung and Sandeep K. Gupta, “Low-cost Scan-based Delay Testing of Latch-based
circuits with Time Borrowing”, IEEE VLSI Test Symposium, 2006.
[13] T.H. Cormen, et al., Introduction to Algorithms, 2nd Ed., The MIT press, 2001.
[14] A. Devgan, and C. Kashyap, “Block-based Static Timing Analysis with Uncertainty”,
ACM/IEEE International Confernce on Computer Aided Design, 2003.
[15] K. Fuchs, F. Fink, and M.H. Schulz, “DYNAMITE: An Efficient Automatic Test Pattern
Generation System for Path Delay Faults”, IEEE Transactions on Computer-Aided Design, 1991.
[16] R. Gupta, R. Gupta, and M.A. Breuer, “The BALLAST Methodology for Structured Partial Scan
Design”, IEEE Transactions on Computers, 39(4), 1990.
[17] D. Harris, Skew-Tolerance Circuit Design, Academic Press, San Diego CA, 2001.
102
[18] C. Hauck and C. Cheng, “VLSI Implementation of a Portable 266MHz 32-Bit RISC Core”,
Microprocessor Report, 2001.
[19] E.P. Hsieh, et al., “Delay Test Generation”, ACM/IEEE Design Automation Conference, 1977.
[20] S.D. Huss and R.S. Gyurcsik, “Optimal Ordering of Analog Integrated Circuit Tests to Minimize
Test Time”, ACM/IEEE Design Automation Conference, 1991.
[21] Intel News Release, “Intel unveils world’s best processor”,
http://www.intel.com/pressroom/archive/releases/20060727comp.htm, 2006.
[22] J.A.G. Jess, et al., “Statistical Timing for Parametric Yield Prediction of Digital Integrated
Circuits”, ACM/IEEE Design Automation Conference, 2002.
[23] Niraj Jha and Sandeep K. Gupta, Testing of Digital Systems, Cambridge University Press, 2003.
[24] W. Jiang and B. Vinnakota, “Defect-Oriented Test Scheduling”, IEEE Trans. on VLSI Systems,
2001.
[25] J. Le, X. Li, and L.T. Pileggi, “STAC: Statistical Timing Analysis with Correlation”, ACM/IEEE
Design Automation Conference, 2004.
[26] Y. Levendel and P.R. Menon, “Transition Fault in Combinational Circuits: Input Transition Test
Generation and Fault Simulation”, IEEE International Symposium on Fault-Tolerant Computing
Systems, 1986.
[27] C. Lin and S. Reddy, “On Delay Fault Testing in Logic Circuits”, IEEE Transactions on
Computer-Aided Design, 1987.
[28] J.J. Liou, K.T. Cheng, S. Kundu, and A. Krstic, “Fast Statistical Timing Analysis by
Probabilistic Event Propagation”, ACM/IEEE Design Automation Conference, 2001.
[29] Y. Liu, S.R. Nassif, L.T. Pileggi, and A.J. Strojwas, “Impact of interconnect variations on the
clock skew of a gigahertz microprocessor,” ACM/IEEE Design Automation Conference, 2000.
[30] U. Mahlstedt, “DELTEST: Deterministic Test Generation for Gate Delay Faults”, IEEE
International Test Conference, 1993.
[31] L. Milor and A.L.S. Vincentelli, “Minimizing Production Test Time to Detect Faults in Analog
Circuits”, IEEE Trans. on Compter-Aided Design of Integrated Circuits and Systems, 1994.
[32] M. Orshansky and K. Keutzer, “A General Probabilistic Framework for Worst Case Timing
Analysis”, ACM/IEEE Design Automation Conference, 2002.
[33] E.S. Park and M.R. Mercer, “Robust and Nonrobust Tests for Path Delay Faults in a
Combinational Circuit”, IEEE International Test Conference, 1987.
[34] E.S. Park and M.R. Mercer, “An Efficient Delay Test Generation System for Combinational
Logic Circuits”, IEEE Transactions on Computer-Aided Design, 1992.
[35] E.S. Park, M.R. Mercer, and T.W. Williams, “Statistical Delay Fault Coverage and Defect Level
for Delay Faults”, IEEE International Test Conference, 1988.
[36] M.H. Schulz and F. Brglez, “Accelerated Transition Fault Simulation”, ACM/IEEE Design
Automation Conference, 1987.
103
[37] G.L. Smith, “Model for Delay Faults based upon paths”, IEEE International Test Conference,
1985.
[38] T.M. Storey and J.W. Barry, “Delay Test Simulation”, ACM/IEEE Design Automation
Conference, 1977.
[39] Neil H.E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: a systems perspective,
2nd Ed. Additon Wesley, 1992.
[40] J. Zeng, et al., “On Correlating Structural Tests with Functional Tests for Speed Binning of High
Performance Design”, IEEE International Test Conference, 2004.
[41] Y. Zhan, A.J. Strojwas, X. Li, L.T. Pileggi, “Correlation-Aware Statistical Timing Analysis with
Non-Gaussian Delay Distributions”, ACM/IEEE Design Automation Conference, 2005.
104
Appendix:
Chip personality distribution from statistical timing information
In addition to the nominal delay characteristics of devices and wires, timing characteristics are
determined primarily by factors such as process variations, manufacturing defects, and noise in deep
sub-micron designs. Since these factors are statistical in nature, the timing characteristics can be best
modeled using statistical models [28]. Therefore, many researchers have been developing statistical
timing analysis approaches, where the delays of gates and interconnects are modeled as
correlated/uncorrelated random variables with known probability density functions (pdfs). There are
two broad categories of statistical timing analysis approaches. The first one is a path-based statistical
timing analysis approach [22][32]. As the number of paths often increases exponentially with the
circuit size, the major problem with this approach is with the critical path selection, which is
especially challenging when inter-die as well as intra-die variations are present [41]. The second is a
block-based approach [2][5][6][14][25][28][41]. This approach analyzes the circuit in a breadth-first
manner, propagating delay probability density functions from primary inputs to primary outputs.
Block-based statistical timing analysis approaches are widely accepted for their efficiency [28][41].
In our research, statistical timing analysis can be used to obtain the probabilistic delay
characteristics of CUT. Existing statistical timing analysis methods deal with combinational logic, and
can be directly applied to flip-flop-based circuits. For example, in flip-flip-based pipelines, the
statistical timing analysis can be performed for each combinational logic block separately by
regarding each block as an independent combinational logic circuit.
However, timing analysis for latch-based pipelines with time borrowing is different from
traditional timing analysis for combinational circuits due to the following reasons. First, transitions at
the inputs of a logic block in a pipeline are not applied simultaneously if time borrowing occurs at
one or more of the input latches. This dependency of the delays across consecutive blocks of logic
necessitates the need for analyzing multi-segment paths.
105
Second, even if time borrowing does not occur at a latch, the pdf for this delay distribution at the
output of the latch is different from the pdf at the input of the latch because any event arriving before
the latch becomes transparent will be reflected at the latch output after the rising edge of the clock.
We call this phenomenon as delay quantization at a latch. Examples of delay quantization are shown
in Figure 23. Figure 23(a) shows the case where the transition always occurs before the reference
time (t
ref
) of the latch L. Therefore, the signal departs from L always at t
ref
. In the case shown in
Figure 23(b), all delays that occur before t
ref
are quantized at the edge (t
ref
) of the clock. In the case in
Figure 23(c), since transition always occurs after t
ref
, the pdf at the output of L is identical to the pdf at
its input.
L
in out
clk
clk
t = t
ref
Probability
d
d
delay (d)
clk
t = t
ref
Probability
d
d
d
p
tb
= 0
Area = p
tb
p
tb
= 1
(a)
(b)
(c)
Probability density function at input of L
Probability density function at output of L
∆t
nom
Figure 23. Statistical delay distribution across a latch and the probability of time borrowing.
Provided that we obtain the pdfs at the inputs and the outputs of the latches in the circuit, we can
define and estimate the probability of time borrowing. The probability of time borrowing p
tb
at a latch
L is the probability that the correct signal arrives after the rising edge of the clock driving L. This
amount is equal to the area under the pdf in the interval (t
ref
, ∞).
106
∫
∞
=
ref
t
tb
dx x f p ) ( , (44)
where f(x) is the probability density function. In Figure 23, p
tb
= 0 in (a), 0 < p
tb
< 1 in (b), and p
tb
= 1
in (c).
Third, reconvergence problem of multi-segment paths should be considered differently from the
conventional timing analysis methods if the reconverging events pass via latches. If there exists a
reconvergent fan-out, events propagating from a fan-out stem will converge at a gate. Hence, the two
events at the inputs of this gate are not independent and their delay distributions are correlated.
Dealing with such correlations increases the complexity of timing analysis. In literature, many
approaches are proposed to handle the reconvergence problem [2][5][6][14][25][28].
In latch-based pipelines, in constrast, the reconvergence problem becomes exacerbated since
multi-segment paths may have to be analyzed rather than only single-segment paths in flip-flop-based
pipelines. At the same time, the timing analysis task may be simplified if a reconvegent fanout
propagates via NTBLs. Suppose that one path of a reconvergence fan-out passes via a latch L
a
and the
other path passes via a latch L
b
as shown in Figure 24. If time borrowing does not occur at either L
a
or
L
b
, any transition at c
8
and c
9
always occurs at an edge of the clock, making the delays of the two
lines uncorrelated. In addition, the transitions at c
8
and c
9
are always hazard-free as discussed in
Section 3.5. Similarly, when time borrowing occurs at only one of these two latches, L
a
and L
b
, the
delays of the two converging lines become uncorrelated. Therefore, the reconvergence problem
caused by two correlated reconverging lines does not exist in cases when at least one line passes via a
NTBL. On the other hand, if time borrowing occurs at both L
a
and L
b
with 100% probability, the
reconvergence should be handled in a conventional manner since the delay distributions at c
6
and c
7
will be identical to those at c
8
and c
9
. However, the reconvergence problem becomes much more
complicated than the conventional reconvergence problem in combinational logic or flip-flop-based
sequential logics when the probability of time borrowing is less than 1 and non-zero (e.g., Case (b) in
Figure 23). In this case, the delay quantization across the latches must be considered to calibrate the
correct delay distribution at the output of latch.
107
G
1
G
3
G
2
G
5
Q
Q
SET
CLR
D
L
a
G
4
Q
Q
SET
CLR
D
L
b
latch
latch
c
1
c
11
c
2
c
3
c
4
c
5
c
6
c
7
c
8
c
9
c
10
Figure 24. An example reconvergence fan-out.
The approach in [6] first attempts to implement a static statistical timing analysis for latch-based
pipelines. It takes into consideration the delay dependencies across stages of pipelines by adopting a
delay model that can represent the time borrowing phenomenon across latches. However, such cases
as Case (b) in Figure 23 are not accurately modeled since they assume Gaussian distributions for all
delay random variables in the pipeline. Also, their solution to the reconvergence problem is not
applicable to the unique properties of reconvergence problem where latches are included in the
reconvergence fan-out. Since they are focusing on propagating the worst case delays across pipeline
stages, only the latest signal departure time is computed at output of latch and this value is used for
the next pipeline stage, instead of propagating the delay distribution which may have been affected by
the delay quantization. Hence, their resulting distributions at latches are unsuitable for deriving the
probability of time borrowing.
We will consider the above three unique properties of statistical timing analysis for latch-based
circuits to develop a statistical timing analysis approach for latch-based circuits using the approach
used for test generation in Chapter 3.
Abstract (if available)
Abstract
Latch-based circuits are used in full custom designed high-speed chips, especially to implement some delay critical parts due to two benefits: higher performance and higher yield at desired performance. However, the unavailability of a delay test methodology that provides sufficiently high coverage has hindered their widespread use.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Redundancy driven design of logic circuits for yield/area maximization in emerging technologies
PDF
Timing-oriented approach for delay testing
PDF
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
PDF
A variation aware resilient framework for post-silicon delay validation of high performance circuits
PDF
Trustworthiness of integrated circuits: a new testing framework for hardware Trojans
PDF
Electronic design automation algorithms for physical design and optimization of single flux quantum logic circuits
PDF
Static timing analysis of GasP
PDF
Gated Multi-Level Domino: a high-speed, low power asynchronous circuit template
PDF
Automatic conversion from flip-flop to 3-phase latch-based designs
PDF
Variation-aware circuit and chip level power optimization in digital VLSI systems
PDF
Development of electronic design automation tools for large-scale single flux quantum circuits
PDF
Timing and power analysis of CMOS logic cells under noisy inputs
PDF
Advanced cell design and reconfigurable circuits for single flux quantum technology
PDF
Synchronization and timing techniques based on statistical random sampling
PDF
Clocking solutions for SFQ circuits
PDF
Designing efficient algorithms and developing suitable software tools to support logic synthesis of superconducting single flux quantum circuits
PDF
Charge-mode analog IC design: a scalable, energy-efficient approach for designing analog circuits in ultra-deep sub-µm all-digital CMOS technologies
PDF
Multi-phase clocking and hold time fixing for single flux quantum circuits
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
PDF
A logic partitioning framework and implementation optimizations for 3-dimensional integrated circuits
Asset Metadata
Creator
Chung, Kun Young
(author)
Core Title
Structural delay testing of latch-based high-speed circuits with time borrowing
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
08/03/2008
Defense Date
04/25/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
ATPG,dely testing,DFT,latch-based circuit,OAI-PMH Harvest,test scheduling,time borrowing
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gupta, Sandeep K. (
committee chair
), Medvidovic, Nenad (
committee member
), Pedram, Massoud (
committee member
)
Creator Email
kun.chung@gmail.com,kunchung@poisson.usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1527
Unique identifier
UC1390876
Identifier
etd-Chung-2307 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-601405 (legacy record id),usctheses-m1527 (legacy record id)
Legacy Identifier
etd-Chung-2307.pdf
Dmrecord
601405
Document Type
Dissertation
Rights
Chung, Kun Young
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
ATPG
dely testing
DFT
latch-based circuit
test scheduling
time borrowing