|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
|
|
STRUCTURAL DELAY TESTING OF LATCH-BASED HIGH-SPEED
CIRCUITS WITH TIME BORROWING
by
Kun Young Chung
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2008
Copyright 2008 Kun Young Chung
ii
List of Tables........................................................................................................................................iv
List of Figures .......................................................................................................................................v
Abstract ................................................................................................................................................vi
CHAPTER 1. Introduction .....................................................................................................................1
1.1. Operation of latch-based circuits ................................................................................................2
1.2. Reference times and nominal delays...........................................................................................3
1.3. Types of time borrowing and their advantages...........................................................................3
1.4. Design choice between flip-flop-based and latch-based circuits ................................................4
1.5. Testing of flip-flop-based versus latch-based circuits ................................................................5
CHAPTER 2. Background – Challenges in delay testing of latch-based circuits ..................................9
2.1. Key challenges in delay testing of latch-based circuits with time borrowing.............................9
2.2. Benefits of scan design-for-testability ......................................................................................10
2.3. The overall optimization problem.............................................................................................11
CHAPTER 3. A new delay testing approach .......................................................................................13
3.1. Basic assumptions....................................................................................................................13
3.2. Key ideas behind the proposed approach .................................................................................14
3.2.1. r-r test: A set of sufficient conditions on block delays......................................................14
3.2.2. r-f test: A set of necessary conditions on block delays .....................................................15
3.2.3. Time borrowing detection.................................................................................................16
3.2.4. Basic delay testing strategies ............................................................................................17
3.3. Test of the first logic block (SCUT0) ........................................................................................18
3.4. Configuring on-path latches .....................................................................................................19
3.4.1. On-path latch with time borrowing...................................................................................20
3.4.2. On-path latch with no time borrowing..............................................................................20
3.5. Configuring off-path latches.....................................................................................................22
3.6. A set of scan chain configurations necessary to maximize coverage .......................................27
3.7. Required tests for maximum coverage .....................................................................................28
3.8. Test generation under the optimal set of scan chain configurations .........................................29
3.8.1. Theoretical maximum delay fault coverage for latch-based circuits.................................29
3.8.2. Proposed test generation approach....................................................................................32
3.9. Test generation under limited scan chain configurations..........................................................33
3.9.1. Available scan chain configurations .................................................................................33
3.9.2. Test generation strategy ....................................................................................................34
3.9.3. Proposed test generation approach....................................................................................36
3.10. Experimental results and comparison .....................................................................................37
3.10.1. Test generation approaches.............................................................................................38
3.10.2. Trends in the experimental results ..................................................................................39
CHAPTER 4. Test application cost minimization under maximum coverage .....................................42
4.1. Motivation example ..................................................................................................................42
4.1.1. Test schedule 1..................................................................................................................46
4.1.2. Test schedule 2..................................................................................................................47
4.1.3. Test schedule 3..................................................................................................................48
4.1.4. The overall optimization problem.....................................................................................49
4.2. Unique characteristics of the optimization problem .................................................................50
iii
4.2.1. Meaning of test results – Dependencies among SPUTs....................................................50
4.2.2. Benefits of r-r tests and r-f tests ........................................................................................52
4.3. Framework for test scheduling to minimize test application cost.............................................55
4.3.1. An SPUT-based approach .................................................................................................55
4.3.2. Search space......................................................................................................................56
4.4. Building a search tree ...............................................................................................................58
4.4.1. Reduction rules .................................................................................................................58
4.4.2. Covering test sequences in a test schedule........................................................................60
4.5. A deterministic optimization approach .....................................................................................61
4.6. The complexity of the optimization problem............................................................................64
4.7. Proposed heuristic approaches..................................................................................................65
4.7.1. Key ideas and overview of the proposed heuristics ..........................................................65
4.7.2. Heuristic 1 (H1): Relative benefit function.......................................................................67
4.7.3. Heuristic 2 (H2): Near-lower-bound function ..................................................................70
4.7.4. Experimental results..........................................................................................................72
4.7.5. Analysis of the heuristic approach ....................................................................................76
CHAPTER 5. Flip-flop-based v.s. latch-based designs........................................................................79
5.1. Flip-flop-based counterpart of latch-based circuit....................................................................79
5.2. Performance comparison ..........................................................................................................81
5.3. Yield comparison.....................................................................................................................85
5.4. Delay fault coverage comparison .............................................................................................89
5.5. A summary of comparison results ............................................................................................89
CHAPTER 6. Future research tasks .....................................................................................................91
6.1. The overall test optimization ....................................................................................................91
6.1.1. Realistic test application cost ............................................................................................92
6.1.2. Chip personality distribution.............................................................................................93
6.1.3. Inclusion of r-f tests ..........................................................................................................94
6.2. Other delay testing approaches .................................................................................................94
6.3. Scan DFT design and control ...................................................................................................95
CHAPTER 7. Conclusion.....................................................................................................................97
References .........................................................................................................................................101
Appendix: Chip personality distribution from statistical timing information.....................................104
iv
Table 1. Five-stage pipeline of array multiplier. ..................................................................................40
Table 2. The characteristics of chip instances under test based on the results for r-r tests...................45
Table 3. The implication of test results for a target SPUT from Lin to Lout...........................................51
Table 4. A summary of benefit of r-r tests............................................................................................54
Table 5. A summary of benefit of r-f tests............................................................................................54
Table 6. A chip personality distribution for Figure 17. ........................................................................69
Table 7. Relative benefit function example..........................................................................................69
Table 8. Design of chip personality distribution. .................................................................................73
Table 9. Test application cost comparisons of proposed approaches. ..................................................74
Table 10. Contributions to overhead of H1 and H2. ............................................................................75
v
Figure 1. An example latch-based linear pipeline. .................................................................................1
Figure 2. Additional components for scan flip-flop and scan latch........................................................6
Figure 3. A three-stage linear pipeline. ................................................................................................18
Figure 4. A two-stage linear pipeline. ..................................................................................................19
Figure 5. Relationships among scan chain configurations. ..................................................................25
Figure 6. Property 2 helps improve robust coverage............................................................................26
Figure 7. A robust test for a multi-segment path can be obtained by combining the robust tests
of its single-segment subpaths................................................................................................31
Figure 8. Proposed ATPG algorithm....................................................................................................35
Figure 9. Test procedure – managing multiple SCUTs.........................................................................37
Figure 10. A two-stage pipeline example.............................................................................................44
Figure 11. Test schedule 1: Average cost = 23.84................................................................................46
Figure 12. Test schedule 2: Average cost = 25.4..................................................................................48
Figure 13. Test schedule 3: Average cost = 22.68................................................................................49
Figure 14. A generic search tree for optimal test scheduling. ..............................................................57
Figure 15. An example of the cost function computation. ...................................................................63
Figure 16. Overall approach that uses proposed heuristics. .................................................................66
Figure 17. Test scheduling illustration. ................................................................................................69
Figure 18. Test application cost comparisons of proposed approaches................................................74
Figure 19. Percentage over the lower-bound........................................................................................75
Figure 20. Design and test schemes for high-speed circuit. .................................................................80
Figure 21. Timing requirements for the two designs............................................................................81
Figure 22. Yield comparison (T = TFF).................................................................................................87
Figure 23. Statistical delay distribution across a latch and the probability of time borrowing. .........105
Figure 24. An example reconvergence fan-out. .................................................................................107
vi
Latch-based circuits are used in full custom designed high-speed chips, especially to implement
some delay critical parts due to two benefits: higher performance and higher yield at desired
performance. However, the unavailability of a delay test methodology that provides sufficiently high
coverage has hindered their widespread use.
In this dissertation, we show that the conventional delay testing approaches cannot be used for
delay testing of latch-based circuits with time borrowing, and show that it is necessary to use design-for-
test (DFT). We first focus on maximizing path delay fault coverage and propose the first path
delay testing approach and the associated DFT for such circuits. We prove that our latch-based delay
testing approach provides the theoretical maximum coverage (for any scan-based approach). We also
prove that this coverage is always greater than (or equal to) that for the latch-based circuit’s flip-flop-based
counterpart. Secondly, we focus on minimizing test application cost for delay testing latch-based
circuits under the constraint that maximum coverage is achieved. We show that conventional
test scheduling methods may not be applicable due to the unique characteristics of latch-based circuits
with time borrowing. We then formulate the minimization problem and propose a deterministic and
two heuristic approaches for test scheduling of such circuits.
The experimental results show that, for many example circuits, the proposed approaches achieve
dramatically higher coverage of path delay faults compared to classical approach, and achieve test
application costs that are within 5% of the corresponding lower-bounds
We then compare high-speed latch-based circuits with their flip-flop-based counterparts from
the viewpoint of path delay testing and present design guidelines for latch-based circuits that
guarantee that latch-based circuits also achieve higher yield and higher performance than their flip-flop-
based counterparts.
1
CHAPTER 1
Introduction
Pipelining of combinational logic is widely used in many parts of a circuit to improve
performance, where either flip-flops or latches are used. One of the advantages of flip-flop-based
pipelines over latch-based pipelines is that flip-flop-based pipelines are relatively easier to design and
supported by CAD tools. To ensure correct propagation of signals across flip-flop-based pipelines,
the delay of each stage (or logic block) must be smaller than the clock period. This timing
requirement is becoming increasingly difficult to satisfy especially in high-speed parts of circuits as
technology advances, since timing is becoming more significantly affected by process variations
and/or defects in fabrication. Hence, latch-based pipelining is used in many high-speed custom-designed
circuits, since it enhances performance and improves yield via or
that relaxes the timing requirement for each logic block. When time borrowing
occurs, a block may take longer time than its nominal allocation before completing its computation
and providing the result to the next block.
L1
φ φ
C0
φ
φ
stable
t1 t2 t3
…
…
…
t4
t3 - t2
L6
L5
L3
L2
L4
L9
L8
L7
φ
C1 …
…
…
Output
of C0
2
A simplified latch-based high-speed pipeline can be modeled as shown in Figure 1, in which
Ci’s are combinational logic blocks and Lj’s are latches. Every latch is assumed to be a positive D-latch,
i.e., it becomes transparent when the corresponding clock is high. For simplicity of description,
we describe the approach assuming that complementary clocks are used. However, our approach is
applicable to any type of clocks, including two-phase non-overlapping, four-phase non-overlapping,
and four-phase overlapping [17]. To simplify the discussion, we assume that latches are ideal, i.e., all
their delays as well as their setup and hold times are zero. However, our approach for test
development inherently takes into account the actual setup time, hold time, and clock-to-Q delay of
every latch. The characteristics of real latches are also explicitly considered during the detailed design
of design-for-testability (DFT) circuitry.
Let us assume that C0 is the first combinational logic block of a high-speed pipeline. We
consider the inputs to the latches driving C0 as primary inputs and assume that a new combination of
values is applied at the inputs of block C0 at the rising edge of its driving clock, i.e., the clock
controlling the latches at its inputs. In Figure 1, this is the rising edge of clock φ at time t1. The values
at the outputs of C0 must stabilize some time before the subsequent falling edge of the block’s
receiving clock, i.e., the clock controlling the latches at its outputs. For C0 in Figure 1, this is the
subsequent falling edge of clock φ at time t4. If the values at outputs of C0 do not stabilize by this
time, correct values cannot propagate via latches at the outputs of the block to the inputs of the next
block, C1, and a delay fault exists at the given clock frequency. On the other hand, if the values at the
outputs of C0 stabilize before the subsequent rising edge of its receiving clock, i.e., the subsequent
rising edge of φ at t2, then any change in values is passed to the inputs of the next block, C1, only
after the rising edge of the block’s receiving clock, φ . If the values at the outputs of C0 stabilize after
the subsequent rising edge of its receiving clock, φ , but before the clock’s falling edge, then any
change in values passes immediately via the latches, and thus to the inputs of the next block. Hence, a
3
new combination of values may be applied at the inputs of C1 as early as the subsequent rising edge
of its driving clock, φ , or as late as this clock’s subsequent falling edge. The values at the outputs of
block C1 must stabilize some time before the subsequent falling edge of its receiving clock, φ, and so
on.
Hence, unlike in a flip-flop-based circuit, in a correctly functioning circuit the values at the
inputs of a block in a latch-based circuit may not be applied at a specific time. Nor may the
corresponding response values need to become available at its outputs at a specific time. This allows a
block to borrow time from others in its fan-in/fan-out.
Without any loss of generality, we define a reference time as the earliest time at which new
values may be applied at the inputs of a block, namely the rising edge of the block’s driving clock.
The corresponding reference time for the responses at the outputs of a block is the rising edge of the
block’s receiving clock. In general, the reference times at the inputs and the outputs of a block are the
clock edges at which the latches at its inputs and outputs, respectively, become transparent. The
nominal delay for a block is defined as the time difference between the reference times at its inputs
and outputs. For each block in Figure 1, the nominal delay is half of the clock period, i.e., T/2, where
T is the period of the clock. If transitions at inputs and outputs of each block of a circuit satisfy these
reference times, the circuit will operate at the desired clock frequency. Such a circuit can be viewed
as a nominal circuit where no time is being borrowed by any block.
Now consider a scenario where the values at the inputs of C1 do not arrive before t2, the rising
edge of φ , and arrive at t3 as shown in Figure 1. In this case, we say that C0 is borrowing time from
C1. The time duration t3 – t2 in Figure 1 denotes the amount of time borrowed. Similarly, if the outputs
of C1 do not stabilize before the rising edge of φ (i.e., t4) but do so shortly thereafter, then C1 is said
4
to be borrowing time from the block in its fan-out. For our definition of the reference times, C1 may
borrow time from blocks in its fan-out to accommodate its own large delay and/or to compensate for
the time it lent to C0.
Time borrowing may be intentional in the sense that it may be planned during the design of a
circuit. Even when time borrowing is not planned during the design of a circuit, it may occur
unintentionally if variations and/or defects during fabrication cause such borrowing in some
fabricated copies of the circuit. Note that even when time borrowing occurs, unintentionally or
intentionally, the circuit is fault-free at given clock frequency provided that the values at outputs of
every time borrowing logic block stabilize before the subsequent falling edge of the block’s receiving
clock. Hence, latch-based circuits can enhance performance (i.e., increase clock frequency) by
enabling time borrowing, and improve yield via unintentional time borrowing.
Flip-flop-based pipelines are easy to design and verify using an extensive set of available tools
for synthesis and verification. Hence, most ASIC designers prefer flip-flop-based pipelines. On the
other hand, latch-based pipelines are more difficult and challenging to design and verify because
ensuring correct timing behavior is more difficult and tool support is limited [8]. However, latch-based
pipelines are used in full custom designed high-speed chips, especially in some of their delay
critical parts, due to abovementioned major benefits, namely, higher performance and higher yield at
desired performance [8].
In a latch-based pipeline, if time borrowing is intentionally planned during design (intentional
time borrowing), this enables design of latch-based pipelines that operate at higher clock frequency,
because it is not necessary to carefully balance delays of combinational logic blocks to increase clock
frequency, and because latches are immune to clock skew to some degree. On the other hand, flip-flop-
based pipelines must carefully balance delays of logic blocks in order to achieve high
performance, since flip-flops present hard time boundaries between pipeline stages where no time
borrowing is permitted. Furthermore, clock skew must be budgeted for in the clock period for flip-
5
flop-based circuits. While retiming can balance flip-flop-based pipeline stages to reduce clock period,
in some cases this is not possible. For example, accessing cache memory is a substantial portion of
the clock period, limiting the clock period of the pipeline stage. There also needs to be additional
logic for tag comparison and data alignment [18]. With flip-flops, the only method for increasing the
speed may be to give the cache access an additional pipeline stage to complete if it is the critical path
limiting the clock period [18]. Experimental results in [8] show that latch-based designs are 5–19%
faster than corresponding flip-flop-based designs, for small increase in area.
If unintentional time borrowing occurs in latch-based circuit, which is not planned during design
but occurs in some fabricated copies for a design due to delay variations and/or (minor) defects
during fabrication, then such a fabricated copy of the circuit may operate at desired clock frequency
when the amount of time borrowing is accommodated by subsequent block(s). In this manner,
unintentional time borrowing increases yield at desired clock frequency when an appropriate delay
testing approach is used. On the other hand, a flip-flop-based pipeline without sufficient timing
margin will malfunction at desired clock period when variations and defects cause similar extra delay
in the circuit, leading to reduction in yield.
In summary, latch-based design enhances performance by enabling intentional time borrowing
and improves yield by allowing unintentional time borrowing, compared to flip-flop-based designs.
Importantly, as clock distribution is becoming increasingly difficult, abovementioned performance
benefits are growing. For the above reasons, latch-based pipelines are used in full-custom designed
high-speed circuits, especially in highly delay critical parts of circuit.
However, we can realize these two advantages only if latch-based design is carried out to obtain
high-speed implementations and an appropriate delay test methodology is used. Otherwise, flip-flop-based
pipelines would prevail due to the ease of design, verification, and test.
The approaches of static testing, such as stuck-at fault testing, are similar for both latch-based
and flip-flop-based circuits. The same automatic test pattern generator (ATPG) can be used to
6
generate tests for both architectures, since tests are generated based on the circuit structure without
considering the timing of circuits. The only difference is that latch-based pipelines entail higher cost
when scan DFT is used to target faults in each block individually, because replacing a latch with a
scan latch requires two additional latches in general, whereas replacing a flip-flop with a scan flip-flop
requires an additional multiplexer as shown in Figure 2 [23]. Hence, in this research we focus on
delay testing (path delay testing) of flip-flop-based and latch-based circuits.
In delay testing of flip-flop-based circuits, typically a divide-and-conquer approach using scan
DFT is used where each logic block is tested individually using scan. As soon as an erroneous
response is captured at any flip-flop in the circuit, the chip is identified as being faulty at the given
clock frequency. Such a chip is either discarded or “binned” to be sold at a slower rated clock
frequency. Also, since such an approach separately targets paths within individual blocks, it targets
shorter paths. Hence, higher delay fault coverage is obtained using a smaller number of tests,
compared to the approach without using scan DFT.
On the other hand, latch-based pipelines with time borrowing pose new challenges in delay
testing of latch-based circuits. Unlike flip-flop-based pipelines where each block of logic can be
tested (for delay faults) separately using scan DFT, none of the existing DFT techniques can be used
for delay testing of latch-based circuits with time borrowing, since time borrowing makes it necessary
to target (a.k.a., multi-block paths), i.e., paths that span multiple blocks.
If the same divide-and-conquer approach is used for delay testing of latch-based pipelines, each
logic block will be tested independently using scan DFT. Using the nominal delay for each block, this
7
allows a maximum delay of T/2 (T is the clock period) for each logic block, if complementary clocks
are used as shown in Figure 1. The delay fault coverage we compute from this approach is high in
most cases since short paths are tested using scan. However, any fabricated copy of the chip that has
even one intentional/unintentional time borrowing site will fail the tests applied by such a divide-and-conquer
approach and be discarded. Since we are considering high-speed application of latch-based
circuits, the circuit is likely to contain one or more time borrowing sites. Whenever this is the case,
such a divide-and-conquer approach will lead to zero-yield for latch-based pipelines. In other words,
circuit designers cannot use intentional time borrowing, which suppresses the performance benefits of
latch-based designs. Also, there will be no yield benefit of using latch-based designs because
unintentional time borrowing is not allowed. Consequently, a simple divide-and-conquer delay testing
approach is not appropriate for delay testing of high-speed latch-based circuits as it erodes much of its
benefits.
The objective of this research is to propose an optimal scan-based path delay testing of latch-based
circuits with time borrowing by optimizing the robust path delay fault (PDF) coverage as well
as the test application cost. We also suggest guidelines for latch-based designs such that the
performance and yield benefits of latch-based circuits are guaranteed under the optimal delay fault
coverage.
In Chapter 2, the key challenges in delay testing of latch-based circuits are discussed. This
motivates the need for developing new DFT designs and new approaches for DFT-based delay testing.
In Chapter 3, we focus on maximizing delay fault coverage and present new path delay testing
approach that requires a very small number of scan chain configurations, while guaranteeing
maximum robust PDF coverage. This decreases the overheads due to DFT significantly. We then
prove that the proposed delay testing approach for latch-based circuits always achieves the theoretical
maximum PDF coverage regardless of the length of multi-segment paths targeted.
Furthermore, we propose a new test generation approach that works for any time borrowing
scenario even for cases where only a fraction of the abovementioned scan chain configurations (or
8
other set of configurations) are available. This is especially useful since it allows us to avoid scan
chain configurations that significantly degrade circuit performance.
In Chapter 4, we propose a new test scheduling approach for latch-based circuits to minimize
the test application cost while achieving the maximum coverage that the test generation method
presented in Chapter 3 can achieve. First, we show that conventional test scheduling approaches may
not be applicable due to the unique characteristics of latch-based circuits with time borrowing. We
then present our new formulation of the test cost minimization problem for path delay testing of latch-based
circuits, and present a deterministic approach as well as two heuristic approaches.
In Chapter 5, we compare high-speed latch-based circuits with their flip-flop-based counterpart
designs from the viewpoint of path delay testing, and propose design guidelines for latch-based high-speed
circuits that guarantee that a latch-based circuit achieves higher performance and higher yield
than its flip-flop-based counterpart. We also prove that a latch-based circuit the delay testing
approach proposed in Chapter 3 obtains the maximum path delay fault coverage that is always greater
than (or equal to) the coverage of the corresponding flip-flop-based circuit.
In Chapter 6, we discuss the future research tasks and subjects such as other delay testing
approaches and the hardware design and control issues related to the scan DFT.
9
CHAPTER 2
Background – Challenges in delay testing of latch-based
circuits
In a flip-flop-based circuit, the transition at the output of a path in one block is latched into the
corresponding flip-flop at a specific clock edge before it begins to propagate via a path in the next
block. Hence, if the delay of the path in the first block is excessive, the transition misses the clock
edge and cannot be seen by the path in the next block. Also, if the delay of the path in the first block
is short, the transition at the input of a path in the next block still starts after the appropriate clock
edge. Thus, in delay testing of flip-flop-based circuits, PDFs in one combinational logic block can be
treated independently of the PDFs in the adjacent blocks.
In latch-based circuits, in contrast, test application at the inputs of a block may not always occur
at the rising edge of the driving clock due to time borrowing. If time borrowing never occurs at a
particular latch, transitions are always applied at the rising edge of the driving clock. However, if time
borrowing occurs at a latch, the exact time at which transition is applied to the next block depends on
the amount of time borrowing.
Hence, for delay testing of latch-based circuits, we first need to know the latches that are sites of
time borrowing. Latches that are sites of intentional time borrowing are known prior to DFT design,
test development, and test application. On the other hand, latches that are sites of unintentional time
borrowing may vary from one fabricated copy of the chip (called a chip instance in the following) to
another and hence are not known prior to test application. More importantly, the precise amount of
time borrowing at a site of intentional/unintentional time borrowing varies from one chip instance to
another; even in a particular chip instance it varies from one vector to another.
Hence, one of the biggest challenges in delay testing of latch-based circuits is that it is
impossible to use the scan mode to apply a test at a latch without knowing that the latch is not a site of
10
time borrowing. Furthermore, even when time borrowing is known to occur at a latch, it is practically
impossible to use scan to apply tests where bits are skewed to precisely replicate the arbitrary and
unknown amounts of time borrowing at outputs of various latches. It is important to recall that simply
applying tests at the inputs of a block and observing responses at its outputs at nominal times will
cause many fault-free chips to be unnecessarily discarded. In fact, in circuits where time borrowing
has been intentionally exploited, this can lead to zero yield at the given clock frequency. Hence, we
need to develop a novel delay testing approach that applies tests and captures responses at clock
edges only, while considering intentional/unintentional time borrowing.
Consequently, if a latch is a site of time borrowing, it is necessary to test multi-segment paths,
i.e., paths obtained by concatenating appropriate paths in successive logic blocks separated by latches.
Since many latch-based parts of circuits (e.g., data-paths) contain an astronomical number of such
multi-segment paths [1], the classical test approach, which targets the entire pipeline without DFT,
typically suffers from impractically high test generation complexity, high test application time, and –
for many circuits – meaninglessly low fault coverage. Hence, the use of some new type of DFT is
imperative to reduce significantly test generation and test application times while providing
meaningfully high values of delay fault coverage by targeting shorter paths. The next section explains
these major benefits of exploiting scan DFT.
There are two major benefits of delay testing using scan DFT. First, appropriate use of scan
DFT reduces the number of target PDFs. For example, consider paths via L5 in the two stage pipeline
shown in Figure 1. Let x paths in C0 terminate at L5 and y paths in C1 originate from L5. Then there
exist xy physical paths in the above two blocks via L5. Since, for each physical path, two PDFs – one
with a rising transition at its input and another with a falling transition – must be targeted, a total of
2xy PDFs that pass via the latch (as well as the two blocks) must be targeted. If one can verify, during
testing of a particular chip instance, that no time borrowing, intentionally or unintentionally, occurs at
11
this latch L5, paths in C0 and C1 can be targeted separately. In such a case, the total number of PDFs
corresponding to the latch that must be targeted drops from 2xy to 2(x+y). Since x and y are typically
large, the use of scan reduces the total number of target PDFs. Note that even greater reductions occur
when one considers the above arguments for multi-segment paths that pass via a larger number of
blocks.
Second, the average length of a path targeted during test generation is shortened when the
proposed approach and DFT are used. It is not always possible to propagate a transition robustly
along a path, since sometimes conflicting logic values are required at side inputs of the path for robust
propagation. As the length of a target path increases, typically the possibility of a conflict between
values required at side inputs also increases. The use of scan at latches where no time borrowing
occurs reduces the average length of paths and hence, in many circuits, enhances PDF coverage.
As noted in Sections 2.1 and 2.2, there are unique problems for test generation and design of
DFT circuits to apply tests and capture responses and it is imperative to develop new DFT designs
and a new delay testing technique that takes advantages of these new DFT designs. Developing this
type of new delay testing involves several sub-objectives that jointly define the overall optimization
problem. The sub-objectives include maximization of delay fault coverage, minimization of test
application cost, minimization of test generation time, design of optimal DFT circuitry and so on.
Among these, maximization of delay fault coverage and minimization of test application cost are
two major problems for test engineers, and typically there is trade-off between the two. In other
words, in order to reduce test application cost, delay fault coverage might have to be compromised to
some extent. However, even if we are willing to incur a sufficiently high test application cost, we may
not be able to achieve desired level of delay fault coverage without implementing an appropriate test
generation methodology.
12
Hence, we prioritize the above sub-objectives to define the overall optimization problem as
follows. First, in Chapter 3, we propose a new structural test generation approach that maximizes the
robust PDF coverage by exploiting scan DFT that applies tests and captures responses at clock edges
only. Second, in Chapter 4, we discuss a new test scheduling method to minimize test application cost,
under the condition that the maximum robust PDF coverage proposed in Chapter 3 is maintained.
13
CHAPTER 3
A new delay testing approach
In our approach, a latch may operate in the following four modes.
(1) Normal mode. The latch is transparent when the corresponding clock is high and holds its state
when the clock is low.
(2) Scan mode.
(2a) Scan-in mode. Vectors are loaded via scan-in and applied at the rising edge of the
corresponding clock. Concurrently, the values previously captured in the latches are scanned
out as described next.
(2b) r-capture scan-out mode. The latch captures response at the rising edge of the corresponding
clock for scan out.
(2c) f-capture scan-out mode. The latch captures response at the falling edge of the
corresponding clock for scan out.
It is assumed that time borrowing does not occur at the “primary” inputs and outputs of the
entire latch-based circuit. This is often true because high-speed latch-based pipelines are typically
embedded in a larger flip-flop-based system. We may test each block individually. Alternatively, we
may test any set of contiguous blocks together as a single entity. In either case, we use the term sub-circuit
under test (SCUT) to describe the block(s) under test during a particular phase of testing. Each
SCUT is characterized by what blocks are included and what scan chain configurations (i.e., operation
modes of latches) are used for the latches in the SCUT.
The proposed test generation approach is based on robust path delay testing of path delay faults.
Hence, the following property of robust tests is used throughout this dissertation.
: Any robust test for a target PDF invokes a delay equal to or greater than the delay of the
target path, independently of the presence or absence of other delay variations or faults in the circuit.
14
This is because a robust test for a PDF guarantees that a transition at an on-path line cannot
occur unless a transition occurs at the previous on-path line, independently of the presence or absence
of any other delay variations or faults [7]. This basic property attributed to robust tests is strictly true
under many commonly used delay models. More details can be found in [7]. In particular, the
propagation of a transition along the path may be affected by the existence of non-static values at off-path
inputs. However, the conditions for robust propagation only allow those values at off-path inputs
which may delay but cannot accelerate the on-path propagation.
Under the r-r test application, tests are applied to the inputs of the SCUT at the rising edge of the
driving clock (i.e., the clock that drives the latches at the input of the SCUT), and the responses are
captured at the outputs of the SCUT at the rising edge of the receiving clock (i.e., the clock that drives
the latches at the output of the SCUT). Let us assume that the SCUT is comprised of m consecutive
blocks of logic in a linear pipeline (Ch, Ch+1, Ch+2, ⋅⋅⋅, Ch+m–1), and that the latches at the inputs of
target paths (input latches) in the SCUT are free of time borrowing. Then, the time interval TA(r,r),
denoting the nominal time allocated to the SCUT, is mT/2, where T is the clock period. Hence, the
following is a sufficient (but not necessary) condition for the SCUT to be free of delay faults.
Σ
+ −
=
Δ ≤ ≡
1
( , )
2
h m
i h
i TA r r C m T , (1)
where ΔCi is the maximum delay of any multi-segment path in block Ci. If this condition is violated
for a latch (output latch) at the output of the SCUT, we discover that the SCUT borrows time via the
latch from the next block Ch+m, since a transition arrives at the latch after the output latch of Ch+m–1
becomes transparent. In particular, we have the following result. If every block of a CUT passes r-r
tests at clock period T, the CUT has no delay fault and no time borrowing at that clock frequency.
15
However, one or more SCUT may fail r-r tests due to time borrowing but the circuit may not have a
delay fault. The above observation is generalized to obtain Theorem 1.
. If a CUT can be partitioned, i.e., divided into a disjoint SCUTs that collectively include
all blocks in the CUT, such that each SCUT passes corresponding r-r tests at clock period T, then the
CUT is free of delay faults at that clock frequency [9].
Under the r-f test application, tests are applied to the inputs of the SCUT at the rising edge of the
driving clock and the responses are captured at the outputs of the SCUT at the falling edge of the
receiving clock. Let us assume that the SCUT is comprised of m consecutive blocks of logic in a
linear pipeline (Ch, Ch+1, Ch+2, ⋅⋅⋅, Ch+m–1), and that the latches at the inputs of the SCUT are free of
time borrowing. Let the time interval TA(r,f) denote the maximum time allowable for any multi-segment
path in a given SCUT. Then, the following is a necessary (but not sufficient) condition for
the SCUT to be free of delay faults.
Σ
+ −
=
Δ ≤ + ≡
1
( , )
2
( 1)
h m
i h
i TA r f C m T , (2)
where ΔCi is the maximum delay of any multi-segment path in block Ci. If this condition is violated
for even one SCUT, the entire CUT is proven to have a delay fault at the clock period T. Note that the
r-f test application allows the maximum time duration for transitions to propagate via each path in an
SCUT. Hence, it is necessary for every block to pass r-f tests.
. If any SCUT fails its r-f tests at clock period T, then the circuit has delay faults at that
clock frequency.
: If an SCUT fails r-f tests, it means the delay of SCUT is longer than the maximum time
duration for transition. Since such additional delay cannot be accommodated via time borrowing, the
circuit has delay faults at the given clock frequency.■
16
Consider a scenario where, based on circuit design, we expect extreme time borrowing, i.e.,
where the delay of one or more single block and/or one or more multiple block combination are, in
presence of delay variations or defects, likely to exceed the maximum possible delay allowed given
by the relation described for TA(r,f). In this case, we consider each single block or multiple block
combination where extreme time borrowing is deemed likely to occur as an SCUT and apply suitable
tests to the SCUT using the r-f test application. Obviously, if any of these SCUTs fails any of its r-f
tests, then the entire CUT is identified as having a delay fault at the desired clock period and delay
testing can be terminated immediately.
However, in much of this dissertation, for simplicity of discussion, it is assumed that a chip
under test has either skipped r-f tests (since extreme time borrowing was not expected) or passed all
the r-f tests applied (since extreme time borrowing does not exist).
Each latch in the circuit, other than primary input and primary output latches, is determined
either as being a time borrowing latch (TBL) or as a non-time borrowing latch (NTBL). A latch L is
identified as a TBL if at least one test for an SCUT that terminates at L fails r-r tests by the capture of
an erroneous value at the latch. In contrast, a latch L is identified as an NTBL if, for all SCUTs that
terminate at the latch, no r-r test fails by causing an error at that latch. The test procedure of the
proposed structural delay testing is based on identification of time borrowing status of latches within
the CUT. We now summarize the above observations.
: A latch L is identified as an NTBL if, for all SCUTs that terminate at the latch, no r-r
test fails by causing an error at that latch.
: A latch L is identified as a TBL if at least one test for an SCUT that terminates at L
fails r-r tests by the capture of an erroneous value at the latch.
Consider a latch L that is identified as a TBL via testing. In general, in such a situation a non-empty
subset of paths that terminate at the latch are time borrowing while the set of remaining paths
17
(which might be empty) are non time borrowing. While it is theoretically feasible to develop a
methodology that considers L as time borrowing only with respect to the former set of paths, such a
methodology is practically unimplementable since it requires execution of delay diagnosis on each
copy of chip to identify the former set. Since typically the complexity of such diagnosis is extremely
high, we consider L as time borrowing wich respect to all paths that terminate at L as stated below.
: If a latch L is identified as a TBL, all multi-segment paths that pass via L must be
targeted by subsequent testing, despite the fact that some of paths in fan-in of L may be not borrowing
time.
The basic idea behind the proposed delay testing approach is to test the first logic block to
identify sites of time borrowing and adaptively add the subsequent logic blocks by deciding whether
to target multi-segment paths or single-segment paths. This process continues to the last logic block.
Consider the three-stage linear pipeline shown in Figure 3. Latches at inputs of logic block Ck are
called level-k latches. Initially, r-r tests are applied to the first logic block (C0). If C0 passes all these r-r
tests, time borrowing does not occur at any of level-1 latches and hence the next logic block (C1)
can be tested separately from C0. Likewise, if C1 also passes all r-r tests, time borrowing does not
occur at any of level-2 latches and hence the next logic block (C2) can be tested separately. Since time
borrowing does not occur at any latch in this particular case, each logic block is individually tested,
which tends to decrease the number of target paths and significantly increase the coverage.
On the other hand, if C0 fails r-r tests, we target multi-segment paths that span C0 and C1,
denoted as C0 + C1. These multi-segment paths are targeted by using scan DFT at level-0 and level-2
latches and configuring all level-1 latches in normal mode. However, this scheme of simply
combining consecutive logic blocks to target multi-segment paths does not improve coverage
significantly, especially in cases where time borrowing occurs extensively across the pipeline. For
example, in the worst case where C0 + C1 also fails r-r tests, we will end up testing the entire three-
18
stage pipeline (C0 + C1 + C2) jointly, while obtaining the same coverage as the classical approach at
the additional cost of testing C0 and C0 + C1.
Hence, we developed an SCUT-based approach [9] that targets multi-segment paths via TBLs
only, and targets shorter paths that start at NTBLs. This is done by configuring NTBLs in scan mode
and TBLs in normal mode. For instance, suppose the testing of C0 identifies L10 only as a TBL. Then,
in the next test session for C0 + C1, we configure L11 and L12 in scan mode, and L10 in normal mode. In
this manner, the two-segment paths via L10 are tested and the single-segment paths starting at L11 and
L12 are tested. Similarly, the results of this testing of C0 + C1 will decide the time borrowing status at
level-2 latches, which will adaptively determine the configurations of level-2 latches in the next test
session that includes C2.
Although this approach [9] improved the coverage significantly in many cases, the experiments
show that the coverage is still low for cases where time borrowing occurs extensively. Next we
propose an advanced SCUT-based approach that further improves the coverage even when time
borrowing occurs extensively, and reduces the complexity of DFT circuitry and scan chain routing.
In Sections 3.3 through 3.6, for simplicity of explanation, the discussion uses the two-stage
linear pipeline shown in Figure 4. However, the ideas proposed are applicable to general latch-based
networks. Assume that there are j latches at the inputs of the first combinational logic block C0 (level-
19
0 latches), k latches between C0 and C1 (level-1 latches), and l latches at the outputs of C1 (level-2
latches).
C0 L11 C1
L1i
L1k
L01
L02
L0j
L21
L22
L2l
Level-0 Level-1 Level-2
A multi-segment path p
α
β
As in the approach described above [9], the paths in the first logic block C0 (= SCUT0) in Figure
4 are tested by themselves by operating level-0 latches in scan (scan-in) mode and level-1 latches in
scan (r-capture scan-out) mode. The purpose of testing SCUT0 is to identify TBLs among the level-1
latches. Assume that such testing detects time borrowing at a subset of level-1 latches denoted by the
set LTB, where
LTB = {L1i | L1i is a TBL, 1 ≤ i ≤ k}.
The remainders of the level-1 latches are not time borrowing sites and constitute a set LNTB, i.e.,
LNTB = {L1i | L1i is not a TBL, 1 ≤ i ≤ k}.
In this CUT we are interested in testing multi-segment paths in the CUT that span C0 and C1.
When any particular multi-segment is being tested, the target path passes via one level-1 latch, which
is referred to as the on-path latch for the path. All the other level-1 latches are called off-path latches
for the target path. Sections 3.4 and 3.5 describe how to test the multi-segment paths with high PDF
coverage using DFT by treating on-path and off-path latches differently and by considering LTB and
LNTB.
Consider a case where the objective of delay testing is to test an arbitrary multi-segment path p
comprised of a sub-path α in C0 and a sub-path β in C1 in the circuit shown in Figure 4. α and β are
20
connected by a level-1 latch L1i. In this case the latch L1i is the on-path latch and every other level-1
latch is an off-path latch.
First, consider the case where the on-path latch L1i is identified as a TBL during testing of
SCUT0 (i.e., L1i ∈ LTB). In order to test multi-segment paths that pass via L1i ∈ LTB, scan mode cannot
be used for L1i, since no known DFT circuitry can replicate an appropriately skewed test application
and response capture corresponding to the precise amount of time borrowing, which varies from
vector to vector and from one chip instance to another. Therefore, only normal mode can be used at
TBL L1i during testing of any multi-segment path that passes via the latch.
Now consider the case where the on-path latch L1i is identified as a NTBL during testing of
SCUT0 (i.e., L1i ∈ LNTB). Theorems 3 and 4 identify the relationship between (i) testing α and β
individually as sub-paths, and (ii) testing α and β jointly (denoted as α + β) as a multi-segment path.
(i.e., α + β stands for the testing of the path p by configuring L1i in normal mode in the SCUT
comprised of C0 and C1.)
If any robust test for α passes when C0 is tested by itself and any robust test
for β passes when C1 is tested by itself, then the worst-case delay of multi-segment path p via L1i is
within the limit imposed by the given clocks.
: We will prove this by contradiction. Let us start by assuming that the path p, i.e., α + β, fails
when SCUT comprised of C0 and C1 are tested together. In this case, the delay of p exceeds the sum
of nominal delays of C0 and C1. Assuming that no time borrowing occurs at the inputs of C0, this can
occur only under the following two scenarios: (i) delay of α is greater than the nominal delay of C0, or
(ii) delay of β is greater than the nominal delay of C1. In the former case, any robust test for α when
21
C0 was tested by itself would have failed and in the latter case any robust test for β when C1 was
tested by itself would have failed (or both). Hence, if each α and β pass robust tests when respective
blocks are tested, this shows that the delay of p cannot exceed its nominal delay.■
If the multi-segment path α + β is robustly testable in the multi-segment
SCUT comprised of C0 and C1, α in C0 by itself and β in C1 by itself are both individually robustly
testable.
: When α + β is targeted in an SCUT comprised of C0 and C1, the conditions necessary (and
sufficient) for robust detection of p require that α be robustly sensitized within C0, as in the case
where α is tested in C0 by itself. Therefore, if α is not robustly testable in C0 by itself, then no robust
test exists for any multi-segment path that includes α. Similar reasoning also applies to β.■
In summary, if we conclude that time borrowing does not occur at the latch after testing the
block(s) in the fan-in of the latch, we can separately test the sub-paths in the fan-out of the latch
instead of testing multi-segment paths that pass via the latch.
While testing a sub-path in the fan-out of latch L1i at which no time borrowing occurs, L1i may
be configured either in normal mode or in scan mode. Next, we compare these two approaches in
Sections 3.4.2.1 and 3.4.2.2.
: The multi-segment path of α + β is targeted by configuring L1i in normal mode.
In this approach, the on-path latch L1i is configured in normal mode although L1i is a NTBL.
This may be the case if the DFT design does not support required scan mode operation. Even if the
normal mode is in use, we can attain higher test quality based on Theorem 4 and the fact that time
borrowing does not occur at L1i by exploiting the following property.
In order to test the sub-path β because L1i is a
NTBL, the test generation procedure targets β as parts of multi-segment paths that start at any level-0
latch and include β by configuring L1i in normal mode. Note that the sub-paths in the fan-in of L1i are
used only to produce a rising or a falling transition (as appropriate to test β) at the output of L1i, and
22
the logic values within C0 need not robustly propagate the transition along any particular path in the
fan-in of L1i. As long as a desired transition is initiated at the output of L1i, robust propagation of the
transition is required only for the sub-path β. By doing so, the number of target PDFs is also reduced
because only β is targeted instead of all multi-segment paths that include β.
In summary, it is shown in Theorem 4 that for a path in the fan-in of L1i (e.g., α), testing using
SCUT0 provides equal or higher robust coverage compared to testing using the multi-segment SCUT
comprised of C0 and C1. Also for the sub-paths in the fan-out of L1i (e.g., β), testing using Property 1
is as good as testing multi-segment paths in an ordinary manner because Property 1 does not require
robust sensitization along any particular path within C0. Hence, Property 1 shows that robust test
coverage can be further improved even without using the scan mode at the on-path latch, provided
that time borrowing is known not to occur at the on-path latch.
: The sub-path β is targeted by operating L1i in scan mode.
In this approach, we only test the sub-path β that originates at a NTBL L1i. As L1i is configured
in scan mode, the sub-path β of the original target path (α + β) will be tested separately and the robust
coverage for sub-paths like β will be combined with the robust coverage for sub-paths like α in the
fan-in of the latch.
If L1i can be configured in scan mode, which is typically true in cases where L1i is connected to
the scan-out chain for time borrowing detection, Approach 2 is preferred to Approach 1 because
Approach 2 provides equal or higher coverage than Approach 1, as per Theorem 4.
Suppose a multi-segment path p that comprises α and β passing via L1i in Figure 4, is targeted.
The configuration of the on-path latch L1i is determined as explained in Section 3.4 (i.e., normal mode
is used if L1i is a TBL; either normal or scan mode is used if L1i is a NTBL).
Note that in both cases, any off-path latch can be configured in scan mode, independently of
whether or not that off-path latch is a site of time borrowing. This is due to the following two reasons.
23
First, if a static value is applied via scan at an off-path latch, the time borrowing status of the off-path
latch has no impact on the on-path delay. Second, even if a rising or a falling transition is applied via
scan, a robust test for a target path like β does not require off-path transitions to satisfy any specific
timing requirement as per Lemma 1. (In particular, an early off-path transition cannot reduce the on-path
delay. In our scheme the off-path transition is never later than in the normal mode.) Hence, even
for a latch where time borrowing is proven to occur, scan mode operation does not violate the robust
delay test conditions, provided that the latch is off-path.
Now let us consider two alternatives, Alternative-1 and Alternative-2 that operate the on-path
latch in the same mode (that meets above requirements) and differ only in the configuration of the
off-path latches. Let the set of the off-path latches at level-1 configured in scan mode in Alternative-1,
A1
scan, be a proper subset of the set of off-path latches configured in scan mode in Alternative-2, A2
scan
(i.e., A1
scan ⊂ A2
scan). Note that Alternative-1 includes the case where scan mode is used for none of
the off-path latches, i.e., A1
scan = φ . Hence, the classical approach where only normal mode is used for
every on-path and off-path latch is a special case of Alternative-1. By comparing the two alternatives,
we obtain the following results.
Any robust test for the multi-segment path p using Alternative-1 or
Alternative-2 invokes a delay equal to or greater than the delay of p.
: First, consider the case where the on-path latch L1i is configured in normal mode. In both
alternatives, scan mode is used for each level-0 latch and each level-1 latch in A1
scan and A2
scan. If two
different test vectors are applied in the two alternatives, the arrival times at the output of the on-path
latch L1i may be different in the two cases. However, due to the characteristic of robust tests given in
Lemma 1, the delay invoked for α and via L1i will be guaranteed to be equal to or greater than the
worst-case delay of sub-path α plus the delay via L1i. Next, for the propagation of this transition along
β, the values applied at off-path level-1 latches may be different in two alternatives. However, also
due to Lemma 1, the subsequent propagation along β will invoke overall delay equal to or greater
than that of the target path p.
24
Second, in case where the on-path latch L1i is configured in scan mode when L1i is a NTBL, the
transitions in both alternatives depart from L1i at the rising edge of the clock driving L1i. The
subsequent propagation along β will invoke overall delay equal to or greater than that of the target
path p due to Lemma 1.■
If a multi-segment path p is robustly testable in Alternative-1, then it is
robustly testable in Alternative-2.
: Note that both alternatives require the same set of conditions for the values at on-path lines
and off-path inputs for robust detection of p. We can specify independent logic values (i) at every
level-1 latch that belongs to A1
scan in Alternative-1 and A2
scan in Alternative-2, respectively, as well as
(ii) at all level-0 latches in both alternatives. Since A1
scan ⊂ A2
scan, Alternative-2 provides a superset of
possible value assignments to satisfy the same set of conditions for robust detection of p.
Consequently, if a robust test exists for p in Alternative-1, then one surely exists in Alternative-2.■
Theorem 5 shows that the test quality obtained by any robust test applied using Alternative-2 is
equal to the test quality obtained by any robust test applied using Alternative-1. Theorem 6 shows that
robust delay fault coverage for Alternative-2 is definitely equal to and may be superior to that for
Alternative-1.
Hence, if we can use a scan chain configuration described in Alternative-2 to target a multi-segment
path p, then we need not use a scan chain configuration described in Alternative-1 to test the
path. The following result is a corollary to Theorems 5 and 6 assuming that the on-path latch L1i is a
time borrowing site.
While testing a multi-segment path via a latch at which time borrowing is known to
occur, the best robust test quality and the best robust coverage can be obtained by operating the on-path
latch in normal mode and all off-path latches in scan mode (single-normal configuration),
provided that DFT circuitry and control signals allow such a combination of modes.
For example, suppose there are four latches at level-1 of Figure 4, and testing of SCUT0 shows
that time borrowing occurs at L12, and multi-segment paths that pass via L12 are targeted. In this case,
25
normal mode is required at L12 since it is the on-path latch and a site of time borrowing. Depending
on the configuration of the off-path latches, 8 (=23 ) configurations may be used as shown in Figure 5.
The relationships among different configurations given by Theorem 6 are represented by arrows
in Figure 5. If a path via the on-path latch (L12) is robustly testable using the configuration specified
by the destination of an arrow, the path is robustly testable using the configuration specified by the
source of the arrow.
The following result is a corollary to Theorems 5 and 6, this time assuming that the on-path
latch L1i is a NTBL.
While testing a multi-segment path via a latch at which time borrowing does not occur,
the best robust test quality and robust coverage can be obtained by operating the on-path latch as well
as all off-path latches in scan mode (all-scan configuration), provided that such a configuration is
supported by the DFT hardware and control.
We can modify Figure 5 to include the remaining eight possible scan chain configurations which
have L12 in scan mode and add corresponding arrows to show the relationships between different scan
chain configurations for this case where L12 is a non-time borrowing on-path latch.
26
If a target path p is tested using a configuration where one or more off-path latches are in normal
mode, then we can use the following property to modify the value applied at output of any latch
where no time borrowing occurs and yet is configured in normal mode.
The output of a NTBL is always hazard-free, because
data stabilizes at the latch input before the latch becomes transparent.
By considering both hazardous and hazard-free values at the input of each such latch even when
a hazard-free value is desired at the latch’s output, robust tests may be found for some paths for
which such tests may not otherwise be found. Figure 6 shows an example case where time borrowing
is detected only at L1 but both latches are operating in normal mode to test a path via L1. The falling
transition is propagated via L1 to test the path shown in bold. Robust propagation of the falling
transition at the on-path input of G4 requires static-1 at its off-path input, which is the output of L2.
However, the output of G3 cannot have a static-1 signal because the values at the inputs of G3 are
already determined by the on-path values as a rising transition at one input and a falling transition at
the other. Hence, a conventional test generator will be unable to find a robust test for this path via L1.
On the other hand, our ATPG (automatic test pattern generator) exploits Property 2, and considers
hazardous-1 signal as well as static-1 at the input of L2. Hence, by exploiting Property 2, our ATPG
can successfully generate a robust test for the target path and improve coverage.
27
The above results provide a significant reduction in the number of scan chain configurations that
are required to maximize robust PDF coverage, even when time borrowing occurs at unexpected
latches (i.e., latches that are not sites of intentional time borrowing). In [9], the fully-adaptive
approach requires the DFT circuitry to support 2k scan chain configurations at a level with a total of k
latches. However, Corollary 1 and Figure 5 show that when we detect time borrowing at the ith latch
in the level, then the configuration in which the ith latch is in normal mode and all the other latches are
in scan mode, by itself, maximizes the coverage for all multi-segment paths that pass via the ith latch.
Hence, no matter which and how many of the latches at the level are sites of time borrowing, the
robust PDF coverage can be maximized for the paths that pass via TBLs if the DFT supports the
following k single-normal configurations: (n, s, s, ···, s), (s, n, s, ···, s), (s, s, n, ···, s), ··· , (s, s, s, ···, n),
where n denotes normal mode and s denotes scan mode. As per Corollary 2, the all-scan configuration
(s, s, s, ···, s) provides the maximum coverage for all paths that pass via the latches where no time
borrowing occurs. Of course, we need the all-normal configuration (n, n, n, ···, n) to support normal
circuit operation. We call the above k+2 configurations as the
, meaning that all normal, all-scan, and every single-normal configurations are
available in every level of latches.
By decreasing the number of required scan chain configurations from 2k to k+2 in a level of k
latches, the proposed approach reduces the complexity of DFT circuitry and scan chain routing. This
is a significant improvement over [9]. In general, we have the following results.
Maximum possible robust PDF coverage can be attained for any latch-based circuit
independently of the time borrowing scenario, provided that latches at inputs of every combinational
block can be configured in (n, n, ···, n, n), (s, s, ···, s, s), (n, s, ···, s, s), ···, (s, s, ···, n, s), (s, s, ···, s, n),
i.e., the all-normal, the all-scan-in, and all possible single-normal configurations.
28
For a faulty chip instance, it is not necessary to cover the entire CUT since a faulty chip instance
is discarded as soon as a delay fault is identified. In particular, a delay fault in a faulty chip instance
can be detected by any target path in an SCUT that terminates at a primary output and fails r-r tests
(under the assumption that r-f tests are not used), regardless of time borrowing status at the latch
where the target path begins. In other words, the objective of delay testing for faulty chip instances is
not to compute the delay faulty coverage but to identify a delay fault at minimum test application cost.
In contrast, for fault-free chip instances, we must test all necessary target paths that are required
to cover the entire CUT in order to compute the delay fault coverage. Essentially, we are interested in
targeting every path of CUT that starts from a primary input and terminates at a primary output. A
multi-segment path p that spans from a primary input to a primary output can be tested either by itself
as a single target path or by multiple sub-paths that partition p using scan, where every partition is
made only at a NTBL. For instance, suppose that p starts from a primary input L0, passes via L1, and
terminates at a primary output L2. Let us assume that L1 is known as a NTBL and r-r tests are applied
to the sub-path from L0 to L1 and the sub-path from L1 to L2. If both sub-paths pass the r-r tests, we
can say p is robustly tested as proven by Theorems 3 and 4.
In summary, for fault-free chip instances, it is required for a test generation algorithm to target
every path from a primary input to a primary output either as a single target path or as multiple
disjoint sub-paths where every partition is made only at a NTBL, such that the coverage is maximized
by selecting best scan chain configurations as per Theorems 3 through 6 and Corollaries 1 and 2. All
these required tests for robustly testable paths/sub-paths must be generated so that the test generation
may conclude and fault coverage can be computed. Hence, we obtain the following result.
The maximum robust PDF coverage is obtained if scan chain configurations are selected
as per Corollaries 1 and 2 assuming the optimal set of scan chain configurations are available, and
every robustly testable path from primary input to primary output is tested either by itself or by
multiple disjoint sub-paths that partition the path at NTBLs.
29
If the optimal set of scan chain configurations (i.e., all-normal, all-scan, every possible single-normal
configurations) are available, we prove in Section 3.8.1 that the proposed test generation
approach under the optimal set of scan chain configurations guarantees the optimal robust PDF
coverage. Hence, no other path delay testing approach for latch-based circuits can obtain higher
robust PDF coverage than that of our proposed approach.
In Section 3.8.2 we describe the test generation procedure under the optimal set of scan chain
configurations. We show that test generation for any single/multi-segment path in CUT is
significantly simplified due to the optimality of coverage the proposed approach achieves.
In general, path delay testing of a structurally long path in a circuit is difficult because test must
meet specific requirements at many off-path inputs to sensitize the path. As noted in Chapters 1 and 2,
multi-segment paths must be targeted in latch-based high-speed pipelines in case time borrowing
occurs. This is believed to increase the complexity of test generation. However, our proposed latch-based
delay testing approach always achieves theoretical maximum delay fault coverage of a latch-based
circuit regardless of length of multi-segment paths being targeted or the complexity of test
generation. This can also reduce the complexity of test generation procedure significantly.
We obtain the following new results (Theorems 9, 10, 11, and 12), from which we prove that our
latch-based delay testing approach achieves theoretical maximum path delay fault coverage, provided
that the optimal set of scan chain configurations are available.
First, in Theorem 9, we identify paths for which it is structurally impossible to generate a robust
test using any scan-based path delay testing methods.
If a single-segment path q in block Ci is robustly untestable when Ci is tested by itself,
then any multi-segment path Q that includes q is also robustly untestable using any scan-based path
delay testing method which controls and observes values only at latches.
30
It is given that a robust test does not exist for the path q, using any path delay testing method
where values are applied at the latches at inputs of Ci and responses captured at outputs of Ci. Note
that a robust test does not exist for q even when latches at inputs of Ci are independently controlled.
Hence, any multi-segment path Q that includes q cannot be robustly tested since no test for Q can
robustly sensitize its sub-path q, regardless of delay testing method used.
Suppose that the total number of paths from primary
inputs to primary outputs in a latch-based circuit is N. If m paths out of these N paths include at least
one single-segment sub-path that is robustly untestable, then the theoretical maximum path delay fault
coverage of the latch-based circuit is (N – m)/N for any scan-based method.
According to Theorem 9, it is proven that no scan-based path delay testing approach can
generate a robust test for m paths, since they include at least one robustly untestable single-segment
sub-path. Hence, no scan-based path delay testing approach can robustly test more than N – m paths
in the circuit.
Next, the following theorems prove that the proposed latch-based delay testing approach is
guaranteed to robustly cover the remaining N – m paths regardless of time borrowing status inside the
circuit, attaining the theoretical maximum coverage.
If every single-segment sub-path that is included in a k-segment path P in an n-stage
latch-based pipeline (1 ≤ k ≤ n) is robustly testable, then P is always robustly testable by the latch-based
path delay testing approach proposed in this chapter.
For simplicity, we first assume that P is a two-segment path where two single-segment paths p
in C0 and q in C1 are connected via latch LD. If p is robustly testable in C0 with a rising (falling)
transition arriving at LD and q is robustly testable in C1 with a rising (falling) transition departing
from LD, then we prove that the multi-segment path P comprised of p and q is robustly testable in
SCUT comprised of C0 and C1 by scanning all off-path latches between C0 and C1. As shown in
Figure 7, p is a path from LB to LD in C0 and q is a path from LD to LH in C1. Let a robust test Testp for
31
p in C0 apply vector (VA, VB, VC) at the input latches LA, LB, and LC, where VB is a transition that
propagates via p and terminates at LD with a rising (falling) transition. Let a robust test Testq for q in
C1 apply vector (VD, VE, VF) at the input latches LD, LE, and LF, where VD is a rising (falling) transition
that propagates via q. When we target the multi-segment path P comprised of p and q in an SCUT
comprised of C0 and C1 using the latch-based delay testing approach, we can scan LA, LB, and LC as
well as two off-path latches LE and LF, regardless of time borrowing status at LD, LE, and LF according
to Section 3.5. Recall that the on-path latch LD can be always configured in normal mode regardless
of time borrowing status according to Section 3.4. A simple combination of Testp and Testq becomes a
robust test vector (VA, VB, VC; VE, VF) for the two-segment path P, where VD value is a rising (falling)
transition implied by the values of VA, VB, and VC. An example is shown in Figure 7. Note that as
Testp and Testq are generated by controlling all off-path latches completely independently, robust test
(VA, VB, VC; VE, VF) for P is also generated by controlling all off-path latches completely
independently under the condition that the same type of transition is implied at LD. We can easily
generalize the above results for C0 and C1 with arbitrary number of inputs. Subsequently, we can
easily generalize these results to paths passing via an arbitrary number of blocks. Hence, we can
prove that the proposed latch-based delay testing approach is guaranteed to generate a robust test for
any multi-segment path that is comprised only of robustly testable single-segment paths.
φ φ φ
φ φ φ φ
32
Theorems 9, 10, and 11 lead to the following result.
The proposed latch-based path delay testing approach is guaranteed to achieve the
theoretical maximum delay fault coverage by configuring all off-path latches in scan mode.
Note that Theorem 12 implies that we can achieve the maximum robust PDF coverage even
when we target only the longest multi-segment paths that start from a primary input to a primary
output, at the cost of high test application cost. In other words, the proposed approach optimizes the
robust PDF coverage even when time borrowing occurs ubiquitously throughout CUT, overcoming
one of the inherent delay testing problems of latch-based circuits with time borrowing. Also, the
maximum robust PDF coverage obtained by the proposed approach is the same as the robust PDF
coverage achievable by testing every block of pipeline separately in a divide-and-conquer fashion.
As implied in Theorems 11 and 12, a robust test for a multi-segment path can be constructed
simply by combining tests for single-segment sub-paths of the target multi-segment path. Hence, the
proposed test generation approach initially generates and stores tests for all single-segment paths.
Similar to the basic delay testing strategy described in Section 3.2.4, test application starts from the
first stage of a pipeline and gradually expands/moves to subsequent stages. According to Corollaries
1 and 2, the all-scan configuration is used at a level of latches where the on-path latch is identified as
a NTBL and the single-normal configuration is used at a level where the on-path latch is identified as
a TBL. The test procedure is illustrated next.
First, the tests for single-segment paths in the first block C0 are applied and sites of time
borrowing and non-time borrowing are identified at the level-1 latches. For NTBLs at level-1, single-segment
paths in the fan-out of these latches are tested using the single-segment tests in C1 that are
already generated and stored, using the all-scan configuration at level-1 latches. For TBLs in level-1,
a single-normal configuration is used that configures the TBL in normal mode, and tests for each two-segment
path via the TBL are constructed by combining the tests for the corresponding two single-
33
segment sub-paths in C0 and C1, respectively, only excluding the value for the on-path latch that is
now configured in normal mode. In the same manner, single-segment paths in C2 or multi-segment
paths in C1 + C2 and/or C0 + C1 + C2 are tested based on the time borrowing status at level-2 latches.
This procedure continues until the last block is considered.
The next section describes our approach for test generation under any set of available scan chain
configurations – even those that do not include all of the above k+2 scan chain configurations.
Some scan chain configurations may not be allowed due to considerations such as performance
overheads associated with using scan at particular latches. Also, to further reduce DFT overheads, a
small number of scan chain configurations may be identified during circuit design using the
probability of time borrowing at each latch. Restrictions on scan chain configurations, however,
trigger the complication of not having the optimal configuration available to test a target path under
the particular time borrowing detected in a particular instance of the circuit under test. In this context,
we propose and demonstrate a new test generation approach that optimizes coverage by considering
the time borrowing status of a CUT in combination with the available scan chain configurations. We
demonstrate that this new approach exploits the properties and the theorems presented above for any
available set of scan chain configurations to provide high coverage.
We assume that the greater the flexibility in the operation of a latch, the higher the overall DFT
overheads. We also assume that the available configurations of latches at each level are determined
prior to test generation (and definitely before any chip instances are tested).
Of course, this test generation algorithm directly covers the case where all scan chain
configurations are available. Property 2 is exploited if a NTBL is operating in normal mode as an off-path
latch.
34
One of the most important parts of the test generation under restrictions on the scan chain
configurations is the selection of the best available configuration(s) for each test session. This
selection process is essentially based on Corollaries 1 and 2 and Figure 5. For multi-segment paths
that pass via a latch where time borrowing occurs, the best configuration is the single-normal
configuration in which only the on-path latch is in normal mode. If this single-normal configuration is
not available, a configuration should be chosen such that the on-path latch is in normal mode and as
many off-path latches are in scan mode as possible, based on Theorems 5 and 6.
In some cases, multiple configurations may be used to target a given set of path to generate a
test. For example, suppose that the circuit shown in Figure 4 has four level-1 latches L11, L12, L13, and
L14. If multi-segment paths passing via L12, which is identified as a TBL, are being targeted, the
configuration (s, n, s, s) is optimal requiring L12 in normal mode, as shown in Figure 5. However,
suppose only the following configurations are supported by DFT: {(n, n, n, n), (s, s, s, s), (s, n, n, n),
(s, n, s, n), (s, n, n, s)}. Among the configurations, we will use two, namely (s, n, s, n) and (s, n, n, s),
since (a) both of these configurations provide better coverage than (n, n, n, n) and (s, n, n, n) (see
Figure 5), and (b) each one of these configurations may provide coverage for some paths which the
other configuration may not cover (since there is no arrow from either of these configurations to the
other in Figure 5). We have developed an algorithm that identifies a minimal subset of the available
scan chain configurations to be used for testing any set of target paths, under any given scenario of
time borrowing.
In general, there may exist multiple versions of SCUTs in one test session (each session differs
in the last stage of the target SCUTs) because the best scan chain configurations may be different for
different target paths (details will be discussed in Section 3.9.3). Since multiple versions of SCUTs
are tested, it is also necessary to avoid testing the same path many times. Our proposed ATPG selects
the best set of SCUTs, manages multiple versions of SCUTs, avoids any unnecessary repetition in
testing of paths, and properly computes the delay fault coverage for the entire pipeline. We have
35
developed an algorithm that identifies the subset of all available scan chain configurations to be used
for testing of any set of target paths, under any given scenario of time borrowing (Figure 8).
Procedure:ATPG_MultiSCUTs( ){
Read the pipeline circuit file and available latch configurations;
Initialize SCUT_list[level] for each level;
For each level {
/* select best configurations for the latches in the current level */
For each latch of the current level {
If (time borrowing = true & corresponding single-normal mode is available)
Select the single-normal mode;
Else if (time borrowing = false & all-scan mode is available)
Select the all-scan mode;
If (no configuration is selected above)
Select all compatible configurations modes μ1, μ2, ···, μr;
For all pairs of configurations μj and μk {
If μj has a superset of latches in scan than μk,
Then, eliminate μk;
}
}
/* construct SCUTs */
For each configuration selected by at least one latch {
If (selected configuration consists of scan modes only)
Construct an SCUT with a single stage;
Add new SCUT to SCUT_list[level];
Else
Combine the selected configuration of current level with
all entries of SCUT_lilst[level – 1];
Add new SCUT(s) to SCUT_list[level];
}
For all latches of all levels within the longest SCUT {
If (time borrowing = false)
Initialize sub-path_list[ ] for this latch to trace tested paths;
}
For all stage inputs within the longest SCUT {
Initialize sub-path_list[ ] for the stage input to trace tested paths;
}
For each SCUT in SCUT_list[level] {
Based on the latch configurations,
Remove/inactivate transitive fanins of latches in scan mode;
Remove/inactivate transitive fanins of stage outputs except for
those in the last stage;
Determine the current primary inputs and primary output;
/* Test of an SCUT */
Call TestATPG procedure for the current SCUT {
For each target path {
Clear line values;
PreProcessRobust( ) {
Robustly sensitize the target path;
If (any line is removed), skip the target;
If (latch is met that is in normal w/o time borrowing)
Target the path starting at this latch;
}
ATPGprocedure( ) {
Generate test for the target path;
Removed/inactivated parts should be ignored;
Implication( ) considering glitch-free signal at
output of latches w/o time borrowing;
Write test vector to a file;
Accumulate PDF coverage information;
A path is not tested more than once;
}
}
}
} /* end of testing all SCUTs of the current level */
Get time borrowing results at the current output latches;
}
36
The proposed test generation approach is illustrated using a three-stage linear pipeline shown in
Figure 9. Suppose that for level-1 latches, (LD, LE, LF), DFT is designed to support three
configurations: {(n, n, n), (n, s, s), (s, s, s)}; and for level-2 latches, (LG, LH, LI), to support two
configurations: {(n, n, n), (s, s, s)}. It is assumed that time borrowing occurs at LD, LE, and LI in a
copy of the chip under test.
The SCUTs at each level are shown in gray in Figure 9. Bold solid lines are used to represent the
paths targeted in each SCUT, and dotted lines are used to represent the paths that are not targeted but
used to apply values at off-path inputs. The fan-ins of latches operating in scan mode are ignored in
Figure 9 since they are not considered by the ATPG. As shown in the figure, the hazard-free property
described in Property 2 is exploited when non-time borrowing latches, LF, LG, and LH, operate in
normal mode as off-path inputs.
The test procedure is summarized as follows. First, C0 is tested (SCUT0) and time borrowing is
detected at LD and LE. To target the two-segment paths via LD, the configuration (n, s, s) is selected at
level-1 latches to obtain SCUT10. To target the paths via LE, (s, n, s) is desired but not available.
Therefore, (n, n, n) is selected for level-1 latches to obtain SCUT11. However, Property 2 is exploited
at the non-time borrowing off-path latch LF. To target the sub-paths starting from LF, (s, s, s) is
selected for level-1 latches to obtain SCUT12. During testing of SCUT10, SCUT11, and/or SCUT12, time
borrowing is detected at LI. For the sub-paths starting at LG and LH, the best configuration (s, s, s) is
used for the level-2 latches. Lastly, to test the multi-segment paths via LI, (n, n, n) is selected for the
level-2 latches. When combined with each of the three previous SCUTs, namely SCUT10, SCUT11, and
SCUT12, this gives rise to SCUT21, SCUT22, and SCUT23.
37
The proposed approach is applied to several circuits for various time borrowing scenarios under
a diverse set of available scan chain configurations. The circuits include a five-stage linear pipelined
array multiplier, five- and ten-stage versions of a linear pipeline that uses copies of the circuit C17
from the ISCAS ’85 benchmark suite (the connections among the stages are based on the pipeline
used in [9]), five- and ten-stage versions of a pipeline MIN (minimum vector selector from [9]), and
five- and ten-stage versions of a pipeline that uses copies of T1 from [1].
To verify the improved coverage provided by the proposed method, the robust PDF coverage
and the number of tests are compared for four different approaches under the given scan chain
configurations.
38
(1) The entire CUT is tested as a single SCUT, since classical
approaches cannot use DFT for delay testing of latch-based circuits with time borrowing.
(2) The approach in
[9] is modified to deal with cases where not all scan chain configurations are available.
In particular, the approach in [9] does not consider any restriction on the scan chain
configurations, and configures the latches such that (a) normal mode is used for all latches that are
sites of time borrowing, and (b) scan mode is used for all latches that are not sites of time borrowing.
Note that condition-(b) is not required but used to attain higher coverage. Hence, the extended version
of the approach in [9] (ITC03.ext) is implemented such that it chooses a single configuration for a
level of latches that satisfies condition-(a) and has the most NTBLs in scan mode. If multiple
configurations satisfy these two conditions and have the same number of latches in scan mode,
ITC03.ext arbitrarily selects one of them.
(3) We claim that the approach
we propose improves the test quality due to two new features:
(F1) Improvements due to Theorems 5 and 6 (and Corollaries 1 and 2), which suggest using scan
mode for as many off-path latches as possible.
(F2) Improvements due to Property 2 that utilizes the hazard-free property at the outputs of
latches that are not sites of time borrowing but configured in normal mode.
The first version of the proposed method, Proposed.v1, implements F1 only.
(4) The final version of the proposed approach,
Proposed.v2, implements both new features, i.e., F1 as well as F2.
The robust PDF coverage for ITC03.ext is always greater than or equal to that for the classical
approach, regardless of what scan chain configurations are available. Proposed.v1 may improve test
quality compared to ITC03.ext if Theorems 5 and 6 (i.e., F1) are applicable. In Proposed.v2, the test
quality can be further improved compared to Proposed.v1 if Property 2 (i.e., F2) is applicable.
39
In practice, the latches at a particular level of a pipeline are connected by a scan chain to scan-out
the captured responses while testing the SCUTs that terminate at the latches. In this case, we can
use the same scan chain to scan vectors to test SCUTs that start at these latches. Hence, we assume in
general that the all-scan configuration is available by default at a level of latches where all latches can
be scanned-out. For this reason, we present in Table 1 the experimental results for cases where the
all-scan configuration is assumed to be available at every level of latches.
We have performed an extensive set of experiments for all above circuits (a linear pipelined
multiplier and five- and ten-stage versions of linear pipelines using copies of C17 [9], MIN [9], and
T1 [1]), assuming arbitrary scan chain configurations, and under various time borrowing scenarios to
demonstrate the benefits of the proposed techniques. The complete results for all circuits for these
diverse scan chain configurations can be found in [11]. Here we present a small subset of results that
illustrate the typical trends.
Table 1 shows the test generation results for a five-stage pipelined multiplier for two different
scan chain configurations, namely configuration-A and configuration-B, and six different time
borrowing scenarios, namely S1 to S6. The scan chain configuration-A assumes that only the all-normal
and all-scan configurations are available at every latch (see the third and fourth columns of
Table 1). The scan chain configuration-B assumes that the all-normal, the all-scan, and all the k
single-normal configurations, i.e., the configurations {(n, s, s, ⋅⋅⋅, s), (s, n, s, ⋅⋅⋅, s), (s, s, n, ⋅⋅⋅, s), ⋅⋅⋅, (s,
s, s, ⋅⋅⋅, n)}, are available at every level of k latches (see the fifth and the sixth columns of Table 1).
The time borrowing scenarios are listed starting with scenarios with the fewest time borrowing sites
(S1: no time borrowing) to the most time borrowing sites (S6: time borrowing at every latch).
40
Available scan chain configurations at every level
all-normal, all-scan all-normal, all-scan,
Time borrowing every single-normal
sites Approach
No.of tests Robust PDF
coverage (%) No.of tests Robust PDF
coverage (%)
Classical 640 9.54 640 9.54
ITC03.ext 478 100 478 100
: None Proposed.v1 478 100 478 100
Proposed.v2 478 100 478 100
Classical 640 9.54 640 9.54
ITC03.ext 902 95.35 902 95.35
Proposed.v1
: L12, L14,
L34, L36, L37
Proposed.v2
Classical 640 9.54 640 9.54
ITC03.ext 1459 9.54 1443 36.61
Proposed.v1
: L10, L14,
L24, L28, L33,
L39, L44
Proposed.v2 °
Classical 640 9.54 640 9.54
ITC03.ext 2158 18.65 2158 18.65
Proposed.v1
: L21, L24,
L27, L31, L33,
L36, L42, L44
Proposed.v2 °
Classical 640 9.54 640 9.54
ITC03.ext 1459 9.54 1443 36.61
Proposed.v1
: L14, L15,
L21, L22, L27,
L29, L30, L33,
L37, L44 Proposed.v2 °
Classical 640 9.54 640 9.54
ITC03.ext 1459 9.54 1459 9.54
Proposed.v1 1459 9.54
: All
latches
Proposed.v2 1459 9.54
: Theorems 5 through 6 (i.e., F1) the coverage from ITC03.ext
: Property 2 (i.e., F2) improves the coverage from ITC03.ext
In Table 1 we also report the reasons behind test quality improvements provided by the
proposed approaches Proposed.v1 and Proposed.v2, compared to the test quality of ITC03.ext. Note
that the number of tests may not directly quantify test application times, because the number of stages
constituting each SCUT may vary depending on the configurations used and the time borrowing
scenario.
In all scenarios, except the scenario where time borrowing occurs at none of the latches, the
proposed approach improves the robust PDF coverage significantly while sometimes applying much
fewer tests. For example, in scan chain configuration-A with time borrowing scenario S3, ITC03.ext
requires more than twice the number of tests needed for the classical approach, while obtaining the
same low robust PDF coverage (9.54%). In contrast, the proposed approaches Proposed.v1 and
41
Proposed.v2, can achieve much higher coverages, namely 82.40% and 86.15%, respectively, while
using much fewer tests due to Theorems 5 and 6 (i.e., F1 in Section 3.10.1) and Property 2 (i.e., F2 in
Section 3.10.1).
Even in an extreme case of time borrowing, namely scenario S6, where time borrowing occurs at
every latch, the proposed approach, under scan chain configuration-B, can achieve 100% robust PDF
coverage due to Theorems 5 and 6 (i.e., F1) by using only k+2 scan chain configurations at a level
with k latches, while ITC03.ext is unable to improve the coverage of 9.54%.
Similar improvements are observed in the pipelines using copies of MIN, C17, and T1 as
demonstrated in [11]. Also as expected, it is confirmed that Property 2 improves the robust PDF
coverage in almost all scan chain configurations where the all-scan configuration is not available.
With regard to the effect of pipeline length (the number of stages) on the test quality, our results
for the five- and ten-stage pipelines using MIN, C17, and T1 in [11] illustrate that as the number of
stages increases, the coverage decreases for the classical approach and ITC03.ext, while the coverage
for the proposed approach does not decrease significantly. This shows that the proposed approach is
more efficient for delay testing of pipelines with more stages when compared to the classical and
ITC03.ext approaches.
The experiments demonstrate that our test approach, when applied under restricted scan chain
configurations, does not sacrifice much of the benefits of the fully-adaptive approach while
dramatically decreasing DFT overheads by using much fewer scan chain configurations. In the next
chapter, we propose an approach to minimize test application time.
42
CHAPTER 4
Test application cost minimization under maximum coverage
In Chapter 3, we focused on the optimization of robust PDF coverage and limited ourselves to
approaches where test application starts from the first stage of a pipeline and gradually
expands/moves to subsequent stages. The following example demonstrates the potential benefits of
alternative test schedules that lead to reduction of test application cost. Note that all the assumptions
made in Chapter 3 also apply in this chapter.
Consider a simple two-stage pipeline as shown in Figure 10. Let us assume that scan chain
configurations (n, n), (n, s), (s, n), and (s, s) are available for the level-1 latches (L1, L2). Hence, four
SCUTs may be constructed for testing: SCUT0 (C0 by itself), SCUT10 (C1 by itself; using configuration
(s, s) for level-1 latches), SCUT11 (C0 + C1; using configuration (n, s) for level-1 latches), SCUT12 (C0
+ C1; using configuration (s, n) for level-1 latches). According to Chapter 3, the maximum robust
PDF coverage can be obtained when SCUTs are chosen for testing based on the time borrowing status
at the level-1 latches. For example, according to Corollary 2, SCUT0 and SCUT10 must be tested if
time borrowing does not occur at L1 and L2.
In order to compare different test schedules and their test application costs, each SCUT is
viewed as a collection of multiple sets-of-paths under test (SPUTs). An SPUT is a group of all paths
that start at a particular input latch, pass through a particular set of latches (if any), and terminate at a
particular output latch. The paths in an SPUT are viewed as parts of an SCUT, which determines the
latches used and their scan chain configurations. In other words, an SPUT specifies a group of paths
as well as the scan chain configurations in use. In our notation, additional subscripts are used to
distinguish various SPUTs that constitute an SCUT. To help explanation in this section, the
hyphenated numbers in subscript represent the indices of latches that are included in the SPUT, and
43
the number in parentheses indicates the index of the SCUT that contains the SPUT. For example,
SPUT0-2-3(12) is a group of all paths that start at L0, pass via L2, and terminate at L3, where scan chain
configurations are as specified in SCUT12.
In Figure 10, SCUT0 includes two SPUTs: an SPUT containing all paths that start at L0 and
terminate at L1 (SPUT0-1(0)) and an SPUT containing all paths that start at L0 and terminate at L2
(SPUT0-2(0)). SCUT10 includes two SPUTs: SPUT1-3(10) and SPUT2-3(10). SCUT11 contains three SPUTs:
SPUT0-1-3(11), SPUT1-3(11), and SPUT2-3(11). SCUT12 contains three SPUTs: SPUT0-2-3(12), SPUT1-3(12), and
SPUT2-3(12). All SPUTs of the circuit in Figure 10 are listed in the second column of Table 2.
Note that SCUT11 does not include the group of all paths that start at L0 and terminate at L2
(SPUT0-2(11)). SCUT11 must reconfigure L2 in r-capture scan-out mode in order to capture responses
for the testing of SPUT0-2(11), where L2 is configured in scan mode in SCUT11 to apply values for the
testing of SPUT0-1-3(11), SPUT1-3(11), and SPUT2-3(11). This modification of scan chain configuration is
nothing but converting SCUT11 to SCUT0 as far as the paths between L0 and L2 are concerned. Hence,
we need not test SPUT0-2(11) when SPUT0-2(0) is used. For the same reason, SCUT12 does not include
SPUT0-1(12).
Note that the paths in SPUT1-3(11) and SPUT2-3(12) are tested using Property 1 when time
borrowing does not occur at L1 and L2, respectively. Also, note that the paths that start at L1 and
terminate at L3 can be targeted as parts of more than one SCUT (i.e., as SPUT1-3(10), SPUT1-3(11), and
SPUT1-3(12)), and the paths that start at L2 and terminate at L3 can be targeted as parts of more than one
SCUT (i.e., as SPUT2-3(10), SPUT2-3(11), and SPUT2-3(12)).
Let us assume that the SPUTs can be tested in any order, independent of the SCUTs to which
they belong. Note that the test schedule of the test generation approach described in Chapter 3 is a
special case where all SPUTs associated with each SCUT are tested one after another. The SPUT
formulation is hence test scheduling at a finer level of granularity. We will further justify the use of
SPUTs in Section 4.3.1. The primary focus of this motivation example is to introduce the fundamental
ideas of the overall optimization problem and our proposed approach.
44
For this example of two stage pipeline, assume that the maximum robust PDF coverage is 95%.
The number of tests for each SPUT is as shown in the fourth column of Table 2. In order to evaluate
the average test application cost for different test schedules, the test application cost is assumed to be
proportional to the number of tests applied. (This implicitly ignores costs that may be associated with
reconfiguring SCUTs and assumes that every test configuration requires equal number of scan clocks.
However, the same ideas can be easily extended to more realistic definition of test cost, which will be
discussed in Section 6.1.1.)
Three test schedules are considered in Sections 4.1.1 to 4.1.3, respectively, to prove that the test
application cost may be improved further without compromising the key benefit of the proposed
approach presented in Chapters 2 and 3, namely high path delay fault coverage. Test schedule 1 in
Section 4.1.1 is used to show the limitations of a non-adaptive approach. Test schedule 2 in Section
4.1.2 is based on the test schedule implied in Chapter 3. Test schedule 3 in Section 4.1.3 is used to
show the application cost can be reduced from that of Test schedule 2.
Although the paths that start at L1 and terminate at L3 are included in SPUT1-3(10), SPUT1-3(11),
and SPUT1-3(12), the coverage of the paths between L1 and L3 using SPUT1-3(10) is greater than or equal
to the coverages using SPUT1-3(11) and SPUT1-3(12) as per Theorem 6 and Corollary 1. Likewise, the
coverage of the paths between L2 and L3 for SPUT2-3(10) is greater than or equal to the coverages for
SPUT2-3(11) and SPUT2-3(12). In reality, there may exist cases where applying tests to paths in SPUT2-
3(11) as a part of SCUT11 is better than applying tests to the same paths in the form SPUT2-3(10), i.e., as a
part of SCUT10, provided that the scan chain configurations of the circuit include the one required by
SCUT11 and the cost associated with reconfiguring the circuit as SCUT10 is high, and provided that the
coverages from SPUT2-3(11) and SPUT2-3(10) are identical. However, since the test application cost
45
function is simplified in this chapter such that it is solely determined by the number of tests applied, it
is assumed that test schedules prefer SPUT1-3(10) to both SPUT1-3(11) and SPUT1-3(12), and prefer SPUT2-
3(10) to both SPUT2-3(11) and SPUT2-3(12).
Test schedules are applied to a set of chip instances characterized by Table 2, where the
personality of each chip instance (chip personality) is characterized by the results by r-r tests for the
SPUTs. (For simplicity, r-f tests are not included in this example.) It is assumed for simplicity that
every chip instance has one of the nine chip personalities, P-1 to P-9, with the different characteristics
as shown in Table 2, and the percentages of chip instances that have each of these chip personalities
are shown in the third row of Table 2. Although there are ten SPUTs, each of which either fails or
passes r-r tests, not all 210 combinations of the results for r-r tests are possible due to dependencies
among SPUTs. For example, if both SPUT0-1(0) and SPUT1-3(10) pass r-r tests, SPUT0-1-3(11) is
guaranteed to pass r-r tests. If SPUT0-1(0) passes r-r tests and SPUT0-1-3(11) fails r-r tests, then SPUT1-
3(10) is guaranteed to fail r-r tests.
Chip personalities: Characteristics of chip instances based on the r-r test results
and their distribution
SCUT SPUTA(B)** Latches P-1 P-2 P-3 P-4 P-5 P-6 P-7 P-8 P-9
No.
of
tests
45%* 10%* 18%* 11%* 9%* 3%* 2%* 1%* 1%*
SCUT0 SPUT0-1(0) L0-L1 4 Pass Pass Pass Pass Pass
(C0) SPUT0-2(0) L0-L2 6 Pass Pass Pass Pass Pass
SCUT10 SPUT1-3(10) L1-L3 6 Pass Pass Pass Pass Pass Pass Pass Pass
(C1);(s,s) SPUT2-3(10) L2-L3 8 Pass Pass Pass Pass Pass Pass Pass Pass
SPUT0-1-3(11) L0-L1-L3 12 Pass Pass Pass Pass Pass Pass
SPUT1-3(11) L1-L3 6 Pass Pass Pass Pass Pass Pass Pass Pass
SCUT11
(C0+C1)
; (n,s) SPUT2-3(11) L2-L3 8 Pass Pass Pass Pass Pass Pass Pass Pass
SPUT0-2-3(12) L0-L2-L3 24 Pass Pass Pass Pass Pass Pass
SPUT1-3(12) L1-L3 6 Pass Pass Pass Pass Pass Pass Pass Pass
SCUT12
(C0+C1)
; (s,n) SPUT2-3(12) L2-L3 8 Pass Pass Pass Pass Pass Pass Pass Pass
Existence of a fault fault-free faulty faulty fault-free faulty fault-free faulty fault-free faulty
Maximum coverage 95% · · 95% · 95% · 95% ·
Time borrowing site None L1 only L2 only L1 and L2
*: The percentage of chip instances with the particular chip personality.
**: A lists the indices of latches included in SPUTA(B), B is the index of SCUTB that covers SPUTA(B).
46
In this section, we assume that chip personality distribution is available and use it to compare
the efficiencies of different test schedules. More details on how to obtain the chip personality
distribution are presented in Section 6.1.2 and Appendix.
According to Table 2, the chip instances that belong to P-1, P-4, P-6, and P-8 are fault-free. The
chip instances that belong to P-1, P-2, and P-3 have no time borrowing at the level-1 latches, the chip
instances that belong to P-4 and P-5 have time borrowing only at L1, the chip instances that belong to
P-6 and P-7 have time borrowing only at L2, and the chip instances that belong to P-8 and P-9 have
time borrowing at both L1 and L2.
Suppose that a test engineer designed Test schedule 1 with the expectation that time borrowing
would occur at L1 only and not at L2, where tests are applied in the non-adaptive order specified in
Figure 11.
The test procedure and results are summarized as follows. Note that Nk refers to the number of
tests for step k and Rk refers to the percentage of chips that are tested in step k (e.g., N2 and R2 for T2).
First, the tests for SPUT0-2(0) are applied to all chip instances (R1 = 100%, N1 = 6), where the chips
instances in P-6, P-7, P-8, and P-9 fail these r-r tests, i.e., time borrowing is detected at L2 for P-6, P-7,
P-8, and P-9. After that, the tests for SPUT2-3(10) are applied to all chip instances (R2 = 100%, N2 = 8),
where the chips in P-3 (18%) fail r-r tests, i.e., the chip instances in P-3 (18%) are identified as faulty
chips and discarded. Then, the tests for SPUT0-1-3(11) are applied to the remaining 82% of the chip
instances (R3 = 82%, N3 = 12), where the chips in P-2, P-5, and P-9 (20%) are identified as faulty
chips and discarded.
47
The overall average test application cost per chip is 23.84 (= Σ =
3
k 1NkRk ). This test schedule
reports the robust PDF coverage of 95% for the chips in P-1, P-4, P-6, P-7, and P-8. However, the
reported coverage is correct only for chip instances of types P-1 and P-4. Since Test schedule 1 is
unable to adaptively change the subsequent SPUTs, it fails to test the multi-segment paths via L2 for
P-6 and P-8 although time borrowing is detected by SPUT0-2(0), and hence the robust delay coverage
reported for P-6 and P-8 is invalid. Moreover, Test schedule 1 fails to identify the faulty chips of type
P-7, i.e., results in test escape.
In summary, the above results indicate that it is necessary to design a test schedule such that it
does not allow any test escape and it achieves the maximum robust PDF coverage for each chip
instance. Also, the test scheduling must be capable of adjusting the subsequent SPUTs adaptively
such that all TBLs are identified and all multi-segment paths via TBLs are tested. This can be done by
testing SPUT0-2-3(12) for the chips that fail SPUT0-2(0) (P-6, P-7, P-8, and P-9).
Test schedule 2 follows what we propose in Chapter 3, where tests are performed from the first
stage of a pipeline and extended to include subsequent stages by constructing SCUTs adaptively
based on time borrowing sites identified during each SCUT testing. Hence, all SPUTs within SCUT0
are targeted first. Based on the level-1 latches identified as sites of time borrowing, multi-segment
paths and/or single-segment paths in C1 are targeted adaptively. This schedule is summarized in
Figure 12.
From T1 – T2, time borrowing sites are all identified. Accordingly, P-1, P-2, and P-3 continue
with T3, P-4 and P-5 with T3’, P-6 and P-7 with T3”, and P-8 and P-9 with T3”’. At T3, the chip
instances in P-2 (10%) are identified as faulty and discarded. At T4, the chip instances in P-3 (18%)
are identified as faulty and discarded. At T4’, the chip instances in P-5 (9%) are identified as faulty
and discarded. At T4”, the chip instances in P-7 (2%) are identified as faulty and discarded. At T3”’,
48
the chip instances in P-9 (1%) are identified as faulty and discarded. The average value of total test
application cost is computed as follows:
Overall average test application cost = 25.4.
2% 1%
5%
20%
73% 63%
100%
3"' 4"'
4
3
"
4
3
'
3 4
2
1
=
+ × + ×
+ ×
+ ×
× + ×
× +
Σ
Σ
Σ
=
=
=
N N
N
N
N N
N
k
k
k
k
k
k
As described in Chapter 3, all faulty chip instances are identified, and all fault-free chip
instances are tested robustly with the corresponding maximum coverage since all SCUTs that are
required to achieve the maximum coverage defined by Chapter 3 are tested by Test schedule 2.
Another interesting test schedule is designed to demonstrate that Test schedule 2, which is based
on Chapter 3, can be improved in terms of test application cost without compromising the robust PDF
coverage. Test schedule 3 is shown in Figure 13.
Whenever an SPUT that terminates at a level-1 latch is tested, the time borrowing status at the
output latch is checked and multiple alternatives for subsequent testing are explored as shown in
Figure 13. In this test schedule, all fault-free chip instances are tested robustly with the maximum
49
coverage since all SCUTs that are required to achieve the maximum coverage defined by Chapter 3
are tested in Test schedule 3. The overall average test application cost is 22.68 using similar
calculations as in Section 4.1.2. This shows 10.7% improvement in test application cost compared to
the cost for Test schedule 2, while identical robust PDF coverage is obtained for each chip personality.
Thus, this example illustrates that we can further improve the test application cost via optimal test
scheduling.
Thus this example illustrates that overall test application cost depends not only on the test
application costs of SPUTs used in test schedule but also on the probabilities of time borrowing at
latches. Hence, complexity of test scheduling problem grows with the number of logic blocks and the
number of latches.
In order to accomplish the overall optimization of DFT design as testing for latch-based high-speed
circuits with time borrowing, the minimization of test application cost must be achieved under
the constraint that the test scheduling method guarantees the maximum delay fault coverage that can
50
be obtained by the proposed delay testing method in Chapter 3. Next we present a systematic
approach for the overall optimization problem under this constraint.
The test application cost minimization problem is similar to the classical test scheduling (or test
scoring) problem [20][24][31], to the extent that the order in which test vectors are applied is
important to reduce the expected value of test application cost. However, as we explain in this section,
our test application cost problem has some unique characteristics that make it more challenging than
the classical test scheduling problem. In particular, passing/failing test results have different
implications and benefits that depend on the time borrowing status at the input latch, the type of
output latch (whether it is a primary output), and dependencies among tests. Furthermore, we must
apply tests adaptively according to time borrowing status identified during testing.
In the test approach proposed in Chapter 3 and the motivation example in Section 4.1, for
simplicity of analysis it is assumed that a chip under test has either skipped r-f tests (since extreme
time borrowing was not expected) or passed all r-f tests applied (since extreme time borrowing does
not exist). However, in some chips, r-f tests may be used to identify faulty chip instances before
applying many r-r tests, which can reduce the test application cost. On the other hand, applying all r-f
tests for every SPUT may be impractically costly. Our preliminary ideas on how to incorporate r-f
tests as well as r-r tests into the overall optimization problem are considered in Section 4.2.1, where
we identify the meanings of test failure. The benefits of r-f tests as well as r-r tests are discussed in
Section 4.2.2. However, for simplicity, the rest of the problem formulation is carried out without
considering r-f tests. See Section 6.1.3 for more details regarding the extension that considers r-f tests
as well as r-r tests.
In conventional delay testing, a failing test simply identifies a faulty chip at that particular clock
frequency, which can be discarded immediately without any further testing. In contrast, in our
51
framework, failing a test does not necessarily mean that the chip under test is faulty. In addition, some
tests are dependent on other tests. For instance, even when r-r tests for a target path pass, the latch at
the end of the path (output latch) may possibly be a site of time borrowing in case the latch at the
beginning of the path (input latch) is a TBL. In other words, passing r-r tests can definitively identify
the time borrowing status at the output latch only if the input latch is identified as a NTBL. Table 3
summarizes the meanings of all possible results of r-r and r-f tests for a target SPUT that starts at Lin
(input latch) and ends at Lout (output latch), under different conditions on Lin and Lout. Note that Lin is
considered as a NTBL if it is a primary input.
Results
Case r-r
tests
r-f
tests
Is Lout a
primary
output?
Time
borrowing at
Lin?
Time borrowing at
Lout Meaning
1 (Fail) Fail No Yes/No
2 Fail n/a* Yes Yes/No Yes Faulty chip instance
3 Fail Pass No Yes/No Yes Multi-segment paths via Lout must be
tested
4 Pass (Pass) No No No for the target
SPUT Target SPUT does not borrow time at Lout
5 Pass n/a* Yes No No for the target
SPUT Target SPUT is fault-free
6 Pass (Pass) No Yes
7 Pass n/a* Yes Yes Unknown Cannot determine time borrowing status
at Lout since Lin is a time borrowing latch
( ): test result automatically known from the other test result.
*: r-f tests are not applicable (n/a) since Lout is a primary output.
If Lout for an SPUT is a primary output for the latch-based part (Lout may be connected to a flip-flop-
based blocks), r-f tests are not necessary and hence only r-r tests are applicable (Cases 2, 5, and
7). If r-r tests for an SPUT pass, the r-f tests for the SPUT are known to pass. On the other hand, if r-f
tests for an SPUT fail, r-r tests for the SPUT are known to fail. This dependency between r-r tests and
r-f tests for any SPUT is denoted in Table 3 using parentheses. Table 3 includes all possible cases of
two test results as well as the conditions of Lin and Lout, except for the cases where r-f tests for an
SPUT fail when r-r tests for the SPUT pass. This case is impossible because r-r tests for an SPUT
always fail if the corresponding r-f tests fail.
52
If r-f tests for an SPUT fail, the chip instance is identified as faulty regardless of time borrowing
status at Lin (Case 1). When Lout is a primary output, failing r-r tests for the SPUT indicates that the
chip instance is faulty, regardless of the time borrowing status at Lin (Case 2). In Case 3 where r-r
tests for an SPUT fail and r-f tests pass, Lout, which is not a primary output, is identified as a site of
time borrowing regardless of the time borrowing status at Lin, and hence the multi-segment paths via
Lout must be tested. In Cases 4 and 5, passing r-r tests imply that the target SPUT does not borrow
time at Lout (Case 4) and the SPUT is fault-free (Case 5), respectively, because Lin is known as being a
NTBL. In contrast, in Cases 6 and 7, passing r-r tests cannot determine the time borrowing status at
Lout, since Lin is a TBL. Hence, we see that passing r-r tests can determine the time borrowing status at
Lout only if Lin is an NTBL.
It should be noted that a test result for an SPUT may have dependencies with test results for
other SPUTs. For instance, in Cases 4 and 5, the test result for the current SPUT can be analyzed only
if the time borrowing status at the input latch Lin is known as being an NTBL. In addition, in Case 3,
since Lout turns out to be a TBL, it is necessary to test multi-segment SPUTs that pass via Lout. Hence,
another important characteristic of the optimization problem is that it often requires adaptation in the
sense that the selection of subsequent target paths (i.e., SPUTs) is affected by the latches identified as
sites of time borrowing by the previous tests. As we have seen in Section 4.1.1, non-adaptive
operation may result in test escape and/or over- or under-estimation of robust PDF coverage. More
details of the characteristics and formulation of the overall optimization problem will be presented in
Sections 4.3 to 4.5.
When a test is applied to an SPUT, say SPUTk, we have an accumulated record of the test results
from the SPUTs that have been tested prior to SPUTk. Depending on this record, the time borrowing
status at the input latch, Lin, and the output latch, Lout, of SPUTk may or may not be known based on
the meanings of test results summarized in Table 3. In other words, the time borrowing status at Lin
53
prior to testing SPUTk can fall into one of the following three cases: (a) unknown (time borrowing
status is unknown), (b) non-time borrowing, or (c) time borrowing. Similarly, the time borrowing
status at Lout prior to testing SPUTk is one of the above three cases, i.e., (a), (b), or (c), if Lout is not a
primary output, or one of the two cases, namely (a) and (b), if Lout is a primary output. (Note that
when Lout is a primary output the chip under test will be discarded if Lout is identified as a TBL, i.e., if
any r-r test fails at Lout.)
Application of a test to an SPUT provides different benefits depending on the accumulated
record of the results of prior tests. First, suppose r-r tests are applied to SPUTk. If the time borrowing
status at Lout is already known as being either time borrowing or non-time borrowing prior to testing
SPUTk, applying r-r tests to SPUTk does not provide any benefit in terms of fault coverage,
knowledge of time borrowing status, or information used by subsequent tests of other SPUTs,
regardless of the time borrowing status at Lin. Also, when the time borrowing status at Lout is unknown
and Lin is known as being time borrowing, passing of r-r tests for SPUTk at Lout provides no benefit
(Cases 6 and 7 of Table 3). On the other hand, in a case where SPUTk fails r-r tests under the
condition that the time borrowing status at Lout is unknown, the benefit when Lout is not a primary
output is that Lout is identified as a TBL (Case 3 of Table 3), and the benefit when Lout is a primary
output is that the chip instance is identified as having a delay fault and is discarded (Case 2 of Table
3). In case where SPUTk passes r-r tests under the condition that the time borrowing status at Lout is
unknown and the time borrowing status at Lin is known as being non-time borrowing (Cases 4 and 5
of Table 3), the benefit of r-r tests of SPUTk is that this result may enhance the coverage if SPUTk and
other SPUTs terminating at Lout collectively and eventually identify Lout as being a NTBL. However,
if some other SPUT that terminates at Lout fails, this test result of SPUTk will not be used to enhance
the coverage, since Lout is eventually identified as being a TBL.
Second, suppose r-f tests are applied to SPUTk, whose output latch Lout is not a primary output.
There is no benefit of such r-f tests if Lout is known as being a NTBL since these tests will always pass.
Only when SPUTk fails r-f tests and the time borrowing status at Lout is either unknown or time-
54
borrowing, r-f tests provide the benefit that the chip instance is identified as being faulty and
discarded, regardless of the time borrowing status at Lin (Case 1 of Table 3).
The benefits of r-r tests and r-f tests are summarized in Table 4 and Table 5, respectively.
Time borrowing status known from
Case previous test results
Lin Lout
Results for
r-r tests applied Benefit
1
Time borrowing/
non-time borrowing/
status unknown
Time borrowing/
non-time borrowing Pass/fail None
2 Time borrowing Unknown Pass None
3 Non-time borrowing Unknown Pass
May enhance coverage if this result and other
results of SPUTs terminating at Lout
collectively identify Lout as being a non-time
borrowing latch.
4
Time borrowing/
non-time borrowing/
status unknown
Unknown Fail
1. Time borrowing detected (if Lout is not a
primary output)
2. Fault detected (if Lout is a primary output)
Time borrowing status known from
Case previous test results
Lin Lout
Results for
r-f tests applied Benefit
1
Time borrowing/
non-time borrowing/
status unknown
Non-time borrowing (Always pass) None
2 Pass None
3
Time borrowing/
non-time borrowing/
status unknown
Time borrowing/
status unknown Fail Fault detected
In summary, our problem is significantly more complex than the conventional test scheduling
problem, since passing or failing r-r tests for an SPUT have different meanings depending on the time
borrowing status at the input latch, the type of output latch (whether it is a primary output of a latch-based
part of a circuit or not), and results of r-r tests for other SPUTs (i.e., dependencies among
SPUTs). Scheduling r-f tests is somewhat similar to the conventional test scheduling problem in the
sense that a faulty chip is identified and discarded if r-f tests for an SPUT fail at any stage of testing.
55
However, passing r-f tests for an SPUT do not provide any coverage for the SPUT, which makes even
r-f tests different from conventional testing.
In Section 4.1, the notion of SPUT (set-of-paths under test) is first introduced to replace SCUT.
Recall that an SPUT is the group of all paths that start at a particular input latch Lin, pass via a
particular sequence of latches (if any), and terminate at a particular output latch Lout. Hence, each
SCUT is viewed as a collection of multiple SPUTs. We call this finer-grained approach an SPUT-based
approach. Test scheduling at such a finer granularity (i.e., SPUT) can reduce the overall test
application cost. For example, suppose there are eight SPUTs terminating at a latch L. If a majority of
the chips under test borrow time at L and this can be detected by testing one particular SPUT, then the
test application cost may be reduced by testing this particular SPUT first.
Test scheduling may be performed under even finer granularity (i.e., path) than SPUT. Such a
path-based approach must be supported by diagnostic methods to identify the paths(s) that causes the
observed test failure. Accordingly, Observations 1, 2, and 3 regarding identification of TBLs and
NTBLs in Section 3.2.3 must be modified as follows in the context of paths in order to implement a
path-based approach.
In a path-based approach, a latch L is identified as a TBL for the path(s) that causes failure of
tests at L (Modified Observation 2 for a path-based approach). If a latch L is identified as a TBL for a
path p, all multi-segment paths that pass via L and cover p must be targeted (Modified Observation 3
for a path-based approach). L is identified as a NTBL for all other paths that do not cause failure of
tests at L (Modified Observation 1 for a path-based approach).
In other words, L can be regarded either as a NTBL or as a TBL depending on the path under
consideration. This is likely to reduce the number of target multi-segment paths. However, it also
implies that all paths in the fan-in of L must be tested even after a test fails for path p in order to
56
identify other paths in the fan-in of L that pass. This is likely to increase the number of tests
especially when most paths in the fan-in of L fail the tests. Recall that in an SPUT-based (or SCUT-based)
approach, a latch is regarded as a TBL as long as one path fails at the latch, and no more test is
needed for the paths that terminate at the latch. In consequence, a path-based approach does not
necessarily guarantee reduction of test application cost.
Implementing a path-based approach entails so much complication in the test generation and the
test procedure. For every test, a path-based approach must be supported by diagnostic methods that
identify the paths that cause the observed test failure. Such diagnosis would add impractically high
run-time
Click tabs to swap between content that is broken into logical sections.
| Title | Structural delay testing of latch-based high-speed circuits with time borrowing |
| Author | Chung, Kun Young |
| Author email | kun.chung@gmail.com; kunchung@poisson.usc.edu |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Electrical Engineering |
| School | Viterbi School of Engineering |
| Date defended/completed | 2008-04-25 |
| Date submitted | 2008 |
| Restricted until | Unrestricted |
| Date published | 2008-08-03 |
| Advisor (committee chair) | Gupta, Sandeep K. |
| Advisor (committee member) |
Pedram, Massoud Medvidovic, Nenad |
| Abstract | Latch-based circuits are used in full custom designed high-speed chips, especially to implement some delay critical parts due to two benefits: higher performance and higher yield at desired performance. However, the unavailability of a delay test methodology that provides sufficiently high coverage has hindered their widespread use.; In this dissertation, we show that the conventional delay testing approaches cannot be used for delay testing of latch-based circuits with time borrowing, and show that it is necessary to use design-for-test (DFT). We first focus on maximizing path delay fault coverage and propose the first path delay testing approach and the associated DFT for such circuits. We prove that our latch-based delay testing approach provides the theoretical maximum coverage (for any scan-based approach). We also prove that this coverage is always greater than (or equal to) that for the latch-based circuit’s flip-flop-based counterpart. Secondly, we focus on minimizing test application cost for delay testing latch-based circuits under the constraint that maximum coverage is achieved. We show that conventional test scheduling methods may not be applicable due to the unique characteristics of latch-based circuits with time borrowing. We then formulate the minimization problem and propose a deterministic and two heuristic approaches for test scheduling of such circuits.; The experimental results show that, for many example circuits, the proposed approaches achieve dramatically higher coverage of path delay faults compared to classical approach, and achieve test application costs that are within 5% of the corresponding lower-bounds.; We then compare high-speed latch-based circuits with their flip-flop-based counterparts from the viewpoint of path delay testing and present design guidelines for latch-based circuits that guarantee that latch-based circuits also achieve higher yield and higher performance than their flip-flop-based counterparts. |
| Keyword | dely testing; time borrowing; DFT; ATPG; test scheduling; latch-based circuit |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Provenance | Electronically uploaded by the author |
| Type | texts |
| Legacy record ID | usctheses-m1527 |
| Rights | Chung, Kun Young |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Full text | STRUCTURAL DELAY TESTING OF LATCH-BASED HIGH-SPEED CIRCUITS WITH TIME BORROWING by Kun Young Chung A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2008 Copyright 2008 Kun Young Chung ii List of Tables........................................................................................................................................iv List of Figures .......................................................................................................................................v Abstract ................................................................................................................................................vi CHAPTER 1. Introduction .....................................................................................................................1 1.1. Operation of latch-based circuits ................................................................................................2 1.2. Reference times and nominal delays...........................................................................................3 1.3. Types of time borrowing and their advantages...........................................................................3 1.4. Design choice between flip-flop-based and latch-based circuits ................................................4 1.5. Testing of flip-flop-based versus latch-based circuits ................................................................5 CHAPTER 2. Background – Challenges in delay testing of latch-based circuits ..................................9 2.1. Key challenges in delay testing of latch-based circuits with time borrowing.............................9 2.2. Benefits of scan design-for-testability ......................................................................................10 2.3. The overall optimization problem.............................................................................................11 CHAPTER 3. A new delay testing approach .......................................................................................13 3.1. Basic assumptions....................................................................................................................13 3.2. Key ideas behind the proposed approach .................................................................................14 3.2.1. r-r test: A set of sufficient conditions on block delays......................................................14 3.2.2. r-f test: A set of necessary conditions on block delays .....................................................15 3.2.3. Time borrowing detection.................................................................................................16 3.2.4. Basic delay testing strategies ............................................................................................17 3.3. Test of the first logic block (SCUT0) ........................................................................................18 3.4. Configuring on-path latches .....................................................................................................19 3.4.1. On-path latch with time borrowing...................................................................................20 3.4.2. On-path latch with no time borrowing..............................................................................20 3.5. Configuring off-path latches.....................................................................................................22 3.6. A set of scan chain configurations necessary to maximize coverage .......................................27 3.7. Required tests for maximum coverage .....................................................................................28 3.8. Test generation under the optimal set of scan chain configurations .........................................29 3.8.1. Theoretical maximum delay fault coverage for latch-based circuits.................................29 3.8.2. Proposed test generation approach....................................................................................32 3.9. Test generation under limited scan chain configurations..........................................................33 3.9.1. Available scan chain configurations .................................................................................33 3.9.2. Test generation strategy ....................................................................................................34 3.9.3. Proposed test generation approach....................................................................................36 3.10. Experimental results and comparison .....................................................................................37 3.10.1. Test generation approaches.............................................................................................38 3.10.2. Trends in the experimental results ..................................................................................39 CHAPTER 4. Test application cost minimization under maximum coverage .....................................42 4.1. Motivation example ..................................................................................................................42 4.1.1. Test schedule 1..................................................................................................................46 4.1.2. Test schedule 2..................................................................................................................47 4.1.3. Test schedule 3..................................................................................................................48 4.1.4. The overall optimization problem.....................................................................................49 4.2. Unique characteristics of the optimization problem .................................................................50 iii 4.2.1. Meaning of test results – Dependencies among SPUTs....................................................50 4.2.2. Benefits of r-r tests and r-f tests ........................................................................................52 4.3. Framework for test scheduling to minimize test application cost.............................................55 4.3.1. An SPUT-based approach .................................................................................................55 4.3.2. Search space......................................................................................................................56 4.4. Building a search tree ...............................................................................................................58 4.4.1. Reduction rules .................................................................................................................58 4.4.2. Covering test sequences in a test schedule........................................................................60 4.5. A deterministic optimization approach .....................................................................................61 4.6. The complexity of the optimization problem............................................................................64 4.7. Proposed heuristic approaches..................................................................................................65 4.7.1. Key ideas and overview of the proposed heuristics ..........................................................65 4.7.2. Heuristic 1 (H1): Relative benefit function.......................................................................67 4.7.3. Heuristic 2 (H2): Near-lower-bound function ..................................................................70 4.7.4. Experimental results..........................................................................................................72 4.7.5. Analysis of the heuristic approach ....................................................................................76 CHAPTER 5. Flip-flop-based v.s. latch-based designs........................................................................79 5.1. Flip-flop-based counterpart of latch-based circuit....................................................................79 5.2. Performance comparison ..........................................................................................................81 5.3. Yield comparison.....................................................................................................................85 5.4. Delay fault coverage comparison .............................................................................................89 5.5. A summary of comparison results ............................................................................................89 CHAPTER 6. Future research tasks .....................................................................................................91 6.1. The overall test optimization ....................................................................................................91 6.1.1. Realistic test application cost ............................................................................................92 6.1.2. Chip personality distribution.............................................................................................93 6.1.3. Inclusion of r-f tests ..........................................................................................................94 6.2. Other delay testing approaches .................................................................................................94 6.3. Scan DFT design and control ...................................................................................................95 CHAPTER 7. Conclusion.....................................................................................................................97 References .........................................................................................................................................101 Appendix: Chip personality distribution from statistical timing information.....................................104 iv Table 1. Five-stage pipeline of array multiplier. ..................................................................................40 Table 2. The characteristics of chip instances under test based on the results for r-r tests...................45 Table 3. The implication of test results for a target SPUT from Lin to Lout...........................................51 Table 4. A summary of benefit of r-r tests............................................................................................54 Table 5. A summary of benefit of r-f tests............................................................................................54 Table 6. A chip personality distribution for Figure 17. ........................................................................69 Table 7. Relative benefit function example..........................................................................................69 Table 8. Design of chip personality distribution. .................................................................................73 Table 9. Test application cost comparisons of proposed approaches. ..................................................74 Table 10. Contributions to overhead of H1 and H2. ............................................................................75 v Figure 1. An example latch-based linear pipeline. .................................................................................1 Figure 2. Additional components for scan flip-flop and scan latch........................................................6 Figure 3. A three-stage linear pipeline. ................................................................................................18 Figure 4. A two-stage linear pipeline. ..................................................................................................19 Figure 5. Relationships among scan chain configurations. ..................................................................25 Figure 6. Property 2 helps improve robust coverage............................................................................26 Figure 7. A robust test for a multi-segment path can be obtained by combining the robust tests of its single-segment subpaths................................................................................................31 Figure 8. Proposed ATPG algorithm....................................................................................................35 Figure 9. Test procedure – managing multiple SCUTs.........................................................................37 Figure 10. A two-stage pipeline example.............................................................................................44 Figure 11. Test schedule 1: Average cost = 23.84................................................................................46 Figure 12. Test schedule 2: Average cost = 25.4..................................................................................48 Figure 13. Test schedule 3: Average cost = 22.68................................................................................49 Figure 14. A generic search tree for optimal test scheduling. ..............................................................57 Figure 15. An example of the cost function computation. ...................................................................63 Figure 16. Overall approach that uses proposed heuristics. .................................................................66 Figure 17. Test scheduling illustration. ................................................................................................69 Figure 18. Test application cost comparisons of proposed approaches................................................74 Figure 19. Percentage over the lower-bound........................................................................................75 Figure 20. Design and test schemes for high-speed circuit. .................................................................80 Figure 21. Timing requirements for the two designs............................................................................81 Figure 22. Yield comparison (T = TFF).................................................................................................87 Figure 23. Statistical delay distribution across a latch and the probability of time borrowing. .........105 Figure 24. An example reconvergence fan-out. .................................................................................107 vi Latch-based circuits are used in full custom designed high-speed chips, especially to implement some delay critical parts due to two benefits: higher performance and higher yield at desired performance. However, the unavailability of a delay test methodology that provides sufficiently high coverage has hindered their widespread use. In this dissertation, we show that the conventional delay testing approaches cannot be used for delay testing of latch-based circuits with time borrowing, and show that it is necessary to use design-for- test (DFT). We first focus on maximizing path delay fault coverage and propose the first path delay testing approach and the associated DFT for such circuits. We prove that our latch-based delay testing approach provides the theoretical maximum coverage (for any scan-based approach). We also prove that this coverage is always greater than (or equal to) that for the latch-based circuit’s flip-flop-based counterpart. Secondly, we focus on minimizing test application cost for delay testing latch-based circuits under the constraint that maximum coverage is achieved. We show that conventional test scheduling methods may not be applicable due to the unique characteristics of latch-based circuits with time borrowing. We then formulate the minimization problem and propose a deterministic and two heuristic approaches for test scheduling of such circuits. The experimental results show that, for many example circuits, the proposed approaches achieve dramatically higher coverage of path delay faults compared to classical approach, and achieve test application costs that are within 5% of the corresponding lower-bounds We then compare high-speed latch-based circuits with their flip-flop-based counterparts from the viewpoint of path delay testing and present design guidelines for latch-based circuits that guarantee that latch-based circuits also achieve higher yield and higher performance than their flip-flop- based counterparts. 1 CHAPTER 1 Introduction Pipelining of combinational logic is widely used in many parts of a circuit to improve performance, where either flip-flops or latches are used. One of the advantages of flip-flop-based pipelines over latch-based pipelines is that flip-flop-based pipelines are relatively easier to design and supported by CAD tools. To ensure correct propagation of signals across flip-flop-based pipelines, the delay of each stage (or logic block) must be smaller than the clock period. This timing requirement is becoming increasingly difficult to satisfy especially in high-speed parts of circuits as technology advances, since timing is becoming more significantly affected by process variations and/or defects in fabrication. Hence, latch-based pipelining is used in many high-speed custom-designed circuits, since it enhances performance and improves yield via or that relaxes the timing requirement for each logic block. When time borrowing occurs, a block may take longer time than its nominal allocation before completing its computation and providing the result to the next block. L1 φ φ C0 φ φ stable t1 t2 t3 … … … t4 t3 - t2 L6 L5 L3 L2 L4 L9 L8 L7 φ C1 … … … Output of C0 2 A simplified latch-based high-speed pipeline can be modeled as shown in Figure 1, in which Ci’s are combinational logic blocks and Lj’s are latches. Every latch is assumed to be a positive D-latch, i.e., it becomes transparent when the corresponding clock is high. For simplicity of description, we describe the approach assuming that complementary clocks are used. However, our approach is applicable to any type of clocks, including two-phase non-overlapping, four-phase non-overlapping, and four-phase overlapping [17]. To simplify the discussion, we assume that latches are ideal, i.e., all their delays as well as their setup and hold times are zero. However, our approach for test development inherently takes into account the actual setup time, hold time, and clock-to-Q delay of every latch. The characteristics of real latches are also explicitly considered during the detailed design of design-for-testability (DFT) circuitry. Let us assume that C0 is the first combinational logic block of a high-speed pipeline. We consider the inputs to the latches driving C0 as primary inputs and assume that a new combination of values is applied at the inputs of block C0 at the rising edge of its driving clock, i.e., the clock controlling the latches at its inputs. In Figure 1, this is the rising edge of clock φ at time t1. The values at the outputs of C0 must stabilize some time before the subsequent falling edge of the block’s receiving clock, i.e., the clock controlling the latches at its outputs. For C0 in Figure 1, this is the subsequent falling edge of clock φ at time t4. If the values at outputs of C0 do not stabilize by this time, correct values cannot propagate via latches at the outputs of the block to the inputs of the next block, C1, and a delay fault exists at the given clock frequency. On the other hand, if the values at the outputs of C0 stabilize before the subsequent rising edge of its receiving clock, i.e., the subsequent rising edge of φ at t2, then any change in values is passed to the inputs of the next block, C1, only after the rising edge of the block’s receiving clock, φ . If the values at the outputs of C0 stabilize after the subsequent rising edge of its receiving clock, φ , but before the clock’s falling edge, then any change in values passes immediately via the latches, and thus to the inputs of the next block. Hence, a 3 new combination of values may be applied at the inputs of C1 as early as the subsequent rising edge of its driving clock, φ , or as late as this clock’s subsequent falling edge. The values at the outputs of block C1 must stabilize some time before the subsequent falling edge of its receiving clock, φ, and so on. Hence, unlike in a flip-flop-based circuit, in a correctly functioning circuit the values at the inputs of a block in a latch-based circuit may not be applied at a specific time. Nor may the corresponding response values need to become available at its outputs at a specific time. This allows a block to borrow time from others in its fan-in/fan-out. Without any loss of generality, we define a reference time as the earliest time at which new values may be applied at the inputs of a block, namely the rising edge of the block’s driving clock. The corresponding reference time for the responses at the outputs of a block is the rising edge of the block’s receiving clock. In general, the reference times at the inputs and the outputs of a block are the clock edges at which the latches at its inputs and outputs, respectively, become transparent. The nominal delay for a block is defined as the time difference between the reference times at its inputs and outputs. For each block in Figure 1, the nominal delay is half of the clock period, i.e., T/2, where T is the period of the clock. If transitions at inputs and outputs of each block of a circuit satisfy these reference times, the circuit will operate at the desired clock frequency. Such a circuit can be viewed as a nominal circuit where no time is being borrowed by any block. Now consider a scenario where the values at the inputs of C1 do not arrive before t2, the rising edge of φ , and arrive at t3 as shown in Figure 1. In this case, we say that C0 is borrowing time from C1. The time duration t3 – t2 in Figure 1 denotes the amount of time borrowed. Similarly, if the outputs of C1 do not stabilize before the rising edge of φ (i.e., t4) but do so shortly thereafter, then C1 is said 4 to be borrowing time from the block in its fan-out. For our definition of the reference times, C1 may borrow time from blocks in its fan-out to accommodate its own large delay and/or to compensate for the time it lent to C0. Time borrowing may be intentional in the sense that it may be planned during the design of a circuit. Even when time borrowing is not planned during the design of a circuit, it may occur unintentionally if variations and/or defects during fabrication cause such borrowing in some fabricated copies of the circuit. Note that even when time borrowing occurs, unintentionally or intentionally, the circuit is fault-free at given clock frequency provided that the values at outputs of every time borrowing logic block stabilize before the subsequent falling edge of the block’s receiving clock. Hence, latch-based circuits can enhance performance (i.e., increase clock frequency) by enabling time borrowing, and improve yield via unintentional time borrowing. Flip-flop-based pipelines are easy to design and verify using an extensive set of available tools for synthesis and verification. Hence, most ASIC designers prefer flip-flop-based pipelines. On the other hand, latch-based pipelines are more difficult and challenging to design and verify because ensuring correct timing behavior is more difficult and tool support is limited [8]. However, latch-based pipelines are used in full custom designed high-speed chips, especially in some of their delay critical parts, due to abovementioned major benefits, namely, higher performance and higher yield at desired performance [8]. In a latch-based pipeline, if time borrowing is intentionally planned during design (intentional time borrowing), this enables design of latch-based pipelines that operate at higher clock frequency, because it is not necessary to carefully balance delays of combinational logic blocks to increase clock frequency, and because latches are immune to clock skew to some degree. On the other hand, flip-flop- based pipelines must carefully balance delays of logic blocks in order to achieve high performance, since flip-flops present hard time boundaries between pipeline stages where no time borrowing is permitted. Furthermore, clock skew must be budgeted for in the clock period for flip- 5 flop-based circuits. While retiming can balance flip-flop-based pipeline stages to reduce clock period, in some cases this is not possible. For example, accessing cache memory is a substantial portion of the clock period, limiting the clock period of the pipeline stage. There also needs to be additional logic for tag comparison and data alignment [18]. With flip-flops, the only method for increasing the speed may be to give the cache access an additional pipeline stage to complete if it is the critical path limiting the clock period [18]. Experimental results in [8] show that latch-based designs are 5–19% faster than corresponding flip-flop-based designs, for small increase in area. If unintentional time borrowing occurs in latch-based circuit, which is not planned during design but occurs in some fabricated copies for a design due to delay variations and/or (minor) defects during fabrication, then such a fabricated copy of the circuit may operate at desired clock frequency when the amount of time borrowing is accommodated by subsequent block(s). In this manner, unintentional time borrowing increases yield at desired clock frequency when an appropriate delay testing approach is used. On the other hand, a flip-flop-based pipeline without sufficient timing margin will malfunction at desired clock period when variations and defects cause similar extra delay in the circuit, leading to reduction in yield. In summary, latch-based design enhances performance by enabling intentional time borrowing and improves yield by allowing unintentional time borrowing, compared to flip-flop-based designs. Importantly, as clock distribution is becoming increasingly difficult, abovementioned performance benefits are growing. For the above reasons, latch-based pipelines are used in full-custom designed high-speed circuits, especially in highly delay critical parts of circuit. However, we can realize these two advantages only if latch-based design is carried out to obtain high-speed implementations and an appropriate delay test methodology is used. Otherwise, flip-flop-based pipelines would prevail due to the ease of design, verification, and test. The approaches of static testing, such as stuck-at fault testing, are similar for both latch-based and flip-flop-based circuits. The same automatic test pattern generator (ATPG) can be used to 6 generate tests for both architectures, since tests are generated based on the circuit structure without considering the timing of circuits. The only difference is that latch-based pipelines entail higher cost when scan DFT is used to target faults in each block individually, because replacing a latch with a scan latch requires two additional latches in general, whereas replacing a flip-flop with a scan flip-flop requires an additional multiplexer as shown in Figure 2 [23]. Hence, in this research we focus on delay testing (path delay testing) of flip-flop-based and latch-based circuits. In delay testing of flip-flop-based circuits, typically a divide-and-conquer approach using scan DFT is used where each logic block is tested individually using scan. As soon as an erroneous response is captured at any flip-flop in the circuit, the chip is identified as being faulty at the given clock frequency. Such a chip is either discarded or “binned” to be sold at a slower rated clock frequency. Also, since such an approach separately targets paths within individual blocks, it targets shorter paths. Hence, higher delay fault coverage is obtained using a smaller number of tests, compared to the approach without using scan DFT. On the other hand, latch-based pipelines with time borrowing pose new challenges in delay testing of latch-based circuits. Unlike flip-flop-based pipelines where each block of logic can be tested (for delay faults) separately using scan DFT, none of the existing DFT techniques can be used for delay testing of latch-based circuits with time borrowing, since time borrowing makes it necessary to target (a.k.a., multi-block paths), i.e., paths that span multiple blocks. If the same divide-and-conquer approach is used for delay testing of latch-based pipelines, each logic block will be tested independently using scan DFT. Using the nominal delay for each block, this 7 allows a maximum delay of T/2 (T is the clock period) for each logic block, if complementary clocks are used as shown in Figure 1. The delay fault coverage we compute from this approach is high in most cases since short paths are tested using scan. However, any fabricated copy of the chip that has even one intentional/unintentional time borrowing site will fail the tests applied by such a divide-and-conquer approach and be discarded. Since we are considering high-speed application of latch-based circuits, the circuit is likely to contain one or more time borrowing sites. Whenever this is the case, such a divide-and-conquer approach will lead to zero-yield for latch-based pipelines. In other words, circuit designers cannot use intentional time borrowing, which suppresses the performance benefits of latch-based designs. Also, there will be no yield benefit of using latch-based designs because unintentional time borrowing is not allowed. Consequently, a simple divide-and-conquer delay testing approach is not appropriate for delay testing of high-speed latch-based circuits as it erodes much of its benefits. The objective of this research is to propose an optimal scan-based path delay testing of latch-based circuits with time borrowing by optimizing the robust path delay fault (PDF) coverage as well as the test application cost. We also suggest guidelines for latch-based designs such that the performance and yield benefits of latch-based circuits are guaranteed under the optimal delay fault coverage. In Chapter 2, the key challenges in delay testing of latch-based circuits are discussed. This motivates the need for developing new DFT designs and new approaches for DFT-based delay testing. In Chapter 3, we focus on maximizing delay fault coverage and present new path delay testing approach that requires a very small number of scan chain configurations, while guaranteeing maximum robust PDF coverage. This decreases the overheads due to DFT significantly. We then prove that the proposed delay testing approach for latch-based circuits always achieves the theoretical maximum PDF coverage regardless of the length of multi-segment paths targeted. Furthermore, we propose a new test generation approach that works for any time borrowing scenario even for cases where only a fraction of the abovementioned scan chain configurations (or 8 other set of configurations) are available. This is especially useful since it allows us to avoid scan chain configurations that significantly degrade circuit performance. In Chapter 4, we propose a new test scheduling approach for latch-based circuits to minimize the test application cost while achieving the maximum coverage that the test generation method presented in Chapter 3 can achieve. First, we show that conventional test scheduling approaches may not be applicable due to the unique characteristics of latch-based circuits with time borrowing. We then present our new formulation of the test cost minimization problem for path delay testing of latch-based circuits, and present a deterministic approach as well as two heuristic approaches. In Chapter 5, we compare high-speed latch-based circuits with their flip-flop-based counterpart designs from the viewpoint of path delay testing, and propose design guidelines for latch-based high-speed circuits that guarantee that a latch-based circuit achieves higher performance and higher yield than its flip-flop-based counterpart. We also prove that a latch-based circuit the delay testing approach proposed in Chapter 3 obtains the maximum path delay fault coverage that is always greater than (or equal to) the coverage of the corresponding flip-flop-based circuit. In Chapter 6, we discuss the future research tasks and subjects such as other delay testing approaches and the hardware design and control issues related to the scan DFT. 9 CHAPTER 2 Background – Challenges in delay testing of latch-based circuits In a flip-flop-based circuit, the transition at the output of a path in one block is latched into the corresponding flip-flop at a specific clock edge before it begins to propagate via a path in the next block. Hence, if the delay of the path in the first block is excessive, the transition misses the clock edge and cannot be seen by the path in the next block. Also, if the delay of the path in the first block is short, the transition at the input of a path in the next block still starts after the appropriate clock edge. Thus, in delay testing of flip-flop-based circuits, PDFs in one combinational logic block can be treated independently of the PDFs in the adjacent blocks. In latch-based circuits, in contrast, test application at the inputs of a block may not always occur at the rising edge of the driving clock due to time borrowing. If time borrowing never occurs at a particular latch, transitions are always applied at the rising edge of the driving clock. However, if time borrowing occurs at a latch, the exact time at which transition is applied to the next block depends on the amount of time borrowing. Hence, for delay testing of latch-based circuits, we first need to know the latches that are sites of time borrowing. Latches that are sites of intentional time borrowing are known prior to DFT design, test development, and test application. On the other hand, latches that are sites of unintentional time borrowing may vary from one fabricated copy of the chip (called a chip instance in the following) to another and hence are not known prior to test application. More importantly, the precise amount of time borrowing at a site of intentional/unintentional time borrowing varies from one chip instance to another; even in a particular chip instance it varies from one vector to another. Hence, one of the biggest challenges in delay testing of latch-based circuits is that it is impossible to use the scan mode to apply a test at a latch without knowing that the latch is not a site of 10 time borrowing. Furthermore, even when time borrowing is known to occur at a latch, it is practically impossible to use scan to apply tests where bits are skewed to precisely replicate the arbitrary and unknown amounts of time borrowing at outputs of various latches. It is important to recall that simply applying tests at the inputs of a block and observing responses at its outputs at nominal times will cause many fault-free chips to be unnecessarily discarded. In fact, in circuits where time borrowing has been intentionally exploited, this can lead to zero yield at the given clock frequency. Hence, we need to develop a novel delay testing approach that applies tests and captures responses at clock edges only, while considering intentional/unintentional time borrowing. Consequently, if a latch is a site of time borrowing, it is necessary to test multi-segment paths, i.e., paths obtained by concatenating appropriate paths in successive logic blocks separated by latches. Since many latch-based parts of circuits (e.g., data-paths) contain an astronomical number of such multi-segment paths [1], the classical test approach, which targets the entire pipeline without DFT, typically suffers from impractically high test generation complexity, high test application time, and – for many circuits – meaninglessly low fault coverage. Hence, the use of some new type of DFT is imperative to reduce significantly test generation and test application times while providing meaningfully high values of delay fault coverage by targeting shorter paths. The next section explains these major benefits of exploiting scan DFT. There are two major benefits of delay testing using scan DFT. First, appropriate use of scan DFT reduces the number of target PDFs. For example, consider paths via L5 in the two stage pipeline shown in Figure 1. Let x paths in C0 terminate at L5 and y paths in C1 originate from L5. Then there exist xy physical paths in the above two blocks via L5. Since, for each physical path, two PDFs – one with a rising transition at its input and another with a falling transition – must be targeted, a total of 2xy PDFs that pass via the latch (as well as the two blocks) must be targeted. If one can verify, during testing of a particular chip instance, that no time borrowing, intentionally or unintentionally, occurs at 11 this latch L5, paths in C0 and C1 can be targeted separately. In such a case, the total number of PDFs corresponding to the latch that must be targeted drops from 2xy to 2(x+y). Since x and y are typically large, the use of scan reduces the total number of target PDFs. Note that even greater reductions occur when one considers the above arguments for multi-segment paths that pass via a larger number of blocks. Second, the average length of a path targeted during test generation is shortened when the proposed approach and DFT are used. It is not always possible to propagate a transition robustly along a path, since sometimes conflicting logic values are required at side inputs of the path for robust propagation. As the length of a target path increases, typically the possibility of a conflict between values required at side inputs also increases. The use of scan at latches where no time borrowing occurs reduces the average length of paths and hence, in many circuits, enhances PDF coverage. As noted in Sections 2.1 and 2.2, there are unique problems for test generation and design of DFT circuits to apply tests and capture responses and it is imperative to develop new DFT designs and a new delay testing technique that takes advantages of these new DFT designs. Developing this type of new delay testing involves several sub-objectives that jointly define the overall optimization problem. The sub-objectives include maximization of delay fault coverage, minimization of test application cost, minimization of test generation time, design of optimal DFT circuitry and so on. Among these, maximization of delay fault coverage and minimization of test application cost are two major problems for test engineers, and typically there is trade-off between the two. In other words, in order to reduce test application cost, delay fault coverage might have to be compromised to some extent. However, even if we are willing to incur a sufficiently high test application cost, we may not be able to achieve desired level of delay fault coverage without implementing an appropriate test generation methodology. 12 Hence, we prioritize the above sub-objectives to define the overall optimization problem as follows. First, in Chapter 3, we propose a new structural test generation approach that maximizes the robust PDF coverage by exploiting scan DFT that applies tests and captures responses at clock edges only. Second, in Chapter 4, we discuss a new test scheduling method to minimize test application cost, under the condition that the maximum robust PDF coverage proposed in Chapter 3 is maintained. 13 CHAPTER 3 A new delay testing approach In our approach, a latch may operate in the following four modes. (1) Normal mode. The latch is transparent when the corresponding clock is high and holds its state when the clock is low. (2) Scan mode. (2a) Scan-in mode. Vectors are loaded via scan-in and applied at the rising edge of the corresponding clock. Concurrently, the values previously captured in the latches are scanned out as described next. (2b) r-capture scan-out mode. The latch captures response at the rising edge of the corresponding clock for scan out. (2c) f-capture scan-out mode. The latch captures response at the falling edge of the corresponding clock for scan out. It is assumed that time borrowing does not occur at the “primary” inputs and outputs of the entire latch-based circuit. This is often true because high-speed latch-based pipelines are typically embedded in a larger flip-flop-based system. We may test each block individually. Alternatively, we may test any set of contiguous blocks together as a single entity. In either case, we use the term sub-circuit under test (SCUT) to describe the block(s) under test during a particular phase of testing. Each SCUT is characterized by what blocks are included and what scan chain configurations (i.e., operation modes of latches) are used for the latches in the SCUT. The proposed test generation approach is based on robust path delay testing of path delay faults. Hence, the following property of robust tests is used throughout this dissertation. : Any robust test for a target PDF invokes a delay equal to or greater than the delay of the target path, independently of the presence or absence of other delay variations or faults in the circuit. 14 This is because a robust test for a PDF guarantees that a transition at an on-path line cannot occur unless a transition occurs at the previous on-path line, independently of the presence or absence of any other delay variations or faults [7]. This basic property attributed to robust tests is strictly true under many commonly used delay models. More details can be found in [7]. In particular, the propagation of a transition along the path may be affected by the existence of non-static values at off-path inputs. However, the conditions for robust propagation only allow those values at off-path inputs which may delay but cannot accelerate the on-path propagation. Under the r-r test application, tests are applied to the inputs of the SCUT at the rising edge of the driving clock (i.e., the clock that drives the latches at the input of the SCUT), and the responses are captured at the outputs of the SCUT at the rising edge of the receiving clock (i.e., the clock that drives the latches at the output of the SCUT). Let us assume that the SCUT is comprised of m consecutive blocks of logic in a linear pipeline (Ch, Ch+1, Ch+2, ⋅⋅⋅, Ch+m–1), and that the latches at the inputs of target paths (input latches) in the SCUT are free of time borrowing. Then, the time interval TA(r,r), denoting the nominal time allocated to the SCUT, is mT/2, where T is the clock period. Hence, the following is a sufficient (but not necessary) condition for the SCUT to be free of delay faults. Σ + − = Δ ≤ ≡ 1 ( , ) 2 h m i h i TA r r C m T , (1) where ΔCi is the maximum delay of any multi-segment path in block Ci. If this condition is violated for a latch (output latch) at the output of the SCUT, we discover that the SCUT borrows time via the latch from the next block Ch+m, since a transition arrives at the latch after the output latch of Ch+m–1 becomes transparent. In particular, we have the following result. If every block of a CUT passes r-r tests at clock period T, the CUT has no delay fault and no time borrowing at that clock frequency. 15 However, one or more SCUT may fail r-r tests due to time borrowing but the circuit may not have a delay fault. The above observation is generalized to obtain Theorem 1. . If a CUT can be partitioned, i.e., divided into a disjoint SCUTs that collectively include all blocks in the CUT, such that each SCUT passes corresponding r-r tests at clock period T, then the CUT is free of delay faults at that clock frequency [9]. Under the r-f test application, tests are applied to the inputs of the SCUT at the rising edge of the driving clock and the responses are captured at the outputs of the SCUT at the falling edge of the receiving clock. Let us assume that the SCUT is comprised of m consecutive blocks of logic in a linear pipeline (Ch, Ch+1, Ch+2, ⋅⋅⋅, Ch+m–1), and that the latches at the inputs of the SCUT are free of time borrowing. Let the time interval TA(r,f) denote the maximum time allowable for any multi-segment path in a given SCUT. Then, the following is a necessary (but not sufficient) condition for the SCUT to be free of delay faults. Σ + − = Δ ≤ + ≡ 1 ( , ) 2 ( 1) h m i h i TA r f C m T , (2) where ΔCi is the maximum delay of any multi-segment path in block Ci. If this condition is violated for even one SCUT, the entire CUT is proven to have a delay fault at the clock period T. Note that the r-f test application allows the maximum time duration for transitions to propagate via each path in an SCUT. Hence, it is necessary for every block to pass r-f tests. . If any SCUT fails its r-f tests at clock period T, then the circuit has delay faults at that clock frequency. : If an SCUT fails r-f tests, it means the delay of SCUT is longer than the maximum time duration for transition. Since such additional delay cannot be accommodated via time borrowing, the circuit has delay faults at the given clock frequency.■ 16 Consider a scenario where, based on circuit design, we expect extreme time borrowing, i.e., where the delay of one or more single block and/or one or more multiple block combination are, in presence of delay variations or defects, likely to exceed the maximum possible delay allowed given by the relation described for TA(r,f). In this case, we consider each single block or multiple block combination where extreme time borrowing is deemed likely to occur as an SCUT and apply suitable tests to the SCUT using the r-f test application. Obviously, if any of these SCUTs fails any of its r-f tests, then the entire CUT is identified as having a delay fault at the desired clock period and delay testing can be terminated immediately. However, in much of this dissertation, for simplicity of discussion, it is assumed that a chip under test has either skipped r-f tests (since extreme time borrowing was not expected) or passed all the r-f tests applied (since extreme time borrowing does not exist). Each latch in the circuit, other than primary input and primary output latches, is determined either as being a time borrowing latch (TBL) or as a non-time borrowing latch (NTBL). A latch L is identified as a TBL if at least one test for an SCUT that terminates at L fails r-r tests by the capture of an erroneous value at the latch. In contrast, a latch L is identified as an NTBL if, for all SCUTs that terminate at the latch, no r-r test fails by causing an error at that latch. The test procedure of the proposed structural delay testing is based on identification of time borrowing status of latches within the CUT. We now summarize the above observations. : A latch L is identified as an NTBL if, for all SCUTs that terminate at the latch, no r-r test fails by causing an error at that latch. : A latch L is identified as a TBL if at least one test for an SCUT that terminates at L fails r-r tests by the capture of an erroneous value at the latch. Consider a latch L that is identified as a TBL via testing. In general, in such a situation a non-empty subset of paths that terminate at the latch are time borrowing while the set of remaining paths 17 (which might be empty) are non time borrowing. While it is theoretically feasible to develop a methodology that considers L as time borrowing only with respect to the former set of paths, such a methodology is practically unimplementable since it requires execution of delay diagnosis on each copy of chip to identify the former set. Since typically the complexity of such diagnosis is extremely high, we consider L as time borrowing wich respect to all paths that terminate at L as stated below. : If a latch L is identified as a TBL, all multi-segment paths that pass via L must be targeted by subsequent testing, despite the fact that some of paths in fan-in of L may be not borrowing time. The basic idea behind the proposed delay testing approach is to test the first logic block to identify sites of time borrowing and adaptively add the subsequent logic blocks by deciding whether to target multi-segment paths or single-segment paths. This process continues to the last logic block. Consider the three-stage linear pipeline shown in Figure 3. Latches at inputs of logic block Ck are called level-k latches. Initially, r-r tests are applied to the first logic block (C0). If C0 passes all these r-r tests, time borrowing does not occur at any of level-1 latches and hence the next logic block (C1) can be tested separately from C0. Likewise, if C1 also passes all r-r tests, time borrowing does not occur at any of level-2 latches and hence the next logic block (C2) can be tested separately. Since time borrowing does not occur at any latch in this particular case, each logic block is individually tested, which tends to decrease the number of target paths and significantly increase the coverage. On the other hand, if C0 fails r-r tests, we target multi-segment paths that span C0 and C1, denoted as C0 + C1. These multi-segment paths are targeted by using scan DFT at level-0 and level-2 latches and configuring all level-1 latches in normal mode. However, this scheme of simply combining consecutive logic blocks to target multi-segment paths does not improve coverage significantly, especially in cases where time borrowing occurs extensively across the pipeline. For example, in the worst case where C0 + C1 also fails r-r tests, we will end up testing the entire three- 18 stage pipeline (C0 + C1 + C2) jointly, while obtaining the same coverage as the classical approach at the additional cost of testing C0 and C0 + C1. Hence, we developed an SCUT-based approach [9] that targets multi-segment paths via TBLs only, and targets shorter paths that start at NTBLs. This is done by configuring NTBLs in scan mode and TBLs in normal mode. For instance, suppose the testing of C0 identifies L10 only as a TBL. Then, in the next test session for C0 + C1, we configure L11 and L12 in scan mode, and L10 in normal mode. In this manner, the two-segment paths via L10 are tested and the single-segment paths starting at L11 and L12 are tested. Similarly, the results of this testing of C0 + C1 will decide the time borrowing status at level-2 latches, which will adaptively determine the configurations of level-2 latches in the next test session that includes C2. Although this approach [9] improved the coverage significantly in many cases, the experiments show that the coverage is still low for cases where time borrowing occurs extensively. Next we propose an advanced SCUT-based approach that further improves the coverage even when time borrowing occurs extensively, and reduces the complexity of DFT circuitry and scan chain routing. In Sections 3.3 through 3.6, for simplicity of explanation, the discussion uses the two-stage linear pipeline shown in Figure 4. However, the ideas proposed are applicable to general latch-based networks. Assume that there are j latches at the inputs of the first combinational logic block C0 (level- 19 0 latches), k latches between C0 and C1 (level-1 latches), and l latches at the outputs of C1 (level-2 latches). C0 L11 C1 L1i L1k L01 L02 L0j L21 L22 L2l Level-0 Level-1 Level-2 A multi-segment path p α β As in the approach described above [9], the paths in the first logic block C0 (= SCUT0) in Figure 4 are tested by themselves by operating level-0 latches in scan (scan-in) mode and level-1 latches in scan (r-capture scan-out) mode. The purpose of testing SCUT0 is to identify TBLs among the level-1 latches. Assume that such testing detects time borrowing at a subset of level-1 latches denoted by the set LTB, where LTB = {L1i L1i is a TBL, 1 ≤ i ≤ k}. The remainders of the level-1 latches are not time borrowing sites and constitute a set LNTB, i.e., LNTB = {L1i L1i is not a TBL, 1 ≤ i ≤ k}. In this CUT we are interested in testing multi-segment paths in the CUT that span C0 and C1. When any particular multi-segment is being tested, the target path passes via one level-1 latch, which is referred to as the on-path latch for the path. All the other level-1 latches are called off-path latches for the target path. Sections 3.4 and 3.5 describe how to test the multi-segment paths with high PDF coverage using DFT by treating on-path and off-path latches differently and by considering LTB and LNTB. Consider a case where the objective of delay testing is to test an arbitrary multi-segment path p comprised of a sub-path α in C0 and a sub-path β in C1 in the circuit shown in Figure 4. α and β are 20 connected by a level-1 latch L1i. In this case the latch L1i is the on-path latch and every other level-1 latch is an off-path latch. First, consider the case where the on-path latch L1i is identified as a TBL during testing of SCUT0 (i.e., L1i ∈ LTB). In order to test multi-segment paths that pass via L1i ∈ LTB, scan mode cannot be used for L1i, since no known DFT circuitry can replicate an appropriately skewed test application and response capture corresponding to the precise amount of time borrowing, which varies from vector to vector and from one chip instance to another. Therefore, only normal mode can be used at TBL L1i during testing of any multi-segment path that passes via the latch. Now consider the case where the on-path latch L1i is identified as a NTBL during testing of SCUT0 (i.e., L1i ∈ LNTB). Theorems 3 and 4 identify the relationship between (i) testing α and β individually as sub-paths, and (ii) testing α and β jointly (denoted as α + β) as a multi-segment path. (i.e., α + β stands for the testing of the path p by configuring L1i in normal mode in the SCUT comprised of C0 and C1.) If any robust test for α passes when C0 is tested by itself and any robust test for β passes when C1 is tested by itself, then the worst-case delay of multi-segment path p via L1i is within the limit imposed by the given clocks. : We will prove this by contradiction. Let us start by assuming that the path p, i.e., α + β, fails when SCUT comprised of C0 and C1 are tested together. In this case, the delay of p exceeds the sum of nominal delays of C0 and C1. Assuming that no time borrowing occurs at the inputs of C0, this can occur only under the following two scenarios: (i) delay of α is greater than the nominal delay of C0, or (ii) delay of β is greater than the nominal delay of C1. In the former case, any robust test for α when 21 C0 was tested by itself would have failed and in the latter case any robust test for β when C1 was tested by itself would have failed (or both). Hence, if each α and β pass robust tests when respective blocks are tested, this shows that the delay of p cannot exceed its nominal delay.■ If the multi-segment path α + β is robustly testable in the multi-segment SCUT comprised of C0 and C1, α in C0 by itself and β in C1 by itself are both individually robustly testable. : When α + β is targeted in an SCUT comprised of C0 and C1, the conditions necessary (and sufficient) for robust detection of p require that α be robustly sensitized within C0, as in the case where α is tested in C0 by itself. Therefore, if α is not robustly testable in C0 by itself, then no robust test exists for any multi-segment path that includes α. Similar reasoning also applies to β.■ In summary, if we conclude that time borrowing does not occur at the latch after testing the block(s) in the fan-in of the latch, we can separately test the sub-paths in the fan-out of the latch instead of testing multi-segment paths that pass via the latch. While testing a sub-path in the fan-out of latch L1i at which no time borrowing occurs, L1i may be configured either in normal mode or in scan mode. Next, we compare these two approaches in Sections 3.4.2.1 and 3.4.2.2. : The multi-segment path of α + β is targeted by configuring L1i in normal mode. In this approach, the on-path latch L1i is configured in normal mode although L1i is a NTBL. This may be the case if the DFT design does not support required scan mode operation. Even if the normal mode is in use, we can attain higher test quality based on Theorem 4 and the fact that time borrowing does not occur at L1i by exploiting the following property. In order to test the sub-path β because L1i is a NTBL, the test generation procedure targets β as parts of multi-segment paths that start at any level-0 latch and include β by configuring L1i in normal mode. Note that the sub-paths in the fan-in of L1i are used only to produce a rising or a falling transition (as appropriate to test β) at the output of L1i, and 22 the logic values within C0 need not robustly propagate the transition along any particular path in the fan-in of L1i. As long as a desired transition is initiated at the output of L1i, robust propagation of the transition is required only for the sub-path β. By doing so, the number of target PDFs is also reduced because only β is targeted instead of all multi-segment paths that include β. In summary, it is shown in Theorem 4 that for a path in the fan-in of L1i (e.g., α), testing using SCUT0 provides equal or higher robust coverage compared to testing using the multi-segment SCUT comprised of C0 and C1. Also for the sub-paths in the fan-out of L1i (e.g., β), testing using Property 1 is as good as testing multi-segment paths in an ordinary manner because Property 1 does not require robust sensitization along any particular path within C0. Hence, Property 1 shows that robust test coverage can be further improved even without using the scan mode at the on-path latch, provided that time borrowing is known not to occur at the on-path latch. : The sub-path β is targeted by operating L1i in scan mode. In this approach, we only test the sub-path β that originates at a NTBL L1i. As L1i is configured in scan mode, the sub-path β of the original target path (α + β) will be tested separately and the robust coverage for sub-paths like β will be combined with the robust coverage for sub-paths like α in the fan-in of the latch. If L1i can be configured in scan mode, which is typically true in cases where L1i is connected to the scan-out chain for time borrowing detection, Approach 2 is preferred to Approach 1 because Approach 2 provides equal or higher coverage than Approach 1, as per Theorem 4. Suppose a multi-segment path p that comprises α and β passing via L1i in Figure 4, is targeted. The configuration of the on-path latch L1i is determined as explained in Section 3.4 (i.e., normal mode is used if L1i is a TBL; either normal or scan mode is used if L1i is a NTBL). Note that in both cases, any off-path latch can be configured in scan mode, independently of whether or not that off-path latch is a site of time borrowing. This is due to the following two reasons. 23 First, if a static value is applied via scan at an off-path latch, the time borrowing status of the off-path latch has no impact on the on-path delay. Second, even if a rising or a falling transition is applied via scan, a robust test for a target path like β does not require off-path transitions to satisfy any specific timing requirement as per Lemma 1. (In particular, an early off-path transition cannot reduce the on-path delay. In our scheme the off-path transition is never later than in the normal mode.) Hence, even for a latch where time borrowing is proven to occur, scan mode operation does not violate the robust delay test conditions, provided that the latch is off-path. Now let us consider two alternatives, Alternative-1 and Alternative-2 that operate the on-path latch in the same mode (that meets above requirements) and differ only in the configuration of the off-path latches. Let the set of the off-path latches at level-1 configured in scan mode in Alternative-1, A1 scan, be a proper subset of the set of off-path latches configured in scan mode in Alternative-2, A2 scan (i.e., A1 scan ⊂ A2 scan). Note that Alternative-1 includes the case where scan mode is used for none of the off-path latches, i.e., A1 scan = φ . Hence, the classical approach where only normal mode is used for every on-path and off-path latch is a special case of Alternative-1. By comparing the two alternatives, we obtain the following results. Any robust test for the multi-segment path p using Alternative-1 or Alternative-2 invokes a delay equal to or greater than the delay of p. : First, consider the case where the on-path latch L1i is configured in normal mode. In both alternatives, scan mode is used for each level-0 latch and each level-1 latch in A1 scan and A2 scan. If two different test vectors are applied in the two alternatives, the arrival times at the output of the on-path latch L1i may be different in the two cases. However, due to the characteristic of robust tests given in Lemma 1, the delay invoked for α and via L1i will be guaranteed to be equal to or greater than the worst-case delay of sub-path α plus the delay via L1i. Next, for the propagation of this transition along β, the values applied at off-path level-1 latches may be different in two alternatives. However, also due to Lemma 1, the subsequent propagation along β will invoke overall delay equal to or greater than that of the target path p. 24 Second, in case where the on-path latch L1i is configured in scan mode when L1i is a NTBL, the transitions in both alternatives depart from L1i at the rising edge of the clock driving L1i. The subsequent propagation along β will invoke overall delay equal to or greater than that of the target path p due to Lemma 1.■ If a multi-segment path p is robustly testable in Alternative-1, then it is robustly testable in Alternative-2. : Note that both alternatives require the same set of conditions for the values at on-path lines and off-path inputs for robust detection of p. We can specify independent logic values (i) at every level-1 latch that belongs to A1 scan in Alternative-1 and A2 scan in Alternative-2, respectively, as well as (ii) at all level-0 latches in both alternatives. Since A1 scan ⊂ A2 scan, Alternative-2 provides a superset of possible value assignments to satisfy the same set of conditions for robust detection of p. Consequently, if a robust test exists for p in Alternative-1, then one surely exists in Alternative-2.■ Theorem 5 shows that the test quality obtained by any robust test applied using Alternative-2 is equal to the test quality obtained by any robust test applied using Alternative-1. Theorem 6 shows that robust delay fault coverage for Alternative-2 is definitely equal to and may be superior to that for Alternative-1. Hence, if we can use a scan chain configuration described in Alternative-2 to target a multi-segment path p, then we need not use a scan chain configuration described in Alternative-1 to test the path. The following result is a corollary to Theorems 5 and 6 assuming that the on-path latch L1i is a time borrowing site. While testing a multi-segment path via a latch at which time borrowing is known to occur, the best robust test quality and the best robust coverage can be obtained by operating the on-path latch in normal mode and all off-path latches in scan mode (single-normal configuration), provided that DFT circuitry and control signals allow such a combination of modes. For example, suppose there are four latches at level-1 of Figure 4, and testing of SCUT0 shows that time borrowing occurs at L12, and multi-segment paths that pass via L12 are targeted. In this case, 25 normal mode is required at L12 since it is the on-path latch and a site of time borrowing. Depending on the configuration of the off-path latches, 8 (=23 ) configurations may be used as shown in Figure 5. The relationships among different configurations given by Theorem 6 are represented by arrows in Figure 5. If a path via the on-path latch (L12) is robustly testable using the configuration specified by the destination of an arrow, the path is robustly testable using the configuration specified by the source of the arrow. The following result is a corollary to Theorems 5 and 6, this time assuming that the on-path latch L1i is a NTBL. While testing a multi-segment path via a latch at which time borrowing does not occur, the best robust test quality and robust coverage can be obtained by operating the on-path latch as well as all off-path latches in scan mode (all-scan configuration), provided that such a configuration is supported by the DFT hardware and control. We can modify Figure 5 to include the remaining eight possible scan chain configurations which have L12 in scan mode and add corresponding arrows to show the relationships between different scan chain configurations for this case where L12 is a non-time borrowing on-path latch. 26 If a target path p is tested using a configuration where one or more off-path latches are in normal mode, then we can use the following property to modify the value applied at output of any latch where no time borrowing occurs and yet is configured in normal mode. The output of a NTBL is always hazard-free, because data stabilizes at the latch input before the latch becomes transparent. By considering both hazardous and hazard-free values at the input of each such latch even when a hazard-free value is desired at the latch’s output, robust tests may be found for some paths for which such tests may not otherwise be found. Figure 6 shows an example case where time borrowing is detected only at L1 but both latches are operating in normal mode to test a path via L1. The falling transition is propagated via L1 to test the path shown in bold. Robust propagation of the falling transition at the on-path input of G4 requires static-1 at its off-path input, which is the output of L2. However, the output of G3 cannot have a static-1 signal because the values at the inputs of G3 are already determined by the on-path values as a rising transition at one input and a falling transition at the other. Hence, a conventional test generator will be unable to find a robust test for this path via L1. On the other hand, our ATPG (automatic test pattern generator) exploits Property 2, and considers hazardous-1 signal as well as static-1 at the input of L2. Hence, by exploiting Property 2, our ATPG can successfully generate a robust test for the target path and improve coverage. 27 The above results provide a significant reduction in the number of scan chain configurations that are required to maximize robust PDF coverage, even when time borrowing occurs at unexpected latches (i.e., latches that are not sites of intentional time borrowing). In [9], the fully-adaptive approach requires the DFT circuitry to support 2k scan chain configurations at a level with a total of k latches. However, Corollary 1 and Figure 5 show that when we detect time borrowing at the ith latch in the level, then the configuration in which the ith latch is in normal mode and all the other latches are in scan mode, by itself, maximizes the coverage for all multi-segment paths that pass via the ith latch. Hence, no matter which and how many of the latches at the level are sites of time borrowing, the robust PDF coverage can be maximized for the paths that pass via TBLs if the DFT supports the following k single-normal configurations: (n, s, s, ···, s), (s, n, s, ···, s), (s, s, n, ···, s), ··· , (s, s, s, ···, n), where n denotes normal mode and s denotes scan mode. As per Corollary 2, the all-scan configuration (s, s, s, ···, s) provides the maximum coverage for all paths that pass via the latches where no time borrowing occurs. Of course, we need the all-normal configuration (n, n, n, ···, n) to support normal circuit operation. We call the above k+2 configurations as the , meaning that all normal, all-scan, and every single-normal configurations are available in every level of latches. By decreasing the number of required scan chain configurations from 2k to k+2 in a level of k latches, the proposed approach reduces the complexity of DFT circuitry and scan chain routing. This is a significant improvement over [9]. In general, we have the following results. Maximum possible robust PDF coverage can be attained for any latch-based circuit independently of the time borrowing scenario, provided that latches at inputs of every combinational block can be configured in (n, n, ···, n, n), (s, s, ···, s, s), (n, s, ···, s, s), ···, (s, s, ···, n, s), (s, s, ···, s, n), i.e., the all-normal, the all-scan-in, and all possible single-normal configurations. 28 For a faulty chip instance, it is not necessary to cover the entire CUT since a faulty chip instance is discarded as soon as a delay fault is identified. In particular, a delay fault in a faulty chip instance can be detected by any target path in an SCUT that terminates at a primary output and fails r-r tests (under the assumption that r-f tests are not used), regardless of time borrowing status at the latch where the target path begins. In other words, the objective of delay testing for faulty chip instances is not to compute the delay faulty coverage but to identify a delay fault at minimum test application cost. In contrast, for fault-free chip instances, we must test all necessary target paths that are required to cover the entire CUT in order to compute the delay fault coverage. Essentially, we are interested in targeting every path of CUT that starts from a primary input and terminates at a primary output. A multi-segment path p that spans from a primary input to a primary output can be tested either by itself as a single target path or by multiple sub-paths that partition p using scan, where every partition is made only at a NTBL. For instance, suppose that p starts from a primary input L0, passes via L1, and terminates at a primary output L2. Let us assume that L1 is known as a NTBL and r-r tests are applied to the sub-path from L0 to L1 and the sub-path from L1 to L2. If both sub-paths pass the r-r tests, we can say p is robustly tested as proven by Theorems 3 and 4. In summary, for fault-free chip instances, it is required for a test generation algorithm to target every path from a primary input to a primary output either as a single target path or as multiple disjoint sub-paths where every partition is made only at a NTBL, such that the coverage is maximized by selecting best scan chain configurations as per Theorems 3 through 6 and Corollaries 1 and 2. All these required tests for robustly testable paths/sub-paths must be generated so that the test generation may conclude and fault coverage can be computed. Hence, we obtain the following result. The maximum robust PDF coverage is obtained if scan chain configurations are selected as per Corollaries 1 and 2 assuming the optimal set of scan chain configurations are available, and every robustly testable path from primary input to primary output is tested either by itself or by multiple disjoint sub-paths that partition the path at NTBLs. 29 If the optimal set of scan chain configurations (i.e., all-normal, all-scan, every possible single-normal configurations) are available, we prove in Section 3.8.1 that the proposed test generation approach under the optimal set of scan chain configurations guarantees the optimal robust PDF coverage. Hence, no other path delay testing approach for latch-based circuits can obtain higher robust PDF coverage than that of our proposed approach. In Section 3.8.2 we describe the test generation procedure under the optimal set of scan chain configurations. We show that test generation for any single/multi-segment path in CUT is significantly simplified due to the optimality of coverage the proposed approach achieves. In general, path delay testing of a structurally long path in a circuit is difficult because test must meet specific requirements at many off-path inputs to sensitize the path. As noted in Chapters 1 and 2, multi-segment paths must be targeted in latch-based high-speed pipelines in case time borrowing occurs. This is believed to increase the complexity of test generation. However, our proposed latch-based delay testing approach always achieves theoretical maximum delay fault coverage of a latch-based circuit regardless of length of multi-segment paths being targeted or the complexity of test generation. This can also reduce the complexity of test generation procedure significantly. We obtain the following new results (Theorems 9, 10, 11, and 12), from which we prove that our latch-based delay testing approach achieves theoretical maximum path delay fault coverage, provided that the optimal set of scan chain configurations are available. First, in Theorem 9, we identify paths for which it is structurally impossible to generate a robust test using any scan-based path delay testing methods. If a single-segment path q in block Ci is robustly untestable when Ci is tested by itself, then any multi-segment path Q that includes q is also robustly untestable using any scan-based path delay testing method which controls and observes values only at latches. 30 It is given that a robust test does not exist for the path q, using any path delay testing method where values are applied at the latches at inputs of Ci and responses captured at outputs of Ci. Note that a robust test does not exist for q even when latches at inputs of Ci are independently controlled. Hence, any multi-segment path Q that includes q cannot be robustly tested since no test for Q can robustly sensitize its sub-path q, regardless of delay testing method used. Suppose that the total number of paths from primary inputs to primary outputs in a latch-based circuit is N. If m paths out of these N paths include at least one single-segment sub-path that is robustly untestable, then the theoretical maximum path delay fault coverage of the latch-based circuit is (N – m)/N for any scan-based method. According to Theorem 9, it is proven that no scan-based path delay testing approach can generate a robust test for m paths, since they include at least one robustly untestable single-segment sub-path. Hence, no scan-based path delay testing approach can robustly test more than N – m paths in the circuit. Next, the following theorems prove that the proposed latch-based delay testing approach is guaranteed to robustly cover the remaining N – m paths regardless of time borrowing status inside the circuit, attaining the theoretical maximum coverage. If every single-segment sub-path that is included in a k-segment path P in an n-stage latch-based pipeline (1 ≤ k ≤ n) is robustly testable, then P is always robustly testable by the latch-based path delay testing approach proposed in this chapter. For simplicity, we first assume that P is a two-segment path where two single-segment paths p in C0 and q in C1 are connected via latch LD. If p is robustly testable in C0 with a rising (falling) transition arriving at LD and q is robustly testable in C1 with a rising (falling) transition departing from LD, then we prove that the multi-segment path P comprised of p and q is robustly testable in SCUT comprised of C0 and C1 by scanning all off-path latches between C0 and C1. As shown in Figure 7, p is a path from LB to LD in C0 and q is a path from LD to LH in C1. Let a robust test Testp for 31 p in C0 apply vector (VA, VB, VC) at the input latches LA, LB, and LC, where VB is a transition that propagates via p and terminates at LD with a rising (falling) transition. Let a robust test Testq for q in C1 apply vector (VD, VE, VF) at the input latches LD, LE, and LF, where VD is a rising (falling) transition that propagates via q. When we target the multi-segment path P comprised of p and q in an SCUT comprised of C0 and C1 using the latch-based delay testing approach, we can scan LA, LB, and LC as well as two off-path latches LE and LF, regardless of time borrowing status at LD, LE, and LF according to Section 3.5. Recall that the on-path latch LD can be always configured in normal mode regardless of time borrowing status according to Section 3.4. A simple combination of Testp and Testq becomes a robust test vector (VA, VB, VC; VE, VF) for the two-segment path P, where VD value is a rising (falling) transition implied by the values of VA, VB, and VC. An example is shown in Figure 7. Note that as Testp and Testq are generated by controlling all off-path latches completely independently, robust test (VA, VB, VC; VE, VF) for P is also generated by controlling all off-path latches completely independently under the condition that the same type of transition is implied at LD. We can easily generalize the above results for C0 and C1 with arbitrary number of inputs. Subsequently, we can easily generalize these results to paths passing via an arbitrary number of blocks. Hence, we can prove that the proposed latch-based delay testing approach is guaranteed to generate a robust test for any multi-segment path that is comprised only of robustly testable single-segment paths. φ φ φ φ φ φ φ 32 Theorems 9, 10, and 11 lead to the following result. The proposed latch-based path delay testing approach is guaranteed to achieve the theoretical maximum delay fault coverage by configuring all off-path latches in scan mode. Note that Theorem 12 implies that we can achieve the maximum robust PDF coverage even when we target only the longest multi-segment paths that start from a primary input to a primary output, at the cost of high test application cost. In other words, the proposed approach optimizes the robust PDF coverage even when time borrowing occurs ubiquitously throughout CUT, overcoming one of the inherent delay testing problems of latch-based circuits with time borrowing. Also, the maximum robust PDF coverage obtained by the proposed approach is the same as the robust PDF coverage achievable by testing every block of pipeline separately in a divide-and-conquer fashion. As implied in Theorems 11 and 12, a robust test for a multi-segment path can be constructed simply by combining tests for single-segment sub-paths of the target multi-segment path. Hence, the proposed test generation approach initially generates and stores tests for all single-segment paths. Similar to the basic delay testing strategy described in Section 3.2.4, test application starts from the first stage of a pipeline and gradually expands/moves to subsequent stages. According to Corollaries 1 and 2, the all-scan configuration is used at a level of latches where the on-path latch is identified as a NTBL and the single-normal configuration is used at a level where the on-path latch is identified as a TBL. The test procedure is illustrated next. First, the tests for single-segment paths in the first block C0 are applied and sites of time borrowing and non-time borrowing are identified at the level-1 latches. For NTBLs at level-1, single-segment paths in the fan-out of these latches are tested using the single-segment tests in C1 that are already generated and stored, using the all-scan configuration at level-1 latches. For TBLs in level-1, a single-normal configuration is used that configures the TBL in normal mode, and tests for each two-segment path via the TBL are constructed by combining the tests for the corresponding two single- 33 segment sub-paths in C0 and C1, respectively, only excluding the value for the on-path latch that is now configured in normal mode. In the same manner, single-segment paths in C2 or multi-segment paths in C1 + C2 and/or C0 + C1 + C2 are tested based on the time borrowing status at level-2 latches. This procedure continues until the last block is considered. The next section describes our approach for test generation under any set of available scan chain configurations – even those that do not include all of the above k+2 scan chain configurations. Some scan chain configurations may not be allowed due to considerations such as performance overheads associated with using scan at particular latches. Also, to further reduce DFT overheads, a small number of scan chain configurations may be identified during circuit design using the probability of time borrowing at each latch. Restrictions on scan chain configurations, however, trigger the complication of not having the optimal configuration available to test a target path under the particular time borrowing detected in a particular instance of the circuit under test. In this context, we propose and demonstrate a new test generation approach that optimizes coverage by considering the time borrowing status of a CUT in combination with the available scan chain configurations. We demonstrate that this new approach exploits the properties and the theorems presented above for any available set of scan chain configurations to provide high coverage. We assume that the greater the flexibility in the operation of a latch, the higher the overall DFT overheads. We also assume that the available configurations of latches at each level are determined prior to test generation (and definitely before any chip instances are tested). Of course, this test generation algorithm directly covers the case where all scan chain configurations are available. Property 2 is exploited if a NTBL is operating in normal mode as an off-path latch. 34 One of the most important parts of the test generation under restrictions on the scan chain configurations is the selection of the best available configuration(s) for each test session. This selection process is essentially based on Corollaries 1 and 2 and Figure 5. For multi-segment paths that pass via a latch where time borrowing occurs, the best configuration is the single-normal configuration in which only the on-path latch is in normal mode. If this single-normal configuration is not available, a configuration should be chosen such that the on-path latch is in normal mode and as many off-path latches are in scan mode as possible, based on Theorems 5 and 6. In some cases, multiple configurations may be used to target a given set of path to generate a test. For example, suppose that the circuit shown in Figure 4 has four level-1 latches L11, L12, L13, and L14. If multi-segment paths passing via L12, which is identified as a TBL, are being targeted, the configuration (s, n, s, s) is optimal requiring L12 in normal mode, as shown in Figure 5. However, suppose only the following configurations are supported by DFT: {(n, n, n, n), (s, s, s, s), (s, n, n, n), (s, n, s, n), (s, n, n, s)}. Among the configurations, we will use two, namely (s, n, s, n) and (s, n, n, s), since (a) both of these configurations provide better coverage than (n, n, n, n) and (s, n, n, n) (see Figure 5), and (b) each one of these configurations may provide coverage for some paths which the other configuration may not cover (since there is no arrow from either of these configurations to the other in Figure 5). We have developed an algorithm that identifies a minimal subset of the available scan chain configurations to be used for testing any set of target paths, under any given scenario of time borrowing. In general, there may exist multiple versions of SCUTs in one test session (each session differs in the last stage of the target SCUTs) because the best scan chain configurations may be different for different target paths (details will be discussed in Section 3.9.3). Since multiple versions of SCUTs are tested, it is also necessary to avoid testing the same path many times. Our proposed ATPG selects the best set of SCUTs, manages multiple versions of SCUTs, avoids any unnecessary repetition in testing of paths, and properly computes the delay fault coverage for the entire pipeline. We have 35 developed an algorithm that identifies the subset of all available scan chain configurations to be used for testing of any set of target paths, under any given scenario of time borrowing (Figure 8). Procedure:ATPG_MultiSCUTs( ){ Read the pipeline circuit file and available latch configurations; Initialize SCUT_list[level] for each level; For each level { /* select best configurations for the latches in the current level */ For each latch of the current level { If (time borrowing = true & corresponding single-normal mode is available) Select the single-normal mode; Else if (time borrowing = false & all-scan mode is available) Select the all-scan mode; If (no configuration is selected above) Select all compatible configurations modes μ1, μ2, ···, μr; For all pairs of configurations μj and μk { If μj has a superset of latches in scan than μk, Then, eliminate μk; } } /* construct SCUTs */ For each configuration selected by at least one latch { If (selected configuration consists of scan modes only) Construct an SCUT with a single stage; Add new SCUT to SCUT_list[level]; Else Combine the selected configuration of current level with all entries of SCUT_lilst[level – 1]; Add new SCUT(s) to SCUT_list[level]; } For all latches of all levels within the longest SCUT { If (time borrowing = false) Initialize sub-path_list[ ] for this latch to trace tested paths; } For all stage inputs within the longest SCUT { Initialize sub-path_list[ ] for the stage input to trace tested paths; } For each SCUT in SCUT_list[level] { Based on the latch configurations, Remove/inactivate transitive fanins of latches in scan mode; Remove/inactivate transitive fanins of stage outputs except for those in the last stage; Determine the current primary inputs and primary output; /* Test of an SCUT */ Call TestATPG procedure for the current SCUT { For each target path { Clear line values; PreProcessRobust( ) { Robustly sensitize the target path; If (any line is removed), skip the target; If (latch is met that is in normal w/o time borrowing) Target the path starting at this latch; } ATPGprocedure( ) { Generate test for the target path; Removed/inactivated parts should be ignored; Implication( ) considering glitch-free signal at output of latches w/o time borrowing; Write test vector to a file; Accumulate PDF coverage information; A path is not tested more than once; } } } } /* end of testing all SCUTs of the current level */ Get time borrowing results at the current output latches; } 36 The proposed test generation approach is illustrated using a three-stage linear pipeline shown in Figure 9. Suppose that for level-1 latches, (LD, LE, LF), DFT is designed to support three configurations: {(n, n, n), (n, s, s), (s, s, s)}; and for level-2 latches, (LG, LH, LI), to support two configurations: {(n, n, n), (s, s, s)}. It is assumed that time borrowing occurs at LD, LE, and LI in a copy of the chip under test. The SCUTs at each level are shown in gray in Figure 9. Bold solid lines are used to represent the paths targeted in each SCUT, and dotted lines are used to represent the paths that are not targeted but used to apply values at off-path inputs. The fan-ins of latches operating in scan mode are ignored in Figure 9 since they are not considered by the ATPG. As shown in the figure, the hazard-free property described in Property 2 is exploited when non-time borrowing latches, LF, LG, and LH, operate in normal mode as off-path inputs. The test procedure is summarized as follows. First, C0 is tested (SCUT0) and time borrowing is detected at LD and LE. To target the two-segment paths via LD, the configuration (n, s, s) is selected at level-1 latches to obtain SCUT10. To target the paths via LE, (s, n, s) is desired but not available. Therefore, (n, n, n) is selected for level-1 latches to obtain SCUT11. However, Property 2 is exploited at the non-time borrowing off-path latch LF. To target the sub-paths starting from LF, (s, s, s) is selected for level-1 latches to obtain SCUT12. During testing of SCUT10, SCUT11, and/or SCUT12, time borrowing is detected at LI. For the sub-paths starting at LG and LH, the best configuration (s, s, s) is used for the level-2 latches. Lastly, to test the multi-segment paths via LI, (n, n, n) is selected for the level-2 latches. When combined with each of the three previous SCUTs, namely SCUT10, SCUT11, and SCUT12, this gives rise to SCUT21, SCUT22, and SCUT23. 37 The proposed approach is applied to several circuits for various time borrowing scenarios under a diverse set of available scan chain configurations. The circuits include a five-stage linear pipelined array multiplier, five- and ten-stage versions of a linear pipeline that uses copies of the circuit C17 from the ISCAS ’85 benchmark suite (the connections among the stages are based on the pipeline used in [9]), five- and ten-stage versions of a pipeline MIN (minimum vector selector from [9]), and five- and ten-stage versions of a pipeline that uses copies of T1 from [1]. To verify the improved coverage provided by the proposed method, the robust PDF coverage and the number of tests are compared for four different approaches under the given scan chain configurations. 38 (1) The entire CUT is tested as a single SCUT, since classical approaches cannot use DFT for delay testing of latch-based circuits with time borrowing. (2) The approach in [9] is modified to deal with cases where not all scan chain configurations are available. In particular, the approach in [9] does not consider any restriction on the scan chain configurations, and configures the latches such that (a) normal mode is used for all latches that are sites of time borrowing, and (b) scan mode is used for all latches that are not sites of time borrowing. Note that condition-(b) is not required but used to attain higher coverage. Hence, the extended version of the approach in [9] (ITC03.ext) is implemented such that it chooses a single configuration for a level of latches that satisfies condition-(a) and has the most NTBLs in scan mode. If multiple configurations satisfy these two conditions and have the same number of latches in scan mode, ITC03.ext arbitrarily selects one of them. (3) We claim that the approach we propose improves the test quality due to two new features: (F1) Improvements due to Theorems 5 and 6 (and Corollaries 1 and 2), which suggest using scan mode for as many off-path latches as possible. (F2) Improvements due to Property 2 that utilizes the hazard-free property at the outputs of latches that are not sites of time borrowing but configured in normal mode. The first version of the proposed method, Proposed.v1, implements F1 only. (4) The final version of the proposed approach, Proposed.v2, implements both new features, i.e., F1 as well as F2. The robust PDF coverage for ITC03.ext is always greater than or equal to that for the classical approach, regardless of what scan chain configurations are available. Proposed.v1 may improve test quality compared to ITC03.ext if Theorems 5 and 6 (i.e., F1) are applicable. In Proposed.v2, the test quality can be further improved compared to Proposed.v1 if Property 2 (i.e., F2) is applicable. 39 In practice, the latches at a particular level of a pipeline are connected by a scan chain to scan-out the captured responses while testing the SCUTs that terminate at the latches. In this case, we can use the same scan chain to scan vectors to test SCUTs that start at these latches. Hence, we assume in general that the all-scan configuration is available by default at a level of latches where all latches can be scanned-out. For this reason, we present in Table 1 the experimental results for cases where the all-scan configuration is assumed to be available at every level of latches. We have performed an extensive set of experiments for all above circuits (a linear pipelined multiplier and five- and ten-stage versions of linear pipelines using copies of C17 [9], MIN [9], and T1 [1]), assuming arbitrary scan chain configurations, and under various time borrowing scenarios to demonstrate the benefits of the proposed techniques. The complete results for all circuits for these diverse scan chain configurations can be found in [11]. Here we present a small subset of results that illustrate the typical trends. Table 1 shows the test generation results for a five-stage pipelined multiplier for two different scan chain configurations, namely configuration-A and configuration-B, and six different time borrowing scenarios, namely S1 to S6. The scan chain configuration-A assumes that only the all-normal and all-scan configurations are available at every latch (see the third and fourth columns of Table 1). The scan chain configuration-B assumes that the all-normal, the all-scan, and all the k single-normal configurations, i.e., the configurations {(n, s, s, ⋅⋅⋅, s), (s, n, s, ⋅⋅⋅, s), (s, s, n, ⋅⋅⋅, s), ⋅⋅⋅, (s, s, s, ⋅⋅⋅, n)}, are available at every level of k latches (see the fifth and the sixth columns of Table 1). The time borrowing scenarios are listed starting with scenarios with the fewest time borrowing sites (S1: no time borrowing) to the most time borrowing sites (S6: time borrowing at every latch). 40 Available scan chain configurations at every level all-normal, all-scan all-normal, all-scan, Time borrowing every single-normal sites Approach No.of tests Robust PDF coverage (%) No.of tests Robust PDF coverage (%) Classical 640 9.54 640 9.54 ITC03.ext 478 100 478 100 : None Proposed.v1 478 100 478 100 Proposed.v2 478 100 478 100 Classical 640 9.54 640 9.54 ITC03.ext 902 95.35 902 95.35 Proposed.v1 : L12, L14, L34, L36, L37 Proposed.v2 Classical 640 9.54 640 9.54 ITC03.ext 1459 9.54 1443 36.61 Proposed.v1 : L10, L14, L24, L28, L33, L39, L44 Proposed.v2 ° Classical 640 9.54 640 9.54 ITC03.ext 2158 18.65 2158 18.65 Proposed.v1 : L21, L24, L27, L31, L33, L36, L42, L44 Proposed.v2 ° Classical 640 9.54 640 9.54 ITC03.ext 1459 9.54 1443 36.61 Proposed.v1 : L14, L15, L21, L22, L27, L29, L30, L33, L37, L44 Proposed.v2 ° Classical 640 9.54 640 9.54 ITC03.ext 1459 9.54 1459 9.54 Proposed.v1 1459 9.54 : All latches Proposed.v2 1459 9.54 : Theorems 5 through 6 (i.e., F1) the coverage from ITC03.ext : Property 2 (i.e., F2) improves the coverage from ITC03.ext In Table 1 we also report the reasons behind test quality improvements provided by the proposed approaches Proposed.v1 and Proposed.v2, compared to the test quality of ITC03.ext. Note that the number of tests may not directly quantify test application times, because the number of stages constituting each SCUT may vary depending on the configurations used and the time borrowing scenario. In all scenarios, except the scenario where time borrowing occurs at none of the latches, the proposed approach improves the robust PDF coverage significantly while sometimes applying much fewer tests. For example, in scan chain configuration-A with time borrowing scenario S3, ITC03.ext requires more than twice the number of tests needed for the classical approach, while obtaining the same low robust PDF coverage (9.54%). In contrast, the proposed approaches Proposed.v1 and 41 Proposed.v2, can achieve much higher coverages, namely 82.40% and 86.15%, respectively, while using much fewer tests due to Theorems 5 and 6 (i.e., F1 in Section 3.10.1) and Property 2 (i.e., F2 in Section 3.10.1). Even in an extreme case of time borrowing, namely scenario S6, where time borrowing occurs at every latch, the proposed approach, under scan chain configuration-B, can achieve 100% robust PDF coverage due to Theorems 5 and 6 (i.e., F1) by using only k+2 scan chain configurations at a level with k latches, while ITC03.ext is unable to improve the coverage of 9.54%. Similar improvements are observed in the pipelines using copies of MIN, C17, and T1 as demonstrated in [11]. Also as expected, it is confirmed that Property 2 improves the robust PDF coverage in almost all scan chain configurations where the all-scan configuration is not available. With regard to the effect of pipeline length (the number of stages) on the test quality, our results for the five- and ten-stage pipelines using MIN, C17, and T1 in [11] illustrate that as the number of stages increases, the coverage decreases for the classical approach and ITC03.ext, while the coverage for the proposed approach does not decrease significantly. This shows that the proposed approach is more efficient for delay testing of pipelines with more stages when compared to the classical and ITC03.ext approaches. The experiments demonstrate that our test approach, when applied under restricted scan chain configurations, does not sacrifice much of the benefits of the fully-adaptive approach while dramatically decreasing DFT overheads by using much fewer scan chain configurations. In the next chapter, we propose an approach to minimize test application time. 42 CHAPTER 4 Test application cost minimization under maximum coverage In Chapter 3, we focused on the optimization of robust PDF coverage and limited ourselves to approaches where test application starts from the first stage of a pipeline and gradually expands/moves to subsequent stages. The following example demonstrates the potential benefits of alternative test schedules that lead to reduction of test application cost. Note that all the assumptions made in Chapter 3 also apply in this chapter. Consider a simple two-stage pipeline as shown in Figure 10. Let us assume that scan chain configurations (n, n), (n, s), (s, n), and (s, s) are available for the level-1 latches (L1, L2). Hence, four SCUTs may be constructed for testing: SCUT0 (C0 by itself), SCUT10 (C1 by itself; using configuration (s, s) for level-1 latches), SCUT11 (C0 + C1; using configuration (n, s) for level-1 latches), SCUT12 (C0 + C1; using configuration (s, n) for level-1 latches). According to Chapter 3, the maximum robust PDF coverage can be obtained when SCUTs are chosen for testing based on the time borrowing status at the level-1 latches. For example, according to Corollary 2, SCUT0 and SCUT10 must be tested if time borrowing does not occur at L1 and L2. In order to compare different test schedules and their test application costs, each SCUT is viewed as a collection of multiple sets-of-paths under test (SPUTs). An SPUT is a group of all paths that start at a particular input latch, pass through a particular set of latches (if any), and terminate at a particular output latch. The paths in an SPUT are viewed as parts of an SCUT, which determines the latches used and their scan chain configurations. In other words, an SPUT specifies a group of paths as well as the scan chain configurations in use. In our notation, additional subscripts are used to distinguish various SPUTs that constitute an SCUT. To help explanation in this section, the hyphenated numbers in subscript represent the indices of latches that are included in the SPUT, and 43 the number in parentheses indicates the index of the SCUT that contains the SPUT. For example, SPUT0-2-3(12) is a group of all paths that start at L0, pass via L2, and terminate at L3, where scan chain configurations are as specified in SCUT12. In Figure 10, SCUT0 includes two SPUTs: an SPUT containing all paths that start at L0 and terminate at L1 (SPUT0-1(0)) and an SPUT containing all paths that start at L0 and terminate at L2 (SPUT0-2(0)). SCUT10 includes two SPUTs: SPUT1-3(10) and SPUT2-3(10). SCUT11 contains three SPUTs: SPUT0-1-3(11), SPUT1-3(11), and SPUT2-3(11). SCUT12 contains three SPUTs: SPUT0-2-3(12), SPUT1-3(12), and SPUT2-3(12). All SPUTs of the circuit in Figure 10 are listed in the second column of Table 2. Note that SCUT11 does not include the group of all paths that start at L0 and terminate at L2 (SPUT0-2(11)). SCUT11 must reconfigure L2 in r-capture scan-out mode in order to capture responses for the testing of SPUT0-2(11), where L2 is configured in scan mode in SCUT11 to apply values for the testing of SPUT0-1-3(11), SPUT1-3(11), and SPUT2-3(11). This modification of scan chain configuration is nothing but converting SCUT11 to SCUT0 as far as the paths between L0 and L2 are concerned. Hence, we need not test SPUT0-2(11) when SPUT0-2(0) is used. For the same reason, SCUT12 does not include SPUT0-1(12). Note that the paths in SPUT1-3(11) and SPUT2-3(12) are tested using Property 1 when time borrowing does not occur at L1 and L2, respectively. Also, note that the paths that start at L1 and terminate at L3 can be targeted as parts of more than one SCUT (i.e., as SPUT1-3(10), SPUT1-3(11), and SPUT1-3(12)), and the paths that start at L2 and terminate at L3 can be targeted as parts of more than one SCUT (i.e., as SPUT2-3(10), SPUT2-3(11), and SPUT2-3(12)). Let us assume that the SPUTs can be tested in any order, independent of the SCUTs to which they belong. Note that the test schedule of the test generation approach described in Chapter 3 is a special case where all SPUTs associated with each SCUT are tested one after another. The SPUT formulation is hence test scheduling at a finer level of granularity. We will further justify the use of SPUTs in Section 4.3.1. The primary focus of this motivation example is to introduce the fundamental ideas of the overall optimization problem and our proposed approach. 44 For this example of two stage pipeline, assume that the maximum robust PDF coverage is 95%. The number of tests for each SPUT is as shown in the fourth column of Table 2. In order to evaluate the average test application cost for different test schedules, the test application cost is assumed to be proportional to the number of tests applied. (This implicitly ignores costs that may be associated with reconfiguring SCUTs and assumes that every test configuration requires equal number of scan clocks. However, the same ideas can be easily extended to more realistic definition of test cost, which will be discussed in Section 6.1.1.) Three test schedules are considered in Sections 4.1.1 to 4.1.3, respectively, to prove that the test application cost may be improved further without compromising the key benefit of the proposed approach presented in Chapters 2 and 3, namely high path delay fault coverage. Test schedule 1 in Section 4.1.1 is used to show the limitations of a non-adaptive approach. Test schedule 2 in Section 4.1.2 is based on the test schedule implied in Chapter 3. Test schedule 3 in Section 4.1.3 is used to show the application cost can be reduced from that of Test schedule 2. Although the paths that start at L1 and terminate at L3 are included in SPUT1-3(10), SPUT1-3(11), and SPUT1-3(12), the coverage of the paths between L1 and L3 using SPUT1-3(10) is greater than or equal to the coverages using SPUT1-3(11) and SPUT1-3(12) as per Theorem 6 and Corollary 1. Likewise, the coverage of the paths between L2 and L3 for SPUT2-3(10) is greater than or equal to the coverages for SPUT2-3(11) and SPUT2-3(12). In reality, there may exist cases where applying tests to paths in SPUT2- 3(11) as a part of SCUT11 is better than applying tests to the same paths in the form SPUT2-3(10), i.e., as a part of SCUT10, provided that the scan chain configurations of the circuit include the one required by SCUT11 and the cost associated with reconfiguring the circuit as SCUT10 is high, and provided that the coverages from SPUT2-3(11) and SPUT2-3(10) are identical. However, since the test application cost 45 function is simplified in this chapter such that it is solely determined by the number of tests applied, it is assumed that test schedules prefer SPUT1-3(10) to both SPUT1-3(11) and SPUT1-3(12), and prefer SPUT2- 3(10) to both SPUT2-3(11) and SPUT2-3(12). Test schedules are applied to a set of chip instances characterized by Table 2, where the personality of each chip instance (chip personality) is characterized by the results by r-r tests for the SPUTs. (For simplicity, r-f tests are not included in this example.) It is assumed for simplicity that every chip instance has one of the nine chip personalities, P-1 to P-9, with the different characteristics as shown in Table 2, and the percentages of chip instances that have each of these chip personalities are shown in the third row of Table 2. Although there are ten SPUTs, each of which either fails or passes r-r tests, not all 210 combinations of the results for r-r tests are possible due to dependencies among SPUTs. For example, if both SPUT0-1(0) and SPUT1-3(10) pass r-r tests, SPUT0-1-3(11) is guaranteed to pass r-r tests. If SPUT0-1(0) passes r-r tests and SPUT0-1-3(11) fails r-r tests, then SPUT1- 3(10) is guaranteed to fail r-r tests. Chip personalities: Characteristics of chip instances based on the r-r test results and their distribution SCUT SPUTA(B)** Latches P-1 P-2 P-3 P-4 P-5 P-6 P-7 P-8 P-9 No. of tests 45%* 10%* 18%* 11%* 9%* 3%* 2%* 1%* 1%* SCUT0 SPUT0-1(0) L0-L1 4 Pass Pass Pass Pass Pass (C0) SPUT0-2(0) L0-L2 6 Pass Pass Pass Pass Pass SCUT10 SPUT1-3(10) L1-L3 6 Pass Pass Pass Pass Pass Pass Pass Pass (C1);(s,s) SPUT2-3(10) L2-L3 8 Pass Pass Pass Pass Pass Pass Pass Pass SPUT0-1-3(11) L0-L1-L3 12 Pass Pass Pass Pass Pass Pass SPUT1-3(11) L1-L3 6 Pass Pass Pass Pass Pass Pass Pass Pass SCUT11 (C0+C1) ; (n,s) SPUT2-3(11) L2-L3 8 Pass Pass Pass Pass Pass Pass Pass Pass SPUT0-2-3(12) L0-L2-L3 24 Pass Pass Pass Pass Pass Pass SPUT1-3(12) L1-L3 6 Pass Pass Pass Pass Pass Pass Pass Pass SCUT12 (C0+C1) ; (s,n) SPUT2-3(12) L2-L3 8 Pass Pass Pass Pass Pass Pass Pass Pass Existence of a fault fault-free faulty faulty fault-free faulty fault-free faulty fault-free faulty Maximum coverage 95% · · 95% · 95% · 95% · Time borrowing site None L1 only L2 only L1 and L2 *: The percentage of chip instances with the particular chip personality. **: A lists the indices of latches included in SPUTA(B), B is the index of SCUTB that covers SPUTA(B). 46 In this section, we assume that chip personality distribution is available and use it to compare the efficiencies of different test schedules. More details on how to obtain the chip personality distribution are presented in Section 6.1.2 and Appendix. According to Table 2, the chip instances that belong to P-1, P-4, P-6, and P-8 are fault-free. The chip instances that belong to P-1, P-2, and P-3 have no time borrowing at the level-1 latches, the chip instances that belong to P-4 and P-5 have time borrowing only at L1, the chip instances that belong to P-6 and P-7 have time borrowing only at L2, and the chip instances that belong to P-8 and P-9 have time borrowing at both L1 and L2. Suppose that a test engineer designed Test schedule 1 with the expectation that time borrowing would occur at L1 only and not at L2, where tests are applied in the non-adaptive order specified in Figure 11. The test procedure and results are summarized as follows. Note that Nk refers to the number of tests for step k and Rk refers to the percentage of chips that are tested in step k (e.g., N2 and R2 for T2). First, the tests for SPUT0-2(0) are applied to all chip instances (R1 = 100%, N1 = 6), where the chips instances in P-6, P-7, P-8, and P-9 fail these r-r tests, i.e., time borrowing is detected at L2 for P-6, P-7, P-8, and P-9. After that, the tests for SPUT2-3(10) are applied to all chip instances (R2 = 100%, N2 = 8), where the chips in P-3 (18%) fail r-r tests, i.e., the chip instances in P-3 (18%) are identified as faulty chips and discarded. Then, the tests for SPUT0-1-3(11) are applied to the remaining 82% of the chip instances (R3 = 82%, N3 = 12), where the chips in P-2, P-5, and P-9 (20%) are identified as faulty chips and discarded. 47 The overall average test application cost per chip is 23.84 (= Σ = 3 k 1NkRk ). This test schedule reports the robust PDF coverage of 95% for the chips in P-1, P-4, P-6, P-7, and P-8. However, the reported coverage is correct only for chip instances of types P-1 and P-4. Since Test schedule 1 is unable to adaptively change the subsequent SPUTs, it fails to test the multi-segment paths via L2 for P-6 and P-8 although time borrowing is detected by SPUT0-2(0), and hence the robust delay coverage reported for P-6 and P-8 is invalid. Moreover, Test schedule 1 fails to identify the faulty chips of type P-7, i.e., results in test escape. In summary, the above results indicate that it is necessary to design a test schedule such that it does not allow any test escape and it achieves the maximum robust PDF coverage for each chip instance. Also, the test scheduling must be capable of adjusting the subsequent SPUTs adaptively such that all TBLs are identified and all multi-segment paths via TBLs are tested. This can be done by testing SPUT0-2-3(12) for the chips that fail SPUT0-2(0) (P-6, P-7, P-8, and P-9). Test schedule 2 follows what we propose in Chapter 3, where tests are performed from the first stage of a pipeline and extended to include subsequent stages by constructing SCUTs adaptively based on time borrowing sites identified during each SCUT testing. Hence, all SPUTs within SCUT0 are targeted first. Based on the level-1 latches identified as sites of time borrowing, multi-segment paths and/or single-segment paths in C1 are targeted adaptively. This schedule is summarized in Figure 12. From T1 – T2, time borrowing sites are all identified. Accordingly, P-1, P-2, and P-3 continue with T3, P-4 and P-5 with T3’, P-6 and P-7 with T3”, and P-8 and P-9 with T3”’. At T3, the chip instances in P-2 (10%) are identified as faulty and discarded. At T4, the chip instances in P-3 (18%) are identified as faulty and discarded. At T4’, the chip instances in P-5 (9%) are identified as faulty and discarded. At T4”, the chip instances in P-7 (2%) are identified as faulty and discarded. At T3”’, 48 the chip instances in P-9 (1%) are identified as faulty and discarded. The average value of total test application cost is computed as follows: Overall average test application cost = 25.4. 2% 1% 5% 20% 73% 63% 100% 3"' 4"' 4 3 " 4 3 ' 3 4 2 1 = + × + × + × + × × + × × + Σ Σ Σ = = = N N N N N N N k k k k k k As described in Chapter 3, all faulty chip instances are identified, and all fault-free chip instances are tested robustly with the corresponding maximum coverage since all SCUTs that are required to achieve the maximum coverage defined by Chapter 3 are tested by Test schedule 2. Another interesting test schedule is designed to demonstrate that Test schedule 2, which is based on Chapter 3, can be improved in terms of test application cost without compromising the robust PDF coverage. Test schedule 3 is shown in Figure 13. Whenever an SPUT that terminates at a level-1 latch is tested, the time borrowing status at the output latch is checked and multiple alternatives for subsequent testing are explored as shown in Figure 13. In this test schedule, all fault-free chip instances are tested robustly with the maximum 49 coverage since all SCUTs that are required to achieve the maximum coverage defined by Chapter 3 are tested in Test schedule 3. The overall average test application cost is 22.68 using similar calculations as in Section 4.1.2. This shows 10.7% improvement in test application cost compared to the cost for Test schedule 2, while identical robust PDF coverage is obtained for each chip personality. Thus, this example illustrates that we can further improve the test application cost via optimal test scheduling. Thus this example illustrates that overall test application cost depends not only on the test application costs of SPUTs used in test schedule but also on the probabilities of time borrowing at latches. Hence, complexity of test scheduling problem grows with the number of logic blocks and the number of latches. In order to accomplish the overall optimization of DFT design as testing for latch-based high-speed circuits with time borrowing, the minimization of test application cost must be achieved under the constraint that the test scheduling method guarantees the maximum delay fault coverage that can 50 be obtained by the proposed delay testing method in Chapter 3. Next we present a systematic approach for the overall optimization problem under this constraint. The test application cost minimization problem is similar to the classical test scheduling (or test scoring) problem [20][24][31], to the extent that the order in which test vectors are applied is important to reduce the expected value of test application cost. However, as we explain in this section, our test application cost problem has some unique characteristics that make it more challenging than the classical test scheduling problem. In particular, passing/failing test results have different implications and benefits that depend on the time borrowing status at the input latch, the type of output latch (whether it is a primary output), and dependencies among tests. Furthermore, we must apply tests adaptively according to time borrowing status identified during testing. In the test approach proposed in Chapter 3 and the motivation example in Section 4.1, for simplicity of analysis it is assumed that a chip under test has either skipped r-f tests (since extreme time borrowing was not expected) or passed all r-f tests applied (since extreme time borrowing does not exist). However, in some chips, r-f tests may be used to identify faulty chip instances before applying many r-r tests, which can reduce the test application cost. On the other hand, applying all r-f tests for every SPUT may be impractically costly. Our preliminary ideas on how to incorporate r-f tests as well as r-r tests into the overall optimization problem are considered in Section 4.2.1, where we identify the meanings of test failure. The benefits of r-f tests as well as r-r tests are discussed in Section 4.2.2. However, for simplicity, the rest of the problem formulation is carried out without considering r-f tests. See Section 6.1.3 for more details regarding the extension that considers r-f tests as well as r-r tests. In conventional delay testing, a failing test simply identifies a faulty chip at that particular clock frequency, which can be discarded immediately without any further testing. In contrast, in our 51 framework, failing a test does not necessarily mean that the chip under test is faulty. In addition, some tests are dependent on other tests. For instance, even when r-r tests for a target path pass, the latch at the end of the path (output latch) may possibly be a site of time borrowing in case the latch at the beginning of the path (input latch) is a TBL. In other words, passing r-r tests can definitively identify the time borrowing status at the output latch only if the input latch is identified as a NTBL. Table 3 summarizes the meanings of all possible results of r-r and r-f tests for a target SPUT that starts at Lin (input latch) and ends at Lout (output latch), under different conditions on Lin and Lout. Note that Lin is considered as a NTBL if it is a primary input. Results Case r-r tests r-f tests Is Lout a primary output? Time borrowing at Lin? Time borrowing at Lout Meaning 1 (Fail) Fail No Yes/No 2 Fail n/a* Yes Yes/No Yes Faulty chip instance 3 Fail Pass No Yes/No Yes Multi-segment paths via Lout must be tested 4 Pass (Pass) No No No for the target SPUT Target SPUT does not borrow time at Lout 5 Pass n/a* Yes No No for the target SPUT Target SPUT is fault-free 6 Pass (Pass) No Yes 7 Pass n/a* Yes Yes Unknown Cannot determine time borrowing status at Lout since Lin is a time borrowing latch ( ): test result automatically known from the other test result. *: r-f tests are not applicable (n/a) since Lout is a primary output. If Lout for an SPUT is a primary output for the latch-based part (Lout may be connected to a flip-flop- based blocks), r-f tests are not necessary and hence only r-r tests are applicable (Cases 2, 5, and 7). If r-r tests for an SPUT pass, the r-f tests for the SPUT are known to pass. On the other hand, if r-f tests for an SPUT fail, r-r tests for the SPUT are known to fail. This dependency between r-r tests and r-f tests for any SPUT is denoted in Table 3 using parentheses. Table 3 includes all possible cases of two test results as well as the conditions of Lin and Lout, except for the cases where r-f tests for an SPUT fail when r-r tests for the SPUT pass. This case is impossible because r-r tests for an SPUT always fail if the corresponding r-f tests fail. 52 If r-f tests for an SPUT fail, the chip instance is identified as faulty regardless of time borrowing status at Lin (Case 1). When Lout is a primary output, failing r-r tests for the SPUT indicates that the chip instance is faulty, regardless of the time borrowing status at Lin (Case 2). In Case 3 where r-r tests for an SPUT fail and r-f tests pass, Lout, which is not a primary output, is identified as a site of time borrowing regardless of the time borrowing status at Lin, and hence the multi-segment paths via Lout must be tested. In Cases 4 and 5, passing r-r tests imply that the target SPUT does not borrow time at Lout (Case 4) and the SPUT is fault-free (Case 5), respectively, because Lin is known as being a NTBL. In contrast, in Cases 6 and 7, passing r-r tests cannot determine the time borrowing status at Lout, since Lin is a TBL. Hence, we see that passing r-r tests can determine the time borrowing status at Lout only if Lin is an NTBL. It should be noted that a test result for an SPUT may have dependencies with test results for other SPUTs. For instance, in Cases 4 and 5, the test result for the current SPUT can be analyzed only if the time borrowing status at the input latch Lin is known as being an NTBL. In addition, in Case 3, since Lout turns out to be a TBL, it is necessary to test multi-segment SPUTs that pass via Lout. Hence, another important characteristic of the optimization problem is that it often requires adaptation in the sense that the selection of subsequent target paths (i.e., SPUTs) is affected by the latches identified as sites of time borrowing by the previous tests. As we have seen in Section 4.1.1, non-adaptive operation may result in test escape and/or over- or under-estimation of robust PDF coverage. More details of the characteristics and formulation of the overall optimization problem will be presented in Sections 4.3 to 4.5. When a test is applied to an SPUT, say SPUTk, we have an accumulated record of the test results from the SPUTs that have been tested prior to SPUTk. Depending on this record, the time borrowing status at the input latch, Lin, and the output latch, Lout, of SPUTk may or may not be known based on the meanings of test results summarized in Table 3. In other words, the time borrowing status at Lin 53 prior to testing SPUTk can fall into one of the following three cases: (a) unknown (time borrowing status is unknown), (b) non-time borrowing, or (c) time borrowing. Similarly, the time borrowing status at Lout prior to testing SPUTk is one of the above three cases, i.e., (a), (b), or (c), if Lout is not a primary output, or one of the two cases, namely (a) and (b), if Lout is a primary output. (Note that when Lout is a primary output the chip under test will be discarded if Lout is identified as a TBL, i.e., if any r-r test fails at Lout.) Application of a test to an SPUT provides different benefits depending on the accumulated record of the results of prior tests. First, suppose r-r tests are applied to SPUTk. If the time borrowing status at Lout is already known as being either time borrowing or non-time borrowing prior to testing SPUTk, applying r-r tests to SPUTk does not provide any benefit in terms of fault coverage, knowledge of time borrowing status, or information used by subsequent tests of other SPUTs, regardless of the time borrowing status at Lin. Also, when the time borrowing status at Lout is unknown and Lin is known as being time borrowing, passing of r-r tests for SPUTk at Lout provides no benefit (Cases 6 and 7 of Table 3). On the other hand, in a case where SPUTk fails r-r tests under the condition that the time borrowing status at Lout is unknown, the benefit when Lout is not a primary output is that Lout is identified as a TBL (Case 3 of Table 3), and the benefit when Lout is a primary output is that the chip instance is identified as having a delay fault and is discarded (Case 2 of Table 3). In case where SPUTk passes r-r tests under the condition that the time borrowing status at Lout is unknown and the time borrowing status at Lin is known as being non-time borrowing (Cases 4 and 5 of Table 3), the benefit of r-r tests of SPUTk is that this result may enhance the coverage if SPUTk and other SPUTs terminating at Lout collectively and eventually identify Lout as being a NTBL. However, if some other SPUT that terminates at Lout fails, this test result of SPUTk will not be used to enhance the coverage, since Lout is eventually identified as being a TBL. Second, suppose r-f tests are applied to SPUTk, whose output latch Lout is not a primary output. There is no benefit of such r-f tests if Lout is known as being a NTBL since these tests will always pass. Only when SPUTk fails r-f tests and the time borrowing status at Lout is either unknown or time- 54 borrowing, r-f tests provide the benefit that the chip instance is identified as being faulty and discarded, regardless of the time borrowing status at Lin (Case 1 of Table 3). The benefits of r-r tests and r-f tests are summarized in Table 4 and Table 5, respectively. Time borrowing status known from Case previous test results Lin Lout Results for r-r tests applied Benefit 1 Time borrowing/ non-time borrowing/ status unknown Time borrowing/ non-time borrowing Pass/fail None 2 Time borrowing Unknown Pass None 3 Non-time borrowing Unknown Pass May enhance coverage if this result and other results of SPUTs terminating at Lout collectively identify Lout as being a non-time borrowing latch. 4 Time borrowing/ non-time borrowing/ status unknown Unknown Fail 1. Time borrowing detected (if Lout is not a primary output) 2. Fault detected (if Lout is a primary output) Time borrowing status known from Case previous test results Lin Lout Results for r-f tests applied Benefit 1 Time borrowing/ non-time borrowing/ status unknown Non-time borrowing (Always pass) None 2 Pass None 3 Time borrowing/ non-time borrowing/ status unknown Time borrowing/ status unknown Fail Fault detected In summary, our problem is significantly more complex than the conventional test scheduling problem, since passing or failing r-r tests for an SPUT have different meanings depending on the time borrowing status at the input latch, the type of output latch (whether it is a primary output of a latch-based part of a circuit or not), and results of r-r tests for other SPUTs (i.e., dependencies among SPUTs). Scheduling r-f tests is somewhat similar to the conventional test scheduling problem in the sense that a faulty chip is identified and discarded if r-f tests for an SPUT fail at any stage of testing. 55 However, passing r-f tests for an SPUT do not provide any coverage for the SPUT, which makes even r-f tests different from conventional testing. In Section 4.1, the notion of SPUT (set-of-paths under test) is first introduced to replace SCUT. Recall that an SPUT is the group of all paths that start at a particular input latch Lin, pass via a particular sequence of latches (if any), and terminate at a particular output latch Lout. Hence, each SCUT is viewed as a collection of multiple SPUTs. We call this finer-grained approach an SPUT-based approach. Test scheduling at such a finer granularity (i.e., SPUT) can reduce the overall test application cost. For example, suppose there are eight SPUTs terminating at a latch L. If a majority of the chips under test borrow time at L and this can be detected by testing one particular SPUT, then the test application cost may be reduced by testing this particular SPUT first. Test scheduling may be performed under even finer granularity (i.e., path) than SPUT. Such a path-based approach must be supported by diagnostic methods to identify the paths(s) that causes the observed test failure. Accordingly, Observations 1, 2, and 3 regarding identification of TBLs and NTBLs in Section 3.2.3 must be modified as follows in the context of paths in order to implement a path-based approach. In a path-based approach, a latch L is identified as a TBL for the path(s) that causes failure of tests at L (Modified Observation 2 for a path-based approach). If a latch L is identified as a TBL for a path p, all multi-segment paths that pass via L and cover p must be targeted (Modified Observation 3 for a path-based approach). L is identified as a NTBL for all other paths that do not cause failure of tests at L (Modified Observation 1 for a path-based approach). In other words, L can be regarded either as a NTBL or as a TBL depending on the path under consideration. This is likely to reduce the number of target multi-segment paths. However, it also implies that all paths in the fan-in of L must be tested even after a test fails for path p in order to 56 identify other paths in the fan-in of L that pass. This is likely to increase the number of tests especially when most paths in the fan-in of L fail the tests. Recall that in an SPUT-based (or SCUT-based) approach, a latch is regarded as a TBL as long as one path fails at the latch, and no more test is needed for the paths that terminate at the latch. In consequence, a path-based approach does not necessarily guarantee reduction of test application cost. Implementing a path-based approach entails so much complication in the test generation and the test procedure. For every test, a path-based approach must be supported by diagnostic methods that identify the paths that cause the observed test failure. Such diagnosis would add impractically high run-time |
| Filename | etd-Chung-2307 |
| Archival file | uscthesesreloadpub_Volume65/etd-Chung-2307.pdf |
|
|
Contact us if you have any questions or feedback
| A |
| B |
| C |
| D |
| E |
| F |
| G |
| H |
| I |
| J |
| K |
| L |
| M |
| O |
| P |
| R |
| S |
| U |
| W |
|
|