Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ADVANCED SERIAL SCAN DESIGN FOR TESTABILITY by Rajesh Gupta A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Engineering) May 1991 Copyright 1991 Rajesh Gupta UMI Number: D P22816 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, th ese will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation Publishing UMI DP22816 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6 UNIVERSITY OF SOUTHERN CALIFORNIA T H E G R A D U A T E SC H O O L U N IV E R SIT Y PARK LO S A N G E L E S, C A L IF O R N IA 90089-4015 This dissertation, written by under the direction of h.&s Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re quirements for the degree of f h .D . CpS R ajesh.G .y£.1; a. D O C TO R OF PHILOSOPHY Dean of Graduate Studies D a te ....... DISSERTATION COMMITTEE h i d . ................... Chairperson Dedication This dissertation being the last step in my formal education, I would like to dedicate it to my parents. Acknowledgements I am grateful to Prof. Melvin Breuer for the inspiration and guidance I re ceived from him during my dissertation work. In addition I would like to thank him for providing access to excellent hardware and software resources at all times. ■ 1 During my years at USC I benefited greatly from interacting with many colleagues I I . . . . . ! and friends. In particular I would like to mention Dr. Rajiv Gupta, Kuen-Jong Lee, Jung-Cheun Lien, Rajagopalan Srinivasan, Amit M ajumdar, and Dr. Charles Njinda. Sridhar Narayana helped me obtain some of the experimental results. I also wish to thank Prof. Alice Parker who gave me some useful feedback and Prof. i Ming-Deh Huang who served on my dissertation committee. I would like to acknowledge the financial support provided by the Defense Advanced Research Projects Agency through Contract No. N00014-87-K-0861 (mon- I itored by the Office of Naval Research) and by the Semiconductor Research Corpo ration through Contract No. 88-DP-075. j Contents i D edication (Acknowledgem ents A bstract 1 Introduction 1.1 Design for T estability.......................................... i 1.2 Serial Scan Design .............................................. i | 1.2.1 Circuit M o d ifica tio n ............................... i , 1.2.2 Test Generation and Application . . . i 1 1.3 Scan Design Costs .................................... j 1.4 Reducing Scan Design C o s ts ............................. 1 2 Background i j 2.1 Partitioning A p p ro ach es..................................... I I 2.2 Partial Scan A pproaches..................................... I 2.3 Multiple Scan Chain Approaches .................... » l i 3 P artial Scan D esign w ith Balanced Structures ! 3.1 In tro d u ctio n ............................................................ 3.2 Basic Circuit M odel.............................................. I 3.3 B-Structures and their P ro p e rtie s.................... 3.4 Scan Design Using B -S tru ctu res................................. 24 3.5 Proof of Correctness .................................................... . . . . 28 3.5.1 Single-Pattern T e s ta b ility ................................. . . . . 28 3.5.2 Generating Single-Pattern T e s ts .................... . . . . 36 3.5.3 Observations on Testing B -S tructures............. . . . . 37 3.6 Algorithm for Scan Register Selection.......................... . . . . 39 3.6.1 Removal of Feedback A r c s ................................. . . . 40 3.6.2 Balancing Acyclic Sequential Structures . . . . . 42 3.6.2.1 Verification P ro c e d u re ....................... . . . 42 3.6.2.2 Balancing P ro c ed u re.......................... . . . 43 j 3.7 Implementation of Scan Path ....................................... 46 3.7.1 E x am p le................................................................. - . . . 47 3.7.2 Construction of Scan P a t h ................................. . . . 47 3.7.3 Test Application ................................................. . . . 51 3.7.4 Circuit M odifications........................................... . . . 52 3.8 Testing Register Functional Modes .............................. . . . 52 3.9 Case Study ........................................................................ . . . 55 3.10 S u m m a r y ........................................................................... 56 P a rtia l Scan D esign w ith U nbalanced S tru c tu re s 58 i l 4.1 In tro d u ctio n ........................................................................ . . . 58 4.2 Optimal Test Scheduling................................................. 63 4.2.1 The Compaction P rin c ip le ................................. . . . 63 4.2.2 Modeling Schedule Constraints ....................... . . . 64 ! 4.2.3 Picking a S c h e d u le .............................................. . . . 68 4.3 Test Generation M o d e l.................................................... . . . 71 4.3.1 Condensing the Test Generation Model . . . . . . . 74 4.3.2 Test Pattern G e n e ra tio n .................................... . . . 77 , 4.4 Summary ......................................................................................................... 79 5 P a rtia l Scan D esign of C ircu its C o n tain in g S w itches 80 5.1 S w itch es............................................................................................................ 80 5.2 Circuit M o d e l.................................................................................................. 82 5.2.1 Atomic Combinational Logic U n i t s .............................................. 82 5.2.2 Generalized Topology G ra p h ........................................................... 85 1 5.3 Switched Balanced S tru c tu re s .................................................................... 85 ! 5.3.1 The Class of SB-Structures................................................................ 86 i 5.3.2 Testability Properties of S B -S tructures........................................ 88 5.4 Algorithm for Scan Register Selection......................................................... 91 5.4.1 Removal of Feedback Registers ..................................................... 93 5.4.2 Balancing Acyclic Sequential Structures ..................................... 95 I # l I 5.4.2.1 Verification P ro c e d u re 95 ! i 5.4.2.2 Balancing P ro c ed u re 97 i 1 i | 5.5 Partial Scan Testing Using I-Paths 101 I ' 5.5.1 I-P a th s 102 i I i 5.5.2 Kernels with I - P a th s ............................................................................103 J 5.5.3 Unsatisfiable K e rn e ls..................................................................... . 106 : I ! ; 5.6 Finding a Satisfiable K e r n e l 109 , 5.6.1 Expansion P ro ced u re 110 : 5.6.2 Dealing with No-Match C o n flic ts .....................................................I l l I t i 5.6.3 Dealing with Data C o n flic ts 113 j 5.6.4 Dealing with Control Conflicts 114 ■ I 5.6.5 Satisfiability Procedure........................................................................ 115 j 5.6.6 E x am p le 118 ! 5.7 S u m m a r y 122 I i ; S P a rtitio n e d P a rtia l Scan T esting 123 • 6.1 Introduction . ...................................................................................................123 6.2 Output-Based P artitio n in g ...............................................................................125 t 6.3 Switch-Based P a rtitio n in g 129 ■ vi i 6.4 Size-Based Partitioning ............................................................................ 134 6.5 Global Partitioning S tra te g y .............................................................................137 ! 6.6 S u m m a r y ................. 138 f 7 Test Scheduling 140 7.1 In tro d u ctio n ............................................................... 140 7.2 The Test Scheduling P r o b le m ......................................................................... 141 7.2.1 Test Application with Multiple Scan C h a in s ................................. 141 7.2.2 Kernel Relationships 144 j 7.2.2.1 Incompatible K ernels 145 I 7.2.2.2 Dependent K e rn els 145 ; I 7.2.3 Modeling Test R elationships...............................................................146 7.2.4 The No-Dependence Scheduling P ro b le m .................... 150 7.3 General Test Scheduling A lg o rith m ............................................................... 151 7.3.1 Term inology........................................................................ 152 7.3.2 Incremental Scheduling.........................................................................153 ^ 7.3.3 Optimal Scheduling............................................................................... 159 j 7.3.4 D iscussion............................................................................................... 163 i 7.4 Summary .............................................................................................................164 i I ; 8 Scan P ath Chaining 165 : j 8.1 The Chaining P ro b le m ......................................................................................165 8.2 Test Application in Fully Compatible D e s ig n s 167 ■ ! 8.2.1 Combined T e s t 167 > | 8.2.2 Separate T e s t .........................................................................................169 8.2.3 Overlapped Test .................................................................................. 170 8.2.4 C om parison 171 ; 8.3 Single Chain in Fully Compatible D e sig n 173 i 8.3.1 Flip-Flop Position R a n g e s 174 i 8.3.2 Single Chain Algorithm ..................................................................... 181 8.3.3 Non-Ideal Solutions............................................................................... 183 8.3.4 A Simplifying T ransform ation............................................................186 8.3.5 Case S t u d y ....................... 186 8.4 Multiple Chains for Two Compatible K ernels.................................. 189 8.4.1 Modeling the P ro b le m .........................................................................190 8.4.2 Problem C h a rac te ristic s..................................................................... 192 8.4.3 Constructing Optimal C h a in s ............................................................198 8.4.3.1 Nonlinear Problem Form ulation....................................... 198 8.4.3.2 Linearizing the P ro b le m .................................................... 200 i 8.4.4 D iscussion 201 I i 8.4.5 Experimental R e su lts 202 j j 8.5 The Rest of the Iceberg 205 1 i i 9 C onclusion 207 9.1 Partial Scan D esign............................................................................................ 210 1 9.2 P a rtitio n in g ..................................................................... 213 9.3 Test Scheduling...................................................................................................215 9.4 Scan Path C haining............................................................................................ 216 i R eference List 218 i i L ist O f T ables i 3.1 Comparison of full scan and BALLAST partial scan..................................... 55 4.1 Relationships among inputs and outputs...................................................... 66 8.1 Results of multiple scan path chaining for circuit with h = 20, l2 — 80, ni = 15, and (n 2 + n \2) = 10.............................................................................203 8.2 Average saving in test time for different circuit examples over various numbers of scan chains.......................................................................................204 L ist O f F igures i i ! 1.1 General synchronous circuit, (a) Huffman model, (b) circuit with scan p ath...................................................................................................................... 1.2 Storage element designs for various scan techniques.................................. 1.3 Example of use of switching elements for partial scan............................... i ' 3.1 Illustration of BALLAST methodology, (a) Synchronous circuit, (b) i partial scan design of (a).................................................................................. I 3.2 Example of topology graph............................................................................... 3.3 (a) Kernel of Figure 3.1(b); (b) combinational equivalent of (a). . . . j 3.4 Additional illustrations of the methodology, (a) Example with HOLD mode; (b) kernel of (a); (c) example with unbalanced paths; (d) kernel : of (c) .................................................................................................................................................................................................................................................................................................................................. 3.5 (a) Test for e stuck-at-1, (b) transformed test with no HOLD operations. ! 3.6 Illustration of balance procedure, (a) Original topology graph, (b) I balanced topology graph. (Bold arcs represent HOLD registers.) . . . ! 3.7 Various scan design solutions for the same circuit...................................... 3.8 Example of scan path implementation.......................................................... 3.9 Organization of scan path into groups of scan path registers.................. 3.10 Modeling FF functional modes, (a) FF connecting two clouds, (b) model of RESET operation, (c) model of PRESET operation, (d) model of HOLD operation................................................................................................. 3.11 Combinational equivalent for detecting HOLD faults................................. t 4.1 Example of ACYST. (a) Partial scan design, (b) acyclic kernel, (c) test generation model....................................................................................... i i 4.2 Example of acyclic kernel................................................................................. 4.3 Construction of schedule constraint graph, (a) Constraints for (H, D) only; (b) constraints for (H, D) and (A, D)................................................ 68 4.4 Basic test generation model, (a) Acyclic structure; (b) basic TGM. 73 4.5 TGM for balanced structure, (a) Balanced structure, (b) combina tional equivalent................................................................................................ 74 4.6 Condensation of test generation model, (a) Step 1, (b) step 2............... 76 5.1 General form of a switch, (a) MUX, (b) bus............. ................................. 81 5.2 Generalized topology graph, (a) Circuit showing labels, (b) GTG with ACLUs as nodes....................................................................................... 84 5.3 SB-structure example, (a) SB-structure, (b) combinational equivalent. 87 5.4 Transformation of the GTG for feedback register analysis, (a) Circuit, (b) GTG G , (c) transformed GTG G f ......................................................... 94 5.5 Illustration of checkSW procedure.............................................................. 96 5.6 Illustration of balanceSW where no finite mincut exists....................... 100 5.7 Illustration of kernels with I-paths. (a) Circuit with two kernels, (b) test plan for K l, (c) test plan for K2............................................................ 103 5.8 Use of I-paths in reducing partial scan overheads, (a) Circuit, (b) test plan. . . ............................................................................................................. 105 5.9 Relationship between maximal (Km a x) and minimal (K m i n ) kernels. 106 5.10 No-match situation, (a) Kernels K l, K2; (b) test plan for K2............... h * — 1 o 0 0 5.11 Unsatisfiable kernel due to data conflict, (a) Circuit, (b) test plan showing conflict.................................................................................................. o o o rH 5.12 Relationship between maximal (Kmax), minimal (Km i n) and mini mal satisfiable (Ksat) kernels........................................................................ 110 5.13 Resolving a no-match conflict for K l in Figure 5.10................................. 112 5.14 Schematic illustration of primary data conflicts, (a) Input side of kernel, (b) output side of kernel..................................................................... 114 5.15 Kernel minimization: I-paths and minimal kernel..................................... 119 5.16 Kernel minimization: I-paths and minimal satisfiable kernel.................. 121 ! 6.1 Subdividing a kernel by output-based partitioning................................... 126 6.2 Overlapping kernels, (a) Low overlap, (b) high overlap. ........................ 128 6.3 Subdividing a kernel by switch-based partitioning. .................................. 130 6.4 Illustration of switch-based partitioning procedure, (a) Original ker nel K , (b) state space, (c) kernels generated, (d) combinational equiv alents....................................................................................................................... 132 6.5 Subdividing a kernel by size-based partitioning..............................................135 7.1 Example of partitioned testing, (a) Partitioned kernels, (b) test rela tionship graph............................................ 142 7.2 TRG example to illustrate dependence groups............................................... 148 i ! 7.3 Illustration of incremental scheduling: (a) partial schedule S', (b) in- ! cremented schedule S" without interruptions, (c) incremented sched- j ule S" with interruption.......................................................................................155 7.4 Implicit enumeration search space for generating a schedule...................... 160 7.5 Schedules generated by the search algorithm. (a) Schedule for I {A, B , C ) as well as (B , A , C); (b) schedule for (C , B , A) .............................161 8.1 Circuit example for illustrating different test schemes..................................168 8.2 Example of single scan path chaining problem............................................... 174 8.3 Ranges for placement of scan FFs. (a) For session T S 3, (b) for session T S 2, (c) for session TSj, (d) combined ranges.............................................. 177 8.4 Illustration of computation of minimum session cycle, (a) Case 1, (b) 1 case 2, (c) case 3.................................................................................................... 179 1 x 7 , 8.5 Example of single scan chain ordering..............................................................183 8.6 Example of single scan path chaining with empty ideal range for a register......................................................................... 184 8.7 Example of single scan path chaining with no perfect matching. . . . 185 ; 8.8 Case study for single scan chain, (a) Schematic description of circuit, | (b) optimal scan chain ordering..........................................................................188 8.9 Circuit model for two-kernel multiple scan chain study................................191 i 8.10 Optimal chaining solutions, (a) Original configuration, (b) in region ! A, (c) in region B, (d) in intersection region...................................................194 | 8.11 Venn diagram of search space for multiple chain design problem. . . . 195 ! 8.12 Decrease in overall test time with multiple scan chains............................... 204 9.1 SIESTA 1.0 system architecture........................................................................208 9.2 SIESTA user interface..........................................................................................209 9.3 The partial scan design space.............................................................................212 xm t I t Abstract i I Serial scan design is an approach to design for testability that can greatly reduce the cost of test generation for sequential circuits. It represents a class of techniques in which the storage elements are connected together into a continuous ! shift register chain in test mode, with the ends of the chain connected to I/O pins. J This allows the storage elements to be fully controlled and fully observed, simplify- ! ' I ing the test problem and reducing it to that of testing a combinational structure. Despite its benefits, serial scan design is often unattractive to circuit designers be- ! cause of overheads in chip area, performance, pin count, and/or test time. Further, | the traditional form of serial scan design is fairly rigid and does not provide flexibil- j ity in meeting specific design goals. This thesis presents an integrated scan design methodology called SIESTA that uses a range of strategies to reduce scan design costs. The concept of partial scan, in which only a subset of the circuit’s storage j elements are included in the scan path, is used to reduce area overhead and perfor mance costs. Techniques are presented for selecting scan storage elements such that : the overheads can be minimized while ensuring that combinational test generation . is sufficient for the resulting design. Partitioning for test is employed to help reduce 1 test generation and test application costs. Algorithms are presented for efficiently 1 scheduling tests for various circuit partitions so as to minimize the overall circuit test time. A study of the scan path chaining problem, for both single and multiple 1 chain designs, is also presented. The results demonstrate that an optimal configura- i tion of multiple scan chains for minimum test time may actually have scan chains of unequal lengths. Given a circuit under design, the SIESTA methodology is able to \ apply different strategies to generate a range of testable design solutions that trade : off different design costs against each other. Thus it can help meet the specific goals , and constraints on the circuit. I ! i f I X I V Chapter 1 Introduction “ If you would know what nobody knows, read what everybody reads, \ just one year afterwards.” —Ralph W. Emerson j l The objective of the research presented here is to develop an integrated scan j design system. Various scan design schemes have been developed previously but im -: plementing them optimally has remained an open problem. In the following sections j we discuss the need for design for testability in general, study the well-known serial1 jscan DFT techniques, and explain why an integrated scan system is required. I 1.1 Design for Testability i Due to the rapid increase in the density of digital ICs, the amount of logic within a single chip has become extremely high. This makes the chips hard to test since! there is little access to the internal circuit elements except through the I/O pins. Automatic test pattern generation (ATPG) is a computation-intensive problem even for combinational circuits. For general sequential circuits the test generation problem j is almost intractable due to the difficulty of bringing the circuit to an arbitrary state. ^ Design-for-testability (DFT) techniques [1] are intended to reduce the diffi-' culty of test generation by providing increased access to the internal elements of a, __________________________________ i: circuit, particularly the storage elements. A d hoc D F T tec h n iq u e s, as the name implies, provide guidelines for circuit designers on how to increase controllability and observability, which are the two key factors related to test generation. Some of these techniques dictate that special test points be added to parts of a circuit that are difficult to access. On the other hand, s tru c tu re d D F T te c h n iq u e s provide well-defined design rules and are usually associated with a specific test methodology such as deterministic testing, exhaustive testing or pseudorandom testing. Examples of structured DFT techniques are Built-In Logic Block Observation (BILBO) [2] and Level-Sensitive Scan Design (LSSD) [3]. BILBO is an example of a fully built-in DFT technique in which test data j is generated and analyzed on the chip, at the cost of high overhead. LSSD is a , type of serial scan technique in which test data is stored off the chip, which means that a large fraction of the test time is taken up in moving data on and off the chip. Both techniques have their advantages and disadvantages which will not be discussed ; here. The Testable Design Expert System (TDES) [4] was developed to handle th e ! tradeoffs among various techniques by using a common model for describing such | techniques. In this research, however, we focus on serial scan design techniques and 1 their particular implementation issues. j < I 1.2 Serial Scan Design ! Serial scan design represents a range of DFT techniques having the distinguishing ; characteristic that test data is stored outside the circuit and is shifted in and out of it serially. This is normally achieved by connecting some or all of the internal storage elements (latches or flip-flops) into a shift register when in the test mode, and providing access to the end points of the shift register through I/O pins. This! shift register is known as a scan p a th . Instead of a single scan path, multiple scan ■ paths may be used, with their end points connected to separate pins or m ultiplexed1 to share the same pins. 1 A generic scan architecture is illustrated in Figure 1.1. Figure 1.1(a) shows I 1 the Huffman model representation of a general synchronous sequential circuit. It I 'K e rn el- FF1 FF1 < — FF2 FF2 (a) SI so (b) Figure 1.1: General synchronous circuit, (a) Huffman model, (b) circuit with scan path. consists of a block of combinational logic (which may or may not be fully connected) along with a set of storage elements connected in feedback loops. Figure 1.1(b) shows a modified circuit in which all the storage elements can be configured into a scan path in test mode. This architecture is known as full scan design, since all storage elements in the circuit are included in the scan path. The scan path can be ( accessed via scan-in and scan-out pins. At least one additional test control signal, which is not shown, is required for distinguishing between the test mode and the normal operation mode. Thus three additional I/O pins are required, irrespective of the type of scan design used. It is possible to multiplex the scan-in and scan-out pins , (but not the test control pin) with existing pins used for normal system operation, at the cost of some additional multiplexing logic. 1.2.1 C ircu it M od ification Each storage element in the scan path needs to be modified so that in the test mode it can shift data serially. A large number of storage element designs have been proposed [3, 5, 6, 7]. The various designs can usually be characterized based on the type of storage elements they use. There are three basic types [8]: single latches, double latches (master-slave latches), and flip-flops. For each type, one or more o f; i 3 DL- D2- C K 1- TCK- CK2- TD" 2D C l C2 Q ID G 2D D2- T- J CKL LI ID G Cl CK2- (a ) 2P SRL: 2-port shift register latch (b) M D SRL: multiplexed data shift register latch LI L2 D T \ y i > ID C ID G C l C l D2- T- OC = C F ID G 1 0 i G C l a j - o > (c ) 2P FF: 2-port flip-flop (d) MD FF: multiplexed data flip-flop Figure 1.2: Storage element designs for various scan techniques. the storage element types in Figure 1.2 may be applicable. The 2PSRL and MDSRL are applicable to latch-based designs; the 2PFF and MDFF are applicable to FF- based designs. In each storage element design, LI and L2 are latches; D1 and D2 ] are the system and test data inputs respectively; CK, CK l, CK2 are system clocks.| In Figures 1.2(b) and 1.2(d), T is a test mode control signal which determines which of the data inputs is read in. In Figures 1.2(a) and 1.2(c), TCK is a test clock signal 1 which enables the test data to be loaded in when a pulse is applied on it. All designs J may have an optional LOAD ENABLE control signal LE (not shown). ' All circuits constructed using one of the above scan path storage elem ent! (SPSE) designs have two basic modes of operation. In the normal mode the circuit | carries out its normal system function. In the test mode the test mode control signal or the test clock signal (whichever is applicable) is activated so that the SPSEs ; behave collectively as a serial shift register. j 1.2.2 T est G en era tio n and A p p lica tio n A set of test patterns for a scan-testable circuit can be obtained by removing the SPSEs from it and carrying out ATPG on the remainder of the circuit. Assuming that all storage elements are in the scan path, the remainder of the circuit is fully combinational. Thus the complexity of ATPG is significantly reduced compared to that for the original sequential circuit. Note that all the inputs and outputs of the combinational portion of the circuit are accessible either as primary I/O or through SPSEs. Given the set of test patterns for the combinational logic, they can ' be applied to the circuit using the following test plan. Assume that the scan path ( consists of k SPSEs and that N test vectors need to be applied. I 1. Keep the system in the test mode for k clock cycles and shift the appropriate portion of the first test pattern into the scan path. 2. Repeat N times: (a) Keep the system in the normal mode for 1 clock cycle while the appropri- \ ate portion of the current test pattern is applied at the circuit primary I inputs. At the end of the clock cycle, sample the values at the circuit; primary outputs. (b) Keep the system in the test mode for k clock cycles and shift out the te s t' result in the scan path; simultaneously, shift in the appropriate portion j of the next test pattern. □ | i i t 1.3 Scan Design Costs The scan technique described above requires that the circuit storage elements be modified into SPSEs such that they can be connected into a scan path in the te s t: 1 mode. The result of these modifications affect the quality of the design in various; ways. Some of these effects are described below. Test generation effort Clearly the problem of testing the circuit is reduced to that of testing a combinational circuit, since all inputs and outputs of the com binational portion of a scan-testable circuit are fully accessible via the scan path. Despite recent advances in test generation techniques for sequential circuits, the problem of ATPG for a combinational circuit is still orders of magnitude simpler than th at for a sequential circuit of comparable size. This implies that the test generation effort required for the scan-testable version is orders of magnitude lower than that for the original sequential circuit. The reduction in test generation cost is the major achievement of the serial scan design methodology, which makes it ac- 1 tually feasible to obtain comprehensive test sets for complex circuits, and achieve. a high degree of confidence in their correct operation. However, depending on the manner in which the serial scan approach is used, the benefit is achieved at the cost I of certain overheads described below. j A rea overhead The modification of the storage elements to make them scannable j causes an increase in the circuit area. The amount of the increase depends on th e ! type of scan design used and the number of storage elements to be connected into I the scan path. The area overhead often makes serial scan design unattractive to • designers because it reduces the space available for functional logic on a chip. If on the other hand the total size of a chip is allowed to increase, the yield (i.e., the j percentage of manufactured chips that are fault-free) of the circuit may decrease. j Perform ance degradation The modification of a storage element to make it ’ scannable introduces an additional delay at its input of approximately two gate delays, even in normal system operation. In a traditional full scan design, every storage element is modified; hence the system clock cycle may need to be extended by this amount, causing a degradation in the circuit’s performance. In the competitive integrated circuit market, this could be a serious liability and, along with the a re a ' overhead cost, sometimes discourages designers from using the scan design approach. | | Clearly, if storage elements that lie in critical timing paths could be excluded from the scan path, the overall performance of the circuit could be made more acceptable; this fact will be exploited in the work presented here. 6 ; Test tim e In a full scan circuit each test applied to the circuit consists of a single pattern fed to the scan path rather than a complex initialization sequence. However, the scan path can only be accessed serially. This makes the time to apply a given pattern proportional to the number of storage elements in the scan path. Since a typical full scan design may have hundreds of storage elements in the scan path, this may lead to hundreds of of clock cycles to apply a test for a given target fault. A high j overall test tim e for a design can be expensive in a production environment, where j autom atic test equipment is usually costly and in high demand, and in maintenance ! testing, where a system may need to be shut down for the duration of a test. ' i j I/O pin count The scan designs described previously require three I/O pins to be added to the circuit for scan-in, scan-out, and the test mode control signal or test clock (whichever is applicable). The first two can actually be multiplexed with ! existing system I/O , thus trading off I/O pins with multiplexer area. | i In summary, scan design leads to lower test generation effort, at the cost of j higher area, extra I/O pins, degradation in system performance, and a high te s t1 time to apply each test. Below we discuss different ways in which these costs can be. reduced. ! i t ! 1.4 Reducing Scan Design Costs i Due to the modern thrust for high density of transistors on ICs and high operating, speed, the overheads due to scan design can be expensive and make scan design unattractive to designers. However, the overheads can be reduced by making use of a range of design options which are described below. P artial Scan Scan design techniques in which all circuit storage elements are | included in a scan path are known as full scan techniques. In partial scan, only a subset of the storage elements are made scannable. This leads to reduced area] overhead and possibly reduced test time. Performance degradation can also be I I j avoided by keeping registers in critical paths out of the scan path. Most partial scan 7 techniques achieve these benefits at the expense of lower fault coverage or higher test generation effort [9, 10, 11]. However, this research has yielded techniques to achieve the benefits of partial scan without incurring these expenses; these techniques will be described in later chapters. P artition in g In traditional scan techniques, test patterns are generated for the | circuit as a whole, hence each pattern needs to be shifted into the whole scan path. J If the circuit is partitioned suitably and individual partitions are tested separately, > only a part of the scan path may need to be accessed for each pattern. In some cases I l this may lead to a reduction in the overall test time. Another potential benefit is ' that the overall test generation cost may be reduced since it increases rapidly w ith ! the size of the circuit under test. I I U se o f Sw itching E lem ents Circuit elements such as multiplexers (MUXes) , and demultiplexers (DEMUXes), which are used as “switches,” have the ability to transfer data unchanged from input to output under certain control inputs. In j traditional testing such elements are treated as random combinational logic elem ents.! By partitioning the circuit appropriately, it may be possible to make use of the switching elements in such a manner that the same scan path storage elements are used to test different partitions at different times. Thus partial scan can be used > without loss in fault coverage within any partition. This type of test is illustrated in Figure 1.3, where R1 can be used to provide test patterns to either K l or K2j by appropriately controlling MUX. Separate test sessions are required to test K l and K2 since R1 cannot supply the required patterns to both simultaneously. Note that some registers and switching elements may need to be tested separately; for example, R2 and MUX in Figure 1.3 do not get fully exercised in all their operation modes during the test of the two partitions. j I M u ltip le Scan P ath Chains The bulk of the test time for a scan testable circuit is consumed in shifting test data serially in and out of the scan path. This tim e can j be reduced by using several scan path chains and scanning them in parallel. Thus 1 the test time is reduced at the expense of two I/O pins per scan chain. Figure 1.3: Example of use of switching elements for partial scan. Given a circuit to be made testable, a multi-dimensional space of scan design solutions can be generated by using different combinations of the various scan design options described above with different parameters. Each design in the space is characterized by its test generation effort, area overhead, test time, test storage, I/O pin overhead and performance degradation. The problem of scan design can thus be stated as follows: given a set of constraints on the various design costs, use a combination of the various design options described above to obtain a design in the solution space that satisfies the constraints. In the next chapter we look briefly at previous attem pts to use one or more of I the scan design options described above in an automated design system. In Chapter j 3 a new approach to partial scan, which reduces the cost due to full scan but retains I the benefit of combinational test generation, is presented. This approach, which is implemented in a subsystem called BALLAST, is based on defining a class of easily testable sequential structures having a “balanced” property. In Chapters 4 and 5 this approach is extended in two different ways. In the former, some of th e ' restrictions of BALLAST are relaxed, leading to a further decrease in the scan design overheads at the expense of some increase in the test generation complexity. In the latter, any multiplexers and/or buses present in the circuit are used during test so as to improve the controllability and observability of the circuit. Chapter 6 studies different ways of partitioning the circuit in order to reduce overall test costs, and a procedure for scheduling tests for the various partitions is developed in Chapter 7. Chapter 8 discusses the problem of constructing single as well as multiple scan path chains with the objective of minimizing the overall test time. Chapter 9 lists some of the achievements of this dissertation and discusses some open problems for future work. Chapter 2 Background i i I i “The merit belongs to the beginner should his successor do even better. ” \ —Egyptian proverb In this chapter we study previous approaches to the problem of managing | costs and tradeoffs in scan-based design. Each approach will be classified based, on the type of option it uses primarily—partial scan, partitioning, or multiple scan chains. j I j 2.1 Partitioning Approaches Four approaches to partitioning for test are described here. The first identifies! independent connected regions of combinational logic. The second uses multiplexers to gain access to various partitions. The third uses the functionality of the logic elements to set up high-level sensitized paths to and from internal partitions. The fourth uses shift registers to gain access to the partitions and to isolate them from each other. 11 C loud-B ased~PaFtitioning [12] In a full scan design, the combinational logic under test in the circuit can be subdivided into distinct regions of connected com binational logic called clouds. Each cloud is a maximal region of connected combi national logic that has either primary inputs or scan path storage elements at its inputs, and either primary outputs or scan path storage elements at its outputs. The CRETE system [12] takes in a hierarchical circuit description, and reorganizes ! the hierarchy to identify the clouds. ATPG can then be carried out on each cloud separately and the test sets can be merged later. This helps to reduce the maximum size of the circuit processed by ATPG and also leads to reduced overall test time. ; If any cloud is itself very large, the other types of partitioning listed below can be ' applied to subdivide it further. i i M u ltip lexer P artitioning [13] This technique was actually proposed for the case ; where a combinational circuit was to be tested exhaustively. Multiplexers are in- serted in the circuit and connected such that during the test of a partition, all its embedded inputs and outputs are accessible via primary inputs and outputs of the circuit. To apply this technique in serial scan design, the outputs and inputs of the scan path storage elements can be treated as primary inputs and primary outputs j respectively, since they are fully accessible using the scan path. In general this tech- i nique can be used to reduce the size of the individual partitions under test and to j achieve controllability and observability of internal wires. However, the multiplexers i added for test can significantly increase the area overhead and may affect the speed j of operation of the circuit. I I-M od e— B ased P artitioning [4, 13, 14] In some circuits, partitioning can also be achieved by appropriately setting the required control inputs of certain partitions j (called data transporters in [14]), giving them a transparent behavior (called an I- j mode in [4]) so that they can trasmit test data to and from other partitions that | are under test. For example, most ALUs can be made transparent to data at one input using some combination at the control inputs. A data transfer path set up in a circuit by a series of such partitions with I-modes enabled is called an I-path [4]. I-paths can be used to obtain High controllability and observability of internal circuit elements with a low hardware cost. Functional P artition in g [15] In this technique the circuit is decomposed into subunits on a functional basis so that functional test vectors can be used. The storage elements associated with each subunit are connected into a separate scan path. To provide access to signals that connect different subunits, either multiplexer partitioning is used (as described above) or a special feedback shift register (FSR) ! is used for each unit. The FSR is inserted in the interconnection lines so that in j .test mode the subunit under test can be provided with test patterns by scanning ! data into the FSR. Only one subunit is tested at a time, and all the scan paths ; share the same scan in and scan out pins through multiplexers and dem ultiplexers., An additional partitioning control shift register (PCSR) with its own scan in and ' • scan out pins is used to configure the circuit during test so that the desired subunit | j can be accessed. This approach has a high overhead due to the large number o f1 shift registers required for isolating the subunits; however, it may be feasible if good i functional tests are available for the subunits. j I I 2.2 Partial Scan Approaches i Three scan design approaches based on partial scan are presented here. The first uses heuristics based on testability estimates to select storage elements for the scan i path. The second attem pts to cover as many faults as possible out of a target fault ( set using combinational TPG. The third uses an analysis of the state transitions to | generate tests and to help construct the scan path. j i I P artial Scan based on T estability Evaluation [9] In this approach only hard- to-test storage elements are included in the scan path. The testability of each circuit J node is evaluated using the Sandia Controllability and Observability Analysis Pro- j gram (SCOAP). Each node is assigned six values that represent the difficulty of j controlling and observing it. Higher values indicate nodes that are harder to test. | 13 Using the SCOAP values, h~ard-to-test storage elements are selected and connected into a scan path. Since these nodes can be easily initialized and observed, the overall ATPG cost for the circuit decreases. The number of storage elements in the scan path can be based on the permissible area overhead or on the desired levels of the SCOAP values in the circuit. According to experimental results the greatest incre m ental cost reduction can be achieved by including 15% to 30% of the hardest-to-test storage elements in the scan path. This technique is simple to implement and can be used to trade off testability improvement versus overhead. However, it depends on SCOAP for identifying hard- to-control circuit nodes, and SCOAP estimates may not be very accurate for some j circuits. In this approach the portion of the circuit consisting of all circuit logic ■ excluding the scan path FFs, called the kernel, is sequential; hence a sequential, ATPG program is required. J P artial Scan based on Target Fault Set [10] This approach requires in itial1 functional testing to be carried out with high fault coverage (typically at least 70%). j t The partial scan analysis is carried out with only the set of remaining untested faults j being targeted. Two alternative algorithms are proposed. In the first, all possible j test vectors are initially generated for each target fault using a modified version of | the PODEM program; for each test vector the set of FFs that are required to be inj the scan path is determined. A minimal subset of FFs is then selected such that all' (or the required number of) target faults can be detected. In the second algorithm t only one test vector is generated for each target fault, but the distance heuristic of PODEM is used to minimize the number of additional FFs that need to be scanned. Each tim e a test vector is generated the distance heuristics are updated so th at th e ; FFs already in the scan path can be used for the remaining target faults at no cost. > l According to experimental results using this approach, 95% fault coverage can be i achieved with less than 65% of the FFs in the scan path. j i One advantage of this approach compared to the previous one is that the ker nel is combinational, hence it requires only combinational ATPG. However, there is a high processing overhead, especially in the first of the two alternative algorithms, 14' which requires all test patterns to be enumerated for each fault. Another disadvan tage is that prior functional testing is a prerequisite for this approach to be useful in getting high fault coverage. P artial Scan based on S tate Transitions [11] This technique determines a minimal subset of storage elements to be made scannable such that all faults in the circuit are easily testable using multiple-pat tern test sequences. A test sequence for a given fault consists of two parts: a justification sequence, which must begin with the circuit in a fixed reset state, and a propagation sequence, which must have a limited length. In case of justification or propagation failure, the algorithm uses the state transition graph of the circuit to determine all minimal subsets of storage elements which, if made scannable, allow the fault to be detected. These minimal I subsets are determined for all faults under consideration. Heuristics are then u se d ; I to select the set of storage elements in the scan path such that at least one feasible I test sequence exists for each fault. J This technique requires a state transition table for the circuit to be available. J It basically searches for a near-optimal solution at the cost of processing a large' amount of information, hence it has a high computation complexity and may be 1 impractical for large circuits. ! P artial Scan based on C ircuit Structure [16, 17] These techniques analyze! the interconnection of combinational logic with FFs, and select scan FFs such that the resulting kernel is acyclic or close to acyclic. 1 Cheng et al. [16] make the assumption that the difficulty of test generation increases with the length of the cycles in the circuit, where the length of a cycle is the number of FFs it passes through. They use a heuristic to select a minimal num ber of scan FFs so as to break all cycles except self-loops, i.e., cycles consisting ■ of combinational logic and a single FF. The argument is that self-loops do n o t! contribute substantially to the sequential behavior of a circuit. For example, test | generation for a subcircuit containing a single self-loop can be carried out in at most two time frames. However, this argument does not necessarily extend to the circuit' as a whole, and the authors do not provide any conclusive evidence that a collection ‘ 15 of interacting self-loops is easier to test than a single long loop with the same number of FFs. The technique of Kunzmann [17] is based on ideas that are somewhat similar to those seen in Chapter 3 of this work but were developed independently. The approach is to select scan FFs so as to result in a kernel that is “equivalent” (or “balanced”, according to our terminology). Such a kernel is acyclic, and further, if any pair of circuit nodes is connected by more than one path, all such paths must have equal numbers of FFs. During test mode all non-scan FFs are placed in a bypass mode so that the kernel behaves as a combinational structure. By selecting | FFs in the manner described above, only combinational ATPG is required to obtain complete stuck-at tests for the resulting circuit. Kunzmann’s technique has the j disadvantage that it requires all storage elements, including non-scan FFs, to be modified in some way, which diminishes the advantage of using partial scan. In general, structure-based partial scan approaches have the advantage that a high-level model of the circuit is sufficient, provided it captures the data depen- j dencies among the FFs and the combinational logic. This helps to simplify the; analysis for selecting scan FFs. Our analytical study in Chapters 3-5 will bring; out the strong dependency of the ATPG cost on structural features such as cycles,' unbalanced paths, and the location of any buses/multiplexers (“switches”) present in the circuit. The results of this study will be utilized in a range of scan path methodologies that attem pt to satisfy the limitations of the earlier approaches. \ Recently an optimization-based approach to partial scan design has b een 1 ] proposed by Chickermane et al. [18]. They pose the problem of selecting scan FFs, as an optimization problem with different objective functions. Given a cost function j associated with converting each FF to a scan FF, and an upper bound on the c o st' i of the design, the goal is to select a satisfactory set of scan FFs so as to maximize! certain testability measures. The testability measures used include the sequential j depth of the resulting kernel, the number of cycles, the estim ated test sequence j length, and SCOAP-based controllability/observability values. This study shows | that provided a good estimator of the cost of scanning each FF is available, a n ! optimization-based approach is feasible for partial scan design. The authors have presented an estimation function based on a standard cell design style; however, estim ating this cost in the general case is a complex problem. 2.3 Multiple Scan Chain Approaches i j Two approaches to multiple scan path chaining are described in this section. The t first attem pts to construct scan paths of equal length while the second forms scan j path chains that can run independent of each other. i J M u ltip lexed A ccess Scan T estable (M A ST ) D esign [19] In this technique the scan path is broken up into several equal-length scan path chain segments, and i the scan in/out ports of each are multiplexed with existing system I/O pins. Thus i the possible number of scan chains is bounded by the number of system I/O pins available, and all chains can be operated in parallel in test mode. W ith this setup the time required to access all the storage elements depends on the length of th e ; longest scan chain. The MAST technique distributes the storage elements evenly! among the scan chains; this minimizes the length of the longest scan chain. The] equal-length scan chain approach is effective when the circuit is tested as a unit in j i a single test session. However, in Chapter 8 it will be shown that when the test for the circuit is carried out in a partitioned manner, relaxing the equal-length chain requirement can lead to more optimal results; i.e., the lowest overall test time may actually be obtained by using an unequal-length scan chain configuration. D F T E X P E R T Scan P ath D esign [14] Unlike the MAST approach, the D F -' TEXPERT approach assigns storage elements to scan path chains based on th e , | interconnection patterns among the registers and data processors (DPs), the latter j being blocks of random logic. The design process [14] is based on every DP having; ! exactly one input port and one output port, hence it may not be easily extensible to general circuits. In DFTEXPERT, the circuit is partitioned into independent networks, each of which is characterized as either a single-DP net, a chain net or a graph net. A single-DP net consists of a DP with its driver and receiver registers (i.e., registersthat supply test patterns and load in the test results, respectively). A chain net consists of a series of alternating registers and DPs, in which the first register is a driver, the last register is a receiver, and the intermediate registers serve as both drivers and receivers. A graph net is a general net that does not belong to either of the first two types. The formation of scan chains in DFTEXPERT is guided by rules. The registers in single-DP and chain nets may be included in the same scan chain or in different scan chains operating in parallel. The registers in a graph net are always connected in the same scan chain. Nets of small size may be combined into groups based on an affinity measure between nets. The scan path ' chaining rules attem pt to use a range of strategies in order to meet the constraints imposed on circuit overheads and on test time. ! In DFTEXPERT, the ordering of registers in each scan chain is carried out by heuristics based on the number of DPs that make use of each register in the chain to provide or receive test data. However, the test lengths of the DPs (i.e., the number j of test patterns required for each DP) are not taken into account. In Chapter 8 w e! will show how the relative test lengths of various DPs, when taken into account, can | help in configuring multiple scan chains such that the overall test time is m inim ized.' Chapter 3 {Partial Scan Design with Balanced Structures | i i i i “The greatest truths are the simplest. . . .” | —Julius Charles Hare and Augustus William Hare, ‘Guesses at Truth’ (1827) j “ I have made this letter longer than usual because I lack the time to make it shorter. ” —Pascal j 3.1 Introduction Autom atic test pattern generation (ATPG) for sequential circuits is generally con-j sidered to be an intractable problem that requires large amounts of com putation | I even for circuits of moderate size. Full scan design techniques attem pt to alleviate I this problem by connecting all flip-flops (FFs) or latches into a scan path during te s t' mode so that all these elements become easily controllable and observable. Thus in , a circuit designed using full scan the portion of the circuit consisting of all circuit \ logic excluding the scan path FFs, which we shall refer to as the kernel of the circuit, is fully combinational. Due to the overhead in modifying the FFs, par tial scan techniques [9] have been proposed in which only a subset of the FFs are included in the scan path. This implies that the kernel is itself sequential. Hence j these techniques either require sequential ATPG [11, 16] or use combinational ATPG I techniques without covering all faults in the kernel [10]. I 19 Inthis~chapt,er we present an alternative partial scan methodology, BALLAST (BALAnced structure Scan Test), that requires only combinational ATPG and leads to complete coverage of all detectable single stuck-at (SSA) faults. Scan path FFs are selected in such a way that the resulting kernel belongs to a certain class of easily testable structures called B-structures. Test patterns for the kernel are obtained by treating it as being combinational, with FFs within it simply replaced by delayless [ wires. During test application, each pattern is held constant in the scan path for a j fixed number of clock cycles before the output pattern of the kernel is sampled and : shifted out. The BALLAST methodology is illustrated using the circuit of Figure 3.1 j consisting of six registers and four combinational logic blocks. The only registers in | the scan path are R3 and R6. In this example the kernel has a pipelined structure, i For the time being let us assume that the scan path registers are provided with the ability to hold data constant across consecutive clock cycles (later this requirement will be relaxed). Test patterns can be obtained by using combinatinal ATPG in the manner described above. To test the kernel, each test pattern is shifted into the scan path and held in it for two clock cycles before the kernel output is loaded back into the scan path and shifted out. We will prove that using this approach we can detect all faults in the kernel that can cause errors in the logical operation of the circuit. We shall also study the implications of “unbalanced” paths in the kernel with unequal delays, and of the presence of HOLD modes in non-scan registers for j normal system operation. A partial scan technique with some similarities to BALLAST was developed' independently by Kunzmann [17]. This technique has several limitations which are; addressed by BALLAST. First, in addition to modifying the storage elements in the scan path, even the non-scan storage elements need to be modified to make them transparent in test mode. This diminishes the advantage of using partial scan. I Second, non-scan storage elements are passive during test (since they are bypassed) and do not get exercised. Third, the technique is not applicable if any storage elements have L o a d E n a b l e control signals. C l R? C2 ----^ RE. 1 — ----^ C4 R1 ----^ C3 - H & (a) i H o l d control (for te s t) SO *< -- S . l (b) Figure 3.1: Illustration of BALLAST methodology, (a) Synchronous circuit, (b) partial scan design of (a). i 21 I 3.2 Basic Circuit Model In this work we consider only synchronous sequential circuits in which every cyclic path contains at least one clocked storage element. The storage elements are assumed to be flip-flops (FFs). The circuit may have any number of clocks. However, the FF clocks must all be controlled by primary input signals, and no clock signal may feed the data input of any FF either directly or through combinational logic. The FFs (may have LOAD ENABLE control signals, with similar restrictions as for the clock signals. The FFs may also have R e se t/ P reset controls provided they are present j in all FFs and all are controlled by the same primary RESET/PRESET control signal, j i In general a synchronous sequential circuit S consists of blocks of combi- j national logic connected with each other either directly or through registers. A 1 r e g i s t e r is defined as a collection of one or more FFs with all FFs driven by the same clock signal and controlled by the same mode control signal (if any). Any | 1 subset of the FFs in a register forms a valid register. ! The set of registers in the circuit can be partitioned into two subsets based on j the presence or absence of explicit L o a d ENABLE controls on the FFs comprising, them . We define the lo a d s e t L as the set of registers in the circuit whose FFs have no explicit load enable control; these registers always operate in the L o a d mode (in which data is read from the data input during every clock cycle). Similarly, we define ' i the h o l d s e t H as the set of registers whose FFs have an explicit L o a d ENABLE, I control signal; these registers have two modes of operation: a HOLD mode (in w hich; they retain their value across consecutive clock cycles) as well as a LOAD mode. The combinational logic in S can be partitioned into c lo u d s , where each cloud is a maximal region of connected combinational logic such that its inputs a re ' either primary inputs or outputs of FFs and its outputs are either primary outputs • ior inputs to FFs. In Figure 3.1(a) each block of combinational logic C l, C2, C3, ■ |C4 represents a cloud. A v a c u o u s c lo u d is a special type of cloud consisting of! i j simple wires with no logic; a vacuous cloud is present wherever a primary input I directly feeds an FF, or an FF directly feeds a primary output, or an FF directly! j feeds another FF. i I 22 Figure 3.2: Example of topology graph. From the way in which clouds are defined it follows that no two clouds can be directly connected together; they must be separated by one or more registers. Further, each FF in a register must receive data from exactly one cloud and must feed exactly one cloud. We can therefore constrain the grouping of FFs into registers j (by splitting registers where necessary) such that each register receives data fro m ! exactly one cloud and feeds exactly one cloud. Under this constraint the topology of the circuit can be modeled by a directed top ology graph G = (V, A, H ,w ) in j which nodes in V represent clouds, each arc in A represents a connection between two clouds through a register, arcs in H C A represent HOLD registers, and w : A — + Z + (positive integers) defines the number of FFs in each register. We will use w(a) , to represent the cost of converting the register a into a scan path register. Figure j 3.2 shows a typical topology graph; bold arcs represent HOLD registers and others : represent L o ad registers. j 3.3 B-Structures and their Properties Let S be an arbitrary synchronous sequential circuit with topology graph G — (V ,A ,H ,w ). I D efinition: S is said to be a balanced sequential structure (B -stru ctu re) if: I i 1. G is acyclic; j 2. Vr>i, V'l € V, all directed paths (if any) from v\ to u2 are of equal length (this | actually includes condition 1); and j 3. V/i € H , if h is removed from G, the resulting graph is disconnected. □ The example of Figure 3.2 satisfies these conditions and is therefore a B- structure. Given a balanced sequential structure S B, we define its com binational eq uivalen t, C B as the combinational circuit formed by replacing each FF in every register in S B by a wire (if the output of the register uses the Q output of the FF) or an inverter (if it uses the Q output of the FF) or both. Define the dep th , d, of S B as the longest directed path in its topology graph. Given an input pattern / applied to S B, define the single-pattern outp ut of S B for I as the steady-state ; output of S B when I is held constant at the inputs to S B and all its registers are operated in L o a d mode for at least d clock cycles. Given some fault / in S B, if the > single-pattern outputs for / of the good and the faulty circuits are different, then I J is a sin gle-p attern test for / . ! B-structures have two interesting properties which allow them to be used ; as kernels in a BALLAST partial scan design: (1) every detectable fault is single pattern testable, and (2) a complete single-pattern test set for all detectable SSA j faults can be derived using combinational test generation techniques. Both proper- j ties will be elaborated on and proved in Section 3.5. Next we present an overview of the proposed BALLAST methodology. j J i 3.4 Scan Design Using B-Structures i W hen any register in a sequential circuit is included in a scan path it serves as a: control and observation point for the rest of the circuit. In effect it becomes a primary j output of the cloud feeding it and a primary input of the cloud it drives. Thus in our circuit model, the inclusion of a register in the circuit scan path corresponds 1 to its removal from the topology graph of the circuit; the reduced topology graph j represents the kernel, i.e., the portion of the circuit to be tested using the scan path. The following is an outline of the BALLAST methodology. 1. Construct the topology graph G of the circuit as defined in Section 3.3. 24! if5* ^ ’ " L B (a) (b) Figure 3.3: (a) Kernel of Figure 3.1(b); (b) combinational equivalent of (a). I I I 2. Select a minimal cost set of arcs, R, to be removed from G such that the \ remaining topology graph is balanced. Arcs in R represent the registers that | must be included in the scan path. Let S B be the B-structure corresponding 1 to the resulting topology graph which represents the kernel of the circuit. For the example of Figure 3.1 the kernel is the B-structure shown in Figure 3.3(a). I Algorithms for implementing this step are presented in Section 3.6. | 3. Determine the combinational equivalent C B of the kernel S B. The combi national equivalent of the kernel in Figure 3.3(a) is shown in Figure 3.3(b). Using traditional combinational ATPG, determine a complete test set T for, C B. Since S B is balanced, T will constitute a complete single-pattern te s t1 set for all detectable faults in the kernel S B. (See Section 3.5 for proof o f1 correctness of this statem ent.) ^ 4. Construct a scan path by appropriately ordering the registers in R and con-; necting them so that they are capable of (i) shifting test patterns in/out, (ii); holding a pattern constant at the kernel inputs for d clock cycles (where d is j the depth of the B-structure comprising the kernel), and (iii) loading in the! test results from the kernel. Requirement (ii) can be achieved by providing all J scan registers with a HOLD mode as in Figure 3.1(b). In Section 3.7 we will 25' show how the need for all scan registers to have a H old mode can be partially removed. Figure 3.4 shows two illustrations of the above methodology which are vari ations on the example of Figure 3.1. In Figure 3.4(a) the register R1 has a HOLD mode provided for normal system operation. Figure 3.4(b) shows the kernel when R l, R3 and R6 are made scannable; clearly it is a B-structure and hence a valid kernel. Figure 3.4(c) shows a second variation in which the two paths between Cl and C4 have unequal delays. By scanning R l, R3 and R6, one of the paths is broken and the resulting kernel in Figure 3.4(d) is again a B-structure. j Given a circuit designed in the manner described above, the test plan for applying a sequence of single-pattern tests to the circuit is as follows. N is the number of test patterns to be applied and I is the total number of FFs in the scan path. , I 1. Operate all scan registers in the S H IF T mode for I clock cycles. (Scan in the j i first test pattern.) i 2. Repeat N times: , I (a) Place all scan registers in the HOLD mode and all non-scan registers ini the LOAD mode for d clock cycles. (This allows test data to propagate' through the kernel.) (b) Operate all scan registers in the L O A D mode for 1 clock cycle. (Load the' test result into the scan registers.) (c) Operate all scan registers in the S H IF T mode for I clock cycles. (Scan out' the test result and scan in the next test pattern.) □ ' The partial scan technique described in this section is applicable to general sequential circuits. It is particularly well suited to pipelined circuits with limited; feedback and feedforward connections. On the other hand, it is not very effective for j circuits in which there are few clouds or in which every storage element is connected j in a feedback loop through a single cloud; in the latter case the technique reduces | to full scan. 26' (a) (b) (c) (d) j *E ■ C2 RL a C4 H O L D .control . (in original circuit) H O L D con trol (for t e s t ) ^ C 2 — = * ■ R — ^ T c iK C3 R ^ C4 H O L D con trol (for te s t) i «. C l * R J C3 C4 Figure 3.4: Additional illustrations of the methodology, (a) Example with HOLD mode; (b) kernel of (a); (c) example with unbalanced paths; (d) kernel of (c). 27 3.5 Proof of Correctness In this section the testability properties of B-structures are studied with the objective of proving th at any B-structure is a valid kernel in the BALLAST test methodology. We shall focus on the kernel of the circuit under consideration in isolation from the scan path. We shall treat all connections of the kernel from /to scan path registers as primary I/O lines of the kernel. The BALLAST methodology ensures that the kernel is a B-structure; this has the following implications. Despite the kernel being sequential, the fact that it is acyclic greatly simplifies the process of obtaining a j 1 test sequence [20, Section 3.5]. But this fact is of limited practical importance to us I since arbitrary m ultiple-pattern test sequences cannot be applied to the kernel using { i a serial scan path. However, by restricting the kernel to being a B-structure, we shall j dem onstrate that single-pattern tests exist for all detectable faults in the kernel and I i that a complete test set of this type can be obtained using ordinary combinational j ATPG on the combinational equivalent of the kernel. I ! 3.5.1 S in g le -P a tte r n T esta b ility Let S B be an arbitrary B-structure and C B its combinational equivalent. The set of I faults of a B-structure refers to the union of the sets of single stuck-at faults of the | individual clouds. A stuck-at fault in a register can be considered to be equivalent, to a stuck-at fault in one of the clouds adjacent to it. A detectable fault is one that i 1 can be detected by some sequence of test patterns. ' By definition of the combinational equivalent, S B can be viewed as the com binational circuit C B with delays introduced appropriately within it. Since delays do not affect the steady-state output function of combinational logic, we have the following lemma. L em m a 1 For any input pattern I, the output of C B and the single-pattern output \ o f S B are identical. P ro o f: By induction on the depth of S B. ! I 28 i L"et~G’5e~the topology graplTof S B and let its depth be d. d = 0 : If d is 0, S B is combinational, and the statem ent is trivially true. d > 0 : Assume that the statem ent is true for 0 < d < n — 1, and consider d = n. Remove from G all “first-level” nodes, i.e., nodes that have no incoming a rc s,; and all corresponding first-level arcs. There must be at least one first-level node because G is acyclic. Let the resulting graph be G\, representing a corresponding balanced structure S f of depth n — 1 with combinational equivalent C f . When the input vector I is applied to S B, all clouds corresponding to first-level nodes must | i settle at constant values within the first clock cycle; hence the values loaded in by I the first-level registers must in fact be their final steady-state values. Thus after the first clock cycle, S f receives a constant input pattern. Since the depth of S f \ is n — 1, and both C f and S'® receive the same input pattern after the first clock j cycle, the output of C f and the single-pattern output of S f must be identical, by induction. Thus the lemma is true for d = n. □ . Note that the only property of B-structures used in the above lemma is the fact th at the topology graph is acyclic, hence it is true of all acyclic structures, balanced or unbalanced. i L em m a 2 Let f s be a fault in S B and let fc be the corresponding fault in C B . Then t any test pattern t for f c in C B is a single-pattern test for f s in S B . : J I I Proof: Let C f and S f be the faulty circuits produced by f c and f s respectively.1 Since t detects fc , the outputs of C B and C f must differ for input t. Due to Lemma 1, the single-pattern outputs of S B and S f must differ for input t. Thus f is a single-pattern test for f s in S B. □ Note that the above lemma does not prove that there is a single-pattern test for every detectable fault in S B. T h eorem 1 Every B-structure is fully testable for all detectable SSA faults using single-pattern tests. \ i I I t 29 P ro o f O utline Let the B-structure S B have depth d and let its combinational equivalent be C B. Let / be a detectable SSA fault in S B and let T be a test for / . T is in general a sequence of patterns to be applied to all data as well as control inputs of S B. It is sufficient to prove that T can be transformed into a single-pattern test for / . Let T cause the fault to be first detected at a particular output 2 of S B. Consider each data input Xi of S B (i.e., any input other than a control input for a register). Since S B is balanced, all paths from Xi to 2 must have equal delays. (Note I that since S B is a B-structure, if there is a HOLD register in any of these paths then all the paths must pass through it, thus preserving equal delays.) This implies that the output value at 2 when the fault is detected depends on the value at Xi during at most one clock cycle, say rXi. Hence if we transform T to T ' such that for each , 1 input Xi, the value required at clock cycle rX i in T is applied at every clock cycle in j the sequence T', then T' must be a valid test for / . We can now transform T ' into T " by operating all HOLD registers in S B in the LOAD mode throughout the te s t.; Note that the final output values at 2 for T' and T " respectively must be identical, I since both can be simulated by applying the corresponding input pattern to C B. I t ! follows that the resulting sequence T" must also detect / by producing an erroneous ; output at 2 . Hence any test sequence can be transformed into a single-pattern test, j In the above proof outline, the crucial step is the transform ation of an ar- bitrary test sequence into a single-pattern test. Below a more complete proof isj presented in which the complete transformation procedure is listed. P ro o f It is sufficient to prove that given any detectable fault / in the balanced structure S B, there exists a single-pattern test vector for / . We first present a formal proof and then illustrate the proof using an example. Let G = (V, A, H ,w ) be the topology graph of S B . During any clock cycle t, let the state of S B be defined by the tuple Gl — (G, I*, h1, x*), where P {v ),v 6 V ' represents the input pattern applied at the circuit primary inputs (if any) of the cloud j v during clock cycle t\ h*(a),a € H represents the mode signal of the HOLD register! 30 a during clock cycle t ; and x i(a),a 6 A represents the logic value of the register a during clock cyle t. A 5-valued logic system is used. The possible logic values are {0,1 ,D ,D , x}, where D and D represent erroneous states due to a fault, and x represents an unspecified or don’t-care value. Assume (without loss of generality) that for a HOLD register a, /^(a) = 0 represents the LOAD mode and hl{a) = 1 represents the HOLD mode. Since / is detectable, some test sequence T for / must exist. Let T consist of a sequence of patterns applied to the circuit primary inputs feeding the clouds and to the control signals feeding the HOLD registers. Let the first pattern be applied at j clock cycle 1 and let the fault be first observed at a circuit prim ary output at clock j cycle m. The application of T to S B can be fully described by the sequence of states j (G1, G2, . . . , Gm) that the circuit experiences. I I We shall now show that based on T it is possible to derive a single-pattern ; test for / . The following procedure transforms T into a single-pattern test T '. p ro c e d u re tra n s fo rm (T — (G1, . . . , Gm)): | 1. Pick a node vq € V such that the corresponding logic in S B has one or more : outputs th at are circuit primary outputs and at least one prim ary output has : value D or D in state Gm. 2. Define a relation called activation time, a : V — - ► { 1 ,2 ,... ,m } where o(u) ; represents the clock cycle during which the node v is is actively involved in sensitizing or propagating the effect of the fault / ; i.e., there is a functional i dependency between the output of node v during clock cycle a(v) and the output of vo during clock cycle m. Note that a (v ) need not be defined for all v. ! (It will be shown that the activation time of every node, if defined, m ust have a unique value.) Set a{vo) < — m. 3. Construct a set J-, representing a frontier of nodes being processed, and set T «- {u0}. | i 4. Repeat the following until T = < f > : j (a) Pick some » £ / having the highest activation tim e a(u ), and remove it | from J-. k < — a(v). 31 (B) For all incoming arcs a of v, do the following: i. u < — source node of a; Add u to IF, with a(u) < — k — 1. ii. If a £ H and hk(a) = 1 then A. t * — min. j such that /F+1(a) = P +2(a) = ... = hk(a) = 1; If there is no such j, skip steps B and C. (Note that in this case the value present in register a at tim e k is undefined, hence it cannot have an influence on the outcome of the test.) t represents the clock cycle during which node u is active, and register a loads the active data, in the original test plan. \ B. t < — k — t = number of HOLD cycles of a in the current sequence. i The next step eliminates the HOLD cycles while keeping the test I valid. ! C. The removal of a from G must disconnect G into separate components, by definition of balanced structures. Let Gu be the j component that contains u. Modify the test by delaying all i electrical activity within Gu by r clock cycles as follows. ! Vr G y in Gu, P{y) * — P~T(v), r < j < k; ; V/i € i f in Gu, hph) *- P ~ T{h), t < j < J f c ; i Va G A in Gu, xj (a) < — xj~T(a), r < j < k; \ hk(a) < — 0. j It can easily be seen that the sequence of modified states j (G1, , Gm) is still a valid test for / . j The preceding portion of the procedure transforms the test sequence such that in the final state sequence every register is in the LOAD mode at the time when the node driving it becomes active, and adjacent nodes are active i during adjacent clock cycles. Note that the activation time a(v) of each node' v depends only on its distance from the output node vq. Since G is balanced, ; this implies that for each node, there is a unique clock cycle during which it | is active. 5. Let t0 < — min„ [a(u)j = earliest activation tim e among nodes th at have been j assigned an activation time. Then (Gto, . . . , Gm) is a valid test for / . 6. Any circuit primary input feeding a node v must have the value determined by I a(vl(v) during the activation time a:(u) of v ; at other times its values do \ not affect the test. Hence we can transform the input patterns as follows to obtain a valid test: 32 Vu € F , P ( v ) < — I a^ \ v ) y to < t < m. In other words, the input pattern consisting of the pattern I a^ v\ v ) applied to each node v is a single-pattern test for / in S B. 7. Return I to(v), Vu E V; this represents the single-pattern test vector T'. The above procedure demonstrates that every fault in S B has a single-pattern test; this proves the theorem. □ E xam ple j Figure 3.5(a) shows a typical B-structure with three clouds and three registers. For j simplicity in this example, each register consists of a single FF. R3 is a HOLD register j while R l and R2 are LOAD registers. The lines a,b ,c,d are circuit primary inputs i to the various clouds and g is a circuit primary output, h controls the HOLD m ode; l of R3 such that if h = 0 during clock cycle t, then R3 loads new data between clock j cycles t and t + 1, and if h = 1, it holds the data present during clock cycle t for an additional cycle. j Consider the fault / which makes the line e stuck at 1. / is detectable, a n d 1 Figure 3.5(a) shows a test sequence T, consisting of three test patterns, th at detects ■ / . The patterns shown at the primary inputs and at h must be applied in order from , 1 left to right over consecutive clock cycles. The states of the internal signals during. this time are also shown. Note that the fault is first detected at primary output g j in clock cycle 3. I We shall now show that based on T it is possible to derive a single-pattern test for / . We first transform the test so that all HOLD registers (i.e., R3) operate j only in the LOAD mode, and then further transform it so that the value applied t o ' each prim ary input is constant throughout the test. Given the test sequence T, we say that a cloud C is active during a clock | cycle k if it is actively involved in sensitizing or propagating the effect of the fault during clock cycle k. In other words, there is a functional dependency between the. output of cloud C3, at whose output the fault is first detected at clock cycle 3, and> the output of C at clock cycle k. The cycles during which the various clouds in 33 XXI Cl XXI C3 XIX r R2 X O X X X D X X D C2 R3 X ll O X X ■ Load Enable- 01X i I (a) I X X - Cl XIX Ri i ----------- 1------ XXI a C3 XIX R2 b X X D X O X X X D C2 R3 c XXI Load Enable (b) Figure 3.5: (a) Test for e stuck-at-1, (b) transformed test with no H old operations. ; Figure 3.5 are active are indicated by underlining the corresponding logic values in the state sequences. Clearly, C3 is active at clock cycle 3. C l must be active at clock cycle 2, since it feeds C3 through the LOAD register R l which has a constant delay of 1 clock cycle. Note that every path between Cl and C3 must have exactly the same number of LOAD registers, because this structure is balanced; hence C l must be active at clock cycle 2 and at no other time. The other cloud feeding C3 is C2, and in this case the connection is via a HOLD register, which introduces a variable delay. The control sequence applied to R3 in test sequence T is ‘01 x \ which means | th at R3 is actually in the HOLD mode at the time (clock cycle 2) just before C3 s becomes active. The most recent clock cycle at which new data was loaded into R3 ' is 1; hence the erroneous output of C3 actually depends on the output of C2 in clock cycle 1. Thus C2 is active in clock cycle 1. j Figure 3.5(b) shows a transformed test in which C2 is active in clock cycle 1 2 instead of 1, and the HOLD operation of R3 is eliminated. In effect, all logical! activity in C2 and in all clouds feeding C2 (directly or indirectly) is delayed by one clock cycle, so that the HOLD cycle of R3 can be eliminated. Note that by delaying the activity in the portion of the circuit feeding C2, the activity in the rest of the circuit is unaffected. This follows from the fact that all paths between this p o rtio n ; of the circuit and the rest of the circuit must pass through the HOLD register R3. , Thus the sequence of states shown in Figure 3.5(b) is a valid test for / in which the error is first observed at clock cycle 3. J The transformation above ensures that every cloud feeding C3 (which is active at clock cycle 3) is active at clock cycle 2. The transformation process must b e 1 repeated for all clouds feeding C l and C2, respectively (none in this case), until (i) the clock cycles during which the various clouds are active have been determined, and . (ii) every HOLD register operates in the LOAD mode in the transformed test. From the preceding arguments it follows that every cloud is active during some unique ! clock cycle; further, no cloud can be active earlier than clock cycle 2, since adjacent, clouds are active during consecutive clock cycles and C3 is active at clock cycle 3 and the depth of the B-structure is 1. Hence the test can be further transformed ; ! such that (a) it consists of only two vectors, applied at clock cycles 2 and 3; and ( b ) ! i the input pattern required at the primary inputs of each cloud C during its active 1 35 cycle are actually applied during both cycles 2 and 3. The resulting single-pattern test vector T' for / is a = 1, 6 = 0, c = 0 and d = 1. The above example illustrates the procedure for transforming an arbitrary test into a single-pattern test, and demonstrates that every detectable fault is single pattern testable. j C orollary to T heorem 1 The maximum required duration of any single-pattern j test is d clock cycles, where d is the depth of the B-structure. 1 i I Proof: This follows from the fact that in the transformed test, all registers operate | in the LOAD mode, hence adjacent clouds are active during adjacent clock cycles.□ I In Theorem 1 we have shown that every B-structure is single-pattern testab le.! Note that the single-pattern testable property could be characterized a type of j “delay-independent” behavior in the sense that for every input-output pair, any value at the output depends on the value at the input at not more than one clock cycle. Based on the information available in our circuit model, i.e., the interconnec tion of the clouds and registers, B-structures represent a complete class of structures j that follow this behavior and are single-pattern testable. Since B-structures can be ; easily identified (using algorithms which will be presented in Section 3.6), they lead j to an efficient partial scan methodology. It should be noted, however, that o th e r' single-pattern testable structures could theoretically exist that are not B-structures.1 3 .5 .2 G en era tin g S in g le -P a tte r n T ests We now proceed to show that a complete single-pattern test set for a B-structure can be obtained using ordinary combinational ATPG. Lemma 2 showed that any fault th at is detectable in C B is single-pattern testable in S B. It remains to be shown J that every detectable fault in S B is detectable in C B. ! 1In fact, in Chapter 5 we will make use of additional information about circuits, viz. the presence of multiplexers and buses, to identify a larger class of structures, called SB-structures, that have the single-pattern testable property. _________________________ 36; L em m a 3 Let f s be a fault in SB and let fc be the corresponding fault in c B~m f s is detectable in S B then f c is detectable in C B . Proof: Let C f and S f be the faulty circuits. Since f s is detectable in S B, by i Theorem 1 there must be a single-pattern test t for this fault. Hence S B and S f must have different single-pattern outputs for t. This implies, by Lemma 1, th at C B and C f must have different outputs for t. Thus t is a test for f c in C B. □ T h eorem 2 Given a balanced structure S B, any complete test set for all detectable j faults in its combinational equivalent C B is a complete single-pattern test set for all' detectable faults in S B. : Proof: Let T be a test set for all detectable faults in C B. We need to show th at j T is a single-pattern test set for all detectable faults in S B . j Let f s be a detectable fault in S B and f c the corresponding fault in Cs .i Since f s is detectable, by Lemma 3 f c must also be detectable. Since T detects a ll; detectable faults in C B, it must contain some pattern t that detects f c . Hence, by: Lemma 2, t must detect fs- Thus T detects all detectable faults in S B. □ j The two theorems reduce the problem of test generation for B-structures to J the simpler problem of combinational ATPG, from which a complete single-pattern! test set can be obtained. The single-pattern nature of the test sequences makes! it possible, in a partial scan circuit, to easily test a kernel that is a B-structure. Each single-pattern test can be shifted into the scan path and held constant for the required number of clock cycles before the test results are loaded into the scan! path and shifted out. The implementation issues related to the scan path will be discussed in Section 3.7. Note that there is no loss in single stuck-at fault coverage. when this form of partial scan is used in place of full scan. ! \ i i 3 .5 .3 O b serv a tio n s on T estin g B -S tr u c tu r e s J i Although combinational ATPG is used in both the partial scan and full scan cases, | I the test generation effort may not be the same in both cases. To explain th is ,' 37 let us denote the combinational equivalent of the B-structure kernel in the partial scan design by PSKCE, and the kernel in the full scan design (which is obviously combinational) by FSK. FSK is simply a collection of all the clouds in the circuit. In PSKCE, the various clouds are cascaded with each other into a single connected structure (according to the connection of registers in the original B-structure). Thus given a particular SSA fault present in both PSKCE and FSK, the fault site is in general farther from the inputs and outputs of the circuit in PSKCE than it is in FSK. This implies a higher amount of analysis to derive a test for the given target fault in PSKCE than in FSK. However, note that the test pattern for this fault may detect more additional faults in PSKCE as a side effect than in FSK, since there are more “critical” faults that get simultaneously activated and sensitized in PSKCE. In general the relative effort in ATPG for the partial and full scan designs is not easily predictable and depends on the nature of the ATPG algorithm. ■ Regarding test length, however, a somewhat stronger statem ent can be made. ! In a full scan design, the test length for the circuit is equal to the maximum test j length among its individual clouds, since test patterns can be generated separately j and applied simultaneously to different clouds. The following lemma relates this te s t! length to that for a B-structure kernel in a partial scan design. ^ I L e m m a 4 Treating only detectable SSA faults in a B-structure as target faults, the test length for the combinational equivalent of the B-structure must be at least as high as the test length for any individual cloud within it. P ro o f by E x a m p le Consider the B-structure in Figure 3.3(a). Let the combina tional equivalent of this B-structure have a test set T of length L that covers all its detectable SSA faults. Now consider any cloud, say C2. Let the set of faults in C2 covered by T be F 2. Let T 2 denote any set of test patterns for C2 th at detects all the faults in F 2, and let L2 denote the length of T 2. We will show that irrespective of the nature of T , there must exist a test set T 2 such that L > L2. Assume the contrary, i.e., L < L2. Then for each test pattern in T, determine by simulation j the corresponding pattern present in R2 during the single-pattern test application. I i The resulting set of patterns at R2, if applied to C2, must cause all faults in F 2 to 38 be detected at the output of C2. By fault simulation on this set of patterns, and dropping those that do not detect any new target fault from F 2, a test set for C2 with length L2 < L is obtained. □ C o ro lla ry Given a partial scan design in which the kernel is a B-structure, and given a single-pattern test set of length L that covers a given set of faults, there exists a test set of length at most L for the corresponding full scan design that covers the same set of faults. Lemma 4 and its corollary imply that the number of test patterns for a partial scan design must generally be higher than the number of patterns required to detect the same set of faults in the corresponding full scan design. However, it is possible th at the full scan design may have more detectable faults within the combinational j logic than the partial scan design. For example, there may be faults in the cloud C2 ' th at can be detected by some pattern applied directly at the inputs of C2 but cannot j be detected by any single-pattern test applied to the B-structure. Such faults are i said to be sequentially redundant since they are detectable using full scan but are j not detectable during normal operation of the original sequential circuit. ' i i 3.6 Algorithm for Scan Register Selection Given an arbitrary sequential circuit to which partial scan is to be applied, BAL-1 LAST selects a set of registers to be made scannable such that the kernel is ai B-structure and the cost of modifying the circuit is minimized. It assumes that the modification cost is the same for registers having equal width, except in the case of a tie between a LOAD register and a HOLD register. In this case the latter is chosen to be in the scan path because converting a HOLD register into a scan register may require a lower area. The selection is carried out using the algorithms presented in! I this section. ! ! Let G = (V, A, H, w) be the topology graph of the circuit. Formally, we need1 to determine a set of registers R C A such that the topology graph of the kernel, (V, A — R, H — R, w), is balanced and w(a ) 1 S minimized. 39 A near-optimal solution can be obtained using the following steps. S te p 1. Transform G = (V, A, H ,w ) into an acyclic topology graph Ga by removing a set of “feedback” arcs R a such that YlaeRA w (a) minimized. S te p 2. Transform Ga into a balanced topology graph Gb by removing a set of arcs R b such that J2aeRB w(a) is minimized. S te p 3. R = R a U R b is the desired set of arcs, and the resulting topology graph G b = (V , A — R, H — R, w) represents the kernel. Both steps 1 and 2 above are intractable, as will be explained later. Note that ; because the problem is partitioned in this way, optimal solutions for the individual steps do not imply an optimal overall solution. j I 3.6.1 R em o v a l o f F eedback A rcs j i The problem of carrying out Step 1 above, which requires a minimum-weight feed-1 back arc set R a to be determined, is known to be NP-complete [21]. Our implemen tation of BALLAST uses the branch-and-bound algorithm outlined below. We w ill1 refer to this algorithm as break C y cles. This algorithm is adapted from the o n e , presented in [22], to which the reader is referred for details. j Let I = {1, 2 , ..., |V|}, and let P : V — ► I represent some total ordering of; the nodes in V . Then P uniquely determines an arc set R a (P) of “feedback” a rc s,' based on the ordering P of nodes, as follows: R a (P) = { ( i j ) 6 A I P (j) < P(i)}. I i Essentially, R a (P) contains all arcs from a higher numbered node to a lower num bered node if the nodes were enumerated according to P. Deletion of all the arcs in R a (P) would eliminate all cycles. Thus the problem is to find an ordering P such th at the total weight of the arcs in R a (P) is minimum; then R a — R a (P)- i I To characterize our branch-and-bound algorithm we need to define a state in ; the search space, a branching rule, and a bounding rule. A state is defined as a tuple! 40 (W,cr) where W~Q V ancfeTis a total ordering on W . A state with W = V is called a leaf state and represents a potential solution; any non-leaf state represents only a partial solution. Given a current state (W, a) with W ^ V, branching is carried out by generating a new state of the form (W 1 , a') where W ' = W U {u/}, w' € V — W; \/w € W , a '(w ) = a{w ); and a'{w') = \W\ + 1. A partial solution state (W,cr) can be bounded, or removed from further con sideration along with all states derived from it, if the lower bound on the weight of any leaf state that can be generated from it is higher than the weight of the best potential solution generated so far. Our algorithm computes the lower bound as J follows. Let Fl represent the set of local feedback arcs whose source and destination ; 1 nodes are both in W and let Fq represent the set of global feedback arcs with source j node in 1 / — W and destination node in W; i.e., I Fl = {(»,j) € A \ i , j € W and a{j) < cr(z)}; j I Fg = {(*,j) € A \i € V — W and j € W }. \ Using F to denote Fl, U Fq, a lower bound on the cost of eliminating all feedbacks is given by the expression i w ( a )• I aeF , The b re a k C y c le s algorithm begins with the state (W = < f> , a = f ) and the process of branching and bounding continues until all possible states in the state j space have been either visited or eliminated by bounding. Note th at since all possible j states are implicitly enumerated, the solution obtained is optimal. The number of states that can be eliminated by bounding depends not only on the computation of the lower bounds but also on the branching rule used for selecting the next state to branch to. W ith a good branching rule, one or m ore. good solutions can be found early in the search process, which help to eliminate | partial solutions later. Our algorithm uses the following guideline. Each tim e aj branch is generated at a given state, all the possible child states are clearly similar | except for the newly added node. We select that state whose new node has the j l largest number of other nodes in G reachable from it as the next state to visit. The 41 underlying goal is to ensure that as many arcs as possible lie in the forward direction according to the ordering generated, and thus to try to find good solutions quickly. The information on the number of reachable nodes is determined beforehand by carrying out a transitive closure operation on the topology graph. 3 .6 .2 B a la n c in g A c y c lic S eq u en tia l S tru ctu re s A simplified form of Step 2 has also been shown to be computationally in tractab le! [17]. In this section we present a heuristic procedure, balance, for solving this prob lem. balance uses a verification procedure, check, to verify th at a given structure j is balanced. I [ I l 3.6.2.1 V erification P rocedure i i I Given the topology graph G of a sequential structure S , the following procedure checks whether S is balanced. First the procedure checks whether the removal of| each arc in the hold set would disconnect the topology graph. Then, starting at | each root node (i.e., node with no incoming arcs) in turn, it levelizes the portion of the graph reachable from it. If every node is found to be at a unique distance from each root the graph is pronounced balanced and the procedure returns SUCCESS. If j an imbalance is detected at any time the procedure exits and returns FAILURE. fun ction check (G = (V, A, H ,w )): Returns SUCCESS if G is balanced, FAILURE otherwise. 1. Construct the graph G' = (V, A — by removing all HOLD registers from G. Determine the connected components {Ci, C *2, ..., Ck } of G '. 2. Construct the graph G" consisting of the original topology graph G with each connected component C{ of G' replaced by a single node. Note th at the arc set of G" is H. 3. Determine whether G" is a tree. If it is, condition 3 of the definition of B-structures is satisfied; proceed to the next step. Otherwise return FAILURE. 4. Repeat for each G\: 42 (a) Determine the set R O O T S of root nodes of where root nodes are defined as nodes with no incoming arcs. If there are one or more root nodes, proceed to the next step; otherwise return FAILURE since condition 1 in the definition must be violated. (b) Pick a root node v\ in R O O T S. Starting at tq, carry out a breadth-first traversal of all nodes in Ci reachable from iq by a directed path, and assign each node visited a level number equal to its distance from iq. If at any time a node V2 needs to be assigned a level number when it has i been previously assigned a different level number with respect to iq, j stop the search and return FAILURE, since C; must violate either ] condition 1 or condition 2. Continue the traversal until no more nodes can be visited. < i (c) Repeat step 4b for all root nodes in R O O T S. 5. S' is a B-structure; return SUCCESS. □ Figure 3.2 indicates the level number assigned by the procedure to each node. ^ In the above procedure, the computation complexity is 0 (n -\-m ) where n = |F | and m = [ A |. i 3 .6.2.2 B alancing P rocedure j An acyclic topology graph G — (V, A ,H ,w ) is balanced if and only if all paths! between any given pair of nodes are of equal length and the removal of any arc I . in H disconnects G. A heuristic procedure, b alan ce, for balancing G is presented; below. It returns a set of arcs R C A such that the derived topology graph (V, A — R, H — R, w ) is balanced. The procedure assumes that G is fully connected; if G is disconnected, the procedure must be invoked on each connected component t o , balance it separately. The procedure works in a recursive manner by partitioning the topology graph into two smaller topology graphs, balancing them independently, and then merging the solutions. The partition is obtained by determining a minimum cost c u tse t' (m incut) C S oi the topology graph. The intuitive idea behind using a m incut is , to minimize the sensitivity of the overall solution to the degree of optim ality of j the merging process (which is .based on a greedy heuristic). The mincut can be; determined by applying the maximum flow algorithm [23] to the topology graph, 43 treating prim ary inputs as sources and primary outputs as sinks, with capacity of arc a € A defined by w(a). During merging, a greedy algorithm is first used to determine a maximal set of L o a d registers taken from the mincut C S that can join the two balanced sub structures while keeping the merged topology graph balanced. From the definition of B-structures it follows that if any HOLD register is used to connect the two balanced sub-structures, no other register should connect them. Thus an alternative to using the set of LOAD arcs derived above is to use the maximum-weight HOLD arc in the mincut. The costs of these two solutions are compared to determine the set of arcs i to be actually used for merging. j fun ction balance (G = (V, A 1 H ,w ) : acyclic topology graph): Returns R C A j such that (V, A — R, H — R, w) is balanced. i 1. If (check(G) = SUCCESS) then return (R * — (f> ), else proceed. I 2. C S < — minimal cost cutset of G; let Gs, Ga be the subgraphs of G induced b y ' C S. ■ 3. Balance Gs and Ga separately; R < — balance(G s) U balance(Gd) U C S . i 4. Let C S h C S D H , C S l < — C S D (A — H ) be a partition of C S into its HOLD and L o a d registers. Sort the arcs in C S l in order of decreasing cost, j 5. CS'L < — < f> , the set of L o a d arcs retained in the topology graph when merging Gs and Gd- 6. For all arcs a in C S l , in order of decreasing cost: j Check whether the inclusion of a makes the merged graph unbalanced: If fcheck!Vj (A — R) U CS'L U {a}, H - C S , w) = S u c c e s s ] i then CS'L «- CS'L U {a}. ; 7. Let an < — highest-cost arc in C S h - 8- If J2aLecs'L w (aL) > n>(a#) then R < — R — CS'L else R < — R — {an}- 9. Return R. □ The run time of the above algorithm consists of two parts: (a) the com puta tion of a minimum cutset at each recursive step, which may be repeated 0 (m ) times 44 ^ Figure 3.6: Illustration of balance procedure, (a) Original topology graph, (b) bal- j anced topology graph. (Bold arcs represent HOLD registers.) 1 i where m = |^4|; and (b) the invocation of the check procedure, which is carried out | at most once for every arc in a minimum cutset, i.e., 0 (m ) times. Since step (a) dominates the time required by the check procedure, the overall complexity is t h a t1 of computing O (m) minimum cutsets. ; In this section we have described a procedure for determining a near-minimal J cost set of registers to be made scannable. Since a branch-and-bound algorithm i s , used to solve a part of the problem, the time complexity of the procedure grows exponentially with the number of clouds in the worst case. However, the compu-1 I tation tim e is not excessive since the procedure deals with each cloud of connected I combinational logic as a single node irrespective of the number of gates/transistors j within the cloud. i I l Figure 3.6(a) shows a topology graph that is not balanced. (HOLD registers, ! are indicated by bold arcs.) Figure 3.6(b) shows the result of applying balance to it; j the arcs removed are (e, / ) and (g , i), and the registers in the circuit that correspond i to these arcs must be included in the scan path. ; i Note th at although the heuristic balance produces a single scan FF selection, the heuristic could easily be extended (for example, by using different cuts other than the m incut to partition the topology graph) to produce a range of solutions. Thus for I the circuit of Figure 3.1, a range of alternative scan designs shown in Figure 3.7 could be derived. (Shaded blocks represent scan registers.) Note that every scan register 45 C2 C4 C3 C2 C4 C3 I 1 I C2 C4 C3 C2 C4 C3 Figure 3.7: Various scan design solutions for the same circuit. . \ i I implies an additional delay for signals passing through it. Hence these designs may j have varying levels of performance degradation depending on what scan registers, if' any, lie in critical timing paths. By obtaining the combinational equivalent of each kernel, the test length and associated test application time can also be determined for each design. Thus among these designs, the solution th at best meets the user’s goals on area overhead, performance degradation, test time, etc. could be selected. I i 3.7 Implementation of Scan Path ! The BALLAST test application procedure, in its simplest form, consists of scanning! a pattern into the selected registers and holding it constant for d clock cycles, where 1 d is the depth of the B-structure comprising the kernel. If the overhead of providing : all selected SPRs with HOLD modes is acceptable, no extra work needs to be done.! However, not all patterns need to be held at the kernel inputs during all d clock: 46 cycles. In this section we show how by introducing dummy bits the need for some SPRs to have a HOLD mode can be eliminated. 3 .7 .1 E x a m p le Consider the BALLAST partial scan design shown in Figure 3.1(b). The scan path contains the registers R6 and R3, which are ordered in the scan path as shown. Each test pattern consists of two parts, t 3 and t6, corresponding to the SPRs R3 and R6 I respectively. If R3 and R6 are both provided with HOLD modes, one way to apply j the test is to scan in the combined pattern t3te and hold it in the scan path for 2 ; additional clock cycles before loading the output of C4 into R6 and R3 after the 3rd clock cycle. j I The test result loaded back into the scan path depends on the value in R3 during the 1 st clock cycle and on the value in R6 during the 3rd clock cycle. Hence | f if we can arrange to have t 3 in R3 in the 1 st clock cycle and f6 in R6 in the 3rd clock j cycle, the HOLD modes in these registers can be eliminated without affecting thej test result. This can be done by inserting two dummy ( “ don’ t care”) bits in the test j pattern between t 3 and t& . When t 3 becomes available in R3, t$ is still displaced by 2 J bits from its final destination in R6 . After two more shifts, during the 3rd cycle R6 contains the correct input pattern. Hence the desired test result is loaded into R3 j ! and R6 after the 3rd clock cycle. Thus the modified test pattern with two dummy bits eliminates the need for HOLD modes in both R3 and R6 . The example presented above is a simple illustration of the concept used to I minimize the overhead due to additional HOLD modes. In general, however, it may | not be possible to eliminate the H old modes of all the SPRs. We now describe how the number of additional HOLD modes can be minimized in an arbitrary circuit. | 3 .7 .2 C o n str u c tio n o f S can P a th ; In general a SPR can play two roles: it can serve as a driver, i.e., a SPR th at feeds j test inputs to a cloud, or as a receiver, i.e., a SPR that is fed test results by a cloud.' 47 ■ X^ingle~SPRTcairsimultaneously play both roles; it can drive test patterns into one j cloud and receive test results from another (or the same) cloud. An example of a pure driver or pure receiver would be a boundary scan element in a chip employing the boundary scan methodology. Let the distan ce between a driver and a receiver, given th at at least one path exists between them, denote the number of registers in any path between the driver and the receiver (excluding themselves) through the kernel. Note that since j the kernel is a B-structure, all such paths must have the same number of registers. > If the distances from a given driver to all receivers to which it has a path (including itself if it is also a receiver) have the same value, define the la ten cy of the driver as this value; otherwise let the latency be undefined. A latency of x for a driver implies th at the test result loaded into the scan path during clock cycle k depends only on • the value present in the driver during clock cycle k — 1 — x. The above concepts are illustrated in Figure 3.8. The registers R2, R3, R6 , R7 and R8 are included in the scan path and ensure that the kernel is a valid B- ( structure. Note that R7 and R8 are not strictly required to be scannable to ensure j that the kernel is a B-structure; however, they may be included in the scan path if j the circuit employs boundary scan. In this circuit R7 is a pure driver; R8 is a pure | receiver; and R2, R3 and R6 play both roles. In the original circuit all registers are Load registers except R2, which has a HOLD mode. The latency of each driver in Figure 3.8 is indicated alongside it as [x]. Undefined latencies are denoted by [U]. j I We now need to determine the following: (1) which SPRs (if any) should b e ; provided with a new HOLD mode; (2) an ordering of all the SPRs in the scan path; and (3) a pattern of dummy bits to be inserted into each test pattern. We consider these issues in turn in the following rules for constructing the scan path. Given a driver with undefined latency, the test result due to each test pattern depends on a single pattern being present in the driver during more than one clock cycle. This follows from the definition of latency. For example, consider the driver | R3 in Figure 3.8. If the scan path is to be loaded with the test result during clock cycle k, the result to be loaded into R2 depends on the pattern in R3 during k — 1, 1 and the results to be loaded into R8 , R6 and R3 depend on the pattern in R3 during: 48 H o l d control (in original circuit) r ■ I I X Jf[l] R7 . H Cl I r ; lij 'f C2 R5 * — T ~ ^ --- r : I C3 R4 t SCAN OUT * C4 R3 H o l d control (for test) [U] R6 [0] m SCAN IN Figure 3.8: Example of scan path implementation. k — 3. For single-pattern testability it is essential that the same desired value be present in R3 during k — 1 as well as k — 3. Hence R3 must be provided with a H o l d mode. The general rule is^ stated below. R u l e 1 All drivers whose latencies are undefined must have HOLD modes. The scan path is constructed by forming d -f 2 groups of SPRs, where d is the depth of the B-structure comprising the kernel. Figure 3.9 shows the sequence of groups in the scan path. The group closest to the scan-in pin is Group 0 and the group closest to the scan-out pin is Group d 4- 1. W ithin each group the ordering of SPRs may be based on routing considerations. SPRs are assigned to groups according to the three rules presented below. Since pure receivers never need to be supplied with test input data, all such registers can be placed at the scan-out end of the scan path so as to minimize the number of clock cycles needed to shift a new pattern into the scan path. In the example of Figure 3.8 (which has d = 2), R8 is placed in Group 3 of the scan path. This principle is stated in the following rule. 49 SCAN Group IN PIN 0 Group Group Group SCAN O U T d-1 d d + 1 PIN Figure 3.9: Organization of scan path into groups of scan path registers. i R u l e 2 A ll pure receivers are placed in Group d -f 1. Drivers are placed in the scan path according to their latency values. As ! we have seen, a driver with a well-defined latency need not have a HOLD mode, provided the arrival time of the desired test data in this driver is synchronized with the appropriate arrival times of test data in the other drivers. For example, in \ Figure 3.8, R6 and R7 are not provided with HOLD modes. For the test to be | I applied correctly to the kernel, R6 must receive its correct pattern exactly one clock j cycle after R7, since the difference in their latency values is 1. i One way to achieve this synchronization is as follows. Drivers with high j latency are placed further along the scan path. The test patterns for all drivers are concatenated into a single pattern and scanned in; however, dummy bits are placed j at selected positions in the test pattern so that test data destined for drivers with 1 l lower latencies lag behind those for drivers with higher latencies. The following two | rules deal with the ordering of registers so that the test patterns can be applied in this manner, and the actual formatting of pattern is described in a final rule. i R u l e 3 All drivers not having a HOLD mode and having latency I m ust be placed i n ' Group I. ; Note that all drivers not having a HOLD mode must have well-defined latency ! values. This rule ensures that drivers having higher latency are placed further along the scan path than those with lower latencies. Thus drivers in Group d receive th e . I correct pattern first; at this instant the testing of the kernel effectively begins even though the patterns destined for drivers in Group 0 still lag by d clock cycles. So far we have not considered the placement of drivers having HOLD m odes.; I The following rule places such drivers in the same group as drivers having latency d. THis ensures tHat they are among the first drivers to receive their correct pattern, which they then hold for the duration of the test (i.e., for the next d clock cycles). R u le 4 All drivers provided with HOLD modes for test and all drivers with HOLD modes already present are placed in Group d. Thus both Rules 3 and 4 may assign drivers to Group d. According to th ese ! rules, R2 and R3 are placed in Group 2, R7 is placed in Group 1, and R6 is placed in Group 0. The resulting scan path is shown in Figure 3.8. ' ! I I i I 3 .7 .3 T est A p p lic a tio n 1 t Once the drivers have been ordered according to the grouping determined above, | we simply need to ensure that the pattern for each group of drivers lags behind the > pattern for its neighboring group by one clock cycle. This is done by the following i rule. I R u le 5 Each single-pattern test is modified by introducing one dummy bit between ■ each pair of consecutive groups i and i + 1 , 0 < i < d — 1 . I Thus a total of d dummy bits is added. Two or more adjacent dummy bits | may be present in a sequence when there are empty groups in the scan path. I n ' the example of Figure 3.8, if represents an input pattern for register Ri and x 1 represents a dummy bit, the modified test pattern to be scanned in (ordered from left to right) is t ^ x G x t Q . No input pattern is required for R8 since it is a pure t receiver. W hen the scan path is implemented as described in this section, the form al, test procedure given in Section 3.4 needs to be modified by generalizing step 2 (a) as , follows: | i 2. (a) Place all scan registers having a HOLD mode in the HOLD m ode, all other scan J registers in the SHIFT mode, and all non-scan registers in the LOAD mode for d clock cycles. (This allows test data to propagate through the kernel.) \ 51 3 .7 .4 C ircu it M o d ifica tio n s The registers in the original circuit may be classified into four types depending on (1) whether or not they have a HOLD capability, and (2) whether or not they are to be included in the scan path. The modifications required for each type of register so th at they can perform the appropriate functions according to the test plan are listed below. N on -Scan R egisters W ith ou t HOLD M ode: These require no modification. N on -Scan R egisters W ith HOLD M ode: The individual HOLD controls of all such registers should be externally controllable so that they can be operated in LOAD mode while the kernel is being tested. Scan R egisters W ith HOLD M ode: These registers need to be converted in to ! scan path registers by the addition of SHIFT modes. Their H old mode signals must be externally controllable so that they can be made to hold test patterns for the ■ required number of clock cycles. No HOLD control signal for a scan register may serve as a control signal for a non-scan register. t .Scan R egisters W ith ou t H old M ode: In addition to being augmented with SHIFT modes, some of these registers need to be provided with HOLD modes as w ell.; The H old control signals for all such registers may be controlled by a single external pin. These modifications may result in a small logic overhead and possibly addi tional I/O pins for ensuring that the HOLD signals are appropriately configured. 3.8 Testing Register Functional Modes i i As we have mentioned previously, a complete set of single-pattern test vectors can i be obtained for covering all detectable faults in the combinational logic. While 5 2 ’ exercising the combinational logic, every FF must operate in the LOAD mode; hence other built-in modes of operation of the FFs may not get exercised. In this section we deal with the problem of testing the R eset (or C l ea r), P reset and H old modes of operation, if present in any FFs in the circuit. Further, we focus our attention on faults in non-scan FFs, i.e., FFs within the kernel. FFs in the scan path are either tested for free during tests for the kernel or can be easily tested using special patterns shifted in and out of the scan path. The three functional modes listed above give rise to six possible fault modes: stuck-at-RESET, can n ot-R eset, stuck-at-PRESET, cannot-PRESET, stuck-at-HOLD and cannot-HOLD. We shall map faults in the functional operation of FFs on to structural faults on the control lines for the FFs. Given the B-structure S B under j test, we generate the combinational equivalent C B using the functional combina-1 tional equivalent of each FF, depending on its built-in functions, rather than simple wires and/or inverters. j Figure 3.10(a) shows a FF connecting two clouds C\ and C 2 , and Figures ' 3.10(b) and (c) show how the combinational equivalents of FFs having RESET and PRESET modes, respectively, are used. Each fault in these functional modes is functionally equivalent to a stuck-at-0 / 1 fault on the corresponding control line. Thus it is sufficient to detect both stuck-at-0 and stuck-at-1 faults on the control I lines using the combinational equivalent. Note that some of the faults may be tested j for free while testing the clouds of the kernel. The control lines may be treated as j primary inputs for the purpose of test pattern generation. If a control line fans o u t1 to more than one FF, as shown in Figure 3.10, faults on all the lines marked x (i.e. , 1 the fanout stem as well as the fanout branches) must be tested. This ensures that all appropriate fault modes are covered. Figure 3.10(d) shows the combinational equivalent of a HOLD FF. Unlike the ’ faults considered earlier, the manifestation of HOLD faults depends on the previous state of the circuit. Two single-pattern test vectors are required to detect the cannot- HOLD fault. While the first vector is applied all FFs operate in the LOAD mode, and while the second vector is applied the FF under test (and possibly other FFs) operate in the HOLD mode. Two copies of the combinational equivalent of the B- 1 structure, viz. C B and C B, are required for generating a test. This is illustrated' 5 3! Control Lines to other FFs Figure 3.10: Modeling FF functional modes, (a) FF connecting two clouds, (b) ! model of RESET operation, (c) model of PRESET operation, (d) model of HOLD ' operation. j in Figure 3.11. Note that multiplexers are not required to model the HOLD FFs in I C f since all FFs operate in the LOAD mode while the first vector is applied. Test pattern generation on the combined combinational structure yields two patterns t\ and t 2 corresponding to C f and C f, respectively. During test, first C is scanned in and applied to the kernel in the normal way with all FFs in the Lo ad m ode.! After the kernel outputs have stabilized the control signal under test is switched t o 1 the H old mode. t 2 is now scanned in and applied to the kernel in the normal way except for the control signal under test being in the HOLD mode. The output of th e ; kernel for the second pattern is used to detect the fault. j The approach described above is sufficient if the HOLD control line for the | I FF under consideration is independent of all other control lines. If there are depen-, ! dencies among the control lines, however, the issue of testing them is more complex and some form of functional testing may be required. ; H o l d ___________ _________________, Control Figure 3.11: Combinational equivalent for detecting HOLD faults. j 3.9 Case Study j i Both full scan and the BALLAST technique were applied to a Viterbi decoder de signed by the Jet Propulsion Laboratory. The basic building block was a chip con taining 16 butterfly circuits. Each chip contained 448 FFs and the total gate count j 1 was 7,280. In our overhead analysis we have ignored the area due to 16 shift registers , present on the chip since they do not cause an area overhead for either full or partial J scan. In both versions a global test mode signal was used to control the operation of i the scan path FFs. The various characteristics of both designs are shown in Table' 3.1. The logic overhead figures are based on the increase in the gate count. Type of scan Full scan B A L L A S T No. of scan path FFs Logic overhead Pin overhead No. of test patterns Clock cycles to apply each pattern Overall test time 448 24.6% 3 * 34 449 15,714 256 14.1% 4 41 262 10,998 Table 3.1: Comparison of full scan and BALLAST partial scan. 55, BAfLLXST required only 256 of the 448 FFs to be made scannable, of which only 96 were required to hold data during test application. This was achieved in the following manner. Instead of one global clock signal controlling all FFs in the circuit, two separate clock signals CK1 and CK2 were used. CK2 was connected to the 96 scan FFs that needed to hold data during test; CK1 was connected to all other scan and non-scan FFs. Both clocks were operated simultaneously at all times except during the part of the test plan when the 96 scan FFs connected to CK2 were required to hold their data. During these clock cycles only CK1 was operated while CK2 remained inactive. This implementation required one additional I/O pin and there would be an additional routing overhead due to the separate clock signals. The resulting kernel had depth 5. The number of single-pattern tests gen erated by combinational ATPG on the combinational equivalent circuit was higher j than in the full scan case. However, there was a 30% reduction in the overall test tim e for the BALLAST circuit due to the shorter scan path, even though dummy bits were added to each test pattern. j ! 3.10 Summary In this chapter a methodology for partial scan design has been described. The foundation of this approach is based on defining a new class of synchronous sequential circuits called balanced structures. They have the following properties which make them useful as kernels in a partial scan circuit: (i) they are single-pattern testable for all detectable faults in the combinational logic and nearly all faults in the storage ; elements; (ii) the FFs internal to these networks need not be made scannable; and ’ (iii) they can be treated as combinational circuits for the purpose of test generation. I The concept of balanced structures can be used to reduce various overheads | in partial scan design for arbitrary sequential circuits. By identifying the balanced sub-structure of the circuit that has the largest number of FFs, it is possible to minimize the test overhead by configuring the balanced sub-structure as a kernel to be tested using a partial scan path without any loss in fault coverage. The j com putation tim e for this analysis is in general orders of magnitude lower than the j I __________ 56' lim e required for sequential ATPG on the original circuit. Some FFs in the scan path may need to have a Hold mode so that single-pattern tests can be applied to |the kernel. By ordering the scan path registers appropriately the number of such HOLD modes required can be reduced to a minimum, provided the test patterns are modified by inserting dummy bits where necessary. Case studies indicate that the logic overhead for the scan path can be reduced significantly using this partial scan methodology, particularly in pipelined circuits such as those which often occur in ^digital signal processing chips. By eliminating scan registers in the critical path the performance of the circuit can also be enhanced. Chapter 4 Partial Scan Design with Unbalanced Structures “ Less is more.”—Robert Browning 4.1 Introduction » Autom atic test pattern generation (ATPG) for acyclic sequential structures is known J to require substantially lower computation effort than for arbitrary sequential struc- | tures [20]. This fact is made use of in the BALLAST approach presented in Chapter j 3 as well as in other techniques [16]. Essentially they select flip-flops (FFs) to be included in the scan path such that the portion of the circuit effectively under test, | the k e rn e l, is either acyclic or close to acyclic. For example, BALLAST makes th e ! kernel acyclic and balanced. BALLAST ensures that a single-pattern test is suffi-: cient for testing all faults. For a general unbalanced kernel, however, a sequence consisting of one or more patterns may be required to detect any fault. In this chapter we assume that a set of FFs has already been selected such that the resulting kernel is acyclic. This selection could be carried out using the | algorithm of Section 3.6.1. Any sequential ATPG program can then be used to obtain test sequences for the faults in the kernel. To apply a sequence of test patterns to the kernel, the circuit must be designed such that while in the test mode the clock signals for scan FFs should be controllable independently of the clock signals for the < _______________________________________ 5 8 1 non-scan FFs. A test sequence can then be applied to the kernel using the following two steps alternately: 1. Serially shift a test pattern into the scan path while disabling the clock signal feeding the non-scan FFs (this effectively puts the non-scan FFs in a HOLD mode); 2. W hile disabling the clock signal feeding the scan FFs (putting them in a HOLD mode), activate the clock signal for the non-scan FFs for one clock cycle (this enables test data to propagate through one level in the kernel). Formally, a test sequence for a fault in a sequential circuit consists of a set of consecutive time frames, in each of which patterns containing both specified and don’t-care values may need to be applied at the various inputs. The length of a sequence is the total number of time frames in it. For an acyclic circuit structure • the length is related to the d e p th or the highest number of FFs in any path in j the structure. If d is the depth of a structure, the test sequence length is bounded by d + 1. However, a given test sequence may contain unassigned or don’t-care input values such that not all primary inputs need to be provided with new data at each of the d + 1 clock cycles. If the inputs and outputs of the structure under test are directly accessible, the time for applying the sequence is d + 1 clock cycles and is not affected by the presence of don’t-care inputs. However, in a partial scan design many of the inputs and outputs of the structure are accessed by shifting data serially. Hence the presence of don’t-care input values could potentially lead to a 1 great saving in test time. In such circuits, where the length of the scan path i s ; usually much higher than d, the test time is dominated by the tim e to shift new patterns into and out of the scan path. ; i An example of acyclic structures which need less than d + l input patterns are j kernels in the BALLAST methodology. In BALLAST, scan path FFs are selected j so as to make the kernel not only acyclic but also balanced, i.e., every path betw een, any two points in the kernel has the same number of FFs. Such kernels (known as 1 B-structures; see Chapter 3) require only a single-pattern test for any fault. Each test pattern in general needs to be held in the scan path for d + l clock cycles. The i num ber of FFs to be included in the scan path in the BALLAST approach, however, 59 ......J jis higher than that required to just make the kernel acyclic, hence the overhead cost |is higher. In this chapter we present an alternative partial scan approach, ACYST (ACYclic structured Scan Test). We study the implications of using a kernel that has an acyclic structure but may be unbalanced. Clearly the area overhead for this case is lower than for a balanced kernel. However, there are two drawbacks. First, a simple combinational equivalent cannot be used for ATPG as in BALLAST. Sec ond, any given fault may require a sequence of up to d + 1 patterns to detect it; this can lead to a high test time to apply a single test sequence, since the individual patterns need to be shifted separately into the scan path. The ACYST methodology deals with these two drawbacks introduced by the unbalanced nature of the kernel. Although it cannot eliminate them, it minimizes their impact, making the use of an unbalanced kernel more acceptable. j For example, consider the circuit shown in Figure 4.1(a) consisting of combi national logic blocks A, B, C interconnected with registers. Each connection shown may consist of any number of wires. Registers R3 and R4 are selected to be made scannable since the resulting kernel, shown in Figure 4.1(b), is acyclic. Note that the j kernel is obtained from the original circuit simply by removing the scan registers and replacing them by pseudo-inputs/outputs. In the kernel, inputs/outputs connected to the same combinational logic block are merged together; thus the kernel effec tively has one output at C which feeds scan path registers and the primary output 01, and inputs at both A and B are fed by scan registers and/or the prim ary input II. The depth of this kernel is 2. Typical sequential ATPG programs would construct an iterative array con sisting of up to 3 copies of the kernel, each representing one tim e frame, and attem pt to find a test sequence for a given fault (if one exists) within these tim e frames. Since ! the kernel has only one primary output at C, any test sequence for a fault m ust prop- \ agate an error to this output. Assume that the error is visible at the output at time frame 2. Because of the topology of the circuit this output value must depend only on the prim ary input values at A at time steps 0 and 2 and on the prim ary input value at B at time frame 1. All other input values are essentially “don’t-cares” in all possible test sequences and are indicated by ‘ x ’ in the input sequences in Figure ; I 60 ! II R4 00 R2 (b) Time frame 0 Time frame 1 Time frame 2 ( 0 Figure 4.1: Example of ACYST. (a) Partial scan design, (b) acyclic kernel, (c) test generation model. 61 4.1(E) (to be applied in the order from left to right). Note th at each ‘x ’ represents a vector of don’t-care values. An ATPG program would normally assign random logic values to these inputs. This means that approximately half the test tim e in this case could be taken up in shifting random data into the scan path. The effectiveness of this test process can be improved in two ways. First, the test pattern generator can be enhanced to fill in the unassigned input values with deterministic patterns that detect one or more additional faults, rather than with random data. In general there is no guarantee that any additional faults can be detected in this way, especially if most of the circuit faults have already been covered. Second, the test sequence can be compacted so that the shifting tim e is reduced without reducing its effectiveness. The latter approach is the subject of this chapter. In the simple example of Figure 4.1, A has its input value specified at time frame 0 but not at 1 , while B has its input value specified at 1 but not at 0. Hence | we can combine the first two patterns by shifting ao and bi simultaneously into the j scan path, holding them there for two clock cycles instead of one, and then shifting in a2. This is a modification of the basic test procedure described earlier in this section. Thus the number of shift cycles of the scan path (i.e., the num ber of times a new pattern is shifted in) is reduced from three to two without losing any of the determ inistic part of the test sequence. Note that this applies irrespective of which fault is under test. Intuitively, the fact that there are two “unbalanced” paths from A to D with unequal delays indicates that in general two distinct input patterns at A will be required to guarantee detection of an arbitrary fault. In Section 4.2 we present a more formal and general discussion of how to test unbalanced acyclic structures. We use a formal model to compute a lower bound on the num ber of shift cycles required to test for an arbitrary fault, and present a test compaction algorithm that achieves the lower bound. In Section 4.3 we study the problem of simplifying the iterative array model used for ATPG by making use of the fact th at the kernel is acyclic. For example, Figure 4.1(c) shows a test generation model (TGM) consisting of a reduced iterative array that is sufficient for any fault in the kernel. The TGM can be further condensed based on the compacted schedule th at will be used for test application, as described in Section 4.3. 62 4L 2 Optimal Test Scheduling A given test sequence for a partial scan design whose kernel is an acyclic structure of depth d may consist of up to d + 1 time frames. Our objective is to find a way jof compacting the patterns in a test sequence so as to minimize the num ber of time frames at which new data needs to be applied. This ensures that when the test Ipatterns are applied using the scan path, the shifting time (which usually dominates the test time) is minimized. Assume that the time frames are numbered from 0 (the earliest) to d (the latest, at which time the fault gets detected). We define the 'schedule as the list of tim e frames in the test sequence that require new data to be Shifted in, in ascending order. Thus a schedule (0,1,2,..., d) means that new d ata : •is shifted in at every tim e frame, while (0) represents a single-pattern test. In the jexample of Figure 4.1 presented earlier time frames 0 and 1 are combined together, •hence the schedule is (0 , 2 ). We shall refer to each element in the schedule as a shift step , since in the i corresponding tim e frame a new pattern needs to be shifted into the scan path, and each element not in the schedule as a hold step, since it requires the contents of the scan path to be held for an additional clock cycle. Note that the test pattern | t scanned in during a given shift step i in the schedule (... , i , j, ...) is actually the j result of compacting the test patterns for time frames i , i + 1 , i -f 2 , ..., j — 1 . I 4.2 .1 T h e C o m p a ctio n P r in c ip le i In our earlier example using Figure 4.1, we combined the test patterns for tim e i frames 0 and 1 because neither of the inputs at A or B need to have a value speci- ; fied in both frames. This compaction applies to all test sequences in this example. ! Before studying more complex cases we state the following principle th at governs | our compaction problem. The term minimal test sequence refers to a test sequence in which all unspecified input values are left as don’t-care values. C o m p a c tio n P rincipled A set of consecutive time frames in a minimal test se quence may be compacted together into a single shift step in a schedule only if no input is required to be assigned values in more than one of these time frames. We will apply this compaction principle before test pattern generation is actually carried out. Note that the principle does not make use of the actual values of the test patterns; it uses only the information, derived from the circuit structure, about which input values can be specified and which must be don’t-cares in various tim e frames. In a single-output acyclic structure, such as our previous example, the required information about the input values can be derived using the following •rule: An input can be assigned a value in time frame x if there exists a path from that input to the output that passes through d — x FFs (assuming the error is first observed at the output in time frame d). Later in this section we describe how to determine an optimally compacted schedule that satisfies the compression principle based on this information. 4 .2 .2 M o d e lin g S ch ed u le C o n stra in ts i Let us now consider the m ulti-output structure in Figure 4.2 in which blocks A, B, C, etc. are combinational and unlabeled blocks are registers. Its depth d is 4 and it has three inputs and four outputs, all of arbitrary width. As before the inputs and outputs are accessible only through a scan path which is not shown. Given an arbitrary fault in the circuit, a test sequence may propagate the fault to any of the outputs. At some outputs it may be possible to detect a fault at a tim e frame earlier than d. However, note that the same test sequence displaced in time can be used to detect a fault at different time frames. Hence we shall assume without loss of generality that a fault is to be detected at time frame d. This is justified since any fault that is propagated into the scan path at time frame d will be observed during the first shift step for the subsequent test sequence. This assumption will lead to a simplified test generation model discussed in Section 4.3. ! The patterns shown at each input in Figure 4.2 indicate the tim e frames at which an input may possibly need to be specified in order to detect a fault at one of the outputs at tim e frame d. This information is determined in the same way as j 64 "X X XC3C4 aoaia2a3a4 B c ----> I f ~ U x h ih 2 x "x8 H D Figure 4.2: Example of acyclic kernel. in the single-output case, except that for a given input, paths to all outputs have to be taken into account. Thus for example the input to C has values at tim e frames 3 and 4 but is always unspecified at all other times. The input to A may need specified values at any time frame, because corresponding to each tim e frame there is some path with the appropriate number of FFs ending in tim e frame d at one of the outputs. Hence it appears at first glance that the compaction principle will not allow a reduced test schedule. However, to detect any fault it is sufficient to propagate it to just one of the outputs. If some test sequence for a fault propagates an error to more than one output, this implies that there may exist a reduced form of the same sequence that propagates it to only one output. Table 4.1 shows, for each output, what input values need to be specified such that the fault is observed at that output at the end of the 4th tim e frame. The table shows that no test sequence would require all 5 values at A to be specified. Also it is clear that all 5 frames cannot be compacted together, since the output at D (which we shall refer to as D for short) requires two different input values at A and also two different input values at H. Under these constraints it seems intuitively clear that a bare minimum of two shift steps will be needed in the schedule for an arbitrary test sequence in order to satisfy the compaction principle stated earlier. A model for representing the schedule constraints is described below. 65 O utput Input values on which the output depends Sequence at A Sequence at C Sequence at H A x x x x a4 — — C x x a 2 x x X X X X C4 — D flO«i X X X X X XC3 X X X -C T - * X E X X Xfl3 X — — Table 4.1: Relationships among inputs and outputs. Given an input x and an output j/, let cr(x,y) be defined as the ordered list of tim e frames at which the input sequence at x for output y can have specified values. Thus for example Table 4.1 shows that cr{A, D ) = (0 ,1 ) and a(H , C) = (). \<j(x,y)\ denotes the number of elements in cr(x,y). We shall attem pt to find a minimal schedule by constructing a schedule constraint graph (SCG). We define an SCG as a directed graph G — (V, A) where V = (0 ,1 , 2 ,..., d) represents the set of tim e frames and an arc (fi,/? ) in A implies that frame f\ m ust occur strictly before frame f i in any compacted test sequence. An SCG is constructed using the following procedure, which takes as input the values <r(x, y) for all inputs x and all outputs y. procedure constructSC G (<r): Returns schedule constraint graph, G = (V, A). < V < — { 0 ,1, 2, .. . , c?}, where d = depth of the circuit; A ^ { } ; ! For all input-output pairs (x,y) of the circuit such that |cr(x,j/)| > 2 do: j /* Add constraints corresponding to this input-output pair */ : { ; L < — cr(x, y); ; While \L\ > 2 do: ! { ! i < — first element of L; j second element of L; Remove i from L; I I 66 1 /* Time frames i and j cannot be compacted together */ For each k, 0 < k < i, do: A * - A U {(A:, j ) } ; For each k, j < k < d, do: A < -A U {(*,*)}; } } } D For the circuit of Figure 4.2 the construction of the SCG is illustrated in Figure 4.3. We begin with the set of nodes V = {0 , 1,2,3,4} and no arcs in A. j Referring to Table 4.1, there are two input-output pairs that may contribute to arcs j in the SCG: <r(H,D) = (1,2) and a{A,D) = (0,1). The fact that a (H ,D ) = (1 , 2 ) implies that there must be a shift step separating time frames 1 and 2 since distinct jtest patterns may be required at input H. In terms of constraints on the schedule, this implies that: 1 . all tim e frames up to and including time frame 1 must occur before tim e frame 2 in the schedule; and 2 . all time frames including 2 and beyond must occur after tim e frame 1 in the schedule. 1 The first item above contributes arcs (0, 2) and (1, 2), while the second contributes j arcs (1 , 3) and (1 , 4). Thus the constraints due to the input-output pair (H, D ) j translate into the arcs shown in Figure 4.3(a), which is the result of the first iteration of the outer ‘for’ loop in the procedure. In the second iteration the constraints due j to the pair (A, D) are added, resulting in the completed SCG shown in Figure 4.3(b). j Note that in the above example it is not sufficient to have only the arcs (0, 1) and (1 , 2) in the SCG. By adding the other arcs we are explicitly encoding the fact th at although some tim e frames represented by nodes in V may be compacted with others, they can never be scheduled in reverse order.1 e 1W ith these constraints encoded explicitly, our problem is actually a special restriction of the ! equal execution tim e job scheduling problem [24][p. 402] with the number o f processors not less | than the number of jobs. (a) (b) Figure 4.3: Construction of schedule constraint graph, (a) Constraints for (H, D) only; (b) constraints for (H, D) and (A, D). In the procedure c o n stru c tS C G , the outer ‘for’ loop may be repeated for all N i inputs and all N o outputs. W ithin the loop the tim e complexity is 0 (d 2), hence the overall tim e complexity is 0 { N jN o d 2). 4 .2 .3 P ic k in g a S ch ed u le The SCG is essentially a representation of information on which tim e frames may be compacted together and which may not. Based on the SCG we are in a position to make the following statem ents about the schedules resulting from compaction. L e m m a 5 Given a compacted schedule S = (fi, f i , ..., f i ) where 0 = f i < f 2 < ■ ■ • < f i < d, S satisfies the compaction principle if for every arc (a,b ) in the SCG there is some fi in S such that a < fi < b. P r o o f Assume that for all arcs (a, b) in the SCG, there is some fi in S such that a < fi < b. Assume for the purpose of contradiction that the compaction principle is violated by S. Then there must be some input of the circuit th at needs d istin c t, 1 values in some time frames a and b that are compacted into the same shift step in ! I S. This implies that the SCG has an arc (a, b). But the fact that a and b are in the j same shift step also implies that there is no fi in S such that a < fi < b, which is a I contradiction. □ I The above lemma essentially means that a given schedule S is valid, i.e., does I I not violate the compaction principle, if no two time frames that have an arc between ; them in the SCG are merged within the same shift step. j 1 68 ' L e m m a 6 The number of nodes in the longest directed path in the schedule con straint graph is a lower bound on the number of steps in any schedule that satisfies the compaction principle. P r o o f Let P be a longest path in the SCG and let it consist of the 6 nodes fly / 2 , •••, fs in sequence. From the construction of the SCG, every arc (fi,fi+ i) implies that if the frames fi and / s+i were compacted into the same shift step, the compaction principle would be violated. Hence there cannot be less than 6 shift steps in any valid schedule. □ In Figure 4.3(b) the path consisting of nodes 0, 1, 2 is the longest, hence at least three shift steps are required in any compacted schedule. | i Our problem is now to find a schedule (/i, / 2, ..., fs) of minimum length that satisfies the following condition: given any arc (a, b) in A, the nodes a and b must not be compacted into the same shift step in the schedule, i.e., there m ust be an I in the schedule such that a < fi < b. We present below a greedy algorithm that achieves the lower bound of Lemma 6 . It essentially places the nodes of the SCG in levels such th at the nodes in the longest path lie in consecutive levels, and all arcs th at begin at a particular level end at some higher-numbered level. Then all nodes (tim e frames) at the same level can be compacted into the same shift step in the schedule. I a lg o rith m sch ed u le (G = (V, A): schedule constraint graph): Returns S , a schedule of minimum length satisfying G. i { 1 I +- 0 ; W hile \ V\ > 0 do: { I * — / + 1 ; Ri < — nodes in V having no incoming arcs; /* Ri consists of consecutively numbered time frames starting with the lowest-numbered time frame in V ; see proof of correctness */ Remove the nodes in i?/, along with adjacent arcs, from G; 69 } /* Final value of I represents number of steps in schedule */ R eturn schedule S = (m i, m2, ..., mi) where m 4 = lowest-numbered time frame in Ri, 1 < i < I. } D The sets Ri determined by this algorithm for the SCG of Figure 4.3 are {0}, {1} and {2, 3, 4}, hence the schedule is (0, 1, 2). The com putation involved in computing Ri in each iteration is of order 0 (d2) assuming that an adjacency m atrix is used to represent the SCG. Since the number of iterations is bounded by d, the overall complexity is O(o?3). Below we demonstrate that the algorithm schedule works correctly in all cases. P ro o f o f C orrectness We need to prove two assertions: first, that S' is a schedule satisfying the compaction principle; second, that the resulting schedule is optimal. Consider the first iteration of the ‘while’ loop. By construction, the lowest- numbered node in G (i.e., 0 for / = 1 ) cannot have incoming arcs, hence it m ust be > included in Ri. Let this node be ri. Let the highest-numbered node in Ri be r 2. We will now show that all nodes r such that r x < r < r 2 must be in Ri. Assume that there is in fact a node v, r x < v < r2, that is not in Ri. Then there m ust be a node u < v with an arc (u , v ) in G. Then by construction of G, u must have outgoing arcs to all nodes v' > v. Hence there must be an arc (u, r 2) in G, which is a contradiction since r 2 is in Ri. Thus Ri represents a group of consecutively numbered tim e frames starting with the lowest-numbered one currently in V. 1 l After the nodes in Ri are removed from G , the resulting graph is similar in | form to G since only a consecutive set of lowest-numbered nodes has been removed, j Hence the arguments above can be applied recursively to the resulting graph for the subsequent iterations. Thus every set Ri consists of consecutive time frames. Note also th at for any arc in G , the two adjacent nodes cannot be in the same R[. From Lemma 5 it follows that S is a valid schedule satisfying the compaction principle. Since all nodes with no incoming arcs are removed in each iteration of the ‘while’ loop, the length of the longest path must decrease by 1 each time. Thus the number of iterations is equal to the number of nodes in the longest path. According to Lemma 6 , this is in fact a lower bound on the number of steps in any valid schedule. Hence the schedule S returned by the algorithm is optimal. □ In this section we have shown how to determine an optim ally compacted schedule based on the structure of the acyclic circuit under test. This schedule can be utilized in two ways. First, it can be used in conjunction with a traditional sequential ATPG program to compact each test sequence produced before random data is used to fill in unspecified input values. In the sequences produced by ATPG, the tim e frame at which the fault is detected may be treated as frame d, and the sequence can be compacted according to the schedule. Some test sequences produced by ATPG may propagate a fault effect to more than one output. Such sequences should be preprocessed by selecting any one of those outputs and then forcing any input value to don’t-cares if the value in that time frame does not influence the i selected output at the tim e of detection. The second and more efficient way to utilize the schedule is to use it as a guide for test generation itself. In the following section we will show how to construct a restricted test generation model to replace the traditional iterative array used in sequential ATPG. Test generation on this model will directly result in compacted test sequences for the desired schedule. 4.3 Test Generation Model We now turn to the problem of test pattern generation for an acyclic structure. | Given an optimized schedule with the smallest number of shift steps, we shall use it to influence the test generation process and simplify it if possible. In test generation for general cyclic circuits, sequential ATPG programs typ ically construct an iterative array containing repeated copies of the circuit in order to represent the behavior of the circuit in different time frames [1]. W ith cyclic circuits the size of the iterative array required to detect an arbitrary fault may grow exponentially with the number of FFs in the circuit. However, in an acyclic cir cuit every irredundant fault must be detectable within d + 1 clock cycles, where d j 71 ! as the depth of the structure, and the complexity of the test generation process is comparable to that for combinational circuits [20]. In fact a simple combinational te s t g e n e ra tio n m o d el (TGM) can be derived from the circuit structure, and any combinational ATPG program capable of dealing with m ultiple faults can be used. Not only is a sequential ATPG program unnecessary, this also avoids any execution overhead caused by the need to m aintain iterative arrays of various lengths. The concept of combinational TGMs is illustrated in Figure 4.4. Figure 4.4(a) shows a simplified version of the structure in Figure 4.2. It has three outputs at C, D and E respectively. We assume that any fault under test will be detected at one of the outputs at tim e frame 4. Figure 4.4(b) shows all outputs placed in tim e frame 4, I and the portion of the circuit that feeds each output is laid out in a levelized fashion corresponding to the time frames. Blocks that are required to be in more than one tim e frame are replicated; thus for example A occurs at several different tim e frames in the expanded structure since the output values may depend on the behavior of A in various tim e frames. Each copy of a repeated block has been pruned to remove any logic that will not be used for test generation; this is indicated by shaded regions but will not be explicitly shown from now on. The subscripts on A 0, f?i, etc. refer to the tim e j frames in which the corresponding instances of the logic blocks exist; the highest I subscript is clearly the depth d = 4. All registers in the expanded structure have been replaced by wires and the resulting TGM is combinational. This is the general form of the TGM before any compaction; we shall refer to it as the b asic T G M and it represents the schedule (0 , 1 , 2,..., d). To generate a test for a fault in the original sequential circuit, the fault must i first be mapped to the set of corresponding fault instances in the combinational ■ TGM. (This is analogous to the modeling of faults in iterative arrays.) Ordinary j combinational ATPG can now be carried out on the TGM. The test pattern obtained ; can be transformed into a test sequence for the sequential circuit using the following rule: all input patterns at logic blocks with subscript i must be used as the zth j pattern in the test sequence. This of course applies if the schedule (0,1, 2,..., d) is i used with no compaction. 72 1 B C "L j D (a) (b) Figure 4.4: Basic test generation model, (a) Acyclic structure; (b) basic TGM. 73 I_rt u ---- = 5 * B ---- D h j q A (a) (b) Figure 4.5: TGM for balanced structure, (a) Balanced structure, (b) combinational equivalent. 4 .3 .1 C o n d en sin g th e T est G en era tio n M o d e l i W hen the test schedule is compacted as described in Section 4.2, not only is the test | tim e per test sequence minimized, but we can also take advantage of the compacted J schedule to condense the TGM. For the special case of balanced structures [25] such 1 as the one shown in Figure 4.5(a) it has been shown that a single pattern is always sufficient for detecting any fault, i.e., the optimal schedule is always (0). The TGM for this class of structures is simply the combinational equivalent of the structure formed by replacing all FFs by wires as shown in Figure 4.5(b). Thus each logic block appears only once in the TGM, and only single faults need to be considered during combinational ATPG. However, this is not the case with general unbalanced structures. Given the schedule to be used, we shall show how a maximally con- | densed TGM can be derived. We shall prove that provided the schedule satisfies j the compaction principle, the condensed TGM is sufficient for complete test pattern generation. fn condensing the TGM we begin with the basic TGM for the schedule (0,1, 2,..., d) and modify it based on the schedule provided. We essentially uti-1 lize the fact th at each input pattern applied to the kernel at a shift step in th e ! schedule is also applied during the subsequent hold steps. The condensation process I can be carried out by repeating the following two steps which are illustrated in Fig- | ure 4.6 for the circuit of Figure 4.4. The term sch ed u le in te rv a l refers to a shift I step and its subsequent hold steps. j 7 4 : S te p 1: Repeat the following for each schedule interval. If any input signal occurs at more than one time frame in the same interval, connect the different copies together by fanning out the earliest copy of the sig nal (i.e., the one occurring in the lowest-numbered tim e frame) to the other copies so that only one copy of the input signal remains within the interval. For example, consider the circuit of Figure 4.4(a) for which ( 0 , 1) is an optimal schedule. Figure 4.4(b) shows the basic TGM which is to be condensed. The schedule (0 , 1 ) has only one interval containing more than one step, namely the one consisting of tim e frames 1, 2, 3 and 4. In this interval the input feeding logic block A occurs three times, hence these inputs are connected together as shown in Figure 4.6(a). Similarly the inputs to C in time frames 3 and 4 are connected together. Note that Ai, A2 and A3 now receive identical inputs and in fact they represent exactly the j l same behavior extended over three clock cycles. The following operation will replace | them with one merged copy in A\. S te p 2: Repeat the following operation until no further changes can be made in the TGM. Let fi be a logic block in the circuit and let fiix,fii2, . . . , fiin be different copies of fi such that ii < z2 • • • < in and every signal feeding an input of fi^ also fans out to the corresponding input of each of ..., fiin. Then remove $ 2, . . . , fiin from the TGM and fan out each output of fiix to all the signals originally fed by the corresponding outputs of fii2,..., fiin. ' ■ At first this step can be applied to remove A2 and A3 and fan out the output of A\ to Bz and E 4 as well. Because block A must have exactly the same behavior in ; tim e frames 1, 2 and 3, we have simply combined the three copies for the purpose o f ! ATPG, and fanned out the outputs appropriately. Note that this does not alter the execution of the test in any way; it only incorporates some information present in the test schedule into the test generation process, reducing the amount of analysis carried out during ATPG. In the resulting structure both J52 and B 3 are fed by Aj, hence they can be merged into R 2. Finally in a similar way C 4 can be merged into ; 75 (a) (b) Figure 4.6: Condensation of test generation model, (a) Step 1, (b) step 2. 76 D3. N o further merging is possible and the final condensed TGM for the schedule (0, 1) is shown in Figure 4.6(b). The condensation steps can be applied to any basic TGM, given a schedule, to yield a condensed TGM. Thus for the balanced acyclic structure of Figure 4.5(a), with optim al schedule (0 ), using steps 1 and 2 we obtain the simple combinational equivalent of Figure 4.5(b). The computation complexity of the condensation steps depends on the im plem entation and on the level of description of the circuit. The com putation in Step \ 2 can be minimized by forming maximally connected clusters of combinational logic j blocks, and carrying out the circuit manipulations on this form of the circuit. Later the lower-level logic descriptions can be filled in for each high-level block, pruned where necessary as described earlier. i 4 .3 .2 T est P a tte r n G en era tio n Given an acyclic structure C , a schedule S — (c f> i, 4 > 2, ■ ■ ■ , < f> n)i and a condensed TGM Tc for C based on the schedule S, the following procedure can be used to generate , a test for an arbitrary fault / in C. Let fc be the corresponding fault (possibly a J I m ultiple fault since some logic may be replicated) in Tc. Let us assume that f c is j detectable in T c ; then ordinary combinational ATPG can be used to derive a test ; pattern for f c in Tc- Note that due to the nature of the condensation process, any given input signal to C can occur in Tc only at time frames < f> i, < j)2, ■.., < j> n , etc. in S. For each let p, be an input pattern containing the values of all inputs th at | occur in tim e frame & in Tc, and containing don’t-care values for all other inputs. ; Then the sequence of patterns (pi,p2, ... ,pn), if applied according to the schedule S, will detect / in C. , To justify the use of the condensed TGM we need to validate the assumption ! th at if / is detectable in C then fc is detectable in Tc- This is done by the following theorem. I 77! z T h eorem 3 Given a fault f detectable in an acyclic circuit C , and given a sched ule S that satisfies the compaction principle, the corresponding fault f c (possibly multiple) is detectable in the condensed TG M Tc- P r o o f Let Tb be the basic TGM of C and let /b be the fault (possibly multiple) Corresponding to / in Tb - Since / is detectable in C , /b must be detectable in Tb using some test pattern tb- Suppose the error is propagated to output fId in Tb - jThen the cone of logic feeding in Tb has certain input values in tb that constitute a sufficient test pattern tb for f B: irrespective of the other input values. Note th at in | tb, no input signal takes on more than one distinct value within the same schedule , interval, otherwise the compaction principle would be violated by the schedule S. Hence for every input signal in the condensed TGM Tc there is a unique value that can be applied to it in every schedule interval in order to simulate the behavior of TB- Let the input pattern formed by these values be tc', note th at it is a condensed form of t 'b. Since rB detects f B, it must cause different output values at ftd in the , Sgood and faulty versions oi Tb - Hence tc must cause different output values at 0^ I in the good and vaulty versions of Tc- Thus f c is detectable in Tc- D The above theorem proves that for any detectable fault in C, a test sequence jthat follows the schedule S can be generated using combinational ATPG on Tc- ' This leads to the following corollary. j I C orollary Given an acyclic circuit C and a schedule S that satisfies the com paction principle, a complete test pattern set for the condensed TG M Tc results in ' a complete test sequence set for C using the schedule S . □ ■ We have thus shown that the condensed TGM derived in this section is a sufficient and complete model for test generation. The size of this TGM is lower than , the expanded iterative array used by traditional sequential ATPG programs. Hence some redundant computations during the test generation process are eliminated. If the schedule is minimal, the model guarantees that an arbitrary fault can be detected using the smallest possible number of shift steps. i I I i I 8 I It should be noted7however, that the condensed'T'GM may contain more than one copy of some logic blocks. Hence the memory required to store the TGM may be higher than that for storing a single copy of the original circuit. Since sequential ATPG programs typically use a single stored copy of the circuit to represent different tim e frames, the memory required for carrying out combinational ATPG on the condensed TGM may be higher than that required by sequential ATPG using the original sequential circuit. Each logic block could appear up to d times in the condensed TGM, where d is the depth of the circuit structure; thus in the worst case the TGM could be up to d times larger than the circuit. 4.4 Summary In this chapter we have studied the problem of testing acyclic structures in partial scan designs. We have presented a new approach to test sequence compaction in which the objective function is the number of distinct patterns to be shifted into the scan path per test sequence. In our approach, each test sequence is compacted into the smallest number of patterns needed to be shifted into the scan path. This leads to the lowest test tim e to detect an arbitrary fault. An algorithm for determining the optim al schedule, based on the structure of the circuit, was presented. We have also presented a specialized test generation model (TGM) for acyclic structures. Like the iterative array model, this model reduces the ATPG problem J to th at of combinational ATPG with multiple faults. A special feature of this model j I is th at it uses the optimal schedule determined separately in order to derive a con- j densed TGM that is smaller than the iterative array used in conventional sequential ATPG. This leads to fewer redundant computations during ATPG. The optimal scheduling algorithm and the test generation model can be used to potentially re duce the testing costs in partial scan designs, especially for signal processing and ■ 1 pipelined data path circuits. j 79 Chapter 5 •Partial Scan Design of Circuits Containing Switches “ It’s them as take advantage that get advantage i’ this world.” — George Eliot, Adam Bede 5.1 Switches Circuits th at are designed using a top-down hierarchical approach are usually con structed such that they consist of functional blocks and registers th at are connected to each other via MUXes and buses. This is especially true of data path and signal I processing circuits. For example, a bus-based architecture is used in the PIRAM ID ! silicon compiler developed by Philips [26]. Every PIRAMID design consists of sev eral execution units, each implementing a specific function, and a num ber of buses through which they interact. The arbitration of the bus is carried out through ; control input lines, which are accessible through a logically separate control unit. i We shall refer to MUXes and buses whose control input lines are accessible ! during test as sw itches. The general form of a switch is shown in Figure 5.1. The | I condition on control lines is true of ordinary data path architectures with separate controllers, as for example the PIRAMID design style mentioned above, provided j either the control lines are directly accessible or a control scan path is provided to J 80 M 'f MUX r 1 1 r i , BUS * L _______I \ 1 \' i t y V (a) (b) Figure 5.1: General form of a switch, (a) MUX, (b) bus. access them . (In most designs a control scan path is also invaluable for testing the control unit, which is typically a finite state machine implementation and therefore hard to test functionally.) Switches can be used to improve the testability of the circuit in a variety of different ways. Some of these ways are listed below. • The class of structures that have the nice testability properties of B-structures can be expanded if switches are present; this information can help to reduce design-for-test overheads. • Switches can be used to set up data transfer paths through a circuit, increasing the controllability and observability of internal elements and thus reducing the complexity of ATPG. • By setting every switch to a fixed configuration in each separate test session, the circuit under test can be implicitly partitioned for the purpose of ATPG, potentially reducing both ATPG complexity and overall test time. Clearly, whenever switches are present in a circuit for functional purposes, they provide an opportunity to reduce the costs associated with testing the circuit. In this chapter the various benefits of switches listed above will be illustrated and studied in detail. 81 5.2 Circuit Model n Section 3.2 we introduced a circuit model based on a topology graph (TG). The clouds of a circuit form the nodes of the TG and registers form the arcs. By merging all combinational logic, including MUXes and buses, into clouds, the com putation complexity of the partial scan design algorithms of Chapters 3 and 4 is minimized. However, to take advantage of switches we need to modify the circuit model so that the information about these structures is not lost. The new model presented below allows switches to be represented as individual nodes in the graph. 5 .2 .1 A to m ic C o m b in a tio n a l L ogic U n its in the topology graph (TG) circuit model of Section 3.2, all connected combinational ogic regions are combined into a single cloud, which is the smallest circuit unit that is considered in the analysis. Each cloud becomes a node in the TG. In general a ! cloud may contain switches buried inside it along with random combinational logic. jWe need to modify the concept of a node so that the functional properties of switch logic are no longer ignored. In the new model, a circuit consists of three types of elements: registers, switches, and random combinational logic. We need to construct a compact graph m odel that can be used for efficient circuit analysis. A new graph model derived ■from the TG model will be presented shortly; in this model, nodes represent switches jas well as random combinational logic, and arcs represent delayless wires as well as registers. The im portant issue is identifying the set of nodes, from which the set of arcs will follow naturally. We assume that the circuit under consideration is provided in the form of a register-transfer (RT) level description. In particular, (1) switches (MUXes and buses) should be clearly identified as such, and (2) wires, FFs, or switches that form an array at the RT level (e.g., a 16-bit bus that carries a binary value or a 32-bit register th at stores a vector of status bits) must be appropriately identified. The re m ainder of the circuit may consist of arbitrary circuit elements, either combinational 82 logic (switcKes7 gates, or blocks) or storage elements (FFs or registers). A typical circuit is shown in Figure 5 .2 (a). Each combinational block A, B, C has an asso ciated gate-level structure that is not shown. To help construct a graph model, all non-switch combinational logic is partitioned into atomic combinational logic units (ACLUs) as follows. Each ACLU will form a node in the circuit graph. 1 . Assign labels to every circuit primary output, every register input and every switch input. Wires that belong to the same vector must have the same label. Wires that do not must have distinct labels. The labels in the circuit of Figure 5.2(a) are shown as circled numbers. \ 2. For every non-switch combinational element c in the circuit, construct a set of labels label(c) such that a label I is present in label(c) if and only if there is a path that starts at c, ends at a wire having label /, and passes through only non-switch combinational logic. Thus for example all gates in B are assigned the label set {2}. A has two outputs with different labels; hence some gates in A have label set {1 }, some have label set {4}, and others have label set {1,4}. 3. Cluster the combinational elements into ACLUs such that every element in a given ACLU has exactly the same set of labels; i.e., C\ and c2 are in the same ACLU if and only if label(cx) = label(c2). Thus all gates in B form one ACLU, while A is subdivided into three different ACLUs A l, A2, A3 as shown. In addition to the set of ACLUs derived above, two more types of ACLUs are defined. First, every switch (or array of switches) with externally accessible control input lines, such as M in Figure 5.2(a), is considered to be an ACLU. Second, ifi I any wire in the circuit fans out to feed more than one ACLU, then a special typei of ACLU called a fanout node, having one input and the appropriate number of j outputs, may be constructed at the fanout point. In particular, if any switch or I register has its output fanning out to more than one destination, for example the! switvh M, there must be an explicit fanout node fed by the switch. I Any ACLU th at has a primary input is referred to as a prim ary inp ut (P I) node. Similarly, any ACLU that has a primary output is referred to as a prim ary ou tp u t (P O ) node. | 8 3 1 [All Combi national A2 A3 A1 (b ) Figure 5.2: Generalized topology graph, (a) Circuit showing labels, (b) GTG with ACLUs as nodes. 5 .2 .2 G e n era lized T o p o lo g y G raph D efin ition A connection between two ACLUs is either a group of wires from an output port of one ACLU to an input port of the other ACLU or a register th at is fed by one ACLU and feeds the other ACLU. Thus each connection transm its data, either within a single clock cycle or with a delay of one or more clock cycles. We define the ty p e of a connection as one of the elements {0,1, u}. Simple wired connections are of type 0, LOAD registers j are of type 1 (since data is transm itted with a delay of 1 clock cycle), and HOLD j registers are of type v (since data is transm itted with a variable delay). j D efin ition A generalized top ology graph (GTG) is a directed graph G = (V, A,c,w). V is the set of ACLUs that includes all the combinational logic in the circuit. A = {ae} represents the set of connections between ACLUs, with each | connection a,- € V x V being either a group of simple wires or a register, c : A —> j {0, l,u} defines the types of the connections in A. w : A. — > Z * (positive integers) j defines the bit widths of the connections. I Thus the set of registers is {r € A | c(r) € {l,f} }• As before, we will use j w(r) to represent the cost of converting a register r into a scan path register. Note I th at between a given pair of nodes u, v € V there could be m ultiple arcs, representing J different types of connections. Figure 5.2(b) shows the GTG of the circuit in Figure | 5.2(a). The register R 1 is represented by the arc from F to B. | D efin ition The len gth of an arc a in a GTG is 1 if c(a) = u, and c(a) otherwise. The length of a path is the sum of the lengths of the arcs in it. . j 5.3 Switched Balanced Structures i i In Chapter 3 we defined a class of easily testable structures called balanced structures j (B-structures) and showed that they are single-pattern testable, and require only : combinational ATPG. In that analysis the functional behavior of the combinational j i 85 ' logic was ignored. In this section we shall study how switches can implicitly partition a circuit such th at a structure that is balanced in parts, but is unbalanced on the whole, can actually behave as a balanced structure for the purpose of ATPG. In the ollowing discussions the set S W C V will represent the set of switch nodes. 5 .3 .1 T h e C lass o f S B -S tru ctu res Consider the structure shown in Figure 5.3(a). It consists of three B-structures B\, ! z ?2 and Bs connected together via a switch as shown. Because B\ and B 2 have different depths, the overall structure is not a B-structure. Hence the theorems of Chapter 3 do not apply to it. Now consider the operation of the circuit if the control line c were to be set permanently such that the switch always received its input data from B\. In this configuration the circuit output would be independent of B 2. In fact B 2 would be logically removed from the circuit, and the circuit would behave as if it is balanced. Conversely, if the control line c were to be set to the opposite value, B\ would be logically removed, and again the remaining structure j jwould behave as a balanced structure. In the following analysis it will be shown th at j even when the control line is allowed to change freely, the entire structure is single pattern testable using the combinational equivalent shown in Figure 5.3(b). Thus !although this structure is not a B-structure, the presence of the switch gives it the properties of B-structures; we shall refer to it as a sw itched balanced stru ctu re I or SB -stru ctu re. I We first define the class of SB-structures and then study its properties. Let i S be an arbitrary synchronous sequential circuit, and let G = (V , A , c, w) be a GTG of S in which every switch that has externally accessible controls is represented by a ! I separate node in V. Let S W be the set of switch nodes. A H old register refers to an arc h € A with c(h) = v. The following definition is derived from the definition , of a B-structure in Chapter 3. D efin ition S is an S B -stru ctu re if: j I I 1 . G is acyclic; I ________________________________ _ _____________________86 J Figure 5.3: SB-structure example, (a) SB-structure, (b) combinational equivalent. : 87 i 2 . Vtq, v 2 € V, all directed paths from iq to v 2 (if any) are of equal length or pass through the same switch node vs € SW, vs ^ iq; 3. Vrq,u2 € V , if any directed path from tq to v 2 passes through a HOLD register h € A, then all such paths pass through h or all pass through the same switch node vs € SW, vs ^ tq . □ The SB-structure of Figure 5.3(a) satisfies this definition. The com binational equivalent of an SB-structure is constructed in exactly the same way as for a B-structure in Section 3.3. Figure 5.3(b) shows the combina tional equivalent of the SB-structure in Figure 5.3(a). O ther term s associated with B-structures are also defined analogously for SB-structures. 5 .3 .2 T e sta b ility P r o p e r tie s o f S B -S tr u c tu r e s SB-structures have exactly the same testability properties as those of B-structures established in Chapter 3; i.e., they are single-pattern testable (Theorem 1 ), and combinational ATPG is sufficient (Theorem 2). The proofs of the theorems, however, have to be modified. Lemmas 1 and 2 apply to all acyclic structures including SB-structures. The proof of Theorem 1, however, is specific to B-structures. Below we re-state the | theorem and indicate a modified proof. S B denotes an arbitrary SB-structure and C B denotes its combinational equivalent. Let the depth of S B be d. 1 T h eorem 4 Every SB-structure is fully testable for all detectable faults using; single-pattern tests. ■ I I P r o o f We shall modify the procedure transform in the original proof into a pro- j cedure transform ' which applies to SB-structures. Let G = (V, A, c, w) be the GTG | of S B, and S W C V the set of switches. Define H as the set of HOLD registers, i.e .,; {h € A | c(h) = t>}. Define the state Gl of S B exactly as before. ' The modified procedure tra n s fo rm ' uses the same steps 1, 2, 3, 5, 6 and 7. The modified step 4 is listed below. 4. Repeat the following until T = < f> : (a) Pick some v € T having the highest activation tim e a(v), and remove it from IF. k < — a(u). (b) Construct a set A of arcs to be processed, as follows. If v € S W , i.e., v is a switch, then use the control input value to the i switch at clock cycle a(i>) to determine which input connection asei is ; selected by the switch at this clock cycle, and set A < — {ase/}; else A < — {all arcs incident onto u). For all arcs a 6 A do the following. i. u < — source node of a. If c(w) = 0 (i.e., u is not a register) a' < — k else a' < — k — 1 . If u has been visited earlier (i.e., a{u) has been assigned a value) then a(u ) must be equal to a '—see Claim 40(b)i below. Otherwise add u to F \ with a(u ) < — a'. ii. If a € A T and hk(a) = 1 then A. t * — min. j such that h^+l(a) = /tJ+2(a) = ... = hk(a) = 1; ! If there is no such j , skip steps B and C. t represents the clock cycle during which node u is active, and register a loads the active data, in the original test plan. B. t < — k — t — number of HOLD cycles of a in the current sequence. The next step eliminates the HOLD cycles while keeping the test , valid. I I C. Let Gu be the subgraph of G consisting of all nodes from which j u is reachable. (I.e., Gu is the cone of influence of node u .) ! Then any change in the input data and/or the state of the I substructure Gu will not affect the outcome of the test, provided the values at u at clock cycle a(u) are unchanged—see Claim j 40(b)iiC below. Hence we can modify the test by delaying all electrical activity within Gu by r clock cycles as follows. i Vu € V in Gu, Iflv) P ~ T(v), r < j < k; j VT E H in Gu , h*(h) < — h^~T(h), t < j < k\ J Va € A in Gu, xfla) < — x^~T(a), r < j < k\ | hk(a) < - 0 . i I 89 It can easily be seen that the sequence of modified states (G1, ..., Gm) is still a valid test for / . The preceding portion of the procedure transforms the test sequence such th at in the final state sequence every register is in the L O A D mode at the tim e when the node driving it becomes active. Note also that every node v th at is active during the test, there is a unique clock cycle a(v) during which it is active. (This is proved in Claim 40(b)i.) □ The correctness of the modified procedure depends on the two claims proved Delow. Claim 4 0 (b )i In Step 40(b)i of procedure transform ', if u has been visited before (i.e., a(u) has been assigned a value) then a(u ) = o'. P r o o f Assume, on the contrary, that u has been visited before and assigned a(u) ^ !o'. Then there must be two paths of different lengths from u to z, both of which must pass through the same switch w € SW (by definition of SB-structures). Also, w m ust already have been visited according to the order of traversing nodes, and must have a well-defined a(u;) value. Thus both paths from u to w , ending at different jdata inputs of w , must have been traversed. However, note that in the procedure tra n s fo rm ', only one incoming arc to a switch node is ever placed in the frontier T during the traversal. This is a contradiction. Hence the statem ent of the claim j m ust be true. □ j C laim 4 0(b )iiC In Step 40(b)iiC of procedure transform ', any change in the ! ! input data or the state of the substructure Gu will not affect the outcome of the test, \ provided the values at u at clock cycle a(u) are unchanged. I P r o o f Consider an arbitrary node u' in Gu• Since there are directed paths u’ ► u and u ----► z, and the latter passes through the HOLD register h , there m ust be a path u' — > z passing through h. Assume that some changes in the values at node u' do not affect the values at node u at clock cycle a(u ), but affect the outcome of j the test. This implies that there must be another path u' — > z not passing through ______________________________________________________ 9 0 1 h. Hence there must be a switch node w € S W through which both paths pass (by definition of SB-structures), and w must already have been visited according to the order of traversing nodes, and it must have a well-defined a(w) value. Thus both the distinct paths, u — ► w and u' — > w ending at different data inputs of w , must have been traversed. This leads to a contradiction since w is a switch, and only one of its incoming arcs could ever have been placed in the frontier T during the traversal. □ The two claims above validate the procedure tra n s fo rm ', which completes the proof of Theorem 4 along the same lines as that of Theorem 1 in Chapter 3. □ Lemma 3 and Theorem 2 can be proved to hold for SB-structures by making use of Theorem 4 instead of Theorem 1 in their proofs. Thus by proving that both the theorems on B-structures in Chapter 3 are also applicable to SB-structures, we have established that SB-structures have the same testability properties. The concept of SB-structures leads to two benefits. First, the overhead for partial scan design can be reduced; because of the switches, it is sufficient for the structure under test to be balanced in parts even if the overall structure is not a B-structure. Second, an SB-structure can be partitioned into smaller regions of logic that can be tested as independent kernels, thus lowering the overall ATPG cost. This aspect will be studied in Chapter 6 . 5.4 Algorithm for Scan Register Selection In Chapter 3 the BALLAST procedure for selecting scan path registers was pre- \ sented. The selection was carried out in two steps using the topology graph (TG) I circuit model. The first was to determine a minimal weight set of scan registers toj make the kernel acyclic; the second was to select a minimal weight set of additional! scan registers to make the resulting kernel balanced. Both steps are NP-complete. In this section we study the analogous problem of selecting a minimal set of scan registers using the generalized topology graph (GTG) model. The problem is stated as follows. 91 — i Let G = (V, A, c, w) be the GTG of the circuit. We need to determ ine a set of scan registers R C A such that the resulting kernel is an SB-structure and E a€ flw (a ) xs minimized. As before, we solve the problem in three steps. S te p 1 . Remove a set of “feedback” arcs R a from G such that Y^aeRA w {°) * s m in imized and the resulting kernel Ga is acyclic. S te p 2 . Remove an additional set of arcs R g from Ga such that YlaeRs w (a ) is I minimized and the resulting kernel Gg is an SB-structure. j S te p 3. R — R a U Rg is the desired set of arcs, and the resulting topology graph j Gg = (V, A — R, c, tv) represents the kernel. The steps above correspond to the steps in the original procedure, which applied to the TG model. Since the TG is a specialization of the GTG, both the j subproblems are NP-complete in the GTG model also. We use the same basic solution approach and modify it for the new problem. The algorithms used in steps 1 and 2 for balancing the TG need to be extended in two ways so that they can be used for balancing the GTG. 1 . In the TG model, every arc represents a register and hence a potential scan register, with the modification weight proportional to its width. In the GTG model, an arc could be a register or a simple wire. Only the former are candi- 1 dates for being converted into scan registers. Hence the algorithms need to be j modified to distinguish between registers and wires. j I I 2. In the TG model, every node is a cloud of combinational logic, with no infor m ation about its functionality. In the GTG model, some of the nodes may be ' switches, and the concept of balanced structures is extended to the class of , switched balanced structures. Hence the algorithms need to be extended also. J i ! In the following discussion on the modified algorithms, the influence of both 1 J the issues above will be described. j 5.4.1 R em o v a l o f "Feedback R e g iste r s In Step 1 of the balancing procedure, we need to make a minimal-weight set of registers scannable such that the resulting kernel is acyclic. Since the arcs in the GTG G = (V, A, c, w ) consist of both registers and simple wires, it is not sufficient to find the minimum feedback arc set (MFAS) of G. Instead we transform G into another graph Gf such that the MFAS of Gf is indeed the required set of registers. In constructing Gf we use the following observation. L e m m a 7 If u ,v € V, and a = (u,v) is a wire, i.e., c(a) = 0, then any register I a' = {a,v) cannot be in a minimal-weight set of scan registers that makes the kernel [ acyclic. i P ro o f Let R\ be a register corresponding to arc a' = {u, v). Assume th at Ri is one of the registers in a minimal set of scan registers that results in an acyclic kernel. By adding i?i to the kernel it must still be acyclic, since there is already a wire from , node u to node v in the kernel. Hence the set of scan registers could not have been j minimal, which contradicts the assumptions. D j We can use this observation to construct the graph G f = (Vf, A f , cf, w f) j as follows. Let Vf = V be the set of nodes. Construct two sets of arcs A p i and A f2 . Let A Fi = {a = (u,v) < E A j c(a) € {l,n} and there is no wire between u and u}. For all arcs (registers) a in Api set cp{a) = c(a) and wp(a) = w(a). Let A p 2 = {a = (u,v) & a | c(a) = 0}. For all arcs (wires) a in A p 2 set Cp(a) — 0 and u)p(a) = oo. Then the arc set Ap = A f i U A F 2 is the set of arcs of Gf- For exam ple,; Figure 5.4(a) shows a simple circuit in which B is a bus (i.e., a switch); Figure 5.4(b) ' shows its GTG G and Figure 5.4(c) shows the transformed graph Gf- Note th at in G f , all registers that cannot be in the set of scan registers due to Lemma 7 have | been removed. In Gfi all wires are represented as arcs having infinite weight. Thus any finite-weight set of feedback arcs represents a set of registers, and our problem is to find a minimum-weight set. Below we show th at such a set must exist. j 93 i _____ I All connections have width = 8 (a) C(R1) \ = V w ( R l) w ( w l) = 0 0 / (w2) v = OO c(w2) C(R2) ;(wl) c(R3) w (R 3) (b) Figure 5.4: Transformation of the GTG for feedback register analysis, (a) Circuit, (b) GTG G, (c) transformed GTG GF. 'Lem m a 8 For a synchronous circuit with GTG G and the transformed GTG G f constructed as described above, G f must have a finite-weight set of feedback arcs. P ro o f Assume for the purpose of contradiction that G f has no finite-weight set of feedback arcs. Then there must be a cycle in G f consisting entirely of infinite-weight arcs. Since each infinite-weight arc is derived from a wire connection, the original circuit must have a continuous cyclic path consisting of wires and combinational elements with no storage elements, i.e., an asynchronous loop. This is a contradiction since by assumption the circuit is synchronous. □ Thus we have transformed the GTG G into a new graph G f to which the branch-and-bound algorithm of Section 3.6 can be applied. The MFAS obtained from this algorithm, denoted by R a , corresponds to the registers that m ust be made scannable in order to eliminate all cycles in the kernel. ! 5 .4 .2 B a la n c in g A c y c lic S eq u e n tia l S tru ctu re s The result of Step 1 described above is a set of arcs R a representing scan registers, j The corresponding kernel is G a = (Va, A a , ca, w a ) where Va = V, A a = A — R a , | and ca, wa are equivalent to c, w respectively restricted to the domain A a- We use a heuristic balanceSW , derived from balance of Chapter 3, to carry out Step 2. As before we use a verification procedure checkSW , derived from check, which returns SUCCESS only if Ga is an SB-structure. I 5.4.2.1 V erification P rocedure Like check in Chapter 3, checkSW attem pts to levelize the GTG starting from various root nodes. Because arcs may represent different types of connections, the levelizing procedure is different. The level number of a node x is denoted by l(x). An arc a leaving node x is also assumed to have level number 1(a) identical to /(a:). Every level num ber is either a nonnegative integer, if all paths from the current root to the current node/arc are of that length, or the symbol v, if there is no unique path length or if one of the paths includes a connection of type v (i.e., a HOLD register).! 95 Figure 5.5: Illustration of checkSW procedure. To deal with the presence of switches in SB-structures, checkSW has some enhancements. Unlike in B-structures, a node is allowed to have a variable level number, caused by unequal-length converging paths, provided all the paths pass through the same switch. For every node x , the procedure computes a set L W P ( s ) of switches in S W such that all paths from the current root node to x pass through every switch in S W P (x ). These sets are useful in checking whether a given set of unbalanced paths is acceptable. (Note that the root node itself m ust never be included in S W P (x ) even if it is a switch; this follows from the definition of SB- structures.) Another enhancement is in the set of starting points for the levelizing process. In check it is sufficient to levelize the input graph starting at each root node in turn. In checkSW it is also necessary to levelize starting at each switch. The need for this is shown in Figure 5.5, in which each arc represents a register. There are paths of unequal length between the switch node s and the node x. W hen the levelizing process starts at the root node r, the set S W P (x ) is computed as {s}, hence no violation is apparent at x. However, when the levelizing process is started at s, the set S W P (x ) is now computed as { }, and the unbalanced paths ending at x are \ detected as a violation. The set L E V E L S is used to store the set of root nodes as well as switches that will be used as starting points for levelization. In the listing below, for node x , src(x) represents the set of source nodes for all incoming arcs to x , and dst(x) represents the set of destination nodes for all outgoing arcs from x. fun ction checkSW (G = (V ,A ,c ,w )): Returns S u c c e s s if G is an SB-structure, FAILURE otherwise. I 96 | 1. L E V E L ® * — {root nodes of G] U S W , where root nodes are defined as nodes with no incoming arcs. If L E V ELS ^ (j), proceed to the next step; otherwise return FAILURE since G has no root nodes and cannot be acyclic. 2. Pick a node r in L E V EL®, remove r from L E V ELS. l(r) *- 0 ; S W P ( r ) *- < j > - L < — {r} (nodes currently levelized). 3. W hile L / V (i.e., L C V) do the following. (a) Pick x £ V — L such that src(x) C L. (b) Va = (a, x) 6 A: If c(a) = v or l(u) = v then 1(a) < — v else 1(a) < — l(u) + c(a). (c) If X 6 S W , S W P (x ) *- S W P (x ) U {x}. (d) If the 1(a) values computed above are all equal to some numeric value la (i.e., la 7^ v), then l(x) /a; else (imbalance is present) do the following: If S W P (x ) ^ 4 > then (the unbalanced paths all pass through some switch) l(x) < — v, \ else return FAILURE. ' ■ (e) Add x to L as a node that has been assigned a level. 4. If L E V E L S 7^ < f> go to 2. 5. S is an SB-structure; return SUCCESS. □ : I Because of the manner in which level numbers are computed, the in itia l; I steps in check which analyze the connections of HOLD registers are not needed j in checkSW . I I 5.4.2.2 B alancin g P rocedure I I I Like balance, the heuristic procedure balanceSW uses a m incut approach to re- j cursively subdivide the GTG, balance the parts separately, and merge the results. ; In order to modify balance to baianceSW , we need to consider two issues: the presence of both wires and registers as arcs in the graph model, and the extension of the class of B-structures to that of SB-structures. In the earlier discussion on breaking feedback loops it was shown th at by transform ing the GTG containing wires and registers, that problem could be reduced to the one solved in Chapter 3. The same transformation is not appropriate for the problem of removing unbalanced feedforward paths. We use a similar transform ation in which all wires are modified to have a width (i.e., cost) of infinity but all registers are retained with no change in their width. This ensures that every m incut of the GTG contains only registers. However, in this case it is not guaranteed th at a finite- weight m incut, i.e., a cut consisting of only registers, exists after this transform ation. C ase 1 In the case when a finite-weight mincut C S exists, this cut is used to j partition the GTG exactly as in balance. The two induced partitions Gs and Gd are balanced separately (recursively). The resulting balanced sub-GTGs are then merged by reintroducing a subset of the mincut register arcs C S ' C C S into the result GTG such that it is an SB-structure, and the combined weight of the arcs in C S ' is maximal. Unlike in balance, in baianceSW the set of registers reintro-j duced during merging can contain more than one HOLD register (this follows from' i the definition of SB-structures); hence the merging procedure does not distinguish] between Load and HOLD registers. C ase 2 If no finite-weight mincut exists, i.e., the mincut C S contains one or more . wires (infinite-weight arcs), clearly a path containing only wires from the inputs of, the structure to its outputs must exist. Depending on the nature of the cut CS, one i i of two strategies is used. C ase 2(a) If C S contains at least one register, the procedure first shrinks the cutset C S so that it contains only registers; i.e., C S r < — {a € C S | c { a ) 0}. The resulting set C S r is no longer a cutset since at least one arc has been removed. Let G' be the graph formed by removing the arcs in C S r from the original GTG. G' is balanced recursively; then, in | i 98 : the resulting GTG, a maximal-weight subset C 'S ' C C S r is reintroduced such that the GTG remains balanced. C ase 2(b) If C S consists entirely of wires, the procedure first re moves all registers from the original GTG, resulting in a purely combi national structure. These registers are placed in the set C Sr. (Here the title C S r is actually a misnomer since it has no relation to any cutset.) The procedure then determines a maximal-weight set of register arcs C S ' C C S r th at can be reintroduced into the GTG without causing an imbalance. To illustrate the strategies used in Cases 2(a,b), consider the circuit in Figure . 5.6. There is no cut consisting only of registers; the mincut consists of R1 and the wire from B to D, which has infinite weight. Using Case 2(a), R1 is tentatively removed from the structure and placed in C S r . The balancing procedure is then i invoked recursively on the remaining circuit. W ithin the recursive call, the m in cu t1 consists of the wire from B to D, which has infinite weight. (Note that in our analysis, the num ber infinity is treated as a large finite number.) Using Case 2(b), R2 is removed since its presence causes an imbalance. Returning from the recursive call, R l is placed back from C S r into the structure since it does not cause an imbalance. Thus the resulting solution requires R2 to be removed from the structure (i.e., made scannable) but not R l. j The overall organization of baianceSW , which is listed below, is similar! to th at of balance in Chapter 3. The sets C S and C S r described in the above discussions are both represented by the same variable C S in the procedure listing, j since they are never required at the same tim e and both need to be m anipulated in j the same way after being constructed. fun ction baian ceSW (G = (V, A,c, w) : acyclic GTG): Returns R C A such th at (U, A — R,c, w ) is balanced. 1. If (checkSW (G ) = SUCCESS) then return (R < — < /> ), else proceed. 99 Figure 5.6: Illustration of b a ia n ce S W where no finite m incut exists. R2 2 . Transform G = (V, A , c ,«;) to Gu = (14, in which all wire arcs : have “infinite” weight, as follows. We use oo to represent some large finite ; num ber, oo > m axa 6 ,4 iy(a). Gu < — G\ j Va G A u, if cu{a) — 0 then wu(a) < — oo. i i i 3. C S < — minimal cost cutset of Gu. I I If C S contains only registers (finite-weight arcs), do the following: ; (a) Determine Gs,Gd, the subgraphs of Gu induced by CS. < (b) Balance Gs and Gd separately; ■ R +- b aian ceS W (Ga) U balanceSW (G d) U CS. \ Else if C S contains wires and at least one register, do the following: I i I (a) C S +- {a G C S \ c(a) f 0}. , (b) G' < — G with registers in C S removed. j (c) R 4- baian ceS W (G') U CS. \ I i Else (C S contains only wires) do the following: J I 100 (a) C S * — {a € A u | c(a) / 0}. (b) R «- CS. ( 4. Sort the arcs in C S in order of decreasing weight. 5. C S' < r — c j > , the set of arcs retained in the GTG initially. 6 . For all arcs a in C 5, in order of decreasing cost: | Check whether the inclusion of a makes the merged graph unbalanced: If [checkSW (K , (Au - R) u C S' u {a}, cu, wu) = S u c c e s s ] ! then C S' < — CS' U {«}. 7. R * — R — CS'; return R. □ ; i i Thus the algorithms of Chapter 3, which are applied to the TG circuit m odel: containing clouds and registers, have been modified so that they can be applied to circuits containing switches and can take advantage of the expanded class of balanced j structures. Since the basic outline of the algorithms is unchanged, the computation j complexities as a function of the graph sizes are similar. For a given circuit, however, | the TG is smaller than the GTG, since every node in the former represents a whole j cloud of combinational logic; hence in general the amount of com putation for th e ' modified algorithms is higher. The actual complexity of the modified algorithm s, depends on how many switches are present, and on a related factor, how much logic | i is contained in each ACLU. In summary, the explicit consideration of switches may j lead to a lower number of required scan registers, but the amount of com putation to achieve this may be higher. I 5.5 Partial Scan Testing Using I-Paths In the preceding sections, the properties of switches have been used to expand th e ! ' class of balanced structures based on embedded switches. We now consider an other aspect of switches: the fact that they can often be used to transfer test data unchanged between scan registers and other parts of the circuit. 101 [57571 I-P atK s W hen switches are present in a circuit for functional purposes, they can also be used during test to transport input patterns and output results. This can lead to a higher degree of controllability and observability of inner parts of a circuit. Essentially switches help to set up paths along which data can be transm itted unchanged. Such paths are referred to as identity-paths or I-paths. The concept of I-paths has been defined and developed extensively in [4]. Here we derive a restricted definition of an I-path th at is suited to its role within the partial scan design problem. All I-paths according to this definition satisfy the definition of I-paths in [4], but the converse j does not hold. | I As before, the GTG G = (V, A,c> w) will represent a circuit under consider- j ation; S W C V represents the set of switches; and in addition, F N C V represents ■ the set of fanout nodes. D efin ition An I-path is a loop-free path in the GTG in which every node is either a switch or a fanout node. ■ In the circuit of Figure 5.2, M l is a switch and F l is a fanout node. The individual nodes M l and F l constitute two separate I-paths of length 0 each; the path from M l to F l is another I-path of length 1. In order to store information j about useful I-paths, we introduce two sets, I L and OL. IL C A is a list of inlets, or connections that are fully controllable. Thus 7X = {a € A | 3 an I-path starting at some PI node and ending at the source node of a }. In the circuit of Figure 5.2, I L = {(M, F), (F, B), (F, C)}. OL C A is a set of ou tlets, or connections that are fully observable. Thus OL = {a 6 A j 3 an I-path starting at the destination node of a and ending at some! PO node}. The circuit of Figure 5.2 has no outlets. K1 t --- K2 (a) ■ P L ~A N “(K17t):------- (t, R4, REC) (t-1, R l, DRI) (t-1, R2, DRI) (t-1, R3, DRI) (b) PLAN (K2, t): (t, R4, REC) (t-1, M, R l) (t-1, R l, DRI) (t-1, R3, DRI) (c) Figure 5.7: Illustration of kernels with I-paths. (a) Circuit with two kernels, (b) test plan for K l, (c) test plan for K2. 5 .5 .2 K ern els w ith I-P a th s In the absence of I-paths the kernel in a partial scan design is simply the circuit with scan registers removed and replaced with primary input/output pairs. In effect the kernel is that portion of the circuit to which arbitrary test patterns can be applied; and on which ATPG can be carried out. However, if there are any I-paths adjacent, to any scan register or circuit primary input/output, it may be possible to isolate a smaller portion of the circuit to which test patterns can be applied using I-paths. This is illustrated in Figure 5.7. Assuming all the registers R1-R4 are scan registers, the structure K l could be considered as the kernel. However, the switch M provides an I-path from either register R l or R2 to the inputs of C. This makes the output of M an inlet along with the output of R3; the input of R4 is of course an outlet. Consider the subcircuit K2, consisting of block C, which has inlets at its inputs and an outlet at its output. Clearly we can consider K2 as a kernel that can be tested as a separate unit. Given a test pattern set for K2, each pattern can be applied by scanning it into (say) R l and R3, setting M to read data from R l for one clock cycle, and scanning out the test result in R4. Associated with every potential kernel K (such as K l or K2) is a te s t p lan , P L A N ( K ,t), which describes what I-paths (if any) are to be set up and how test 103 patterns are to be applied. The test plans for K l and K2 are shown in Figures 5.7(b) and 5.7(c), respectively. Each test plan consists of an unordered collection of 3-tuples of the form (f(t),X ,m o d e ). f ( t ) is some expression that is a function of time; X is a circuit object, a scan register or a switch, that actively participates in the test at tim e unit f ( t ); and mode is the specific mode of operation of X at time f ( t ) during the test. In our example it is assumed that test results are loaded into ; the scan path registers at clock cycle t , and all other time instants are expressed in ^ relation to this param eter. The description of the mode of operation depends on the type of circuit struc ture. For a switch, the mode of operation refers to the name of the structure whose j ! output is read by the switch at the corresponding time. For a scan register, t h e ' mode of operation is either one of its functional modes (L o a d , HOLD, etc.) or the keyword DRI or the keyword REC. DRI indicates that the scan register must have the appropriate test input data present in it at the corresponding time, and it is | implicit that it must have been held there since it actually arrived. REC indicates I that the scan register loads in a test result, and it is implicit that this result must j be held for as many clock cycles as required until it can be shifted out. ! Returning to Figure 5.7, using K2 as the kernel rather than K l has certain advantages. j • Since K2 has a lower depth (in terms of the amount of combinational logic from ‘ i i inputs to outputs) than K l, and in general is smaller than K l, the ATPG cost j for K2 can be expected to be lower. i • Since the test for K2 uses only two input scan registers, R l and R3, ra th e r! than three in the case of K l, a lower amount of time is required for shifting in ■ I test patterns, provided the scan path registers are chained appropriately. . I i Note that the switch M is exercised in only one of its two modes of operation in | the test plan of Figure 5.7(c), and some simple functional testing is required for the \ other mode. ' I Another benefit of using I-paths is illustrated in the hypothetical example of Figure 5.8(a). In order to break all cycles and make the structure balanced, both 104 R 1 F - J 1 f t PLAN (K, t): BUS (t, R l, REC) --- 1 c .n i i / ‘ (t-1, BUS, R2) (t-1, R2, LOAD) (t-2, BUS, R l) (t-2, Rl, DRI) " ....L " " S ^ k; i i j J R 2 ( a ) ! I Figure 5.8: Use of I-paths in reducing partial scan overheads, (a) Circuit, (b) test I plan. R l and R2 need to be made scannable. However, consider the effect of making R l ( alone scannable, represented in the figure by the arrow inside R l. Due to the I-paths j associated with the bus, the input to C becomes an inlet and its output becomes an j outlet. Any test pattern can be applied to the kernel formed by C by using R l as ' both a driver and a receiver of test data. The corresponding test plan is shown in i I Figure 5.8(b). In this example the overhead due to partial scan has been reduced' by making use of I-paths to isolate a balanced kernel. i Thus three potential benefits of using I-paths have been presented above:; reduced scan overhead, reduced ATPG effort, and reduced test time. In the following analysis we assume that for the circuit under consideration, a set of scan registers has already been selected, using the algorithms described earlier in this chapter which ignore I-paths, such that the resulting kernel is balanced. Note th at the i algorithms for selecting scan registers could be modified so as to give preference to those registers that are close to switches; this heuristic would help to ensure t h a t , I-paths present in the circuit are best utilized for enhancing internal controllability and observability. Given the scan registers, we now study the problem of identifying a minimal kernel, in the presence of I-paths, that can be fully tested while achieving. the various benefits described above. . 105 I - p a t h s 'MIN MAX Figure 5.9: Relationship between maximal {Km a x ) and minimal (K m i n ) kernels. i 5 .5 .3 U n sa tisfia b le K ern els In the example of Figure 5.7 the kernel K2 was identified as the smallest part of the circuit that included all the non-switch logic and had inlets/outlets surrounding it. j In any circuit, given the scan registers and the I-paths, such a kernel can be easily! identified; we will refer to it as the m in im al kernel. On the other hand, the kernel j K l in Figure 5.7 is called the m ax im al k ern el, since it consists of the entire circuit excluding all scan registers (which are replaced by primary I/O ). The relationship betwen the maximal and minimal kernels is illustrated schematically in Figure 5.9, where K m a x and K m i n represent the maximal and minimal kernels, respectively. Clearly the nature of the minimal kernel guarantees that every input and i output is individually accessible to some scan register(s) through I-paths. However, in order to test the minimal kernel in a given circuit it may be necessary to ap p ly ' arbitrary distinct patterns to every input simultaneously, which may in fact not be ■ possible using any conceivable test plan. Such a kernel is said to be u n s a tis fia b le . 1 I I I : I 1061 Note th at the maximal kernel in any design must always be satisfiable since ev ery input/output is connected to either a scan register or primary I/O . There are three types of conflict situations that can cause a kernel to be unsatisfiable. These three types are described below, and methods for resolving conflicts are discussed in Section 5.6. I l I N o-m atch C onflict In the circuit of Figure 5.10(a), which has two driving scan j registers R l and R2, the minimal kernel consists of the structure K l, which has ; three inputs. Clearly, given an arbitrary single-pattern test to be applied to K l, it is not possible to scan the entire pattern into the scan path and apply it to K l, unless of course the circuit is further modified. This is evident from the fact that there is j I no one-to-one mapping of a subset of scan registers to the kernel inputs such th at j there is an I-path between each scan register and the corresponding kernel input. \ Clearly the kernel K2 must be satisfiable since it is the maximal kernel; its test plan is shown in Figure 5.10(b). Hence one solution to the no-match conflict for K l is' to simply use the maximal kernel K2, i.e., ignore I-paths in this structure. A m ore| detailed analysis will be presented in Section 5.6. I D a ta C onflict Another case of unsatisfiability is shown in Figure 5.11. Here every 1 input/output of the kernel K l can be mapped to an apppropriate scan register. However, as the test plan of Figure 5.11(b) shows, the bus is required to read data from two different sources at the same time, which is an obvious d ata conflict. This conflict can be eliminated by including the fanout point at the output of the bus in I the kernel, resulting in kernel K2. The kernel now has only one input and the bus does not have to carry out conflicting tasks. C ontrol C onflict In the preceding analyses we have assumed th at the control lines 1 feeding all switches are independent of each other. However, in practice there may be constraints on the control patterns that can be applied to the various switches j and registers. For example, consider the circuit of Figure 5.8(a) and the test plan for j kernel K shown in Figure 5.8(b). Suppose that R2 has a Hold mode of operation,! I and th at the control lines for the circuit are configured such th at R2 cannot load j K2 PLAN (K2, t): (t, R3, REC) (t-2, R l, DRI) (t-2, R2, DRI) (b) (a) Figure 5.10: No-match situation, (a) Kernels K l, K2; (b) test plan for K2. EED R l R PLAN (K l, t): (t, R3, REC) (t-2, BUS, R l ) \ ^ _ (t-2, Rl, DRI) ~ > g X , (t-2, BUS, R 2 ) ^ (t-2, R2, DRI) (b) (a) ! i Figure 5.11: Unsatisfiable kernel due to data conflict, (a) Circuit, (b) test plan I showing conflict. 108 d ata in the same clock cycle that the bus reads data from R2. In this case tHe~| operations required at time t — 1 cannot be carried out, making K unsatisfiable.1 j Thus given a kernel in a partial scan design, the following steps can be taken to check whether it is satisfiable. If possible, find a mapping of scan registers to kernel inputs (outputs) such that an I-path exists between every scan register and kernel input (output) pair. If both mappings exist, determine the test plan for the kernel based on these I-paths. If there are no data or control conflicts in the test plan, the kernel is satisfiable. 5.6 Finding a Satisfiable Kernel j Clearly the maximal kernel must be satisfiable, since it is the original kernel derived ■ from the partial scan analysis and does not use I-paths. In the presence of I-paths, | if the corresponding minimal kernel is found to be satisfiable, then it is sufficient; to obtain a single-pattern test set for it and apply these tests according to its test ( plan. Note th at if the maximal kernel is an SB-structure, the minimal kernel (w hich; is a part of the maximal kernel) must also be an SB-structure, hence it should be j possible to obtain a complete single-pattern test set using ordinary combinational! ATPG on its combinational equivalent. If the minimal kernel is found to be unsatisfiable, there m ust exist some satisfiable kernel that is a superset of the minimal kernel and a subset of the maximal kernel. One such kernel is of course the maximal kernel itself. Our problem is to find the smallest such kernel, so as to ensure the maximum benefit of using the I-paths available in the circuit. This is illustrated schematically in Figure 5.12, where K s a t represents the minimal satisfiable kernel. The procedure findK ernel, which will be listed shortly, carries out the task of determining the minimal satisfiable kernel. 1The circuit m odel used in this research requires the registers to be independently controllable through external control lines, which implies that control conflicts cannot occur. However, the control conflict is discussed here anyway to make the make the list of conflict types com plete. p a t h s MIN ■SAT Figure 5.12: Relationship between maximal ( K m a x ) , minimal (K m i n ) and minimal satisfiable (K$a t ) kernels. 5.6.1 E x p a n sio n P ro c ed u r e ! The approach used in findK ernel is to expand the unsatisfiable kernel in steps, elim inating conflicts one at a time. Each expansion step is based on a relatively j simple procedure, described below, that adds one or more nodes to the kernel. Later | the strategy for resolving conflicts as well as the complete findK ernel procedure I are presented. [ The expansion process is carried out by the procedure m ergeN od e. It takes' three inputs: the circuit under consideration, the current kernel within it, and som e1 node outside the kernel (generally a switch or a fanout node) which is to be merged | with the kernel. : fun ction m ergeN od e (G = (V, A, c, w): GTG of circuit; K = (Vk, A k , c, w ): GTG of kernel; v: node that is to be merged into K ): Returns K ‘, GTG of resulting kernel. 1. VI - VK U{v}. 110 2 . For all u G V, if there is a path from v to any node in Vk passing through u, or there is a path from any node in Vk to v passing through it, then add it to Vk- 3. A'K ^ A C i ( V k x V k ) . 4. R eturn K ' = (Vk, A'K,c,w). □ The m anner in which the merging procedure is actually invoked depends on what kind of conflict is currently being resolved. 5 .6 .2 D e a lin g w ith N o -M a tch C o n flicts A no-match conflict could occur on either the input side or the output side of the kernel. The discussion below refers to a no-match conflict on the input side but applies to the output side also. j In the minimal kernel, as well as all kernels created by the expansion process,: every kernel input must be accessible from some scan register through an I-path. Even if no complete one-to-one m atch of scan registers to kernel inputs exists, some partial one-to-one m atch must exist; a trivial example is a m atch between any one' kernel input and one scan register that has an I-path feeding that input. For example, consider the no-match situation for kernel K 1 in Figure 5.10(a). One possible partial | m atch for the kernel inputs is R1 driving the input of A and R2 driving the input of B. Now for the input of C there is no possible matching scan register since R2 is already being used for driving another kernel input. In general, a no-m atch situation implies th at for some input of the kernel under consideration, all possible driver scan I i registers have already been matched to some other kernel inputs. j In this situation, let K I\ represent the kernel input for which no m atch can be found. Let R be any one of the scan registers that could possibly drive this input through an I-path, and let K I 2 be the kernel input to which R has already been m atched. In our example, K I\ is the input of C and K I 2 is the input of B. Let I Pi and IP 2 be the I-paths connecting R to K I\ and K I 2, respectively. Each I-path consists of registers, switches and fanout nodes; hence there m ust be a fanout node v at which the two I-paths I Pi, IP 2 fork out. In our example, v is the fan o u t1 111 'L—r ^ IRT R 2l..r~- ^ A B C Figure 5.13: Resolving a no-match conflict for K1 in Figure 5.10. point at the output of R2. We refer to v as the conflict node for the no-match conflict currently under consideration. The conflict can be elim inated by expanding the kernel K to include node v , using the procedure m ergeN od e presented above, resulting in a larger kernel K '. Thus in Figure 5.10(a), the switch M and the fanout point of R2 are added to the kernel, resulting in the kernel K3 shown in Figure 5.13. Since the set of kernel I/O for the expanded kernel K ' differs from th at for: the original kernel K , the search for a m atch must be carried out again on K '. : The process of resolving any no-match conflict and obtaining a new m atch must b e 1 repeated, if necessary, until a complete match is obtained for all kernel I/O . In the, worst case the resulting kernel could be the maximal kernel itself. In Figure 5.13, no complete m atch exists for the expanded kernel K3, and one more expansion step is required, resulting in kernel K2 of Figure 5.10(a) whose test plan is shown in Figure; 5.10(b). The existence of a complete m atch does not guarantee that the associated 1 test plan is conflict-free (unless the resulting kernel is the maximal kernel and uses no I-paths); we still need to check for data/control conflicts among the I-paths in order to ensure that the test plan is feasible. i 5.673 D e a lin g w ith D a ta C on flicts A data conflict is essentially a pair of steps in the test plan th at cannot be executed simultaneously, making the test plan infeasible. Different conflicts may be related. For example, if a given I-path consisting of several switches is required to transm it two different streams of data simultaneously, the resulting test plan would have several conflicts, one at each switch. In such a case, the conflict that involves a node closest to the kernel is called the prim ary conflict, and the others are called its secon d ary conflicts. By resolving a primary conflict, the related secondary conflicts get simultaneously resolved. I Figure 5.14 illustrates two cases of related data conflicts occuring along an! I-path segment. The I-path segment consists of a series of registers, switches, and/or < fanout points. In Figure 5.14(a), the I-path segment along which the conflict occurs! carries two streams of conflicting test input data diverging towards different inputs of the kernel. In this case there must be a fanout node v € F N at the point of divergence, and this is the site of the primary data conflict. Similarly in Figure j 5.14(b), the two streams of conflicting test results must converge at a switch node: v E S W , which is the site of the primary data conflict. In both cases the node v is < referred to as a prim ary conflict node. J In both cases (a) and (b) of Figure 5.14, the primary conflict can be elimi nated by expanding the kernel K to include the node v , according to the procedure m ergeN od e, resulting in a larger kernel K ' . In the merging process some additional switch and/or fanout nodes are included in K '. It should be noted that although the targeted primary and secondary conflicts get resolved, some new conflicts can be created by this procedure. For example, consider the switch v' in Figure 5.14(a) which gets merged into K ' . The switch v' may have additional inputs which were previously ignored when K was the kernel, but now become inputs to the kernel K ' itself. Thus it is quite possible that K ' is unsatisfiable due to new conflicts related to the new input to be considered. However, note also that the process of resolving one data conflict increases the size of the kernel in a monotonic manner; repeated j expansions must stop when the kernel becomes the maximal kernel, which is known! to be satisfiable. 113 I-path conflict region N ew input to IT N ew input to K* I-path conflict region Figure 5.14: Schematic illustration of primary data conflicts, (a) Input side of kernel, (b) output side of kernel. 5 .6 .4 D e a lin g w ith C on trol C o n flicts A similar philosophy is used in resolving control conflicts. Let the conflicting test plan steps be (tl, X I , M l) and (f2, X2, M2). In general a control conflict may involve two test plan steps occurring at different times; i.e., tl ^ t2. This could, be caused by the nature of the state transitions in the controlling circuitry. In this ■ analysis we will assume that information about control conflicts in the test plan is j i available to the software that eliminates conflicts; we will not deal either with how j I this information is determined or with its representation. J i W henever an I-path is used for transm itting test data, there is an implicit j assumption that all elements (registers and switches) in the I-path, as well as across j different I-paths, are independently controllable. If a control conflict is detected, i t : means th at some constraint buried within the control regime has been violated. The j solution is to merge the conflicting I-path elements with the kernel and ensure that | they are treated as ordinary functional logic during ATPG. The data propagation j problem can then be tackled at a lower (say gate) level rather than the I-path level. [ In term s of the graph model, the manipulations of the GTG to resolve a ' i control conflict are identical to the situation where there are two separate data conflicts, one at each of XI and X2. The two conflict nodes (each a switch or fanout node) are identified as before. The procedure m ergeN od es must be invoked twice, once each with the two conflict nodes, to eliminate the control conflict. | 5 .6 .5 S a tisfia b ility P ro c ed u r e j The m ergeN od e function is used by the function findK ernel, listed below, to ■ help in resolving data and control conflicts. findK ernel takes as input the GTG of the circuit along with the set of scan registers, and returns the GTG of a minimal satisfiable kernel. It begins by determining the minimal kernel and checks whether it is satisfiable. (It is possible that in some cases the minimal kernel may consist of more than disjoint subcircuits. Although findK ernel would treat them as a single i kernel, Chapter 6 will show how a kernel may be broken into subkernels th at can b e ! tested separately.) If the minimal kernel is itself satisfiable, this kernel is returned. J Otherwise, depending on the type of conflicts present, it expands the kernel in steps j i until the resulting kernel has a conflict-free test plan. Note that the result of this procedure is not unique since other solutions may also be possible. ! fu n ction findK ernel (G = (V, A,c,w): GTG of circuit; R C A: scan registers; , S W C V: switches): Returns subgraph K of G such that K is a minimal I satisfiable kernel, along with test plan P L A N (K ,t). \ i 1. Determine K max and K min, the maximal and minimal kernels. \ K 4 Kmin- 2. For K: \ K I < — ordered list of input arcs, (kii, • • • , kin)-, K O < — ordered list of output arcs, (ko\, ko2 , ■.., kom). 3. For all a € K I: dri(a) < — {r E R | 3 I-path from r to a}; For all a € KO: rec(a) < — {r € R | 3 I-path from a to r}. 115 4. Find a m atch of scan registers to kernel 1 /0 as follows. Construct two lists R I = (rix, . . . ,rin) and RO = ( r o i,... ,rom), if they exist, such that: \/x G [l,n], rix G dri(kix)\ V cc G [l,ni], rox € rec(kox); all rix distinct; all rox distinct. If no such lists exist [no-match conflict], do: (a) Determine a conflict node v associated with the current no-match conflict; (b) K < — m erg eN o d e(G , K, v)\ (c) Go to Step 2; Else proceed. 5. Determine test plan P L A N (K ,t). Check the plan for conflicts. If there are no data/control conflicts in P L A N ( K ,t ), or if K — K max, return K along with PLAN(K>t)\ Else proceed. 6 . If there is a data conflict, identify a primary data conflict; let it be between the steps (tl,Xl, Ml) and (tl,Xl,M2), with M l ^ M 2. Let v be the primary conflict node. Eliminate this primary conflict: K < — m erg eN o d e(G , K, v). Go to Step 2. 1 7. If there is a control conflict, let it be between the steps (tl, X I, M2) and (t2, X2, M2), with X I ^ X2. Let vl and v2 be the conflict nodes corresponding to X I and X2, respectively. Eliminate both conflicts: K < — m erg eN o d e(G , m ergeN ode(G , K, u l), v2). Go to Step 2 . □ Assuming that the original kernel Km ax determined by the partial scan anal- j ysis is an SB-structure, the kernel K that returned by fin d K e rn e l must also be an j SB-structure since it is contained in K max. The corresponding test plan guarantees, 116' th at any single-pattern test for K can be applied. Given the test properties of SB- structures, a complete single-pattern test set for K can be obtained using ordinary combinational ATPG on its combinational equivalent. In some cases the test plan for K may show that not all scan registers in R are actually used in testing the kernel. This is because a given kernel I/O port may be accessible to more than one scan register, and some of the scan registers may not be used in the matching that is actually used in PLAN(K,t). For example, in Figure 5.7, only R l, R3 and R4 are required for applying tests to K2, while R2 ! is not. The unused registers are still useful for carrying out some simple functional j testing on the switching logic that is not exercised by the test plan. However, having ; identified these registers, one of two design options can be used. The first option is to identify the scan registers that are not used for testing j I the kernel, and actually remove them from the set of scan registers as a postprocess- J i ing step. This may lead to some saving in area overhead, at the cost of making som e! of the switching logic harder to test. Thus, for example, in Figure 5.7, by making R2 a non-scan register, it is more difficult to test the switch M completely—especially if this structure is a subcircuit buried in a larger circuit. Either functional testing i or sequential ATPG may be required to test it. j The second option is to retain such registers and use them for explicitly testing the switching logic in a separate test session. For the structure of Figure 5.7, thej following approach could be used. 1. Based on the combinational equivalent of the minimal satisfiable kernel, w hich' is K2, use combinational ATPG to obtain a test set for this kernel. Apply! these tests using scan registers R l, R3 and R4. (This subset of registers can' be placed at appropriate positions in the scan chain so as to minimize the tim e : to apply each test; the chaining issue will be studied in Chapter 8 .) ^ I 2. By fault simulation, determine the faults in the maximal kernel, which is K I, i but outside the minimal kernel K2, that have not yet been detected. Based o n ! the combinational equivalent of K I, use combinational ATPG to obtain a te s t! set for these faults only. Apply these tests using all appropriate scan registers. 117’ An interesting parallel can be drawn between the construction of kernels in I [4] and th at in this work. In [4], potential kernels are initially identified as arbitrary functional logic blocks at the register-transfer (RT) level, and various clusters of these blocks, without regard to the presence of I-paths. Each of these kernels is then analyzed to determine whether I-paths exist that can route test data to and from it. In the approach presented here, on the other hand, a kernel is initially identified as a cluster of circuit blocks that is guaranteed to have an inlet/outlet at every port. Conflicts among these I-paths are then identified and the information is used to reshape the kernel to ensure that tests can be applied effectively. It can be i argued th at the approach developed in this work is more efficient since no tim e is wasted in analyzing kernels that do not have inlets/outlets at all their ports. 5 .6 .6 E x a m p le An example of kernel minimization using I-paths is shown in Figure 5.15. The structure shown is an SB-structure kernel in a partial scan design. It is actually a part of a larger data path circuit generated by the USC-ADAM synthesis system. For simplicity the scan registers are not shown; instead the inputs/outputs of the structure that are connected to scan registers are marked by small horizontal arrows. [ Since every I/O port of the structure is either a circuit primary I/O or connected toj a scan register, the structure clearly represents the maximal kernel in this design. It consists of two functional units (add-1 and mul-2), five MUXes, and one register. \ The depth of this SB-structure is 1. Interestingly, it is not a B -structure because oT the two paths from mul-2 to Mux-10-1 having different delays. The shaded portion of the figure indicates I-path logic th at can be removed from the kernel to obtain the minimal kernel, namely the unshaded portion of the structure in the center. The test plan for this kernel is listed below. i I P L A N (Kmin, t): (t, reg-6 , REC) i (t — 1, Mux-10-1, add-1) ! (t — 1, Mux-10-1, mul-2) ' 118 PI-1 reg-6 PI-2 reg-4 mul-2 reg-5 reg-3 add-1 reg-6 Figure 5.15: Kernel minimization: I-paths and minimal kernel. i l i 119 (t - - 1 , Mux-5-1, reg-5) (t- - 1 , reg-5, DRI) (t- - 1 , reg-3, DRI) (t- - 1 , Mux-2-1, PI-1) (t- - 1 , Mux-2-2, PI-2) (t- - 2 , Mux-2-1, PI-1) (t - 2 , Mux-2-2, PI-2) There is a data conflict at time (t — 1) since Mux-10-1 is required to execute two different actions. Hence the minimal kernel is not satisfiable. The conflict is ! i elim inated by merging Mux-10-1 with the minimal kernel, resulting in the kernel1 shown in Figure 5.16 (namely the unshaded region in the center). The test plan for j this kernel is listed below. P L A N (Ksat, t): (t> reg-6 , REC) ( t - l , Mux-5-1, reg-5) ( t - l , reg-5, DRI) ( t - 1 , reg-3, DRI) ( t - l , Mux-2-1, PI-1) ( t - 1 , Mux-2-2, PI-2) ( t - 2 , Mux-2-1, PI-1) ( t - 2 , Mux-2-2, PI-2) This test plan has no conflicts, hence the kernel is satisfiable. In order to obtain test patterns for this kernel, the register reg- 2 must be replaced by wires, and ordinary combinational ATPG can be carried out on the resulting structure. The process of obtaining the minimal satisfiable kernel guarantees that all irredundant faults in it can be detected by a test set obtained in this m anner and applied using. the test plan P L A N (K sa t, t). Note that many of the faults in the I-path logic lying j in the shaded region of Figure 5.16 are detected for free (and can be identified by fault simulation) during the test for Ksat. Any remaining undetected faults can be detected by carrying out ATPG separately on the maximal kernel. ! 120 PI-1 reg-6 PI-2 reg-4 Mux- i 2 -2 j mul-2 reg-5 reg-3 Mux- <5-2 . add-1 reg-6 Figure 5.16: Kernel minimization: I-paths and minimal satisfiable kernel. 5.7 Summary Switches, which occur in almost every circuit in the form of multiplexers and buses, are versatile circuit elements that can help reduce test costs in many ways. This chapter has explored many of these im portant benefits. First, the presence of switches can be used to expand the class of structures having the single-pattern testable property, and to use this information for more efficient partial scan design. Second, switches help to set up I-paths for transporting test data, which can lead to reduced test generation costs and potentially lower test time. Finally, although this approach preserves the testability of the non-switch functional logic, it provides I the option to reduce the scan design overhead (in a postprocessing step) by trading j it off against testability of the unexercised switching logic. In the following chapter | one additional feature of switches will be encountered: their use in partitioning a ! circuit in a natural functional manner. Chapter 6 Partitioned Partial Scan Testing i i “United [they] stand, divided [they] fall.” —English proverb, with apologies 6.1 Introduction i In the preceding chapters, we have seen how scan registers can be selected for a sequential circuit such that some form of combinational ATPG can be used to ob-1 tain a complete test set for the circuit. Although the combinational ATPG problem 1 is theoretically NP-complete, its complexity is significantly lower than that of se quential ATPG. Thus in general, given a combinational test generation model of m oderate size for the circuit under test, tests can be obtained fairly quickly using; procedures such as PODEM [27]. ATPG procedures use heuristics which have been I i shown to reduce the tim e complexity to approximately 0 (n2) or 0 (n3) in the a v -' erage case. However, as the following paragraphs will explain, the complexity may i become exponential in large circuits unless some form of partitioning is carried out. ' i The faults in a combinational circuit under test can be grouped into two c a t-! egories: irredundant (or detectable) faults and redundant faults. Irredundant faults are those that can be detected by a static test consisting of a single test vector; 123! redundant faults are those that cannot. In other words, combinational ATPG algo rithm s such as the D-algorithm and PODEM will fail to find a test for a redundant fault. Redundant faults may be caused by additional logic th at is provided in a circuit to reduce the existence of hazards and/or races. Unfortunately such faults, if present, may consume an exponential amount of ATPG com putation, since the ATPG procedure may search fruitlessly for a test vector until all possibilities are exhausted. (In some cases a timeout is used to limit the backtracking.) Thus, since j any redundant faults may contribute an exponential amount of com putation, the I j test ATPG cost for a very large circuit could be dominated by such faults and may j become excessively high. ! i The ATPG cost for a large circuit can be reduced by partitioning it into ^ smaller subcircuits that can be tested individually. This partitioning process could be carried out manually, since human designers can easily identify functional modules or clusters of modules th at can be treated as subkernels to be tested relatively independent of each other. It could also be carried out by software, provided th e , complexity of the partitioning process is relatively low compared to th at of the j ATPG process itself. Some ideas for automatically partitioning a circuit for test are suggested in this chapter. j j W hether the partitioning is carried out manually or automatically, however, 1 two interesting problems arise. (1 ) Given a test plan and test vectors for each I subkernel, how should the various tests be scheduled? There may be interactions among the various subkernels that do now allow them to be tested sim ultaneously.; i For example, two kernels that use the same switch for propagating test results cannot be tested in the same test session. (2 ) Given the number of circuit I/O pins th at are available for constructing separate scan path chains, how should the scan registers be configured into one or more scan chains so as to achieve the lowest test time? These two problems, namely test scheduling and scan path chaining, will be studied in Chapters 7 and 8 , respectively. In the the remainder of this chapter we study an approach for au to m atic: partitioning of an acyclic and/or balanced kernel in a serial scan design. The ap-1 proach consists of a combination of three different schemes. The first, output-based \ partitioning, breaks up the kernel by identifying all the circuit logic feeding each 124! kernel output. The second, switch-based partitioning, uses information about the switches present in the circuit to identify portions of the kernel th at can be tested independent of others. The third, size-based partitioning, subdivides the kernel into smaller kernels by scanning additional registers. Using a suitable combination of these partitioning schemes, the single kernel resulting from the partial scan analysis can be subdivided into a set of smaller, more easily tested kernels. The principles behind the three partitioning schemes are presented in the following sections, after which a global partitioning strategy is described in Section 6.5. To facilitate the discussion, we define a cone of logic as follows. G = {V, A, c, ic} refers to the generalized topology graph (GTG) of the original circuit. | I D efin ition Given an acyclic and/or balanced kernel K = { Vk, A k , c , u >} within G, and any node v £ V k , the cone of v in K , denoted by cone(K,v) is defined as K{y) = {V k ( v ) , A k ( v ) , c, w } where Vk{v) = {u £ Vk | there is a path in K from u to u ) and A k ( v ) = A d [ V r . - ( u ) x V r - ( u ) ] . 1 6.2 Output-Based Partitioning ! The first form of partitioning is illustrated in Figure 6.1. Assume that this structure I is part of a larger circuit, and that all four registers R1-R4 have been selected to be I in the scan path. The original unpartitioned kernel, C, consists of two substructures C l and C2 . C l and C2 could be balanced or unbalanced structures, depending on the type of partial scan used. The outputs of C l and C2 feed the scan registers R l and R 2 , respectively. Starting at each of these outputs, and moving towards the inputs of C, we can identify the cones of logic feeding these two outputs. Each of these cones can be treated as a separate kernel. Thus the kernel associated with the i output at R3 consists of C l fed by the input at R l; the kernel for R4 consists of; C2 fed by the inputs at both R l and R2. ATPG can be carried out separately for : C l and C2. During test, the kernels C l and C2 need to be tested in two separate j sessions, because the same driver register R l is required to supply different test data 1 to C l and C2 . j i 125 32 100 vec. R4 R2 100 vec. C1 20 vec. C2 Figure 6 .1 : Subdividing a kernel by output-based partitioning. Note that to test C l, test patterns need to be scanned only into R l and test j i results need to be scanned only out of R3. Depending on the relative num ber of test J patterns for C l and C2 and the sizes of the various scan registers, this could imply a saving in test tim e in certain cases provided the ordering of registers in the scan path is appropriate for this purpose. The problem of ordering the scan registers will be dealt with in Chapter 8 . However, for the current example, assume that the scan path contains the registers connected in the order shown in Figure 6.1. Let the test length of C be 1 0 0 (patterns) and let the test lengths of the subkernels C l and C2 b e ! 100 and 20, respectively. Assume also that Cl and C2 are combinational. The test! tim e for C is (41 x 1 0 0 ) -f 16 = 4116 clock cycles. This test tim e can be reduced j to 1744 clock cycles simply by testing C l and C2 as separate kernels. During the test for C l, each test vector is shifted into R l and each test result is shifted out of R3. Thus only a portion of the scan path needs to be accessed, and only 908 clock j cycles are required to test C l. During the test for C2 the entire scan path needs to, be accessed, requiring a total of 836 clock cycles. Thus the overall test tim e for C is , reduced significantly in this example. ' The process of output-based partitioning can be described in formal term s as j follows. Let the GTG of the original circuit be G = (V, A,c, w), and let the kernel: resulting from the partial scan analysis on G be K = (Vk, Ak, c, w). Let K O C Vk 126 denote the set of nodes in K at which there is an output to a scan register or to~a1 prim ary output port of the circuit. Each such kernel output gives rise to a d istin c t! subkernel. The set of all such output-based kernels, denoted by K S o p , is given by | the following expression: J I I K S OP — {cone(A", vQ ) | vQ £ KO}. j I I Note that each kernel (cone) in K S OP is in general an acyclic sequential structure | (balanced or unbalanced, depending on the nature of the original kernel K ). Because the subkernels generated in this way may overlap, they do not strictly form a partition of the original kernel K. However, based on these kernels it is possible to form a strict partition of the fault set of K . For a given kernel k{ £ K S op , let F(ki) be the set of faults to be tested in it. When generating tests for the various kernels in K S op , faults that lie in the region of overlap among kernels kt, kj £ K S op i need to be tested only once. Thus after test generation is carried out on any kernel the faults already detected can be removed from the fault lists of all other kernels kj,j 7^ i in which they occur. By doing so we can ensure that the fault sets F(ki) j form a strict partition of the original set of faults F(K). Note that any detectable fault in F(K) must be detectable in at least one of the kernels F(ki). \ Depending on the nature of the circuit under consideration, there may or | may not be a high degree of overlap among the various cones generated in this way. J Below we discuss the implications of various degrees of overlap among kernels. | i Low O v e rla p Clearly, if two kernels k{ and kj have very little logic in common and share few inputs, as illustrated in Figure 6.2(a), it is advantageous to treat them as separate kernels. This reduces the size of each circuit on which ATPG is invoked, and also has the potential for reduced test time as dem onstrated earlier. I H ig h O v e rla p Figure 6.2(b) shows a case where ki and kj have a large proportion . I of their circuit logic, as well as kernel inputs, in common with each other. In th is ' case the reduction in the circuit size for ATPG is less significant, and there is low! potential for reduced test time. In fact, there is actually a possibility of increased! 127 (a) (b) Figure 6.2: Overlapping kernels, (a) Low overlap, (b) high overlap. I test generation effort. Assume that ATPG is carried out first for all faults in ki and j then for the undetected faults in kj. Suppose there is a fault / in the region of k{ \ common to kj that can never be propagated to the output of kj, but can only bej propagated to the output of kj. During ATPG on ki, some backtracking effort may* be wasted in trying to detect / . If however ATPG had been carried out on ki and kj , combined, / might perhaps have been detected with less overall effort. The actual j I effort in each of the two cases depends on the characteristics of the ATPG algorithm ' i and its implementation. From the above observations it is evident that kernels with little or no overlap | I should be treated independently and kernels that overlap heavily should be m erged.; l This guideline is used in the procedure presented below, p rocessK ern els to merge | kernels that have a large amount of circuitry in common. j i I I M e rg in g O v erla p p in g K ern els ■ I The procedure processK ernels compares all pairs of kernels ki and kj, which may have been derived using output-based partitioning or any other means. If the in- j tersection of the two kernels (i.e., the overlapping region) has a size (number ofj combinational gates) that is more than a certain fraction a (say, one third) of that 128 of their union (i.e., the combined size), it combines them into a single kernel. The procedure m ergeK ern el invoked by it, which is not listed here, simply returns a kernel formed by merging the two kernels passed as param eters. The kernels are considered in order of increasing size so that smaller kernels tend to get merged earlier before being compared with other kernels. procedure processK ernels (K S : set of kernels): Modifies K S by merging some kernels with others depending on overlaps, and returns modified set. j i { | VT; E K S , in order of increasing size do: 'ikj E K S — {ki}, in order of increasing size \kj\, do: if (\ki H > a ■ \k{ U kj | ) / * er is a param eter, 0 < a < 1 */ { | ki < — m ergeK ernel ( k{,k3)\ j Remove k: from KS. } R etu rn KS. } 6.3 Switch-Based Partitioning i f I A form of partitioning using switches present in a circuit is shown in Figure 6.3. The circuit contains a kernel C of a circuit surrounded at the inputs and outputs by scan registers. The kernel consists of two sub-structures Cl and C2 connected to a multiplexer M . The original kernel can be subdivided into the smaller kernels C l and C2, since any test for C l or C2 can be applied by setting the multiplexer M j to read data from C l or C2, respectively. Hence it is sufficient to run ATPG on C l i and C2 separately. The tests for the various kernels need to be applied in separate I sessions. However, as is the case for output-based partitioning, not all registers m a y ' need to be scanned in every session. Hence it is possible for the overall test tim e to I be reduced due to partitioning in some cases, depending on the ordering of registers in the scan path. 129 R2 C2 Figure 6.3: Subdividing a kernel by switch-based partitioning. The example above illustrates an im portant fact, namely th at switches pro vide a natural way to partition a circuit for ATPG. Given the set of switches con-1 tained in a kernel (if any), every combination of control settings for the various j switches gives rise to one or more candidate kernels. For example, consider the cir cuit in Figure 6.4(a), which has two switches M l and M2. Assume that during a given test session, the control input of switch M2 is set such that M2 reads d a ta 1 from A. During this entire session, M2 ignores the data at its input fed by B. Hence only faults in II, 12 and A (i.e., in the cone of logic feeding the left input of M) * along with M2 and C can be detected in this session. In effect, by setting M2 to read from A, we have isolated the kernel corresponding to the leftmost structure in Figure 6.4(c). If M2 is configured instead to read data from B rather than A, then the resulting kernel contains 12, 13, M l, and B (i.e., the cone of logic feeding the right input of M2) along with M2 and C. This kernel can be further subdivided, by configuring M l in its two possible modes, into two kernels to be tested in separate! sessions. These two kernels correspond to the middle and rightm ost structures in j Figure 6.4(c). Thus we have shown how the original kernel of Figure 6.4(a) can be; subdivided into three switch-based partitions by configuring the two switches in it in all possible ways. Note that the nodes C and M2, which feed the prim ary o u tp u t. directly without passing through any switches, are always present in any subkernel irrespective of the switch configurations. The procedure s w itc h P a rt presented below can be used to construct the set of all switch-based partitions. The procedure generates a tree of states, each of which represents a partially constructed kernel for which some subset of the switches have already been configured to read data from one specific input port. Each leaf state corresponds to a complete kernel in which all applicable switches have been configured. The result of the search is a set of kernels K S sw , each derived from a leaf state in the search tree. For the circuit of Figure 6.4(a), the states generated by the search process are shown in Figure 6.4(b). ; As before, G = (V, A,c,w) is the original circuit, K = (Vk,Ak,c,w) is the! unpartitioned kernel, and K O C 14 is the set of kernel output nodes. During | the search the procedure maintains a list of states along the path from the start > (root) state So to the current state S ',-. Each state S ,- is represented by a 3-tuple I (K i , S E Q i,Ii ) where Ki is the current configuration of the kernel being generated; 1 SEQ i is a sequence of switches whose control settings have not yet been configured; j and /, is a subset of the input nodes of the first switch in SEQi. \ i The first operation in the procedure sw itc h P a rt is to generate the s ta r t, state So = (Kq 7 SEQo, / 0), as shown in Figure 6.4(b) for the current example. Ko j represents a part of the original kernel K that is present in all the kernels derived | from switch-based partitioning. It is obtained by removing all switches from K and j then merging the cones of all the output nodes K O in the resulting structure. SEQ o is a list of all switches in S W , sorted such that if S{,Sj £ S W and there is a path from Si to sj in K , then s , appears after Sj in SEQo. Note that this ordering is not unique, and th at such an ordering must exist since K is acyclic. Finally, T o is a set of nodes at the inputs of the first switch in SEQo. As the search proceeds, the elements in / 0 will actually be removed one by one until I 0 becomes empty. Note th at the value of I 0 shown in Figure 6.4(b) is the initial value when the start state is created. In general, from a non-leaf state Si the procedure attem pts to generate a new j state Si+1 by deciding on a tentative control setting for the first switch in S E Q i, 1 II T T (a ) t ± 1 2 1 1 2 i s V V c c t ♦ (0) i= (M2, MI) B} r Kj = {C,M2, A 11, 12} SEQ1 = () ^ Ii = { ) r Ko={c} SEQo = (M2 I Iq= {A, l = {C, M2, B p \ 1 = (Ml) ! = {12,13} J K! = SEQi . I, ^ K 2={C,M2,B? Ml, 12} SEQ2 = () ^ I2=(} (b) K2= {C,M2,B^ Ml, 13} SEQ2 = () I h=n II 12 J T T i. C T Figure 6.4: Illustration of switch-based partitioning procedure, (a) Original kernel | K, (b) state space, (c) kernels generated, (d) combinational equivalents. j (d) 132 _1 say M. In effect7~the switcE- ilif gets added to the kernel along with the cone of logic feeding one of the inputs of M. If no further states can be generated, the current state is identified as a leaf state and the current kernel is added to K S sw . The procedure then backs up from the current state to an earlier one and tries to generate alternative states from the new current state. The search ends when all states have been exhausted. The detailed procedure is listed below. procedure sw itch P artition (circuit G = (V,A,c, w); original kernel K = (14, Ak, c, w); switches SW C 14; kernel outputs KO C 14): Returns K S s w , set of kernels. 1 . K' < — K with all switches in S W removed. 2. K'0 ^ \ J VoeK o C o n e(K \v0 ). 3. Construct start state S0 = (K 0,SEQ 0,Io) as follows. (a) Ko <- K < 0 . (b) SW' < r — SW — {switches with no incoming arcs}. (c) SEQo < — SW' sorted such that switch s,- in SEQo does not have a path in K to any switch sj in SEQo, j > i. (d) I q < — {nodes in K directly feeding the first element of SEQo}. 4. Let current state be Si. If SEQ i is empty, this is a leaf state; do the following. (a) Add a copy of Ki to K S s w . (b) Delete state Sf, find the most recent state Sj (i.e., the maximum j, 0 < j < i) such that Ij ^ < f> . If there is no such state, the search is completed; return K S sw . Else, make Sj the current state and denote it by S% . 5. From current state Si generate the next state 5 ',-+i = (iC+i, SEQi+i, Ii+i) as follows. (Note that SEQ i must be nonempty since Si is not a leaf state). (a) M < r — first element of SEQi. (b) If Ii is empty do the following: i. If i = 0 return K S sw . ii. i < — i — 1 (5*; is new current state). iii. Go to 4. Else do the following: i. Let a be some arbitrary element of If, remove a from It. ii. Ki+1 < — Ki U {M } U cone(K', a). (c) S E Q i+i <- SEQ i - (M ). (d) While the first element of S E Q l+\ neither feeds any node in K {+ 1 nor feeds any primary output of K ' 0 : remove the first element of SEQi+i. (e) Ii+i < — {nodes in K directly feeding the first element of S E Q i+j}. ! (f) * < — i -f- 1 (Si is new current state). I 6 . Go to 4. □ The subdivided kernels from switch-based partitioning of the kernel of Figure j 6.4(a) are shown in Figure 6.4(c). As in the case of output-based partitioning, the J set of subdivided kernels K S sw may have overlaps among them ; the procedure invocation p ro c e ssK e rn e ls (K S sw ) can be used to merge kernels th at have a high j degree of overlap, for example, the second and third kernels shown in Figure 6.4(c). f The resulting combinational equivalents to be used for ATPG are shown in Figure 6.4(d). W hen a certain switch in a kernel has only one input present, this switch may j be replaced by a simple wire in the combinational equivalent, as in the case of the first kernel. During test application this switch must be configured so th at it selects the appropriate input. When two or more kernels are merged during postprocessing, the resulting kernel may have switches with more than one input present. In this case, for the purpose of ATPG, the switch should be modeled by a multiplexer having exactly as many inputs as are present in the kernel; the control inputs generated by ATPG should be transformed into the appropriate control patterns during test application. 6.4 Size-Based Partitioning i i Both the output-based and the switch-based forms of partitioning can help to reduce1 the ATPG cost as well as potentially the overall test time. However, if any of the C 2 Figure 6.5: Subdividing a kernel by size-based partitioning. resulting kernels is still too large, a third form of partitioning can be used to break | it down at the cost of additional logic. This scheme, called size-based partitioning, involves introducing extra scan registers such that a kernel gets physically divided into two or more smaller kernels. An example is shown in Figure 6.5. The original^ kernel C is broken down into the subdivided kernels C l and C2 , each of which may j be an acyclic/balanced structure, by converting R3 into a scan register. ! I Size-based partitioning dilfers from the output-based and switch-based form s; of partitioning in two m ajor characteristics. • Size-based partitioning introduces extra scan registers, leading to additional overhead and possibly performance penalties. • The resulting kernels have they advantage that they can be tested concurrently. This issue will be elaborated on in Section 7.2 under the discussion of test i scheduling. The “size” of a kernel refers to some characteristic that influences the ATPG cost. Typical size characteristics are the number of combinational gates, the number of inputs/outputs, or the largest number of gates in any input-output path. The exact notion of size depends on the actual design and ATPG environment, e.g., on : what kind of computing power is available and the efficiency of the ATPG algorithm I used. In the discussions here we assume that there is a function size(&) which, fo r, any kernel k, returns either 0 if the kernel is acceptable or 1 if the kernel needs to \ be partitioned further. 135! Here we present an example of a simple greedy algorithm, lim itS ize , for size-based partitioning. It essentially determines a set of additional scan registers required to limit the depth, i.e., the largest number of non-scan registers in any path through the kernel. This procedure is an extension of th at presented in [16], which has the lim itation that it simply uses the number of registers in a path as an estim ate of the test generation effort; other criteria are ignored. The lim itS iz e procedure presented here allows the more general size function to be used as a partitioning criterion. The procedure builds up each partition by starting with a seed node and growing a cluster of circuit nodes until it reaches the size limit. ! W henever a node is added to a cluster, all nodes lying in the same cloud (as defined in Chapter 3) must be added to the cluster. In other words, a maximal set of nodes th at are connected to each other via wires must be simultaneously added to the cluster. This ensures that at any stage in the formation of the cluster, all ‘ connections at the boundary between the cluster and the rest of the circuit are j through registers. Hence the cluster can be made to form a valid partition, isolated; from other partitions, by scanning these registers. In the procedure below, cloud(v): denotes the set of nodes in the same cloud as the node v. ! | p ro c e d u re lim itS ize (Circuit G = (V, A,c, w); original kernel ; K = (T4, Aki c, u>)): Returns R , set of additional scan registers. 1. K ' *- I<\ K S SZ <- < f> . : i 2. W hile K ' is not empty do: j (a) Pick a node k on the periphery of K'\ i.e., k € K ' such that A : is a j P I/P O node of K ' or there is a node v E V adjacent to k but not in K '. ; (b) Construct kernel K P containing all nodes in cloud(k). K ' <- K ' - K P . \ T R I E D 4 > . (c) If size(K P ) = 1 then (cloud(k) alone violates the size bound) go to step I 2 d; else expand subkernel K P in a greedy fashion as follows. | Do forever: ! i. Pick a node v E K ' — T R I E D adjacent to a node in K P such th at , \cloud{v)\ is highest; if there is no such node, go to step 2 d. j ii. KPtemp K P U cloud(v). iii. If size(KPtemp) = 1 then add v to T R I E D ; else add the nodes in cloud{v) to K P and remove them from K ' . (d) K P is a maximal kernel satisfying the size limit; add it to K S s z . 3. R * — all registers in Ak that connect two nodes in two different partitions in ] K S s z . , i 4. R etu rn R. □ j i The procedure listed above results in a set of additional scan register required I to partition the original kernel into subkernels. A side-effect of the greedy approach j is th at while most subkernels may be just under the size bound, a few residual kernels , may be much smaller; this could be avoided if a more elaborate scheme were used, j I 6.5 Global Partitioning Strategy Let K be the kernel resulting from the partial scan analysis of Chapter 5. The; objective of the global partition strategy is to construct a set of kernels K S by using a ' combination of output-based, switch-based and size-based partitioning. A procedure' glob alP artition that carries out this task is listed below. First, it uses output-based and switch-based partitioning to break down the kernel as far as possible. If anyj of the resulting kernels violates the size criterion, size-based partitioning is used to introduce additional scan registers appropriately. In this case the entire procedure is repeated for the resulting kernel. 1 procedure glob alP artition (Circuit G = (V, A,c,w); original kernel K = (14, Ak, c, u;)): Returns set of additional scan registers R and set of kernels K S . i 1. R * — < f> . 2. (Output-based partitioning) j K S op = {cone (K ,v 0) | u0 € K O ). ! K T E M P < — processK ernels(/C5'op). 137- 3. (Switch-based partitioning) I<Ssw < — [)KPeKTEMP sw itchPart(G , KP). K T E M P <- processK ernels(I<Ss w ). 4. K B I G < — { K P € K T E M P \ size(K P) = 1}. If K B I G is empty then return R and K S * — K T E M P . 5. (Size-based partitioning is required) 1 For all K P € K B I G do: j i (a) R sz < — lim itSize(G , K P). (b) r < - r u r s z . ; 1 3 6 . In original kernel K , remove all registers that are also present in R. j Go to step 2 . □ ! i i Each subkernel in K S obtained from the partitioning process can be treated j as an independent kernel for the purpose of test generation and test application, f Every subkernel has the same testability characteristics as the original m axim al1 l kernel; i.e., if the maximal kernel is an SB-structure, then every subkernel is an! SB-structure. For each subkernel, the I-path analysis of Chapter 5 can be carried j out to identify a minimal satisfiable kernel, along with the associated test plan. 6.6 Summary i 1 This chapter has explored the problem of testing a circuit in a partitioned manner, i The maximal kernel resulting from the partial scan analysis of the preceding chapters i can be subdivided into smaller kernels by using a combination of three different forms of partitioning: output-based, switch-based, and size-based. Although the resulting kernels may have regions of shared circuit logic, the sets of faults targeted for detection in the various kernels form a proper partition of the set of target faults : in the original kernel. By forming subdivided kernels, the overall test generation J cost can be reduced, and in some cases the total test application tim e can also | be reduced. The reader is asked to bear in mind that enhancing the au to m atic. partitioning process, to maximize its benefits, would require more knowledge about j 138' the relations Hips among partition features, ATPG cost, and test time; this is a subject for future research. The more im portant contribution of the partitioning study presented here is that it enables us to formulate the test scheduling problem, which is studied in Chapter 7. i Chapter 7 I i t Test Scheduling . i i i “The sum of the parts is lesser than the whole. ” I —Source unknown \ i ! 7.1 Introduction I i t The partitioning procedure of Chapter 6 results in a set K S of independently testable subkernels. For each of these subkernels a combinational equivalent circuit can be derived in the same way as for the minimal satisfiable kernel of C hapter 5. ATPG can then be carried out separately on each combinational equivalent to obtain a test set for the corresponding kernel. We now have the problem of applying these tests in i an efficient manner. The kernels could simply be tested one at a tim e in a sequence.; | However, as this chapter will show, the overall test tim e can be significantly reduced by exploiting the potential for testing different parts of the circuit in parallel. ■ Depending on the characteristics of the kernels in K S , some subsets of kernels may have m utual conflicts that require them to be tested in different test sessions; i t , may be possible to test other subsets concurrently with no conflicts. In this chapter1 the different types of kernel relationships that affect the scheduling of tests are studied, and subsequently a method for scheduling tests for the various partitions to achieve minimum overall test time is presented. 7.2 The Test Scheduling Problem So far it has been assumed that all scan registers are to be connected together in a single scan chain. If instead they are distributed among m ultiple scan chains, each having its own scan in/scan out pins and if necessary its own test/norm al mode control pin, the distribution has an influence on the scheduling of tests. The popular approach to multiple scan path chaining is to try to make all chains of equal length; however, it will be shown in Chapter 8 that the greatest reduction in test tim e can often be achieved by using unequal-length chains. In this section we assume that ; the scan chains have already been constructed; the problem of constructing the scan : chains so as to minimize the overall test time will be studied in Chapter 8 . \ I 7 .2.1 T est A p p lic a tio n w ith M u ltip le S can C h a in s | Consider the circuit example in Figure 7.1(a). A , B and C are kernels resulting! from the partitioning process. Each may be a combinational logic block or an SB- : structure; non-scan registers within the subkernels are not shown. Assume the test ! lengths of A, B and C are 50, 1 0 0 and 30 patterns, respectively. The scan registers, J each having a width of 8 bits, are connected into two scan chains chainl and chain 2 ’ as shown. For kernel K , let T ( K ) denote the number of test patterns, and let C h(K ) j be the set of chains th at are involved in testing K . The number of clock cycles required to completely flush the scan chains in j C h ( K ) is m a x c e c h ( K ) where |c| denotes the length of (i.e., the num ber of FFs in) scan chain c. However, depending on the ordering of the registers in the chains, !and the positions of the registers that are actually used for testing K , it may not be necessary to flush the chains completely in order to apply tests to K . We define the drive cycle of K as the sufficient number of shifts required to scan a test pattern for K into the scan chain(s) in C h (K ), and the receive cycle of K as the sufficient; num ber of shifts required to scan a test result from K out of the scan chain(s) in ; Ch(K). These terms are defined formally below. j c h a i n 2 c h a i n 1 [100 vec.] [30 vec.] [50 vec.] R 5 R 4 (b) Figure 7.1: Example of partitioned testing, (a) Partitioned kernels, (b) test rela tionship graph. D efin ition The drive cycle D C (K ) [receive cycle RC(K)] of K is the smallest integer n such that for all chains c E Ch(K), each FF in c th at drives test patterns into [receives test results from] K is at most at the nth position away from the scan-in [scan-out] pin of c. Thus D C (A) = 8 , D C (B ) = 8 , and DC(C) = 16; RC (A ) = 16, R C (B ) = 8 , and RC{C) = 16. W hen applying tests to K , all chains in K must be operated with a cycle th at is long enough to drive as well as receive test data for K . D efin ition The chain cycle C C ( K ) of K is max [DC(K), RC(K)]. i For example since DC (A) = 8 and RC(A) = 16, we have CC (A) — 16. Note i th at to scan out the output of A present in R4 it is necessary to shift out the contents ' of both R4 and R5, thus requiring 16 clock cycles per test. When a series of tests are applied to A consecutively, it may be possible to get by with just 8 shifts per test by j providing an individual H O L D control for R5 and using the following scheme. After | I applying a given test pattern, the test result must be shifted from R4 into R5 in 8 ' clock cycles, and then held there (to inhibit unwanted data being loaded into R5) ' while the next test pattern is applied. At this time a new test result is available in R4 and the old test result is available in R5, and the shifting operation is repeated over the next 8 clock cycles. The result of the previous test can now be observed. Although this “pipelined” scheme would reduce the effective chain cycle of A to 8 ,; I i n g e n e r a l i t r e q u i r e s s o m e s c a n r e g i s t e r s t o h a v e i n d i v i d u a l H O L D c o n t r o l s , w h i c h 1 i s a h i g h o v e r h e a d . H e n c e w e w i l l c o n t i n u e t o a s s u m e t h a t t h e s c a n c h a i n ( s ) a r e i l c o m p l e t e l y f l u s h e d f o r e a c h t e s t v e c t o r a p p l i c a t i o n . In cases where the assignment of registers to the various scan chains is known • but the ordering of the registers in each chain is not known, the exact chain cycle j cannot be computed. However, a worst-case assumption can be m ade by using i C C (K ) = m axcec,/i(A ') lcli i-e-; the time required to flush all the chains. For kernel 5 , Ch(B) = {chain2} and C C (B ) = 8 . Thus every test requires 1 8 clock cycles for a test pattern to be scanned in while the previous test result is ; being scanned out, and 1 clock cycle for the test result to be loaded in to the scan' 143 registers. Further, if B is a sequential structure with depth o f, each test pattern needs to be applied to the kernel over a duration of d additional clock cycles. In the following discussions we assume that all three kernels have depth 0. Thus 9 clock cycles are required to apply a test pattern to B, and the total test tim e for 100 test patterns is 900 clock cycles, ignoring the time required to flush out the scan chains after the last test pattern is applied. In the case of kernel A, C h(A ) = { c k in l} , and C C (A ) = 16. Since only j 8 bits of actual test data are to be scanned into the driver register R l, the test j i patterns must be form atted and padded with dummy data so that R l contains the appropriate pattern after 16 clock cycles of shifting. The total tim e to apply 50 test patterns to A is 17 x 50 = 850 clock cycles. Kernel C is connected to more than one scan chain; Ch(C) = {chain 1 , ! chain2} and C C {C ) = 16. the total test tim e is 17 x 30 = 510 clock cycles. j Thus in general, for a kernel K , the time T T (K ) for testing it independent! of all other kernels is given by the expression | t T T (K ) = T (K ) x [CC(K) + d + 1 ]. If all kernels are to be tested in separate test sessions, the overall test tim e T T for the circuit is given by T T = TT{I<). i K e K S | i In the following section we investigate the conditions under which kernels can b e ! tested concurrently, and how the various scan chains must be operated in order t o ; I achieve concurrent testing. | i 7 .2 .2 K ern el R e la tio n sh ip s i I j Given a set K S of kernels to be tested and their test plans as described in Chap- j ter 5 , it is desirable to determine the relationships among them and accordingly i schedule the tests so as to minimize the overall test time. There are two types of j I l 144 relationships among kernels that can affect the scheduling of tests: incompatibility and dependence. 7.2.2.1 In com p atib le K ernels Incompatible kernels are those whose tests cannot be applied simultaneously. An incom patibility may either be a result of the partitioning process or be caused by conflicts among the I-paths in the test plans of the kernels. Let a kernel be repre sented by its GTG, and let T P ( K ) denote the test plan for kernel K . i D efin ition Two kernels Kj are incom patible if either of the following holds: [ ( 1. there is a circuit node v present in both K t and K :t; 2. there is a test step (tl, X I , M l) in TP(Ki) and a test step (il,J O ,M 2 ) in j TP(I<j) such that M l ^ M2. □ | I i In Figure 7.1(a), since the switch M transm its test results from both kernels' A and C to the receiver register R4, it would be required to carry out conflicting actions if the kernels were to be tested concurrently. Thus A and C are incompatible 1 due to the second condition in the definition above. I I 7 .2.2.2 D ep en d en t K ernels 1 In the example above, A and B are compatible and can be tested sim ultaneously., (Further, Ch(A) = {chainl} and C h (B ) = {chain2}, i.e., there is no chain that is, used by both these kernels. This implies that A and B can be tested independently using chainl and chain2, respectively, without the chains having to operate in sync. 1 For example, the 50 test patterns for A could be applied using the following: sequence of operations: Repeat {Scan in test pattern for A into chain 1 (while scan- j ning out previous test result) in 16 clock cycles; apply pattern to A; load test result J into chainl in 1 clock cycle}. Simultaneously, the 100 test patterns for B could! be applied using the following sequence of operations: Repeat {Scan in test patternl 145 for B into chain2 (while scanning out previous test result) in 8 clock cycles; apply pattern to B; load test result into chain2 in 1 clock cycle). Of course this requires th at the two chains be separately controllable; but the “asynchronous” operation helps to reduce the test time since chain2 does not have to waste tim e by operating at the longer chain cycle of chainl. B and C are also compatible since they can be tested simultaneously. How ever, both use chain2 for applying tests, hence they are said to be d e p e n d e n t. D e fin itio n Two compatible kernels Ki, Kj are d e p e n d e n t if both of the following i hold: i I 1 . Ch(Ki) D Ch(Kj) ^ 4> ; ; 2 . CC(Ki) ± CC(Kj). ; The dependence between B and C implies that when testing C at the same tim e as J5, the chains used for testing B (i.e., chain2) must be slowed down to be in sync with the other chains used for testing C (i.e., chain 1 ). Thus chain2 would receive a new pattern for B every 16 clock cycles rather than every 8 . Recall from the earlier analysis that the test tim e for C alone is T T (C ) = 510 and th at for B alone is T T (B ) = 900. However, when testing them concurrently, the test tim e for B ! j increases to more than 900 because of the slower operation of chain2 while C is also being tested. The reader will not be burdened at this point with the calculation of! the new test tim e for B ; this is deferred until the test scheduling model has m atured. I But it should be observed that in general a dependence between two kernels may tend to increase the test time, although the overall time for testing them in sync would still always be lower than the time for testing them separately as if they were incompatible. , I 7 .2 .3 M o d e lin g T est R e la tio n sh ip s i i The incom patibility and dependence relationships among kernels are modeled using 1 a te s t re la tio n sh ip g ra p h (T R G ), which is defined as follows. A TRG is an 146 undirected graph (V, I , D) in which nodes in V represent the kernels resulting from partitioning; edges in I C V x V connect kernels that are incompatible; and edges in D C (V x V) — I connect compatible kernels that are dependent. Note that If)D = < j). For the example of Figure 7.1(a), the TRG is shown in Figure 7.1(b). Solid lines represent incom patibility edges in I and broken lines represent dependence edges in D. The TRG is an extension of the test incompatibility graph (TIG) defined by Craig et al. [28]. The TIG is used to model incompatibility relationships alone, and is similar to the TRG without any dependence edges. It is used for scheduling j built-in self-test (BIST) circuits. In the TIG model, every kernel is assumed to have a fixed test time which does not depend on whether any other kernels are tested | simultaneously. In contrast, in the case of scan design with multiple chains, the test ! tim e for a given kernel is not constant, since the chains may need to be slowed down ! depending on other kernels being tested at the same time. It is this feature of serial j scan designs that cannot be modeled in the TIG, and is modeled in the TRG using j the dependence relationship. D efin ition A subset of kernels V' C V is said to form a com p atib ility group if; I fl (V' x V ) = < f), i.e., there are no incompatibility edges among them. ; | Thus in Figure 7.1(b), both {A, B} and {B, C} are compatibility groups.! [Each single kernel alone forms a compatibility group. Any set of kernels forming aj compatibility group can be tested concurrently. 1 | D efin ition A set of nodes V' forming a compatibility group is said to be a d e p en d en ce group if the subgraph (V', I' = < f> , D' = (V7 x V') fl D) is connected. Thus { B ,C } and {B} are dependence groups but {A, B } is not. 1 I i D efin ition Given a dependence group K D , the chain group of K D , C h(K D ) = ! ^KeKDCh(K), i.e., the set of all chains used for applying tests to one or more kernels : in K D . 147' < fi) < d ) Figure 7.2: TRG example to illustrate dependence groups. I i Analogous to the definition of the chain cycle of a kernel, we can define the chain cycle of a dependence group. I D efin ition The chain cycle of K D , denoted by C C (K D ), is maxKeKD CC(K ),' I i.e., the maximum of the chain cycles among the kernels in K D . i For example, C h ( B ,C ) = {chainl, chain2}, and C C ( B ,C ) = 16. The chain | group is essentially the set of chains that must be used to test the kernels in the associated dependence group concurrently. All these chains m ust be run at the corresponding chain cycle, i.e., each new pattern must be scanned in over the corre- ( sponding num ber of clock cycles. j Every compatibility group K C Q V can be partitioned into a set of depen dence groups by obtaining the connected components of the subgraph (KC, < f> , (K C , x K C ) fl D). For example, Figure 7.2 shows a TRG with six kernels. The set {a,b,c,d,e} is a compatibility group, and it can be partitioned into the three de pendence groups {a,b}, {c, d) and {e}. The following property of dependence groups helps to construct a test sched- i ule for a set of compatible kernels. 148 L e m m a 9 Given a compatibility group K'C and two of its dependence groups KD i, K D j , if C C (K D i) ^ C C (K D j) then Ch(KD f) n C h(K D j) = < f> , i.e., the chain groups are mutually exclusive. P r o o f Assume for the purpose of contradiction th at CC (K D i) ^ C C (K D j) and - C h(K D f) fl C h(K D j) ^ < j> . Let c £ Ch(KDi) n Ch{KDf). Then chain c is used for applying tests to some kernel ki £ KD i as well as to some kernel kj £ K D 3; hence | there must be an edge {ki, kj} in the TRG. This implies that ki and kj m ust be in j the same connected component of the subgraph (KC, < f> , (K C x K C ) CI D) of the j TRG. This contradicts the assumptions. □ i C o ro lla ry Given a compatibility group K C , and a chain c, let KD i, i = 1, 2 , . . . , m j be different dependence groups of K C satisfying c £ Ch(KD f). Then C C (K D \) = ' C C (K D 2) = • • • - CC(I<Dm). \ P r o o f Consider any pair KD{, K D j , 1 < i,j < m. Since c belongs to both K D i and K D j, the set Ch(KDi) fl Ch(KD j) is nonempty. Lemma 9 implies that j C C (K D i) = C C (K D j). Applying this argument to all pairs of dependence groups j in {K D !... K D m}, we have C C (K D X) = C C (K D 2) = • •. = C C (K D m). □ I The corollary above implies that given a compatibility group K C of kernels! th at are to be tested concurrently, every scan chain used for applying tests m ust b e ; associated either with a unique dependence group KD i within K C or with a subset j of dependence groups having the same chain cycle. Hence the following rule can b e : used for concurrently applying tests to the set of kernels K C in a particular test \ session. j I T est Session E x e c u tio n R u le: For each chain c, let K D i be some dependence group, if it exists, such that c £ Ch(KD{). Then in the j current test session c must be operated at the chain cycle C C (K D i). If j no such dependence group exists, chain c is idle in this session. | 149 Based on the model presented in this section, the test scheduling problem can be stated as follows: determine a schedule for applying tests to kernels such th at (1 ) at all times, the kernels being tested form a compatibility group; and (2 ) the overall test time is minimized. Below we examine a special case of the problem in which there are no dependencies, and show that it is equivalent to the problem studied earlier by Craig et al. [28]. Since Craig’s problem is intractable, the special; case study also serves to show that the general scheduling problem in the presence | of dependencies is intractable. Subsequently, in the following section, we study the * general case of test scheduling with dependencies. j i l 7 .2 .4 T h e N o -D e p e n d e n c e S ch ed u lin g P r o b le m i This problem is mostly of academic interest since the chance of an arbitrary circu it! having no dependencies is low. It is, however, useful in the special case of a partial ; I circuit design in which the following conditions hold: i I i 1 . either there is a single scan path or there are m ultiple scan paths with exactly \ the same length; 2 . the assignment of registers to scan chains and/or the ordering of registers in j each scan chain have not been determined yet or are not known. j In such a case, as we observed earlier, the worst case can be assumed in which : every scan chain is flushed completely in order to drive/receive from any kernel; and the chain cycle is taken as iV, the length of the single or m ultiple scan chain(s). Hence during the entire test, the scan chain(s) must operate at a chain cycle of N irrespective of what subset of the kernels is under test at any time. ! I Let K max be the maximal kernel as defined in Chapter 5, and let its depth be d. Let K S = { K \ , . .., K n) be the set of kernels in the circuit obtained by ■ subdividing K max. Note that every kernel Ki € K S must have depth d(K { ) not ! higher than d. When applying tests to any kernel K{, we will assume that each j pattern is held in the scan path for d — d(Ki) clock cycles before the test plan for ! Ki is initiated. This has the advantage that any group of compatible kernels can 150 1 be tested in a synchronized manner irrespective of their individual depths. Since in general d < C N , the increase in test time (if any) due to this synchronization is likely to be negligible. Let each kernel Ki G K S have test length T(K,). Then the test tim e for K % is T T (K i) = (N + d T 1) x T(I<i) clock cycles. This test tim e is fixed irrespective of j what other compatible kernels are tested in parallel with it. In other words, there i are no dependencies among the kernels. Hence we can construct a TRG in which I there is an incompatibility edge for every pair of incompatible kernels, and th e re ' are no dependence edges. This graph is identical to the TIG defined by Craig et , al. in [28], and the scheduling problem is identical to their unequal execution time j scheduling problem with test time for node K t equal to TT(Ki). ^ This problem is studied in detail in [28]. It is shown to be intractable, and two j different heuristics are presented. The first heuristic breaks down each kernel test into subtests such that all subtests for all kernels have equal duration. A suboptim al; algorithm is then used to schedule the resulting set of independent equal-duration tests. The second heuristic is similar to the first, but it assumes th at any test for a \ kernel, once initiated, must run to completion; i.e., all subtests for a kernel must be' executed in a contiguous manner with no interruptions. j i 7.3 General Test Scheduling Algorithm i In the general case of multiple chains that are not all of the same length, the TIG of Craig et al. [28] is not sufficient to model the problem. This is because of the^ dependencies among kernels, based on shared scan chains, which makes the test tim e of a given kernel dependent on the rates at which the various scan chains are being operated. Given an arbitrary ordering of the kernels to be tested, a unique test schedule can be obtained by using a greedy scheduling approach to schedule: the tests. Based on this idea, an algorithm for obtaining an optim al schedule i s , presented in this section. It implicitly enumerates all orderings of the kernels and j I determines the ordering that leads to the schedule with the lowest test time. | 151 7.3.1 T erm in o lo g y We begin the discussion on the scheduling algorithm by presenting the term s used in describing schedules. A schedule is essentially a sequence of events th at take place over a period of time; we define two types of events, those associated with kernels j and those associated with scan chains. ! | i D efin ition A kernel event is a tuple (ts,td, K,cc) where ts is a start tim e unit , (representing a clock cycle) beginning at which tests are applied to kernel K € K S; these tests are applied for a duration of td clock cycles during which all the chains j in C h(K ) operate at a chain cycle cc > C C (K ). I The test for a given kernel K may be distributed over one or more kernel events. Thus for any given kernel event, td < T ( K ) • (cc + d + 1). > D efin ition A chain event is a tuple (ts,td,c,cc) where ts is a start tim e unit (representing a clock cycle) beginning at which chain c operates at a chain cycle cc for a duration of td clock cycles. i D efin ition A schedule is a pair (K E , C E ) where K E is a set of kernel events,: C E is a set of chain events, and the earliest kernel/chain event has start tim e 0 . The len gth of a schedule 5 , denoted by |5 |, is 1 I I 1 5 1 = max (ts + td). I ' (ts,td,K,cc)£KE ' I I The length of a schedule is the effective total test time in clock cycles. ; I A typical schedule for the circuit of Figure 7.1(a) is shown in Figure 7.3(a). Each horizontal bar alongside the name of a kernel represents a kernel event, and the , num ber adjacent to it is the duration of the event. Each range alongside the name of a chain represents a chain event, and the number in parentheses adjacent to it I is the cycle at which the chain is operated during this event. In this example there! are two kernel events, (0,850, A, 16) and (850,1360, C, 16), and two chain events, (0,1360, chainl, 16) and (850,510, chain2,16). This is not a complete schedule for| 152 the circuitTsince tHere is no test ior~B. Note that there are no inconsistencies in this schedule which could invalidate it, e.g., two incompatible kernels being scheduled to be tested simultaneously or the chains in C h (K ) being operated at different cycles or at a cycle lower than C C (K ) while the kernel K is being tested. Thus the scheduling problem is to construct a schedule such that the ap propriate number of test patterns is applied to each kernel and the length of the schedule is minimal. We first present a greedy solution approach in which the sched ule is grown in an incremental manner, adding tests for one kernel at a time, until all kernels are tested. Clearly the resulting schedule depends on the order in which the kernels are considered. An implicit enumeration algorithm is subsequently presented < for exploring all the possible useful orderings of the kernels under consideration. j t 7 .3 .2 In c r e m e n ta l S ch ed u lin g t ! Let S' be a partial schedule for testing a subset K S ' of the set of kernels K S . Sr/1 may or may not be an optimal schedule for K S 1. Given a kernel K K S ', we wish j to expand the schedule S', without disturbing any of the tests already scheduled, so th at the resulting schedule S" contains the test for K and IN") is minimal. The! problem is illustrated for the circuit of Figure 7.1(a) in Figure 7.3. Figure 7.3(a)' shows a partial schedule S' = (K E ', C E ') for the subset of kernels K S ' = { A , < 7 } . J K E ' contains the kernel events (0, 850, A, 16) and (850, 510, C, 16); C E ' contains the , chain events (0,1360, chainl, 16) and (850, 510, chainl, 16). The schedule is to be! expanded by including a test for kernel B. In incrementing the schedule, one of two possible scheduling disciplines can j be used. These are analogous to the two scheduling disciplines described in [28] for ] the BIST scheduling problem. The first discipline, scheduling without interruptions, 'is analogous to scheduling with run to completion [28]. It is presented here for completeness; however, it should be noted that running tests to completion is a stronger concern in BIST testing, where seeds and signatures need to be saved: temporarily, than in scan testing, where a test can be interrupted and continued after any interval without a large effect on the test control regime. Hence the second discipline, scheduling with interruptions, is more relevant in the context of scan testing. Scheduling W ith o u t Interruptions This discipline requires that every kernel i be tested over a contiguous tim e period. For example, consider the inclusion o f; the test for B in the schedule S' shown in Figure 7.3(a). B is compatible with j A , hence the test for B could begin at time 0, in parallel with the test for A, by j operating chain2 at a chain cycle of 8 . At this chain cycle the test would last for ■ 100 x ( 8 + 1) = 900 clock cycles. However, note that according to C E ' chain2 has , already been scheduled to operate at a cycle of 16 starting at tim e 850. Since the test would have to be interrupted in order for the two chains to be synchronized, this schedule must be rejected. Instead, to avoid the interruption, chain2 must begin i operating at a cycle of 16 beginning at tim e 0. This would ensure that the tw o ! chains are always perfectly synchronized. However, since B is now to be tested at a | chain cycle of 16 throughout, the test time for B increases to 100 x (16 + 1) — 1700, j which in fact dominates the entire schedule as shown in Figure 7.3(b). Scheduling W ith Interruptions If instead we perm it the test for B to be in- j terrupted, the first 94 test patterns for B could actually be applied at a chain cycle- of 8 (using chain2) at the same time that all 50 test patterns for A are applied at a J chain cycle of 16 (using chain 1). This is shown in Figure 7.3(c). The 94th test for B ends at tim e 846, while the 50th test for A ends at time 850. Hence chain2 must j be idle for 4 clock cycles before the two chains can synchronize and operate at the ( same chain cycle of 16. Once they are synchronized, the remaining 6 test p a tte rn s; for B can be applied over the next 102 clock cycles, in parallel with the complete i test for C. The resulting schedule S" is shown in Figure 7.3(c). Clearly, by allowing, the interruption in the schedule, the test time has been reduced. I 1 The details of the analysis described above are listed in the procedure firstfit | below, which adds a test for a single kernel K to an existing partial schedule S', I I resulting in an augmented schedule S". It is capable of scheduling both with and i without interruptions. The procedure essentially searches for an interval in S' during which the test set for K (or some portion of it, if interruptions are allowed) could i 154 ( a ) ( b ) ( c ) A - B C T i m e j - chainl chain2 A B C T i m e chainl * chain2 * (850) (510 ) t i i 1 3 6 0 C y c l e = 1 6 C y c l e = 1 6 j ( 8 5 0 ) ( 1 7 0 0 ) ( 5 1 0 ) 4 ' 8 5 1 1 3 6 0 ' 1 7 0 0 1 * C v c l e = 1 6 < -------------- C y c l e = 1 6 -----------> ( 8 5 0 ) ( 8 4 R ) C (§19)_____ T i m e I 8 4 6 ' ' 8 5 1 1 3 6 0 ' C v c l e = 1 6 . chainl * C v c l e = 8 < C y c l e = 1 6 y chain2 K Figure 7.3: Illustration of incremental scheduling: (a) partial schedule S', (b) incre m ented schedule S" without interruptions, (c) incremented schedule S" with inter ruption. 155 be applied without clashing with existing tests. A “clash” refers either to another kernel th at is incompatible with K and hence prevents K being tested at the same tim e, or to a set of chains that are already being operated at certain cycles which do not allow K to be tested at the same time. p roced ure firstfit (Kernel K , partial schedule S' = (K E C E ')): Returns schedule S" = (I<E",CE"). 1. T K * — T ( K ); (remaining number of patterns for K ) K E " <- KE'- CE" <- CE'- (initial schedule) fl < — 0 ; (tentative start time for test of K ) 2. If T K = 0 return S" = (K E ", CE")- else continue. 3. (Check for kernel clashes at fl.) If 3(fs', td', K ', cc') € K E ' such that: (i) K and K ' are incompatible, and (ii) (ts' < fl < ts' + td') then fl < — ts' + td'] go back to step 3; else continue. 4. (Check for chain clashes at fl.) Construct R C E , set of chain events that involve chains associated with K and overlap with the test for K: R C E <- {(ts',td',c',cc') € CE' \ d € C h(K ) and (ts' < fl < ts' + td')}. If R C E is empty, then cc < — CC(K)-, go to step 5. Else if there is a positive integer X > C C (K ) such that V(fs', fd', c', cc') G R C E , cc' = X , then cc < — X \ go to step 5. Else fl < — min(ts/i^-i C -|C C /)ejR C -jE;(fs/ + td')] go back to step 3. 5. (No clash exists at fl.) Determine the number of patterns that can be applied to K starting at fl as follows. (a) Tentative end-point of the current kernel event: t2 * — tl + T K x (cc + d + 1 ), where d is the depth of the original kernel resulting from the partial scan analysis. (b) (Check all kernel events.) V(ts',td',K',cc') < E KE': i. If [K and K ' are incompatible] and t l < ts' < t2 then t2 <- ts'. ii. If [interruptions are allowed] and [K and K ' are dependent] and ts' < tl < td' < t2 then (it might be possible to speed up the chain cycle after K ' test is completed at td') t2 <- td'. (c) (Check all chain events.) V(ts',td',c',cc') € CE', if tl < ts' < t2 and c' € C h ( K ) and cc' / cc then t2 < — ts'. (d) t l, <2 are now the end-points of the current interval in which tests for K can be applied. Determine how many patterns can be applied: T L * - L JS + tJ- (e) If [TL = 0] or [no interruptions are allowed and T L < T K \ then t l < — tl + m in (l,t2 ) + 1 (i.e., move tentative start tim e for test forward); go to step 3. 6 . Compute actual duration of kernel event: td < — T L x (cc + d + 1 ). j Add kernel and chain events for K to S": I<E" +- K E " U {(f 1, td, K, cc)}- | VcG Ch(I<), CE" <- C E "U {(tl,td,c,cc)}. 7. Number of test patterns remaining: T K <- T K - TL. | t l < — tl + td (tentative start tim e for rest of test). Go to step 2 . □ At the end of Step 4 a chain cycle cc has been decided for the kernel K ; depending on what chain events, if any, are in progress at the start tim e tl. If there are none, cc simply defaults to CC (K ). In Step 5, this chain cycle value is taken as 1 fixed; thus if any chain required for testing K has already been scheduled to run a t 1 I a different chain cycle, say X , during a later period, the test for K cannot overlap j with this period. However, a variation on the procedure listed above could allow the I chain cycle selected for K to be increased to X so as to allow it to overlap with the 1 157 subsequent chain event witli cycle X . Note that this would only be possible when no chain event was in progress at the start time tl and hence the default chain cycle of C C (K ) was selected in Step 4. The schedule example of Figure 7.3(b) is actually based on this variation. Given a fixed ordering of kernels, the procedure firstfit above can be used to j construct a schedule in incremental steps based on the ordering. This is done by the { procedure scheduleO rder listed below. The issue of determining a good ordering ' t is dealt with in the next section. J proced ure scheduleO rder (Sequence of kernels, K SE Q ): Returns complete schedule S = ( K E ,C E ). ' ' \ 1 . ! 2. \/K € K S E Q , in order: i 5 firstfit (If, 5). j 3. R etu rn S. □ Thus with both scheduling disciplines (i.e., with or without interruptions), j given an ordering of the kernels, a schedule can be constructed in an incremental greedy fashion by scheduling one kernel at a time. Clearly the order of considering j the kernels affects the overall test time. For example, in the example of Figure 7.3(b),' the fact that C was scheduled before B made it impossible to fit in the test for B | before the test for C without a change in the operating cycle of chain2. If instead j B had been scheduled first, the test for B could initially have been scheduled to run uninterrupted until time 900, with chain2 operating at a cycle of 8 ; the test for C could then have been applied over the next 510 clock cycles, resulting in a shorter schedule with length 1410. Since the schedule is clearly sensitive to the ordering of' the kernels, we present below an algorithm that implicitly enumerates all possible orderings while searching for the one that leads to the minimal-length schedule. 158 773.3 O p tim a l S ch ed u lin g In this section an algorithm schedule for generating a schedule for a set of kernels K S is presented. The algorithm implicitly enumerates all possible orderings of the kernels in K S , based on some default ordering K S E Q , and determines the ordering th at results in the shortest schedule. Note that it is the ordering that is actually optim al, and the associated schedule is optimal under the constraint th at 1 the scheduling process is carried out incrementally, one kernel at a time. Before listing the algorithm schedule, we will illustrate the scheduling pro cess using the example of Figure 7.1(a). For this circuit let the default ordering K S E Q be (A, B, C). Assume also that the scheduling discipline for this circuit allows interruptions in tests for kernels. The first action is to generate a schedule for the kernels in the default order. ' This can be seen in the search space diagram of Figure 7.4. Starting at the root node, the leftmost path traces the incremental evolution of the schedule as thej kernels are considered in the order A, B, C. At each node the sequence of kernels scheduled so far is listed along with the total test tim e for the partial schedule for these kernels. The schedule generated for the complete sequence (A , B, C ) is shown I in Figure 7.5(a), and the total test time at this state is 1410 clock cycles. This! schedule will serve as the current “best schedule” until a better schedule (if any) is found. I The algorithm now backtracks to generate the new states (A , C ) and (A, \ C, B ); the schedules generated at these states are shown in Figure 7.3(a) and (c), respectively. The latter schedule contains an interruption in the test for £?; however, the resulting test tim e is 1360, hence it replaces the current best schedule. The generation of new states, or branching, continues in the depth-first search order of the state space shown in Figure 7.4. At each state the current schedule is generated and its length is compared with that of the current best schedule. The' schedule at the leaf state (B, A, C) is identical to the one for (A , B , C ) shown in ’ Figure 7.5(a), with length 1410. At the next visited state (B , C ) the resulting! schedule has length 1410; this is greater than the current best schedule length 1360.■ 159 (A ,B) 900 (A,C) 1360 (B.A) 900 (A.B.C) 1410 (A.C.B) 1360 (B,A,C) 1410 (B.C) 1410 >< (B.C.A) (C.A) 1360 >< (C,B) 1140 (C.A.B) (C.B.A) 1360 Figure 7.4: Implicit enumeration search space for generating a schedule. j Hence this state is removed from further consideration, called pruning; in other j words, no further branching is carried out from this state. Instead, the algorithm backtracks to the preceding states in order to continue branching. Pruning also takes place at the state (C,A), at which the length of the partial schedule is 1360, and clearly any states generated by branching from this state must have a length at least equal to that of the current best schedule. The schedule generated for the final state (C, B, A) is shown in Figure 7.5(b).' The test tim e is 1360, which is equal to the length of the current best schedule. I r However, the new schedule has the advantage that there is no interruption in the j test for any kernel, hence it may be preferred as the best schedule. Thus the result j of the scheduling algorithm is the schedule of Figure 7.5(b) with test tim e 1360 clock | cycles. The details of the scheduling algorithm are presented in the procedure sch ed u le listed below. The procedure maintains a series of states STo, ..., S T {, where ST0 is the root state, STi is the current state, and others represent the interm ediate states 160 (a ) ( b ) A B C (850) , 1 5 1 2 ) ________ 1 S 5 2 L . 1 5 1 2 )______ Tim e ^ Chaim < - - - - - - - - - - - - - - - - - - - - - - - C y c ] e = J 6 - - - - - - - - - - - - - - - - - - - - - - - - - , chain2 < C V c l e = 1 6 > < C y d e ^ , ( 8 5 0 ) ( 9 0 0 ) ( 5 1 0 ) c T i m e r- - - - - C v c l e = 1 6 | I 1 8 5 0 ' ' 9 0 1 1 4 1 O ' C y c l e = 1 6 chain 1 <... .. C y c l e = 8 C y c l e = 1 6 chatn2 .... I I I I j I f Figure 7.5: Schedules generated by the search algorithm, (a) Schedule for (A, B, C ) as well as (B , A, C); (b) schedule for (C, B, A). in the path from ST q to 'STi in the search space. Each state STj consists of a tu ple (K S E Q j , S C H j , N K j ) where K S E Q j is the sequence of kernels currently being scheduled, SC H j is the corresponding partial schedule, and N K j is a sequence of kernels whose tests need to be added to the current schedule by considering them in some order so as to make the schedule complete. Each element of N K j is used for branching out to a new subtree of states. As the search proceeds, once an element has been used to create a new branch, it is removed from N K j. \ I p roced ure schedule (Sequence of kernels K S E Q ): Returns S C H , schedule for , the kernels in K S E Q . j 1 . (Generate initial root state STo) K S E Q o «- (); SC Ho = {< t> ,4> )\ N K 0 <- K S E Q . Initial value of state counter: i < — 0. Initial “best” test time: T T < — oo. I 2. (Check if “best schedule” needs to be updated) ; If i = \K SEQ \, i.e., all kernels are tested in the current schedule, then if ' {\SCHi\ < T T ) or (\SCHi\ = T T and SCHi has fewer interruptions than | S C H ), then S C H * - SCHi and T T « - \SCHi\. i 3. (Backtrack if necessary) If N K i = (), i.e., no more branching is possible from this state, then ; determ ine the highest value of j if one exists, 0 < j < i, such th at N K j ^ (). If there is no such j (search is completed) return SCH ; Else (make STj the current state) i * — j. j 4. (Branch out from state STi) K l * — first element of NKi; remove K 1 from N K i. 1 (Generate STi+1) ; K S E Q i + 1 < — K S E Q i with K l appended; ‘ N K i +i < — sequence of kernels that are in K S E Q but not in KSEQ i+ i, in : order of occurrence in K S E Q . i (Generate new schedule by adding the new kernel to the old schedule) ; S C H i+1 <- firstfit(Kl, SCHi). ■ (Prune if possible) | If ISC ^h-iI > T T then N K i+1 <- (). I 162 5 . (Move to the newly-created state) i * — 2 + 1 ; Go to step 2 . □ The search tree generated for the circuit of Figure 7.1(a) based on a schedul ing discipline that allows interruptions is shown in Figure 7.4. At each node, the procedure schedule simply invokes firstfit to build a new schedule in an incremen tal manner. Thus the scheduling discipline in schedule is the same as that enforced in firstfit. 1 I j 7 .3 .4 D isc u ssio n For the circuit of Figure 7.1, if the three kernels were tested one after another without overlapping, the overall test time would be the sum of the independent test times ' for A (50 x 17 = 850), B (100 x 9 = 900), and C (30 x 17 = 510), which is 2260 clock j I cycles. One of the optimal schedules generated by the procedure schedu le is shown ! in Figure 7.5(b); the total test time is 1360 clock cycles. This result is influenced by certain characteristics of the circuit design which are discussed below. | The configuration of the scan chains in this example ensures th at kernels A ! and B are independent, i.e., can be tested “out of sync” with each other. This j feature allows B to be tested relatively quickly by using the shorter scan chain! chain2, without having to be in sync with the longer scan chain chain 1 used for- testing A and C. Clearly a different configuration of scan chains, say with half the! width of R l moved out of chain1 into chain2 so as to equalize the chain lengths,. ! might degrade the solution. In this situation the test for B would dom inate the test schedule, and the chains would both have length 2 0 ; the resulting test tim e would be 1 0 0 x ( 2 0 + 1 ) = 2 1 0 0 clock cycles, ignoring the flushing tim e for the scan chain' at the end of the test. I l j Another feature that influences the test schedule is the m anner in which the, I logic under test in the circuit is clustered into kernels. Different ways of forming the, subkernels by partitioning would lead to different schedules. Consider for exam ple' the unpartitioned kernel K consisting of A, B, C along with M , and assume that, I ! I 163' K is to be treated as a single kernel for the purposes of test generation and test application. K would involve a somewhat higher effort in test generation than the subdivided kernels. The resulting test set for K would probably have at least 100 patterns. W ith two chains of equal test length, the test tim e would be at least 100 x (20 + 1) = 2100 clock cycles. Thus in this example subdividing the kernel is clearly beneficial with respect to both test generation and test application. In other cases, however, it is possible that subdividing the kernel may lead to a higher test time, implying a tradeoff between the costs of test generation and test application. In general it is difficult to predict a priori, during kernel formation, the influence! of different kernel partitioning strategies on these two measures. It is hoped that the ideas presented in this scheduling study may be useful in future research dealing ^ with the development of estimators to guide the partitioning process itself. j i I 7.4 Summary ! I Chapter 6 showed how the maximal kernel can be partitioned into subkernels for the j purpose of test. The subdivision of the kernel gives rise to the problem of scheduling the tests for the various kernels so as to minimize the total test time. In this chapter we have assumed that the assignment of scan registers to scan chains and the ordering j of registers in the scan chains have already been decided. Based on this assumption,! the various relationships among kernels with respect to test scheduling have been studied, and an implicit enumeration algorithm for obtaining an efficient schedule' i has been presented. It is clear that there is a strong interrelationship between th e 1 problem of constructing the single/multiple scan chain(s) and the test scheduling! problem studied above. The scan path chaining problem is tackled in Chapter 8 . i Chapter 8 Scan Path Chaining i i “When no exact solution to a difficult problem can be found, look fo r ! either an approximate solution to the complete problem, or ■ an exact solution to a simplified form of the problem. ” —Source unknown 8.1 The Chaining Problem i i The scan design approach can greatly reduce the cost of test pattern generation for | general sequential circuits. However, it constrains test data (both input patterns and output results) to be shifted in and out of the circuit in a manner, typically through a single serial scan path. Usually a large proportion of the test tim e is taken up i n ' this shifting process while only a small proportion is used for actually propagating' test data through the circuitry under test. As circuit sizes increase, the amount of | test d ata passing through the scan-in and scan-out pins increases correspondingly, | leading to test application times that may be unacceptably high. I The high test tim e caused by the need for serial shifting of test data can be | reduced in two ways. The first is to distribute the scan registers among a set of! m ultiple scan chains that operate in parallel with their own scan-in and scan-out pins. Note that during the process of scanning test data, all pins that do not serve as scan-in or scan-out pins are idle. Hence by using multiplexers, some or all of th e ! 165 system I/O pins can be made to serve as scan-in or scan-out pins while test data is being scanned. Additional unused pins available on a chip package can also be used for this purpose. In this way the rate of test data entering/leaving the circuit can be maximized. The second approach to reducing test time is to order the registers in a scan chain such that registers that need to be accessed more frequently are closer to the scan-in/scan-out pins. This reduces the average tim e for applying tests to the circuit. In general the scan path chaining problem can be stated as follows: given i a circuit, the test length (number of test vectors) for each kernel, the set of scan j registers (each having one or more FFs), and the number of scan chains that can \ be used, (1) determine an assignment of scan registers to chains, and (2 ) determine j the ordering of the registers in each chain, such that the test time for the resulting | design is minimized, and constraints on scan path routing area and/or test control 1 complexity (if any) are satisfied. In this chapter some studies of the scan path chaining problem are carried out with a view to understanding the nature of the problem and its interaction with the test scheduling problem studied in Chapter 7. Recognizing the difficulty of the general chaining problem, in this chapter we examine two special restricted cases for which accurate solutions can be found more j easily. Both involve designs in which all kernels are compatible with each o th e r,! which we will call fully com patible designs. In Section 8.2 we describe three canonical test application schemes for such designs. In the following two sections we| consider two cases of the chaining problem for fully compatible designs; Section 8.3 i I deals with the single chain problem, and Section 8.4 deals with a specified number of m ultiple chains for a restricted class of circuits, namely those with two kernels.; The latter study brings out an im portant fact, namely that an optimal configuration' of m ultiple chains may actually require the chains to have unequal lengths. In both 1 problems we assume that the objective is to minimize test tim e without regard to | I routing area or control complexity. This chapter essentially serves to open up a Pandora’s box of scan path chaining problems. By providing a glimpse of a couple of the problems, and showing how they relate to the test scheduling problem, it will hopefully serve as a precursor to a much more comprehensive analysis of this difficult problem. I 166 8.2 Test Application in Fully Compatible Designs In a fully compatible design, any arbitrary subset of kernels can be tested simultaneously without conflicts. An example of a fully compatible design is a full scan design in which every cloud of connected combinational logic (as defined in Chapter 3) forms a separate kernel. Another example is a partial scan design in which the only form of partitioning used to decompose the kernel is size-based ! partitioning. Note that in a partial scan design with no partitioning, there is only i one kernel and the design is obviously fully compatible. j As we have already mentioned, the main difficulty in the general chaining j problem is its close relationship with the general test scheduling problem, which I was dealt with in Chapter 7. However, in the case of fully compatible designs ; the optimal scheduling of tests is much more straightforward than in the general I scheduling problem. This makes the chaining problem for fully compatible designs | somewhat manageable, and is the motivation for the case studies in this chapter. ■ ' i We begin by examining the simplified test scheduling problem for fully com patible designs given scan chains that have already been constructed. There are three ways in which tests can be applied to the kernels in such a circuit. These are [ called the combined, separate and overlapped schemes, respectively. These schemes vary in control complexity and in relative test application time. Each results in a distinct schedule for testing the various kernels. In discussing the schemes we will use the following notation, mostly borrowed from Chapter 7. Let K S be the set of kernels to be tested, { K \ , . .., K n}. Ch is the set of chains; Ch(Ki) is the set of chains involved in testing K{. Given a set of kernels K C C K S , we define the drive cycle D C (K C ), the receive cycle R C (K C ), and the chain cycle C C (K C ) of a set of kernels exactly as in Chapter 7. ; 8.2.1 C o m b in ed T est The combined test scheme is the traditional scheme used in applying tests to scan- < based circuits, in which tests are applied to all the kernels simultaneously. Note that I 167 S I . f so FFs) FFs) [2 0 vec.] [ 1 0 0 vec.] m R 2 Figure 8.1: Circuit example for illustrating different test schemes. no scan register can drive more than one kernel and no scan register can receive from ] more than one kernel; otherwise there would be incompatibilities among the kernels.! Thus it is possible to merge the test sets for the kernels into a single test pattern j I set of length T — m a x ^ €x s T(Ki). To apply each test, the scan chain is clocked in the S H IF T mode L times, where L is the maximum length (i.e., maxim um num ber1 of FFs) among the scan chains in Ch. Thus the total test time in clock cycles is ; T T c o m — T x (T -f d 4 -1) + L, where d is the maximal kernel depth as defined in Chapter 7, and the last L clock cycles serve to flush out the scan chain(s) after the last pattern has been applied, j Consider the circuit in Figure 8 .1 , in which A and B are the two kernels, both j combinational. Both scan registers R l and R2 are 8 bits wide and are connected into a single scan chain. A has 20 test vectors and B has 100. Using the combined test scheme, in effect A and B are treated as a single kernel with 100 test patterns. ! Each pattern is applied by shifting it into the entire scan chain over 16 clock cycles \ and then loading the test result into the scan chain in the next clock cycle. The j total test tim e is 100 x 17 + 16 = 1716 clock cycles. j In the combined test scheme, the control and clocking schemes are relatively simple since the same sequence of operations on the scan path (i.e., shift L times, hold d times, load once) is repeated throughout the entire test. Another feature i I 168 is that the test time is independent of the ordering of registers in the scan chains. Thus in the case of a single scan chain, the routing of the scan path can be carried out early in the physical design stage even if estim ates of the test lengths of the kernels are not yet available. However, the practice of flushing all chains to apply each test vector can be wasteful since not all scan registers may need to be accessed every time. The following two schemes address this issue at the expense of more complicated control and clocking. 8 .2 .2 S ep a ra te T est i In the separate test scheme, every kernel K{ is tested in its own private test session TSi during which no other kernels are tested. The motivation for this scheme is th at if the test lengths of the kernels have wide differences, as for example in the circuit of Figure 8.1, then it is wasteful to combine the tests for the kernels and to [ flush the entire scan chain for each test. Instead, the various kernels can be tested independently, and in each session TSi a minimal amount of shifting, i.e., CC(Ki) clock cycles per test, can be carried out to apply tests to the current kernel A ',-. Thus i the test tim e per session is TTsep{Ki) — T (K i ) x (C C (K { ) -f- d + 1 ), and the overall, test tim e is * TTsep = £ TTsEp(Ki) + max CC(Ki). \ K , e K S I The second term in the expression above represents the extra clock cycles used up j i in flushing out portions of the scan chains between sessions. If the kernels are tested | in order of increasing test length, all this flushing takes place at the end of the last session. It is clear from the discussion above that the separate scheme requires more! complex control than the combined scheme, since there are m ultiple test sessions with different chain cycles. In the circuit of Figure 8.1, A can be tested in the first test session and B in the second. The chain cycle CC(A) is 16, hence the test tim e for A is 20 x 17 = j 340. In the second session, due to the ordering of the scan chain shown, the chain j cycle C C ( B ) is 8 , hence the test time is 100 x 9 = 900. The overall test tim e is j 340 + 900 + 16 = 1256. In this example TTsep is lower than TTcom• Note that the 169 ordering of the scan chain in this case is favorable to separate testing. In some cases, however, the test time for separate testing may be higher than that for combined testing even with the most favorable scan chain ordering. 8 .2 .3 O v erla p p ed T est In the overlapped test scheme, the kernels in K S are ordered according to nonde creasing test length. Let K S E Q = (Ad, Ad, . ■., I\ n) be the sequence of kernels in K S ordered in this manner; i.e., T(K \) < T(Ad) < < T ( K n). Consider the circuit of Figure 8.1, for which K S E Q = (A, B). The test is carried out in two sessions T S \ yT S 2 • In T S \ , tests are applied to both A and B simultaneously until the 20 test vectors for A are exhausted. During this session test patterns must be scanned into/out of both R l and R2, and the scan chain is operated at a cycle C C (A , B ) = 16. Hence the test tim e in TSi is 20 x 17 = 340. In T S 2 , the remaining 80 test patterns for B are applied in a separate manner. Now the chain can be operated at a cycle of C C (B ) = 8 , and the session test time is 80 x 9 = 720. The overall test tim e is 340 + 720 + 16 = 1076, including the flushing time for the scan chain. In general, given the sequence of n kernels K S E Q , the following procedure can be used to apply the tests using the overlapped scheme: (Session T S\) Apply T{K\) tests to all kernels A d,... , K n. For i — 2 to n do: { (Session T S i ) Apply T{Ki) — T (K i-i) test patterns to kernels A!,-,. . . , K n. } Let T T o vl{TS{) denote the test time for test session TSi. Thus for the first session, T T qvl(T S i ) = T(I<i) x (C C (K i, ... , K n) A d + 1 ). For all other sessions, 170 T T ^ ( T 'S l) ^ ( T { l< 'i ) ~ T { 'R ~ ; ) ) x (C C \K U . . . , K n) + d + 1 ) . T h e o v e r a l l t e s t t i m e i s T T o v l = J 2 t t o v l ( T S % ) + C C (K u . . . , K n), where the second term represents the total flushing time. 8 .2 .4 C o m p a riso n The overlapped test scheme is the most efficient of the three schemes, i.e., for all circuits, T T o v l < T T c o m and T T o v l < T T s e p - This is proved formally in the two theorems below. T h eo r e m 5 T T o v l < T T c o m - P r o o f T T o v l = J 2 T T o v L ( T S i ) + C C { K U ..., K n ) i = 1 = T ( K 1 ) - ( C C ( K 1 , . . . , K n) + d + 1) + E ( r ( A ' i ) - T(Ki-i)) ■ (CC (K i,. . . , K„) + d + 1) i = 2 + C C ( K u . . . , K n ). For all », (CC{I<l . . ., I<n) + d + 1 ) < (C C {K U ..., K n) + d + 1 ); hence T T o v l < (C C {K U ... , K n) + d + 1 ) T(Ah) + E ( T ( ^ ) - W - i ) ) i=2 = ( C C ( # 1 , . . . , / C ) + d + l ) . T ( A ' n ). T h u s T T o v l < T T Co m - □ T h e o r e m 6 T T q v l < T T s e p - P ro o f T T ovl = J ^ T T o v U T S i) + C C ( K u . . . , K n) i= 1 = T ( K l ) - ( C C ( K 1 ,...,I< n) + d + 1) + T>(T(Ki) - T(K i-i)) ■ (CC(Ki, . . . , K n) + d + 1 ) i= 2 Using the definition of C C (K i,..., A„) from Section 7.2.3, we have CC{I<i, ..., I<n) + d + 1 = max [ C C ^ ) ] + d + 1 i < j < n = max [CC'(A^) + d + 1] i < j < n < [(7C(A^) + d -{-1] + [C*(7(Aj--(.i) - + - d + 1 ] + • • • + [C(7(An) + d + 1 ]. =$■ T T o v l < T { K X) • ([C'C'(ATi) + d + 1] + • • • + [C C (A n) + d + 1]) n + E j) - • ([c c 'C a o + d + 1] + • • • + [ c c c / o + d + 1]) i= 2 + a c ' ( / ^ , . . . , A n). I On the right hand side, collect terms containing [CC(Kj) + d + 1] for each j . This j results in j i T T o v l < T {K X) ■ [CC{IU) + d + 1] + T{I<2) • [CC(K2) + d + 1 ] + j • • • + T T ( K n) • [CC(Kn) + d + 1 ] + CC(I<U ...,I< n) n = E T (K i) • (CC(Ki) + d + 1) + C C C A T i ,. .. , K n) 2 = 1 n = J 2 T T SEp(I<i) + C C ( K u . . . , K n). 2 = 1 T h u s T T q v l < T T s e p - □ Thus we have shown that the overlapped test scheme always leads to a lower test tim e than the other two schemes in all circuits, irrespective of the scan chain organization. However, the combined scheme has two advantages over the overlapped and separate schemes: ( 1 ) test control and clocking are simpler, and (2 ) the ordering of registers in the scan chain is not im portant, which implies that the scan path routing can be carried out purely on geometric considerations and without referring to the test lengths of various kernels. Note that there is no situation in which the separate scheme is clearly superior; it has approximately the same control complexity as the overlapped scheme, yet it is not as efficient. Although we have described all three test schemes for completeness, in the remaining discussions on the chaining problem for fully compatible circuits, our attention will be focussed mostly on the combined and overlapped schemes. Two case studies of the chaining problem for fully compatible circuits are presented in the following sections. In each we consider a constrained version of the problem. The first deals with constructing a single scan chain for a fully compatible design. For this problem we assume the overlapped test scheme, and present a heuristic whose output includes a confidence level indicating the degree of optim ality of the solution. The second study deals with constructing m ultiple scan chains for fully compatible designs. Because the problem is highly complex, we consider only the two-kernel case and develop an optimal solution method. I 8.3 Single Chain in Fully Compatible Design i In this section we consider the problem of constructing a single scan chain for a fully compatible circuit such that the test time, based on the overlapped test scheme, is I minimal. We begin by defining some terminology. Let the circuit have N flip-flops (FFs) F S = {F\, F%,..., Fjv}. Note that registers m ust be broken down into individual FFs; this is because an optim al order ing of the scan chain may require the FFs in a scan register to be in different parts of the chain. The scan chain to be constructed can be represented by a sequence of N slo ts numbered 1 , 2 , N to which scan FFs are to be assigned. The first 1731 [2 0 vec.; [100 vec.] [30 vec.] Figure 8 .2 : Example of single scan path chaining problem. i I I slot represents the FF closest to the scan-in pin and the last slot represents the FF ! closest to the scan-out pin. The chaining problem is to assign one FF to each slot, i.e., to construct a p o sitio n m a p p in g P : F S — ► [1, iVJ, where the notation [x,y] represents the set {integers i \ x < i < y}. i The idea used in our approach is to determine, for every FF Fj , a range of slots Range(Fj) = [low(Fj) 1 high(Fj)] in which it may be placed in order for the chain ordering to be optimal. We will show that given this information it is simple to determ ine (based on the well-known perfect matching problem) an optim al solution, i.e., a mapping P that satisfies the range constraints, if such a solution exists. i i 1 8.3.1 F lip -F lo p P o sitio n R a n g es i Consider the circuit in Figure 8.2. There are three kernels, A, B, C, with test lengths 20, 30, 100, respectively, and the sequence of kernels in order of nondecreasing te s t; length is K S E Q = (A ,B ,C ). All the five registers shown, R1 ... i?5, are to be connected in a single scan chain so as to minimize the test time under the overlapped I scheme. For simplicity we assume that every register has exactly one FF. Since! K S E Q has three kernels, the test for the circuit is organized into three sessions T S i,T S ' 2 ,T S 3 as described in Section 8.2.3. 174! Let KS{ denote the set of kernels that are tested in session TSi- Thus K S i = {A, B, C}, K S 2 = { # , C}, and K S 3 = C. In a fully compatible circuit, each scan FF can serve to drive test patterns into exactly one kernel and to receive test results from exactly one kernel. In a given session T Si , depending on what subset of kernels are being tested, each FF may or may not be involved in driving test patterns, and may or may not be involved in receiving test results. Thus we can partition the set of scan FFs into four sets according to what roles they play in test session TSi. The idle set is the set of scan FFs that neither drive nor receive from any kernel in KSi\ in other words, they play no role in this session. The driver set Di (receiver set R i) is the set of scan FFs that drive (receive from) some kernel in I\ Si. Finally, ' the driver-receiver set Ci is the set of scan FFs that drive some kernel in K Si as well as receive from some kernel in KSi. Based on these four sets we define q = |/,-|, di = \Di\, ri = |i?j|, and c; = |C ;|.. For example, in session T S 3 only C is under test; thus D 3 = {R2,R4}] R 3 = {i?5}; and C3 = < f). Since there are two driver FFs and one receiver FF, it means th at every time a test is applied in session T S 3 , two bits of useful data must be shifted into the scan chain and at least one bit of useful data must be shifted out. It is clear that irrespective of the ordering of the scan chain, at least two shift operations are required to achieve this. We refer to this number as the minimum session cycle, defined as follows. i D efin ition In a given test session TSi, the m inim um session cycle M SC i is j the minimum number of clock cycles, over all possible orderings of the scan chain, th at are required to shift in a new test pattern while shifting out the result of the I previous test. In other words, the minimum session cycle M SCi is the lowest possible chain ' ■ cycle th at could be used in session TSi assuming that the chain could be ordered 1 1 solely so as to minimize this chain cycle. In our example M S C 3 — 2, and the general rule is stated below: If Ci = < fr then M SCi — max(d8 , r*). (8 -1 ) 175 ‘ There may be a large number of possible ways to position the non-idle scan FFs in the scan chain so as to achieve the minimum session cycle for this session. One such assignment is P(R2) = 1 , P(f?4) = 2 , and P(R5) = 4; another is obtained by exchanging the positions of R2 and R4. In fact any assignment such th at all driving FFs lie in the first two slots and all receiving FFs lie in the last two slots must satisfy the minimum session cycle. The positioning of FFs th at are idle in this session is imm aterial since they do not influence the test time in this session. In general for every FF Fj, if Fj serves as a driver it is desirable to position it within the first M S C i slots in the chain; and if it serves as a receiver it is desirable to position it within the last M SC i slots in the chain. Thus we can compute Rangei, the range of positions to which Fj may be assigned so as to minimize the test time in session TSi, as follows: RangefiFj ) = [ 1 [ 1 [ N — M SC i + 1 [ N - M S C i + 1 N } if Fj € h M SC i ] if F} € Di N ] if F} € Ri M SC i ] if Fj e Ci (8.2) Note that FFs in the driver-receiver set Ci must lie in the middle portion of the chain; this will be elaborated on shortly. Thus for session T S 3 we have Range 3 (R2 ) = Range 3 (RA) = [1,2]; Range 3 (R5) = [4,5]; and Range 3 (Rl) = Range 3 (R3) = [1,5].| These ranges are illustrated in Figure 8.3(a). Now consider session TS?. We have D 3 = {i?2}, i ?2 = < j> , and C2 = {7?4, R5}. j Once again we wish to compute the minimum session cycle, i.e., the lowest possi- ! ble chain cycle among all orderings of the chain. Clearly this can be achieved by placing R2 nearest to the scan-in pin, placing i?4 and i?5 close to the middle of the chain, and placing the remaining idle FFs arbitrarily. For example, the ordering (i?2, — , RA, R5, — ) (where ” represents some idle FF) leads to a minimal chain cycle, i.e., 4. In this case it is not possible to have a chain cycle lower than 4 with j any ordering, hence the minimum session cycle must be 4. Below we show how to j formally derive this value. . In general, to compute the minimum session cycle M SC i when Ci is n o n -, empty, we can construct a hypothetical chain that is optimal for the current session I 176 R1 R1 R 2 R 2 R 3 R 3 R 4 R 4 R 5 R 5 1 1 I I I I I I I I I I 1 2 3 4 5 R a n g e 3 (a) R1 1 2 3 R a n g e 2 (b) R1 4 5 R 2 R 2 R 3 R 3 R 4 R 4 I I R 5 I I I I I I ................. ... I I R 5 I I 1 2 3 4 5 R a n g e 1 (c) 1 2 3 R a n g e (d) 4 5 Figure 8.3: Ranges for placement of scan FFs. (a) For session TS3, (b) for session 1 T £ 2, ( c ) for session TS\, (d) combined ranges. ! 177 [as follows, Ideally the c4 driver-receiver FFs in Ci must be placed in the middle of the scan chain, and the remaining N — C i FFs (including the FFs in Di and Ri) must be distributed equally on either side. However, depending on the values of d, and r8 , the hypothetical chain may not have the Ci FFs in the exact middle. C ase 1 If both r,- and di are less than or equal to [(TV — ct -)/2], then the c; FFs in Ci must be placed in a group in the middle of the chain, with all the Di FFs I preceding this group and all the Ri FFs following this group in the chain. This is j illustrated in Figure 8.4(a). Thus the minimum session cycle is \(N — q )/2 ] + c,. C ase 2 If di > \(N — c,-)/2], then the chain must have the Di FFs in the first d{ slots, followed by the C8 FFs in the next c; slots. This is illustrated schematically in Figure 8.4(b). Thus the drive cycle for the current session must be di + Ci. Note that there must be a valid assignment of the Ri FFs in the remaining slots. Clearly the group of driver-receiver FFs Ci is skewed from the middle of the chain towards the scan-out pin, hence the receive cycle cannot be greater than the drive cycle. Thus the chain cycle for this configuration, which corresponds to the minimum session cycle, is d{ + c,-. C ase 3 If ri > \(N — c,-)/2], then by an argument analogous to th at in Case 2, the minimum session cycle is r; + c;. This case is illustrated in Figure 8.4(c). Combining the above cases, we get the following rule for the minimum session cycle for session TSi when Ci is nonempty: [ If Ci ^ 4> then M SC i = max ^d;, r,-, |” — -—- ^ + c,. (8-3) ' i Together, equations 8.1 and 8.3 provide a complete rule for determining the , t minimum session cycle in all cases. These equations can be used to compute M S C i 1 for all sessions TSi. Based on these computed values, equation 8.2 can be used to j determine the values Range^Fj) for all FFs Fj. \ 178 Dj Ri FF FF SCAN IN MID POINT SCAN OUT (a) Di Ci Ri FF FF FF FF FF FF FF FF SCAN IN MID POINT SCAN OUT (b) Di Ri FF FF FF FF FF FF FF FF SCAN IN H h MID POINT SCAN OUT (c) Figure 8.4: Illustration of computation of minimum session cycle, (a) Case 1, (b) I case 2, (c) case 3. I Returning to the current example, we have M S C 2 = 4, and the corresponding ranges for the FFs for optimizing session T S 2 are shown in Figure 8.3(b). Finally, in the remaining session TS\, we have D\ = {RI}, Ri = < j> , and C\ = {R 2 , R3, R4, R5}. This results in a minimum session count M S C \ = 5. Since this is equal to the length of the chain, the ranges for all FFs are [1, 5] in this session, as shown in Figure 8.3(c). Thus for each test session, not only have we shown how to determine some ordering of the scan chain that minimizes the test time in th at session, but in fact we have characterized all possible orderings that achieve the minimum session time, in the form of the ranges for all FFs as shown in Figures 8.3(a,b,c). Given any particular ordering of the FFs, it is easy to check whether or not it satisfies the optim al ranges for a given session. For example, the ordering (R I, R4, R 2 , R5, R3) satisfies the optimal ranges for T S 2 and also for TS\, but not for T S 3 . Given the | i ranges for the various sessions, the question arises: is there any ordering th at satisfies ! the optimal ranges for all the sessions? Figure 8.3(d) shows the result of taking the intersection, for each FF, of the j ranges in Figures 8.3(a,b,c). The computation of these ranges is shown below. ! VF) E FS, Range(Fj) = p) Rangei(Fj). j 1<;<|A'S£Q| i I We will refer to Range(Fj) as the ideal ran g e of Fj. Note that in this example all the ideal ranges are nonempty. (It is possible that some ideal ranges may be empty; this situation will be discussed in Section 8.3.3.) From the m anner in which the various ranges have been derived, it follows that if any ordering of the chain satisfies the ideal ranges, then that ordering must achieve the lowest possible test tim e fo r, every session, and consequently the lowest possible test time for the entire test. This - result is stated in the following theorem. ] l i T h e o re m 7 Given a position mapping P : F S — > [1,?V], and given VFj E F S , j Range(Fj) ^ 4>, the ordering. of the scan chain corresponding to P is optimal (i.e., j leads to minimum test time using the overlapped scheme) if'iF j E F S , P(Fj) €j Range(Fj). F I j j ’ We will refer to any ordering of the chain that satisfies all the icleal ranges as an ideal ordering or an ideal solution. The result of Theorem 7 will be utilized in the next section to help in solving the chaining problem. 8 .3 .2 S in g le C h ain A lg o rith m We present in this section a heuristic procedure, called singleC hain, that returns an ideal ordering if one exists, otherwise returns an ordering in which a maximal num ber of FFs lie in their ideal ranges. If an ideal ordering exists, then according to Theorem 7 the overall test time for the circuit is minimized, i.e., the ordering is optimal. If however not all FFs lie in their ideal ranges, then the ordering returned by the procedure may or may not be optimal, i.e., there may exist another ordering not found by this procedure that actually leads to a lower test tim e using the overlapped ! test scheme. In the latter case there is no obvious way to determine whether or not jthe solution returned is optimal. However, the procedure computes a confidence jvalue, which is the fraction of the FFs that lie in their respective ideal ranges. This value indicates the degree of confidence in the optim ality of the solution. A j confidence value 1 indicates that the solution is definitely optimal. The procedure singleC hain maps the problem of finding an ideal ordering ! to the well-known maximum-size bipartite matching problem [23] [Chapter 9]. Given a bipartite graph (V, 1 7 , A) where V and U are two sets of nodes and A C V x U is a set of arcs, the maximum-size bipartite matching problem is to find a maximum- J cardinality set M C A such that for any two arcs a,b € M , a and b have no node ■ in common. This problem can be solved in 0(y/n • e) time by m apping it to the , maximum network flow problem [23]. If the resulting matching is a perfect matching, ( I i.e., \M\ = |V| = \U\, the corresponding ordering of the FFs is an ideal solution, and ' i a confidence value of 1.0 is returned. If not, the m atch M yields a partial placement ; of the FFs in the chain. In this case the procedure places the remaining FFs in the j vacant slots, each as close as possible to its ideal range. This is carried out in a greedy fashion by starting with FFs with empty or small ranges (i.e., more strongly constrained FFs) and proceeding to FFs with larger ranges. The confidence value i I 181 | [returned is \M \/N , where \M\ is the number of FFs lying within their ideal ranges and N is the total number of FFs. proced ure singleC hain (Number of FFs N ; set of FFs F S m , ideal ranges Range(Fj) for all Fj G FS): Returns P, a mapping of FFs to [1, i\T], and a confidence value c, 0 < c < 1 . 1 . Construct bipartite graph B = (V, U, A), where V, U are two sets of nodes and A is the set of arcs, as follows. V < - U ^ [ l , N ] \ A < — /) € V x U | I G Range(Fj)}. 2. O btain M C A, the maximum-size bipartite m atch of B. 3. S R C <- {j G [1,N] | (j,l) e M}; D S T < — {/ G [1,JV] | (j,/) € M }. 4. Determine positions for FFs that are placed within their ideal ranges: For each j G S R C do: J l (a) Determine (j, /), the arc in M that is connected to node j in B. (b) P(Fj) «- (. I 5. If the solution is ideal, i.e., \M\ = N, then return P along with c = 1.0. j 6 . Determine positions for FFs th at cannot be placed within their ideal ranges: F S E Q « — sequence of FFs Fj G (F S — SRC), ordered according to increasing size of Range(Fj). For each Fj G F S E Q in order do: (a) Let p be the integer in the set of integers ([1, N] — D S T ) that is nearest to the middle of Range(Fj) if Range(Fj) ^ < f > , or nearest to the value (N + l)/2 otherwise. ' (b) P(Fj) <- p. (c) D S T «- D S T U { p } . 7. The function P is now defined for all FFs in the domain F S , and represents an ordering of the FFs such that a maximum number of FFs lie within their ideal ranges. Confidence level, c < — R etu rn P along with c. □ 182 [20 v ec [100 vec.] [30 vec.] R 4 m R 5 Figure 8.5: Example of single scan chain ordering. The actual ordering of FFs returned by the algorithm depends on the imple m entation of the maximal-size bipartite matching algorithm. One possible solution ■ for the circuit of Figure 8.2 is indicated in Figure 8.3(d), where the thickened portion of each range represents the slot in which the corresponding FF is placed. Note that this is a perfect matching of FFs to slots, i.e., the solution is ideal. The scan path for this ordering, (R2, R4, R3, Rb, RI), is shown in Figure 8.5. 8 .3 .3 N o n -Id e a l S o lu tio n s I In the above discussions we used an example for which an ideal solution exists, i.e., | ( the procedure returns an optimal solution with confidence level 1.0. There are two ' I possible cases in which no ideal solution can be found. (1) One or more of the ideal i ranges for the FFs is/are empty; (2) the ideal ranges for the FFs are all nonempty, but no perfect matching of FFs to chain slots exists. For completeness, we present below an example of each of these two cases. 183 [5 vec.l \ 2 0 vec.l Figure 8 .6 : Example of single scan path chaining with empty ideal range for a register. E xam ple 1 : E m pty R ange Figure 8 . 6 shows a circuit example with two combi national kernels, A (requiring 5 test vectors) and B (requiring 20 test vectors). Each register consists of a single FF. This circuit has K S E Q = (A, B). It is tested in two sessions, T S\ and T S 2 , with minimum session cycles M S C i = 2 and M S C 2 = 1- For j the FF R2 this results in Range\(R2 ) = [2,2] and Range2 (R 2 ) = [1,1]. Hence the I ideal range Range(R2) is empty, i.e., < j> . However, the ideal ranges for the other two 1 FFs are nonempty; Range(Rl) — [1,1] and Range(R3) = [3,3]. Hence the chaining J procedure first places RI in slot 1 and R3 in slot 3. Subsequently R2 is placed in ; the remaining slot 2 . Thus the ordering generated by singleC hain is (Rl, R2, R3), and the confidence level is |. The confidence value lower than 1.0 implies that the ordering may or may not be optimal; in our example the solution is actually nonoptimal. For the solution (i?l, R 2 , R3), the total test time using the overlapped scheme is 62 clock cycles. Now consider another ordering (R2, Rl, R3). The total test tim e in this case is 53 clock ( cycles, which indicates that the solution obtained by singleC hain is not optimal. | Consider an interesting variation on the example in which A requires 15 test patterns instead of 5. Since the relative test lengths of A and B are unchanged, K S E Q is still (A, B). The procedure singleC hain returns the same non-ideal solution (Rl, R2, R3), for which the total test time is again 62 clock cycles. W ith the alternative ordering (R2, R l, R3), however, the test tim e is now 73 clock cycles; this implies that the solution from the procedure could actually be optimal. ______________________________________________ _ _ 184 | [20 vec. [50 vec.] [100 vec.] Figure 8.7: Example of single scan path chaining with no perfect matching. This variation on the example highlights a weakness in the procedure sin- gleC hain, in that it does not consider the actual test lengths of the various kernels, only the relative test lengths, in computing an ordering. If an ideal ordering exists, jthen the actual test lengths are irrelevant since the ordering must be optim al. W hen ■ ■ n o ideal ordering exists, however, the solution may suffer from the lack of consid- jeration of the actual test lengths. The issue of taking the exact test lengths into account when constructing a non-ideal solution is an open problem requiring further J analysis. j | E xam ple 2: N on em p ty R anges, N o P erfect M atching Another example of , single scan path chaining is shown in Figure 8.7. There are three combinational J kernels with test lengths shown, and five FFs. Consider the FFs R l, R2 and i?3. 1 Analysis shows that Range(Rl) = [3,3]; Range{R2) = Range(Rd) = [3,4]. Even ■ without considering the ranges of other FFs, it is evident that no perfect m atching , of the FFs to the chain slots can exist, since the three FFs R l, R2 and R3 cannot be squeezed into the two slots at positions 3 and 4. In this case the procedure would first produce a partial m atching with four of the five FFs in their ideal ranges, e.g., (i?5, — , R l, R2, i?4). The remaining FF R3 would then be placed outside its ideal range in the remaining slot 2. The associated confidence level is 0 .8 . 185 j 8 .3 .4 A S im p lify in g T ran sform ation Above we have presented a procedure for constructing a near-optimal single scan chain for a fully compatible circuit. The procedure views the circuit as a sequence of n kernels, K S E Q = {Kx, ■ ■ •, K n), in ascending test length order. Each kernel Ki has certain scan registers driving it and certain scan registers receiving from it. Each gives rise to a test session TSi during which kernels K i , . .., K n are tested. In certain circuits, however, a subset of the kernels in K S E Q , say { K i,... ,Kj), may have exactly the same test length; hence the test sessions TSi+\,... ,T S j have zero duration under the overlapped scheme. In this case the problem can be simplified by merging K i , . . . , Kj into a single kernel, say I<i...j. Any register that drives (receives from) a kernel in {I< i,. . . , Kj) is also a driver (receiver) for the merged kernel. This merging of kernels can be carried out for every subsequence within K S E Q of kernels having the same test length, resulting in a shorter sequence of kernels. However, the ideal range of each ' i FF is exactly the same for the modified problem. Thus the result of the chaining algorithm for the modified set of kernels is also a solution to the original chaining problem. This transformation can be used to reduce the com putation complexity whenever possible without altering the solution space. j 8 .3 .5 C a se S tu d y The single scan path chaining analysis of this section was applied to a single bit slice (i.e., a single butterfly circuit) of the Viterbi decoder described in Section 3.9. Based on full scan design, this circuit had 12 mutually compatible combinational kernels of various sizes, each being essentially a cloud of the circuit. These kernels j were connected together via a total of 28 scan FFs. Due to similarities among somej groups of kernels, there were only four distinct test lengths among the 1 2 kernels. Hence after the simplifying transformation of merging equal-test-length kernels the num ber of kernels for the chaining analysis was reduced to four. These kernels and their interconnections are illustrated schematically in Fig ure 8 .8 (a). The four merged kernels and their respective test lengths are A [2], B j 186! I [ 8 ], C [13], and D [15]. Each arrow in the diagram represents a set of scan FFs that receive test results from the source kernel and drive test patterns into the destination kernel. Thus for example the self-loop at node B indicates that there are five scan FFs th at serve as driver-receivers for kernel B. The parentheses alongside an arrow contain the names of the FFs, e.g., {F F l - F F ll). There are a total of 28 scan FFs to be connected in a single scan chain. In this example, if the combined test scheme is used with an arbitrary scan chain ordering, a chain cycle of 28 must be used for applying all 15 tests to the circuit. The resulting test time is 463 clock cycles. Below we determine a scan chain ordering and the resulting test time based on the overlapped test scheme. For the four test sessions, T S \ ... T S 4 , the respective minimum session cycles are 28, 25, 21 and 19. Using these values the ideal ranges can be computed for a ll, the 28 FFs. In every session there is at least one driver-receiver FF, hence equation 8.3 must be used. Note that FFs that lie along the same arrow in Figure 8 .8 (a), i.e., j th at receive from the same kernel and drive the same kernel, must have exactly the I same range. The range for a group of FFs covered by each arrow in the diagram is indicated alongside it, e.g., [4,25] for the self-loop at kernel B. Since there are only j nine arrows in the diagram, the range computation needs to be carried out only nine times. For this circuit the procedure sin g leC h ain is able to find an ideal chain ordering, i.e., an assignment of FFs to chain slots such that every FF is within its j ideal range. The actual ordering returned by the procedure depends on which of| the many possible perfect matchings is found during the execution of the m axim um ' m atching step. One such chain ordering is shown in Figure 8 .8 (b). Based on the j overlapped test scheme, the test tim e using this scan chain is 392 clock cycles, j According to Theorem 7 this must be an optimal solution. Note that this solution; achieves a saving of 15% in the test time compared to the combined test scheme, at the cost of a more restricted scan path ordering. Thus in general, the overlapped scheme trades off routing area against test time compared to the combined scheme. The actual increase in routing area, if any, cannot be determined until the final physical design is carried out. If in fact the 187 (F F 7 -F F 1 1) Range: [4,25] . (FF1) Range: [1,25] [2 vec.] [8 vec.] (FF2-FF3) ^ Range: [1,19] (FF14-FF15) Range: [4,19] (FF6) i Range: I [8,28] (FF4-FF5) Range: ■ > x J 1 0 ,2 8 ] (FF12-FF13) Range: [8,25] [13 vec.] [15 vec.] (F F16-FF18) Range: , ^ [ Q - 21] J F F19-FF 28) Range: I < [10,19] J (a) F F # s - Slots 2 3 14 15 7 8 17 18 4 8 19ji20H21 H22 10 23 24 25 26H27H28 iff. 16 10 11 2d 21 2 8 i (b) Figure 8 .8 : Case study for single scan chain, (a) Schematic description of circuit, (b) optim al scan chain ordering. ! 188: jwiring due to tHe optimal scan chain ordering is taken into account in the placement stage, the actual area cost may not be very high. Even better, if the ordering generation in sin g leC h a in could be made sensitive to the physical placement of the scan FFs, say by altering the process of obtaining a maximum matching, the additional routing cost could be minimized or even eliminated in some cases. The latter idea is a subject for future study. j |8 .4 M u lt i p l e C h a in s fo r T w o C o m p a t ib le K e r n e ls In this section we consider another restricted case of the general chaining problem defined in Section 8.1. Unlike in the previous problem, we now allow multi- i iple scan chains. Note that this introduces two new degrees of freedom in the solution: jthe lengths of the various chains, and the actual assignment of FFs to the chains. To compensate for the vastly increased complexity of the solution space, in this section | we will consider the following restrictions on the problem, and attem pt to formulate J a solution. 1. There are exactly two kernels. 2. As in the previous study, the kernels are mutually compatible. 3. The number of chains is prespecified. 4. Rather than the canonical overlapped test application scheme, we use a special quasi-overlapped, scheme described below in which the ordering of FFs in the ' scan chains is irrelevant. < The q u a si-o v e rla p p e d scheme used in this problem is actually a hybrid of J the combined and overlapped schemes. If the circuit has a single scan chain, this scheme is identical to the combined scheme, i.e., the chain is flushed completely every time a test pattern is applied, irrespective of the ordering of the scan FFs. If! there are m ultiple scan chains, however, a modified form of the overlapped scheme 1 is used. Let K S E Q = (Ki, K 2, ■.., K n) be the ordered sequence of kernels and i i ______________________________________________________________________________ 189j T'Si^T'Sz,... ,T S ^ t h e test sessions exactly as defined under the overlapped scheme in Section 8.2.3. In the original overlapped scheme, a minimum amount of shift ing is done to apply each test pattern, and the chain cycle used in session TSi is CC(I<i,..., K n), which is dependent on the FF ordering within the scan chains. In the quasi-overlapped scheme, however, all chains th at are active in a given session are flushed completely. Thus the chain cycle used in session TSi is max [cl , ceC h (K i K n) where C h{K{,..., I<n) is the set of chains used for applying tests during session TSi and |c| is the number of FFs in chain c. Thus in this scheme the chain cycles in the j various sessions do not depend on the ordering of FFs in the scan chains, only on the assignment of FFs to the scan chains. I ! 8.4.1 M o d elin g th e P ro b lem Under the above restrictions, the problem of this section can be stated as follows: given two kernels and the associated scan FFs, and given the number of scan chains to be constructed, determine an assignment of the scan FFs to chains such th at the overall test time under the quasi-overlapped test scheme is minimized. Figure 8.9 shows the two kernels under test, K x and K 2, and the associated scan FFs. There are three set of scan FFs. The first, R x, contains nx FFs that are used for testing I<x but not K 2. These FFs may be either drivers, receivers, or 1 driver-receivers with respect to I<x. It is not im portant to distinguish among these| types of FFs when the quasi-overlapped scheme is used, since the presence of any of these FFs anywhere in a given chain may require the whole chain to be flushed i during test, irrespective of the FF type. Similar to R x, the set of FFs R 2 contains i n 2 FFs th at are used for testing I< 2 but not I<x. Finally, R X2 contains n X2 FFs that are to be used for testing both K x and K 2. Thus each FF in R x2 either receives from K x and drives K 2 or drives K x and receives from K 2. Let k be the number of chains to be constructed. Each chain c, is essentially I a set of FFs, i.e., a subset of R x U R 2 U R x2. The set of k chains in the design ( i I 1901 / 2 v e c . /1 v e c . Figure 8.9: Circuit model for two-kernel multiple scan chain study. under consideration is denoted by Ch. Our objective is to determine the set Ch = {ci,c2, ... ,ck}, where UC i £ C h c i — Rl U Rl u i?1 2! j Vci, C j e Ch, i / j, C i n C j = < j> \ I I and the overall test time is minimized. I In the discussions below, we will use C h i , C h 2 and C h u to denote three; partitions of Ch defined as follows. I C hx = { c e C h \ c C Ri}; Chi — {c € Ch | c C Ri U R 1 2}', C h u = C h - C h i - C hi. Thus each chain in C h u contains at least one FF from each of Ri and (R n U Ri). ! Under the quasi-overlapped test scheme described above, the test for the circuit consists of two sessions, TSi and T S 2 . For brevity, let denote the test length T (K i ) of Ki\ i.e., h = T(KX) and l2 = T ( K 2). Note that by assumption, I h < li- In TSi, h test patterns are applied to Ki as well as K i in a combined j 1 1 191 i manner. During this session all the chains in Ch must be involved in applying tests. jThus the chain cycle for this session is maxcec/i \c\. In T S 2 , the remaining I2 — h test patterns are applied to K 2 alone. Clearly during this session all chains in Chi must be idle since they contain no FFs connected to K 2. Only chains in C h 2 and in C h \ 2 are involved in applying the tests, since each contains at least one FF connected to K 2- Accordingly, the chain cycle in this session is m axc 6 c7i2uc/ii2 lcl- Below we derive some interesting properties of any optimal chaining solution, i.e., any well-formed set of chains Ch = {ci, c2, ..., c^} that has minimum associated test tim e using the quasi-overlapped scheme. We take advantage of these properties to prune the search space and efficiently obtain an optimal solution. f i 8 .4 .2 P r o b le m C h a ra cteristics j In constructing the k scan chains for the circuit of Figure 8.9, each of the rti + n 2 + n i 2 j FFs could be assigned to any of k chains. Hence the total number of possible ways of constructing the chains is k( -ni+n2+ni2\ For small problem sizes it may be possible to enumerate all the ways and compute the test tim e for each, since the test tim e computation for a fully compatible circuit using is relatively simple under any test application scheme. For larger problems, however, the solution space grows rapidly. For example, with 4 chains and 30 FFs, the total number of distinct chain configurations is approximately 1018. Fortunately, as the study presented in this j section demonstrates, it is possible to identify a limited subspace of solutions within ! which an optimal solution must exist. This helps to restrict the search for th e ; solution. { l I We first show that we can ignore chaining configurations in which there is j more than one chain in Chi 2 , i.e., there is more than one chain that has FFs from R\ \ as well as from R 2 U R \ 2 ■ For example, consider the configuration in Figure 8.10(a), ! where each vertical bar represents a chain and its length represents the num ber of FFs in the chain. C h \ 2 contains two chains, C 3 and c4. Assume that this configuration is optimal. The proof of the lemma below will show that another optimal configuration, shown in Figure 8.10(b), must exist. In the new optimal configuration, the FFs in [c 3 and c4 are redistributed, without altering their respective lengths, such that the resulting set C h\ 2 has only one chain. L e m m a 1 0 There exists an optimal chain configuration in which C h i 2 has only one chain. P ro o f Suppose Ch is an optimal configuration of the chains with \Chx2\ > 1. Then transform the chain configuration Ch as follows. Pick any two chains Ci,Cj € Chi2. W ithout loss of generality, assume that \ci\ < \cj\, i.e., Ci is not longer than Cj. Define N x = \{ci U Cj) fl i? i|, i.e., the total number of FFs from Ri present in the two chains. Define N 2 = |c,| + \cfi — N x, i.e., the total number of FFs from R 2 and R \ 2 present in the two chains. Note that since c; is the shorter chain, N\ + N 2 = |c;| + |cy[ > 2\ci\. Thus clearly N\ > \ci\ or N 2 > \c{\. In either case, we reconstruct the two chains while keeping their respective j lengths unchanged as described below. C ase 1 : If N x > |c« |, place |c* | FFs from Ri in c,, and the remaining Ni + N 2 — |cj| FFs (from both Ri and (R 2 U R i2)) in Cj. C i now belongs to Ch\ instead of C hl2. I C ase 2 : If N 2 > |cj|, place |c,-| FFs from (R 2 U R12) in c,-, and the remaining} + N 2 — \a\ FFs (from both R t and ( R 2 U R12)) in cj. ci now belongs to C h 2 J instead of C h\2. j In both cases, the chain configuration is altered such th at one chain is removed] from C h \ 2 and migrates to either Chi or Ch2. Under the quasi-overlapped test' application scheme, all the chains in Ch are accessed in test session T S i ’ , since the' length of each chain is unaltered, the test time in TSi is unchanged. In test session T S 2, only chains in C hi 2 U C h 2 are accessed. In the two cases above, this set of 1 chains either remains the same or has one chain removed. Thus in both cases thej chain cycle in T S 2 cannot increase, and the test tim e for T S 2 cannot increase. Thusj we have shown that by transforming the chains as described above, the number of| chains in C h x2 is reduced by one but the test time does not increase. Hence the new j chain configuration resulting from this transformation must also be optimal. 193 Cy C2 Ch-j C1 c 2 c 3 c 4 I I „ I C h 1 Oh 1 2 (c) L e g e n d : : < S S m I m v r.v m C3 C4 Cg I ,------ 1 LpJ O h 12 C h 2 (a) C5 V C h 2 ci c2 C h f ci c2 C h t • I - : - : - : * : - : - : - : * : - : mm$ ^ S S S f ? # S i S S B f g f : S I S O ? ; c3 c4 - ■ M O h 12 (b) C 5 LT J C h 2 C 3 C4 - * V O h 12 (d) C5 v C/?2 F F s from R 1 ips§ vX v. v. w F F s from (R 2 u R 12 ) Figure 8.10: Optimal chaining solutions, (a) Original configuration, (b) in region A, (c) in region B, (d) in intersection region. 194 C h a in co n fig u ra tio n s e a r c h s p a c e R e g io n (AnB) R e g io n Figure 8.11: Venn diagram of search space for multiple chain design problem. ' j The transform ation described above can be repeated as long as C h X2 has more j than one chain. Each time a new optimal configuration is generated in which th e ! num ber of chains in C h i2 decreases by at least one. Finally we must obtain a chain j configuration with IC/112I = 1. □ | ! Lemma 10 shows that in searching for an optimal chain configuration, it is sufficient to consider those configurations in which at most one chain has FFs from both R i and (R 2 U R 1 2 ), and for all other chains c, either c C J R1 or c C (R 2 U R \ 2). Figure 8.11 shows a Venn diagram of the search space, in which Region A is the restricted search space resulting from Lemma 10. There are still many degrees of freedom in the configuration of the chains, both within Region A and outside it, because each chain is allowed to have an arbitrary number of FFs. Below we show another way in which the search space can be limited by considering only chaining configurations in which all chains in Chi are of equal lengtli~(±l), and~all chains in C h 2 U C h i2 are of equal length (±1). For the optimal configuration in Figure 8.10(a), the proof of the lemma below will show that there must exist another optimal solution, shown in Figure 8.10(b), satisfying this restriction. 'L em m a 11 There exists an optimal chain configuration in which all chains in C h\ are of equal length ( ± 1) and all chains in C h 2 U C h \ 2 are of equal length (±1). |P ro o f Suppose Ch is an optimal configuration of the chains. Carry out the fol- owing transformations which preserve the optimality of the configuration. First, consider the chains in Chi and rearrange them such that all the FFs contained among them are evenly distributed among them. Formally, let N\ = ifZeech! Icl > reconstruct the \Chi\ chains in Chi such that each has either fisfcll or 1]5a71 FFs. Clearly none of the resulting chains can be longer than the longest of the original chains in Chi. Next, consider the chains in (C h2 U C h i2) and rearrange them such that all the FFs contained among them are evenly distributed among them . Let N 2 = J2c€(Ch2uCh12) lcli reconstruct the \Ch2 U C h i2\ chains such th at each has either jv2 _ \Ch2uC h12\ or \Ch2uChi2\\ FFs. Clearly none of the resulting chains can be longer than the longest of the original chains in Chi U C h i2. W ith the resulting chain configuration, the chain cycles to be used in both j sessions T S i and T S 2 cannot be greater than the respective chain cycles with the ■ original configuration. Thus the new configuration must also be optimal, and satisfies , i the statem ent of the lemma. □ ' Thus Lemma 11 can be used to restrict the search to a space labeled Region B in Figure 8.11. The two lemmas presented above have shown that it is sufficient j to search for a solution in either Region A or Region B. We will now show th at it is j I actually sufficient to search in only the intersection region of these spaces, by proving j th at an optimal configuration must exist within the intersection region. Figure ; 8.10(d) shows such an optimal configuration, which is derived from the optimal 1 configuration in Figure 8.10(b). T h eorem 8 There exists an optimal chain configuration in which C h x2 has at most one chain and all chains in C h x are of equal length (±1) and all chains in C h 2 U C hX2 are of equal length ( ± 1 )- P ro o f By Lemma 10 there exists some optimal chain configuration Ch in which C h X2 has at most one chain. Carry out the following transformations on this config uration. First, using a method similar to that in the proof of Lemma 11, rearrange the FFs contained in the chains in C h x so that they are evenly distributed among the chains. This is similar to the first transformation step in th at proof. The lengths of the resulting chains in C hi must range from x x — \ to i i , where xi is some integer l (i.e., xi = ma,xceCh1 |c|). j Similarly, using the same method, rearrange the FFs contained in the chains in C h 2 so that they are evenly distributed among the chains. Note that this differs from the second step in the earlier proof since the chain in C h X2 (if any) is not altered here. The lengths of the resulting chains in C h 2 must range from x 2 — 1 to x 2 , where x 2 is some integer (i.e., x 2 = m ax^c/^ |c|). , i I If there is a chain in C h X2, say cX2, let its length be x 12. If x1 2 equals either,' or i 2 - 1, or if there is no chain in C h X2, then the present configuration satisfies the statem ent of the theorem. Otherwise consider the following two cases. I Case 1: If c12 is the longest of the chains in this configuration, i.e., x X2 > x x 1 I and xi2 > x 2, then Ci2 is the chain that determines the chain cycle in both sessions I T S X and T 5 2. Pick any FF in cX2, say / , and remove it from ci2. Depending on| ) whether f € R l or / € (i?2 U i?i2), add / either to one of the chains in C h x or C h 2, j respectively. Note that it must be possible to do so while m aintaining all chains in ! C h x of equal length (±1) and all chains in C h 2 of equal length (±1). In the new configuration created by this migration of / , the length of cx2 is closer to x 2 by at least one FF; however, cX2 is still the longest chain, and since it has been shortened, the test time in each of the two test sessions could not have increased. Hence the new configuration is also an optimal configuration. The process described in this paragraph can be repeated, updating the values of xi, x2 and generating new optim al1 i ________________________________________________________________________ 197J configurations, until the length of c12 is equal to x 2 (±1). This final configuration must satisfy the statem ent of the theorem. C ase 2: If ci2 is not the longest chain in this configuration, then remove arbitrarily one FF / from a longest chain (which could be in either C hi or C h 2), and place / in Ci2. Note that irrespective of which chain / came from, the test tim e in each of the two sessions could not have increased. This migration process can be repeated if necessary until the length of ci 2 is equal to x 2 (±1). This final configuration must satisfy the statem ent of the theorem. i 1 Thus the theorem is proved in all possible cases. □ [ i The preceding theorem allows us to restrict the search for an optimal con figuration to a small fraction of the search space, i.e., the intersection of Regions A j and B in Figure 8.11. In the following section we develop an efficient way to traverse j I this space. i » I 8 .4 .3 C o n stru ctin g O p tim a l C h ain s 1 i Due to the restricted characteristics of any configuration in the intersection of Re- I gions A and B, we can define a simplified search problem as shown below. We first | formulate a nonlinear integer programming problem and then show how to tackle the problem by reducing it to a bounded set of linear problems. 8.4.3.1 N onlinear P roblem Form ulation The problem can be stated as follows. Given the test lengths h and l2 of the two j kernels, the lengths ni, n 2 and n i2 of the three registers, and the total number of! chains k that are to be constructed, determine four integers k\, k2 , X\ and x 2 such i that the following relationships hold. j I 1. There are k\ chains in Ch\ and k2 chains in (C h 2 U C h \2). T hat is, j ( 8. 4) : ( 8. 5) ; 198 1 0 < ki,k2 < k; k\ T k2 = k. In the configuration of'Figure 8.10(H), A q = 3 and k2 = 2. 2. Every chain in Ch\ has either x\ or xi — 1 FFs from i?i- At least one of these chains must have aq FFs. If there is a chain in C h \ 2 (with maximum length x 2), it may have up to x 2 — 1 FFs from R \. These observations are embedded in the following constraints on kj, x\ and x 2. ki(xi — 1) + 1 < n\ < k\x-i + (x2 — I). (8.6) 3. Every chain in C h 2 and C h i2 has either x 2 or x 2 — 1 FFs. At least one of these chains must have x 2 FFs. If there is a chain in C h i2 , at least one of its FFs and up to x 2 — 1 of its FFs must be from R 2 U R \2. These observations are embedded in the following constraints on k2 and x 2. (k2 — l)(x 2 — 1) + 2 < n 2 + n 1 2 < k2 x 2 — 1. (8.7) i 4. The test tim e under the quasi-overlapped scheme is minimized. Depending on the relationship between aq and x 2, there are two possible expressions for the overall test time, which is the objective function. C ase aq < x 2: The chain cycle in both sessions T S \ and T S 2 is x 2 , hence the overall test time is T T = (x2 + d + l)/2. (8.8) C ase xi > x 2: In T S i the chain cycle is xi and in T S 2 the chain cycle is x 2. J The overall test time is T T = (a;x -f- d T l)/x -t- (x 2 + d + 1)(^2 ~ ^i)- (8-9) . I j The two expressions above serve as objective functions to be minimized. Since i the relationship between X\ and x 2 cannot be predicted in advance, both ob- j jective functions must be attem pted separately, and in each resulting solution, the values of aq and x 2 must be compared to determine whether they are ; consistent with the objective function that was used. Note that according to J the lemmas proved earlier there must exist some configuration of the chains I 199 that satisfies our constraints, hence at least one of the two cases must yield a satisfactory solution. In the above formulation of the problem, some of the inequalities contain products of more than one of the variables to be optimized, i.e., ki, k2 , x i , x 2. Since 'all the variables take on integal values, this is an integer nonlinear programming problem. f 8.4.3.2 Linearizing th e P roblem In our optimization problem, the range of values that ki and k2 can take on is fairly j restricted, since the total number of chains k is fixed and typically small. Note also j that if ki and k2 are treated as constants, the problem formulation above becomes j an integer linear programming (ILP) problem [29], which is theoretically intractable j |but for which efficient software packages exist. We can take advantage of this fact to j iterative over all k possible combinations of k\ and k2 values, and for each, determine : locally optimal values of x\ and x 2 by solving the corresponding ILP problem. The < solution with the lowest overall test time is the global optimum. The procedure i m ultipieC hains listed below carries out this process. I r procedure m ultipieC hains ( h ih ,n i^ n 2 ,n i2 ,k): Returns k \,k 2 , x \ , x 2. | 1. Current “optim al” test time: T T ° * — oo. I I 2. For all fc 1 ? 0 < ki < k, do the following: I (a) k2 * — k — kx. I (b) Treating fci and k2 as constants, and x\ and x 2 as variables, solve the ^ integer linear programming problem consisting of the constraints 8.4 through 8.7 with the objective function 8.8 above, to obtain values of aq I and x 2 that minimize this objective function. If Xi < x 2, this is a valid local optimal solution for the current values of j ki and k2. j Otherwise, solve the integer linear programming problem again, this ; time using the objective function 8.9 instead of 8.8, to obtain new values : of X x and x 2 that minimize this objective function. I 200 If X\ > x 2, this is a valid local optimal solution for the current values of k i and Otherwise no valid solution exists for the current values of ki and £2 ; hence skip step 2c. (c) The values k i , x i , X 2 represent a local optimal solution, with test tim e T T according to the appropriate equation 8.8 or 8.9. If T T < T T ° , then store the current solution: i. T T ° *- T T ; ii. kP < — ki) iii. k® < — ^2 ; I iv. x ° « — aq; V. X ° *— X2- 3. R e tu r n k°, k ° , x ° , x ° . □ I 8 .4 .4 D iscu ssio n The algorithm presented above yields an optimal chaining configuration for fully compatible two-kernel circuits. It also applies to circuits with more than two kernels ; 1 for which the simplifying transformation described in Section 8.3.4 results in tw o 1 kernels. If the circuit model resulting from this transformation still has more than two kernels, however, the algorithm m u ltip ie C h a in s may still be useful for obtaining a suboptimal solution. To make this possible, a process similar to the simplifying transform ation of Section 8.3.4 could be carried out to reduce the number of kernels to two. Rather than merging kernels that have exactly the same test length, any : subset of kernels that have approximately the same length could be merged into a 1 single kernel. This idea could be used to partition the set of kernels into two subsets, based on their test lengths and/or the number of FFs involved in testing them , and merge the kernels in each subset. Clearly the resulting chain configuration for th e ! two merged kernels may not be optimal for the original set of kernels. The heuristic! used for partitioning the set of kernels into two subsets would have an influence on 1 the degree of optimality of the solution. The number of chains to be constructed is required as an input param eter to the m u ltip ie C h a in s algorithm. Every chain is associated with at least two I/O pins serving as the scan-in and scan-out pins respectively. When the quasi-overlapped test scheme is used, however, note that some of the chains may be required to operate asynchronously with respect to each other, i.e., at different chain cycles, during the test. Hence one or more additional I/O pins may be required for test control under this scheme. i I I 8 .4 .5 E x p e rim en ta l R e su lts i In this section we study the results of applying the above multiple chain design algorithm to circuit examples, and compare the results with the traditional approach of constructing equal-length chains. Consider a circuit with two compatible kernels, K \ and K 2, with test lengths l\ = 20 and I2 — 80, respectively. Let the number of FFs communicating with K \ \ but not K 2 (i.e., n\) be 15, and let the number of FFs communicating either w ith ; K 2 alone or with both K \ and K 2 (he., n 2 + n12) be 10. The 25 FFs in this circuit I could be connected into a single scan chain, in which case the test tim e (based on the combined test application scheme) is 2105 clock cycles. Table 8.1 (illustrated graphically in Figure 8.12) shows the results of using m ultiple chains. The notation used in the table is as follows, k is the number of j scan chains to be constructed, ki and & 2 represent the numbers of chains in C h\ and j (C h 2 U C h 12), respectively, m axi [max2] represents the maximum length among the : chains in C hi [(Ch2 U C h2)\. T T is the overall test time for this solution. T T eq is the ! overall test tim e for the “traditional” chaining approach in which FFs are assigned J arbitrarily to chains provided the lengths of the chains are equal (±1). For the case of two chains, the optimal solution shown in the first row of Table 8.1 has lengths of 15 and 10, respectively, for the two chains. This results in a saving in test time, compared to the “traditional” solution, of 12.2%. For other numbers of chains, ranging up to 10, the corresponding results are shown in the table, and the overall test time is plotted in Figure 8.12. For this circuit example, l 202 k ki k2 m ax 1 m a x 2 T T T T eq Saving 2 1 1 15 10 995 1133 12.2% 3 1 2 15 5 695 809 14.1% 4 2 2 8 5 548 647 15.3% 5 2 3 7 4 467 485 3.7% 6 2 4 7 3 407 485 16.1% 7 3 4 5 3 365 404 9.7% 8 3 5 5 2 305 404 24.5% 9 4 5 4 2 284 323 12.1% 10 5 5 3 2 263 323 18.6% Table 8.1: Results of multiple scan path chaining for circuit with = 20, l2 = 80, ni = 15, and (n2 + n i2) = 10. the saving in test time compared to the solution with equal-length chains ranges | between 3.7% and 24.5%, depending on the number of chains. The average saving I among the solutions for 2, 3, ... 10 chains is 16.0%. I Table 8.2 shows the average saving in test time for variations of the above < i example. It should be noted that in any individual solution for a given number of chains, there must be a non-negative saving. It is possible, however, for the saving j I to be 0 in some cases. Clearly the average saving varies widely depending on th e ! circuit characteristics. In general, it appears from the results of the study that when the test lengths C and /2 of the two kernels differ widely, a greater saving is obtained—unless if the ratio l\ : /2 is similar to the ratio n\ : (n2 + nu2), in which case the saving is less significant. This information could be useful as feedback to the process of partitioning for test. To summarize, the results show th at the j algorithm m u ltip ie C h a in s can indeed be used to reduce the overall test time; but I they also indicate that the savings are dependent on the circuit characteristics, and j more study is required to understand the dependency better. Overall test time 1200 Optimal chain configuration Equal-length chain configuration 1000 800 600 400 O © 200 Number o f scan chains Figure 8.12: Decrease in overall test time with multiple scan chains. h h n t (n2 + n 12) Average saving 20 80 15 10 16.0% 20 80 10 15 6.6% 40 60 15 10 3.3% 40 60 10 15 2.2% 10 90 15 10 23.0% 1 | Table 8.2: Average saving in test time for different circuit examples over various I numbers of scan chains. | f ____________________ __________________ 2 0 4 1 8.5 The Rest of the Iceberg The study presented in this chapter is merely the tip of the iceberg of chaining sub problems. To get a handle on the chaining problem, we have started by looking only at fully compatible designs. The results obtained here are applicable to traditional full scan designs in which every cloud of combinational logic forms a kernel, or to partial scan designs in which size-based partitioning is used to form subkernels. Al though many other chaining subproblems remain unsolved, the primary contribution of this work is in bringing out the implicit relationship between the scheduling prob lem (discussed in Chapter 7) and the chaining problem (which attem pts to minimize test time based on the test scheduling and test application schemes to be used). One interesting result established in the multiple chain study is that contrary t o ■ . traditional wisdom, equal-length chains are not necessarily optimal; depending on the kernel characteristics and the test application scheme, the lowest overall test tim e may actually be achieved by using unequal-length multiple chains. The results obtained in this chapter can be used to provide insights into more areas of the chaining problem space. The following is a list of directions that can b e ! taken in future research. j t • Take geometric constraints such as maximum routing area, maximum wirej length between scan FFs, and maximum clock wire length into account. • Use the total number of pins available as scan-in, scan-out, or scan path control pins as a constraint, rather than assuming the number of chains as given. • In the multiple chain case, deal effectively with the case of more than two kernels. j ! • Allow kernels to be incompatible, in both the single chain and multiple chain! problems. • In the multiple chain case, use the canonical overlapped test scheme rather than the quasi-overlapped test scheme. • In the separate and overlapped schemes, reduce the amount of shifting to apply each test by “pipelining” the shifting in/out of test patterns/results respec tively. The modified test schemes, pipelined separate and pipelined overlapped, would help to reduce test time, but would need certain sets of scan registers to inhibit loading data at their inputs during certain clock cycles. • Allow the scan chains to be reconfigurable using multiplexers. The overall test tim e could be reduced, at the expense of control complexity, by bypassing certain segments of the chains in some sessions. Each of the extensions above requires a fair amount of analysis. The last idea, which suggests reconfigurable chains, completely redefines the chain construction problem by adding a whole new degree of freedom. Clearly there is a wide expanse of chaining subproblems yet to be explored in addition to those th at have been addressed in this work. I I 206 Chapter 9 Conclusion “The greater our knowledge increases, the greater our ignorance unfolds. ” —John F. Kennedy, 1962. ! I 1 This thesis has attem pted to solve many of the design issues associated with , well-known serial scan design techniques. Scan design has traditionally been viewed within the design community as a rigid and expensive technique. However, the work presented here describes and analyzes many different ways in which scan design can be applied, leading to a more flexible design methodology in which high testability can be achieved and different design costs (such as area overhead, test time, I/O pin count) can be traded off against each other. I The results of this work are being implemented in a prototype software system called SIESTA (System for IntEgrated Scan Test Application) [30]. SIESTA is built upon Cbase, an object-oriented framework for VLSI design and test applications [31]. Figure 9.1 shows the system architecture and its relationship to Cbase and ! other tools. Bold lines indicate portions of the work that are already implemented; j broken lines indicate modules that are under development. SIESTA interacts w ith ! the user in text and graphic form by making use of the Cbase graphic user interface, j as shown in Figure 9.2. ' • The research presented here can be classified into four m ajor areas: p a rtia l1 scan, partitioning, test scheduling, and scan path chaining. Below we describe the 1 C base SIESTA ACYST partial scan SWBALLAST partial scan .hi- CRETE Circuit reorganization TG S Partitioning for test USC Test | Generation | System j S can path chaining Cbase-UI Cbase user Interface Test scheduling Kernel minimization Figure 9.1: SIESTA 1.0 system architecture. 208 ktrop-jnnt/auto/poisson/gupta/'siesta/circuits Browser j [ Tools ] | Library ifTypeLatticc || Help If Defaults I x i t ll Create Delete Edit Info H ow e Resize CntxtUf CntxtDr Zoo*In ZoonOut Detail Redraw Chkln ChkOut Commit Abort SCopy RCcpy Place Route HerQe F latter D R C Scale Attach Detach Unused Unused Unused Unused Unused Unused screenUp screenteft screenRight screcnHone Enter option; n«»e of scan for this design is BALLAST Scan registers for breaking cycl rl Scan registers for breaking imbalances: r5 Total modification cost SCAN REGISTERS MENU cJaclwmod 1 Set type of scan 2 Constrain scan register selection 3 Select scan registers 4 Display scan registers 5 Display kernel 0 Return to SIESTA main Menu e Exit Enter option: Q Status : SIESTA Serial Scan Design System Figure 9.2: SIESTA user interface. 209 [contributions made in each area by this thesis. We also list some of the open problems for future research, which in some cases are related problems that could not be fully addressed in this work, but in other cases arise as a direct consequence of the contributions in this work itself. 9.1 Partial Scan Design The work on partial scan design has shown that there exist partial scan designs, with reduced overheads compared to full scan, that can achieve the main benefits of full scan itself. The partial scan approaches presented in this work can be applied to arbitrary synchronous circuits; however, they are most effective with data path and pipelined circuits. Although some area cost is unavoidable with partial scan design, in general the area cost is lower than with full scan. Perhaps more im portant, the partial scan design approach allows a range of designs to be generated, each with different area/performance characteristics, and in fact can allows area cost to be traded off against performance cost. The BALLAST partial scan technique, presented in Chapter 3, achieves full coverage of all detectable single stuck-at (SSA) faults using simple combinational ATPG, thus enjoying the main benefit of full scan design at reduced cost. This work has also explored two different extensions to the BALLAST tech- I nique. The first, called ACYST, presented in Chapter 4, deals with kernels th a t’ are acyclic but allows them to be unbalanced. This leads to designs that havej lower overheads compared to BALLAST but require some additional work in test j generation. In particular, simple combinational ATPG cannot be used, although a' combinational ATPG program that can handle multiple stuck-at (MSA) faults is sufficient. In general for an unbalanced kernel a given test consists of a sequence of test patterns. In the ACYST approach, the structure of the kernel is analyzed to determine an efficient way to compact the test sequences, i.e., to reduce the number I of patterns in a given test sequence (for an arbitrary fault) to a minimal value. This helps to minimize the test application time. ACYST also generates a condensed test generation model (TGM). By carrying out combinational ATPG with an MSA fault i ( i i 210| m odel on the condensed TGM, a complete set of minimal-length test sequences can be generated. Another extension to BALLAST, present in Chapter 5, takes advantage of any switches (MUXes or buses) present in the circuit to reduce test-related costs whenever possible. Switches can be utilized for test in many different ways. Two main uses of switches are studied in this work. The first takes advantage of the fact that kernels that appear to be unbalanced may actually show balanced behavior in the presence of switches. By using this information, the overhead due to partial scan design can be reduced by taking advantage of switches that are present in the circuit for normal operation. The second use of switches is to set up d ata transfer paths called I-paths within the kernel. These can help to transport test data during test and thus gain access to inner portions of the kernel. This leads to reduced ATPG cost and can also help to increase the degree of sharing of test resources (i.e., scan path registers) among different parts of the circuit. Figure 9.3 shows the space of partial scan designs for a given circuit. The horizontal axis represents the degree of scan, i.e., the fraction of the circuit’s FFs that are included in the scan path. The vertical axis shows the cost of test generation for each given partial scan design. The space is divided into three regions. The leftmost region consists of partial scan designs generated by BALLAST, which have balanced (acyclic) kernels. The middle region consists of ACYST scan designs, which have acyclic but unbalanced kernels. The remaining region of the space consists of designs j whose kernels contain cycles. Although the overheads in this region are lower than in the other regions, in general the ATPG cost for these circuits is orders of magnitude higher. Most earlier work has focussed on this region of the design space; however, j no conclusive solutions have been found and this remains an area for future study. In 1 this thesis, the main contribution of the work on partial scan presented has been to j identify the various regions in the design space and to effectively generate solutions j in the balanced and acyclic unbalanced regions. | Areas for future work in the partial scan domain can be summarized as fol-! lows: 2 1 1 1 _J CYCLIST; ACYST BALLAST T P G c o s t T P G c o s t P e r c e n t a g e o f s c a n 0% 100% FULL SCAN - Combinational TPG - Single-pattern tests - Combinational TPG with modified TGM - Short multi-pattern test seq u en ces - Sequential TPG - T e s t se q u en ce s grow exponentially NO SCAN Figure 9.3: The partial scan design space. • Extending the BALLAST and ACYST techniques to deal with circuits contain ing storage elements other than D-latches and D-FFs, e.g., J-K -FFs, T-FFs, and S-R-FFs. • Allowing for register/switch controls that are derived within the circuit rather than fed by primary inputs, and also allowing for gated clocks and/or multiple clocks. • Studying the coverage of non stuck-at faults and developing techniques for generating tests for these faults using partial scan. • In this work I-paths consist of only switches and registers; they could be gen- J eralized to contain modules like adders that are capable of transm itting data ; unchanged or through a bijective transformation. j i i • Developing a partial scan strategy for the cyclic region of the design space. 9 .2 P a r t i t io n in g In Chapter 6, different ways of subdividing an acyclic/balanced kernel in a partial scan design have been presented. The partitioning process is driven by two main j objectives: reduction in ATPG cost and reduction in test application time. Unfor- j I tunately, with the current state of the art, it is not possible to accurately link the partitioning process to these circuit parameters. In the absence of good estimators for these param eters, any elaborate partitioning scheme would be meaningless, since it is analogous to a blind man taking pains to paint a beautiful picture when he j cannot even see what he is doing. J W hat is clear, however, is that both the parameters tend to increase with j increasing circuit size. In fact, the ATPG problem is theoretically intractable, al- though good heuristics tend to increase at a polynomial rate (between 0 (n2) and ; 0 ( n 3)) in the average case. Test length is also generally higher for large circuits. The size of a circuit could be defined as one or a combination of several circuit features such as the number of gates, number of primary inputs, or the number o f; 213 [gates in any input-output path. Thus for very large circuits even a rudimentary form of partitioning could potentially be useful for reducing test costs. In Chapter 6 a heuristic approach for logically partitioning a kernel so as to bound the maximum size of each resulting subkernel is presented. The size function is assumed to be defined by the user based on the specific ATPG and test appli cation environments. Three different types of partitioning are used to break down the kernel: output-based, switch-based, and size-based. We have dem onstrated that partitioning a circuit can not only reduce ATPG cost but also reduce the overall test time (see Chapter 7), provided the registers in the scan path are chained appropri ately so as to maximize the efficiency of the partitioned test. The main contribution of the partitioning study presented here is to demon- I strate the usefulness of partitioning, and to help generate partitioning solutions in j the design space. However, further work is required in the following areas. • Studying the relationship of kernel features, such as size, number of inputs/out puts, and test length, to costs like ATPG effort and test time. The results could be used for developing fast estimators of these costs. • Improving the procedure for merging subkernels (processK ernels), based on either theoretical or empirical studies, and possibly using test length as one of the criteria used for merging. • Developing more effective ways of size-based partitioning, by using additional size criteria, and by making the partition sizes uniform wherever possible. • Enhancing the global partitioning heuristic, to make it more sensitive to th e ' actual ATPG cost and test time for the resulting design. j • Taking into account the organization of the scan chain(s), if this information j is available, to influence the formation of partitions for minimum overall test | I time. ! The work described above would help in better allocating resources such as chip area overhead, test time, and CPU time for ATPG, and in trading them off against each; other using various different partitioning solutions. 214 9.3 Test Scheduling Given the various subkernels produced by partitioning, we have developed an al gorithm in Chapter 7 for scheduling the tests for them in a time-efficient manner. Previous research on the test scheduling problem has been focused mostly on cir cuits using the built-in self test (BIST) methodology. We have shown that scheduling problem for scan-testable circuits cannot be modeled using these earlier problems because in our problem, the test time for each kernel is not a constant; rather, it depends on what other kernels are to be tested at the same time. A new model based on a test relationship graph (TRG), which contains the information required for scheduling the tests, is presented in Chapter 7. The manner in which the various scan registers in the circuit are organized into a single scan chain or into multiple scan chains has a strong influence on the scheduling of tests. The presence of multiple chains with unequal lengths leads to complex interactions among the kernels during test, depending on what chains are involved in testing each kernel. The TRG essentially captures the different types of interactions among the kernels and allows schedule information to be easily computed. Based on the TRG model, an algorithm for searching the solution space to obtain the best test schedule is presented in Chapter 7. The algorithm assumes th at the configuration of the scan chains is known. It allows one of two different scheduling disciplines to be specified. The first, scheduling with interruptions, allows the test for a kernel to be interrupted temporarily while the test control for the circuit i is reconfigured to test a different subset of kernels. The second, scheduling without^ interruptions, requires that once a test for a kernel is started, it must be carried; out in a continuous period of time with the same control configuration throughout, j although the tests for other kernels compatible with it may be initiated or completed during this period. Although the same basic scheduling algorithm is used in b o th ! cases, the algorithm obtains an optimal schedule with the latter discipline but may sometimes result in a suboptimal schedule with the former discipline. Future work on the test scheduling problem could be carried out in the fol-, lowing directions. i • Allowing for some or all scan registers to have individual HOLD controls, which permits tests to be applied in a pipelined manner with a lower chain cycle as defined in Section 7.2. • Improving the efficiency of the implicit enumeration algorithm for scheduling, possibly by introducing a bounding criterion to further prune the search space. Alternatively, developing fast suboptimal heuristics. • Expanding the search space for schedules by using something other than a “first fit” strategy, or by removing the constraint that kernels must be considered one at a tim e in sequence. Thus for example after a portion of a test is executed, the remainder of the test could be considered later in the sequence. Expanding the search space in this way could lead to better solutions. 9.4 Scan Path Chaining The scheduling problem described above assumes that the configuration of sin gle/m ultiple scan chains is fixed. The chaining problem, however, is to determine a configuration of the chains themselves such that it leads to the most optim al test schedule. A chain configuration includes both the assignment of scan FFs to chains (in the case of multiple scan chains) and the ordering of scan FFs in each chain (in both the multiple and single scan chain cases). The optimal chaining problem in its most general form is extremely complex and frustrating because of the many degrees of freedom involved in constructing the scan chains. The main achievements in this work with respect to the chaining! problem are to bring out its interaction with the scheduling problem, and to solve j two constrained chaining problems. Both the constrained problems involve fully , compatible circuits, i.e., circuits in which all kernels can be tested simultaneously,1 hence they are applicable to full scan designs. The first study deals with constructing a single scan chain, and ordering the FFs within it such that the test time is minimized. An algorithm for solving this i problem is presented, which returns an ordering of the FFs in the chain. It does notj ^always determine an optimal ordering; however, as a part of the solution it returns Ja confidence level which indicates the level of optimality of the ordering. The second study deals with constructing a given number of scan chains for a circuit consisting of exactly two compatible kernels. The problem is to assign each scan FF to one of the chains such that the test time is minimized. A special test application scheme that does not depend on the ordering of FFs in each chain is used. This problem is tackled by reducing it to a finite and reasonably small number of integer linear programming problems. By solving each of these problems, the solution that has the lowest test time can be taken as the globally optimal solution. Although only two constrained chaining problems have been considered in this work, it is hoped that the experience and insight gained from the analysis will j help to mount an assault, in future research, on the most general form of the scan ' path chaining problem. Some of the issues to be tackled are listed below. • Extending the model used in the multiple chain study to the overlapped test application scheme (rather than the quasi-overlapped scheme). • Constraining FFs that belong to the same register to be assigned to the same scan chain and to be adjacent to each other within the chain. • Developing chaining solutions for circuits with incompatibilities among kernels, for both single and multiple chains. • Designing reconfigurable scan chains through the use of multiplexers, so th a t; the chain configuration can be altered in different sessions. j i • Taking into account the routing area overhead for different scan chain config- j urations, and generating a range of solutions that trade off routing overhead J vs. test time. • Allowing for different types of clocking schemes, which affect the way in which scan path registers fed by distinct clock signals can be connected to each other. Reference List [1] M. A. Breuer and A. D. Friedman. Diagnosis and Reliable Design of Digital Systems. Computer Science Press, Rockville, Md., 1976. [2] B. Konemann, J. Mucha, and G. Zwiehoff. Built-in logic block observation techniques. In Proceedings, IEEE International Test Conference, pages 37-41, October 1979. t [3] E. B. Eichelberger and T. W. Williams. A logic design structure for LSI testabil ity. In Proceedings, lfth Design Automation Conference, pages 462-467, June 1977. [4] M. S. Abadir and M. A. Breuer. A knowledge-based system for designing testable VLSI chips. IEEE Design & Test of Computers, 2(4):56-68, August 1985. [5] M. J. Y. Williams and J. B. Angell. Enhancing testability of LSI circuits via test points and additional logic. IEEE Transactions on Computers, C-22:46-60, 1973. [6] J. H. Stewart. Future testing of large LSI circuit cards. In Digest of Papers, IE E E Semiconductor Test Conference, pages 6-15, October 1977. [7] S. Funatsu, N. Wakatsuki, and A. Yamada. Designing digital circuits with; easily testable consideration. In Digest of Papers, IEE E Semiconductor Test, Conference, pages 98-102, 1978. | I [8] E. J. McCluskey. Logic Design Principles with Emphasis on Testable Semicon- j ductor Circuits. Prentice-Hall, Englewood Cliffs, NJ, 1986. j [9] E. Trischler. Design for testability using incomplete scan path and testability analysis. Siemens Forsch.- u. Entwickl.-Ber., 13(2):56-61, 1984. [10] V. D. Agrawal, K.-T. Cheng, D. D. Johnson, and T. Lin. A complete solution to the partial scan problem. In Proceedings, IEEE International Test Conference, pages 44-51, 1987. [11] H.-K. T. Ma, STTDevadas, A. R. Newton, and A. Sangiovanni-Vincentelli. An incomplete scan design approach to test generation for sequential machines. In Proceedings, IEEE International Test Conference, pages 730-734, September 1988. [12] Rajiv Gupta, R. Srinivasan, and M. A. Breuer. A methodology for partition ing and hierarchical reorganization of sequential circuits for DFT and BIST. Technical Report CENG 90-19, University of Southern California, 1990. [13] E. J. McCluskey and S. Bozorgui-Nesbat. Design for autonomous test. IEE E Transactions on Computers, C-30(ll):866-875, November 1981. [14] S. Bhawmik. An Integrated CAD System for the Design of Testable VLSI Cir cuits. PhD thesis, Indian Institute of Technology, Kharagpur, India, February 1988. [15] S. Al-Hariri and F. Ozguner. An easily testable structure for LSI and VLSI cir cuits. In Proceedings, National Electronic Conference, pages 346-350, October 1982. [16] K.-T. Cheng and V. D. Agrawal. An economical scan design for sequential logic test generation. In Proceedings, Fault-Tolerant Computing Symposium (FTCS- 19), pages 28-35, June 1989. [17] A. Kunzmann. Produktionstest synchroner Schaltwerke auf der Basis von Pipe- linestrukturen. In Proceedings, 18. Jahrestagung der Gesellschaft fur Infor- matik, Hamburg, pages 92-105, 1988. Informatik-Fachberichte 188, Springer- Verlag. [18] V. Chickermane and J. H. Patel. An optimization based approach to the par tial scan design problem. In Proceedings, IEEE International Test Conference, pages 377-386, September 1990. [19] P. P. Fasang, J. P. Shen, M. A. Schuette, and W. A. Gwaltney. Automated design for testability of semicustom integrated circuits. In Proceedings, Inter national Test Conference, pages 558-564, 1985. [20] A. Miczo. Digital Logic Testing and Simulation. Harper & Row, New York, 1986. [21] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, 1979. [22] H. W. Lenstra. The acyclic subgraph problem. Technical Report BW 26/73, M athematical Centre, 49, 2e Boerhaavestraat, Amsterdam, July 1973. 219 [23] R. E. Tarjan. Data Structures and Network Algorithms. SIAM, PHiladelphia, 1983. [24] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley Publishing Company, Reading, Mas sachusetts, 1974. [25] Rajesh Gupta, Rajiv Gupta, and M. A. Breuer. BALLAST: A methodology for partial scan design. In Proceedings, Fault-Tolerant Computing Symposium (FTCS-19), pages 118-125, June 1989. [26] R. Woudsma, F. P. M. Beenker, J. L. van Meerbergen, and C. Niessen. PI- RAMID: An architecture-driven silicon compiler for complex DSP applications. In Proceedings, International Symposium on Circuits and Systems, pages 2596- 2600, May 1990. [27] P. Goel. An implicit enumeration algorithm to generate tests for combinational logic circuits. IEEE Transactions on Computers, C-30(3):215-222, March 1981. [28] G. L. Craig, C. R. Kime, and K. K. Saluja. Test scheduling and control for VLSI built-in self-test. IEEE Transactions on Computers, 37(9):1099-1109, September 1988. [29] T. Lengauer. Combinatorial Algorithms for Integrated Circuit Layout. John Wiley & Sons, Chichester, England, 1990. [30] Rajesh Gupta. SIESTA 1.0 User’ s Manual. University of Southern California, Departm ent of Electrical Engineering-Systems, Los Angeles, CA 90089-0781, February 1991. [31] Rajiv Gupta, W. H. Cheng, Rajesh Gupta, I. Hardonag, and M. A. Breuer. An object-oriented VLSI CAD framework: A case study in rapid prototyping. IEE E Computer, pages 28-37, May 1989. 220
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11257162
Unique identifier
UC11257162
Legacy Identifier
DP22816